> # VM_ARGS="-Xmx100G" bin/riot --out=RDF/XML
It's JVM_ARGS but its unlike to help.
I tried 50G (it's not a good idea to ask for more than teh physial size
of the machine).
gz/non-gz is not a factor.
Titanium is in the step of reading the JSON document.
The parsing code hasn't got to the JSON-LD processing. Titanium is
reading in the file and creating a JSON structure.
It is the nature of JSON that the whole file has to be read for robust
processing. Streaming processing of JSON is done by relaxing that robust
ness (duplicate keys, @context the last in the object, etc.)
If there is a way to stream process, then we're missing it but I don't
see that capability in Titanium.
Ideally, there would be a TTL file.
As there isn't,
The file authorities-gnd_entityfacts.jsonld.gz has a structure you can
exploit.
Each line is a entry in a JSON array and it is complete - it has the
@context on each line - that
There are 9 million lines.
A possiblity is split the file on newline,
Clean up each line
* Drop the last line (the "]")
* remove the first character of each line (which is ",", after "[" on
the first line.
Parse each line.
Andy
On 05/07/2024 11:37, Sorin Gheorghiu wrote:
I got always OutOfMemoryError no matter how much RAM is available and it
fails wheather the jsonld file is archived or not
I attempt to convert this file
https://data.dnb.de/opendata/authorities-gnd_entityfacts.jsonld.gz with
Jena 4.10.0 or 5.0.0
Here my tests:
# free -m
gesamt belegt frei gemeinsam Zwischen
verfügbar
Speicher: 128821 24038 100091 1 4691 103714
Auslager: 1903 0 1903
# JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"; export JAVA_HOME
# cd /opt/apache-jena-4.10.0
# VM_ARGS="-Xmx100G" bin/riot --out=RDF/XML
/var/tmp/authorities-gnd_entityfacts.jsonld.gz >
authorities-gnd_entityfacts.rdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
java.base/java.util.LinkedHashMap.newNode(LinkedHashMap.java:256)
at java.base/java.util.HashMap.putVal(HashMap.java:627)
at java.base/java.util.HashMap.put(HashMap.java:608)
at
org.glassfish.json.JsonObjectBuilderImpl.putValueMap(JsonObjectBuilderImpl.java:209)
at
org.glassfish.json.JsonObjectBuilderImpl.add(JsonObjectBuilderImpl.java:81)
at
org.glassfish.json.JsonParserImpl.getObject(JsonParserImpl.java:334)
at
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:175)
at
org.glassfish.json.JsonParserImpl.getArray(JsonParserImpl.java:321)
at
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:173)
at
com.apicatalog.jsonld.document.JsonDocument.doParse(JsonDocument.java:163)
at
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:112)
at
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:90)
at
org.apache.jena.riot.lang.LangJSONLD11.read(LangJSONLD11.java:73)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:416)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:385)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:360)
at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:384)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:331)
at riotcmd.CmdLangParse.exec$(CmdLangParse.java:229)
at riotcmd.CmdLangParse.exec(CmdLangParse.java:169)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
at riotcmd.riot.main(riot.java:29)
# VM_ARGS="-Xmx100G" bin/riot --out=RDF/XML
/var/tmp/authorities-gnd_entityfacts.jsonld >
authorities-gnd_entityfacts.rdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space:
failed reallocation of scalar replaced objects
# JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"; export JAVA_HOME
# cd /opt/apache-jena-5.0.0
# VM_ARGS="-Xmx100G" bin/riot --out=RDF/XML
/var/tmp/authorities-gnd_entityfacts.jsonld.gz >
authorities-gnd_entityfacts.rdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space:
failed reallocation of scalar replaced objects
# VM_ARGS="-Xmx100G" bin/riot --out=RDF/XML
/var/tmp/authorities-gnd_entityfacts.jsonld >
authorities-gnd_entityfacts.rdf
09:42:56 INFO riot :: File:
/var/tmp/authorities-gnd_entityfacts.jsonld
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringUTF16.compress(StringUTF16.java:161)
at java.base/java.lang.String.<init>(String.java:4501)
at java.base/java.lang.String.<init>(String.java:300)
at
org.glassfish.json.JsonTokenizer.getValue(JsonTokenizer.java:510)
at
org.glassfish.json.JsonParserImpl.getString(JsonParserImpl.java:101)
at
org.glassfish.json.JsonParserImpl.getObject(JsonParserImpl.java:332)
at
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:175)
at
org.glassfish.json.JsonParserImpl.getArray(JsonParserImpl.java:321)
at
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:173)
at
com.apicatalog.jsonld.document.JsonDocument.doParse(JsonDocument.java:163)
at
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:112)
at
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:90)
at
org.apache.jena.riot.lang.LangJSONLD11.read(LangJSONLD11.java:73)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:444)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:413)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:375)
at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:391)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:337)
at riotcmd.CmdLangParse.exec$(CmdLangParse.java:234)
at riotcmd.CmdLangParse.exec(CmdLangParse.java:174)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
at riotcmd.riot.main(riot.java:29)
Regards
Sorin
Am 05.07.2024 um 00:35 schrieb Andy Seaborne:
On 03/07/2024 10:22, Sorin Gheorghiu wrote:
Greetings,
here my attempt to convert a large file from json-ld to rdf format,
does riot tool support archived files?
Yes.
$ riot --out=RDF/XML filein.jsonld.gz > fileout.rdf
That should work (Jena 5.0.0)
What happened?
Best regards
Sorin