> # VM_ARGS="-Xmx100G"  bin/riot --out=RDF/XML

It's JVM_ARGS but its unlike to help.

I tried 50G (it's not a good idea to ask for more than teh physial size of the machine).

gz/non-gz is not a factor.

Titanium is in the step of reading the JSON document.

The parsing code hasn't got to the JSON-LD processing. Titanium is reading in the file and creating a JSON structure.

It is the nature of JSON that the whole file has to be read for robust processing. Streaming processing of JSON is done by relaxing that robust ness (duplicate keys, @context the last in the object, etc.) If there is a way to stream process, then we're missing it but I don't see that capability in Titanium.

Ideally, there would be a TTL file.

As there isn't,

The file authorities-gnd_entityfacts.jsonld.gz has a structure you can exploit.

Each line is a entry in a JSON array and it is complete - it has the @context on each line - that

There are 9 million lines.

A possiblity is split the file on newline,
Clean up each line
* Drop the last line (the "]")
* remove the first character of each line (which is ",", after "[" on the first line.

Parse each line.

    Andy

On 05/07/2024 11:37, Sorin Gheorghiu wrote:
I got always OutOfMemoryError no matter how much RAM is available and it fails wheather the jsonld file is archived or not

I attempt to convert this file https://data.dnb.de/opendata/authorities-gnd_entityfacts.jsonld.gz with Jena 4.10.0 or 5.0.0

Here my tests:

# free -m
               gesamt      belegt       frei     gemeinsam Zwischen verfügbar
Speicher:      128821       24038      100091           1 4691 103714
Auslager:        1903           0        1903

# JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"; export JAVA_HOME

# cd /opt/apache-jena-4.10.0

# VM_ARGS="-Xmx100G"  bin/riot --out=RDF/XML /var/tmp/authorities-gnd_entityfacts.jsonld.gz > authorities-gnd_entityfacts.rdf

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
         at java.base/java.util.LinkedHashMap.newNode(LinkedHashMap.java:256)
          at java.base/java.util.HashMap.putVal(HashMap.java:627)
          at java.base/java.util.HashMap.put(HashMap.java:608)
         at org.glassfish.json.JsonObjectBuilderImpl.putValueMap(JsonObjectBuilderImpl.java:209)          at org.glassfish.json.JsonObjectBuilderImpl.add(JsonObjectBuilderImpl.java:81)          at org.glassfish.json.JsonParserImpl.getObject(JsonParserImpl.java:334)          at org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:175)          at org.glassfish.json.JsonParserImpl.getArray(JsonParserImpl.java:321)          at org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:173)          at com.apicatalog.jsonld.document.JsonDocument.doParse(JsonDocument.java:163)          at com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:112)          at com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:90)          at org.apache.jena.riot.lang.LangJSONLD11.read(LangJSONLD11.java:73)
          at org.apache.jena.riot.RDFParser.read(RDFParser.java:416)
          at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:385)
          at org.apache.jena.riot.RDFParser.parse(RDFParser.java:360)
          at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:384)
          at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:331)
          at riotcmd.CmdLangParse.exec$(CmdLangParse.java:229)
          at riotcmd.CmdLangParse.exec(CmdLangParse.java:169)
          at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
          at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
          at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
          at riotcmd.riot.main(riot.java:29)

# VM_ARGS="-Xmx100G"  bin/riot --out=RDF/XML /var/tmp/authorities-gnd_entityfacts.jsonld > authorities-gnd_entityfacts.rdf Exception in thread "main" java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects

# JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"; export JAVA_HOME

# cd /opt/apache-jena-5.0.0

# VM_ARGS="-Xmx100G"  bin/riot --out=RDF/XML /var/tmp/authorities-gnd_entityfacts.jsonld.gz > authorities-gnd_entityfacts.rdf Exception in thread "main" java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects

# VM_ARGS="-Xmx100G"  bin/riot --out=RDF/XML /var/tmp/authorities-gnd_entityfacts.jsonld > authorities-gnd_entityfacts.rdf 09:42:56 INFO  riot            :: File: /var/tmp/authorities-gnd_entityfacts.jsonld
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.base/java.lang.StringUTF16.compress(StringUTF16.java:161)
          at java.base/java.lang.String.<init>(String.java:4501)
          at java.base/java.lang.String.<init>(String.java:300)
         at org.glassfish.json.JsonTokenizer.getValue(JsonTokenizer.java:510)          at org.glassfish.json.JsonParserImpl.getString(JsonParserImpl.java:101)          at org.glassfish.json.JsonParserImpl.getObject(JsonParserImpl.java:332)          at org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:175)          at org.glassfish.json.JsonParserImpl.getArray(JsonParserImpl.java:321)          at org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:173)          at com.apicatalog.jsonld.document.JsonDocument.doParse(JsonDocument.java:163)          at com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:112)          at com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:90)          at org.apache.jena.riot.lang.LangJSONLD11.read(LangJSONLD11.java:73)
          at org.apache.jena.riot.RDFParser.read(RDFParser.java:444)
          at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:413)
          at org.apache.jena.riot.RDFParser.parse(RDFParser.java:375)
          at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:391)
          at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:337)
          at riotcmd.CmdLangParse.exec$(CmdLangParse.java:234)
          at riotcmd.CmdLangParse.exec(CmdLangParse.java:174)
          at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
          at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
          at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
          at riotcmd.riot.main(riot.java:29)

Regards
Sorin

Am 05.07.2024 um 00:35 schrieb Andy Seaborne:


On 03/07/2024 10:22, Sorin Gheorghiu wrote:
Greetings,

here my attempt to convert a large file from json-ld to rdf format, does riot tool support archived files?

Yes.


$ riot --out=RDF/XML filein.jsonld.gz > fileout.rdf

That should work (Jena 5.0.0)

What happened?


Best regards
Sorin

Reply via email to