You're dealing with two formats that both require context to be parsed. In other words, they build up information in the heap as they are parsed. Ideally, you could switch to stream-able formats like NTriples, but if that is not a choice you can make, you could try a couple of things. You could go Turtle -> NTriples then NTriples -> JSON-LD. This might work a bit better because you don't have to build up the state in heap for _both_ contextual formats at the same time. I don't know what kind of use to which you intend to put the JSON-LD, but if you can use multiple files in that use, you might try splitting the file and processing it in pieces.
Can you tell us a bit more about your use case? There might be another approach someone can recommend. ajs6f > On Jul 19, 2019, at 8:18 AM, Ankit Dangi <[email protected]> wrote: > > Hi, > > I am using Apache Jena 3.12.0 with OpenJDK version 1.8.0_212 on a 64-Bit > Ubuntu 18.04.2 LTS (bionic) server with no changes to any default > configurations. > > I have a 3.2G sized-Turtle (.ttl) RDF file that has ~25M triples that I'd > like to convert to a JSON-LD representation. I first looked at jena.rdfcat > which suggested I should be using 'riot' instead. I then tried > riotcmd.turtle with 2 different GCs with up to 40G max-heap size but in > about 12 mins it ran into a "java.lang.OutOfMemoryError: Java heap space" > (stack trace at the end). > > $ cd apache-jena-3.12.0/bin > > > > FAILED-1: $ riotcmd.turtle --time --verbose --syntax=TURTLE >> --output=JSON-LD large_file.ttl -Xmx40G -XX:+OptimizeStringConcat >> -XX:+UseG1GC -XX:+UseStringDeduplication >> -XX:+PrintStringDeduplicationStatistics >> -Dlog4j.configuration=file:~/apache-jena-3.12.0/jena-log4j.properties > > > > FAILED-2: $ riotcmd.turtle --time --verbose --syntax=TURTLE >> --output=JSON-LD large_file.ttl -Xmx40G -XX:+OptimizeStringConcat >> -XX:+UseConcMarkSweepGC >> -Dlog4j.configuration=file:~/apache-jena-3.12.0/jena-log4j.properties > > > Question: I believe I may be missing some parameters or configurations that > I could fine-tune. Any suggestions on what could I try? If not, are there > any alternate mechanisms by which I could convert the large TTL to a > JSON-LD? > > Stack trace follows below: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> at java.util.LinkedHashMap.newNode(LinkedHashMap.java:256) >> at java.util.HashMap.putVal(HashMap.java:631) >> at java.util.HashMap.put(HashMap.java:612) >> at com.github.jsonldjava.core.RDFDataset$IRI.<init>(RDFDataset.java:317) >> at com.github.jsonldjava.core.RDFDataset$Quad.<init>(RDFDataset.java:52) >> at com.github.jsonldjava.core.RDFDataset.addQuad(RDFDataset.java:540) >> at org.apache.jena.riot.writer.JenaRDF2JSONLD.parse(JenaRDF2JSONLD.java:85) >> at >> org.apache.jena.riot.writer.JsonLDWriter.toJsonLDJavaAPI(JsonLDWriter.java:205) >> at >> org.apache.jena.riot.writer.JsonLDWriter.serialize(JsonLDWriter.java:178) >> at org.apache.jena.riot.writer.JsonLDWriter.write(JsonLDWriter.java:139) >> at org.apache.jena.riot.writer.JsonLDWriter.write(JsonLDWriter.java:145) >> at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:207) >> at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:165) >> at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:112) >> at org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:178) >> at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1277) >> at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1162) >> at riotcmd.CmdLangParse$1.postParse(CmdLangParse.java:334) >> at riotcmd.CmdLangParse.exec$(CmdLangParse.java:170) >> at riotcmd.CmdLangParse.exec(CmdLangParse.java:128) >> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93) >> at jena.cmd.CmdMain.mainRun(CmdMain.java:58) >> at jena.cmd.CmdMain.mainRun(CmdMain.java:45) >> at riotcmd.turtle.main(turtle.java:30) > > > -- > Ankit Dangi
