Thanks Johannes for starting this thread. I am facing the exact same problem with tdb2. For any significantly large file for that matter, it takes forever to load. I hope this problem has a solution. Thank you. -Ahmed
On Mon, Jun 8, 2020 at 11:55 AM Hoffart, Johannes <[email protected]> wrote: > Hi, > > I want to load the full Wikidata dump, available at > https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to > use in Jena. > > I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G. > Initially, the progress (measured by dataset size) is quick. It slows down > very much after a couple of 100GB written, and finally, at around 500GB, > the progress is almost halted. > > Did anyone ingest Wikidata into Jena before? What are the system > requirements? Is there a specific tdb2.tdbloader configuration that would > speed things up? For example building an index after data ingest? > > Thanks > Johannes > > Johannes Hoffart, Executive Director, Technology Division > Goldman Sachs Bank Europe SE | Marienturm | Taunusanlage 9-10 | D-60329 > Frankfurt am Main > Email: [email protected]<mailto:[email protected]> | Tel: +49 > (0)69 7532 3558 > Vorstand: Dr. Wolfgang Fink (Vorsitzender) | Thomas Degn-Petersen | Dr. > Matthias Bock > Vorsitzender des Aufsichtsrats: Dermot McDonogh > Sitz: Frankfurt am Main | Amtsgericht Frankfurt am Main HRB 114190 > > > ________________________________ > > Your Personal Data: We may collect and process information about you that > may be subject to data protection laws. For more information about how we > use and disclose your personal data, how we protect your information, our > legal basis to use your information, your rights and who you can contact, > please refer to: www.gs.com/privacy-notices< > http://www.gs.com/privacy-notices> >
