SSD. First phase was 50-90k triples per second until 3B triples where it started going down from 50k to 20k per second (took 3 days). SPO => SPO->POS, SPO->OSP phase was 25-50k per second until 1B where it went from 25k to 4k triples per second, currently at 3.7B triples.
On Sun, 12 Sept 2021 at 04:59, Laura Morales <[email protected]> wrote: > Just a personal curiosity... are you building it on a SSD or HDD? What is > your "triples loaded per second" rate? > > > > Sent: Sunday, September 12, 2021 at 2:39 AM > > From: "Cristóbal Miranda" <[email protected]> > > To: [email protected] > > Subject: Faster TDB2 build? > > > > Hi, > > > > I'm running tdb2.tdbloader on Wikidata, but it's > > taking too long, now it's on day 11 and still indexing, > > whereas tdbloader2 (for TDB) didn't take as much for me. > > I was wondering if something could be done to allow > > more space on RAM for the build phase in order to be faster, > > for example passing a memory budget parameter to the > > loader. Not sure exactly how the extra RAM space would be > > used, but I was thinking that maybe if more b+tree blocks > > were kept in RAM this processing would be faster, for > > example keeping 2 upper levels of the tree in primary memory, > > or even everything in there if the given budget allowed it. > > > > What would it take to implement such a feature? maybe in a > > tdb2.tdbloader2? I was looking at the code for a way to do something > > but couldn't find an easy modification to achieve this. > > >
