Re: tdb2.tdbloader performance

Laura Morales Tue, 28 Nov 2017 10:35:10 -0800

> I've achieved concurrent 120K on the server hardware but it depends on the
input.


Good to see that it can go faster. I do understand that this metric is 
dependent on input, but it still looks rather slow considering that datasets 
keep growing. At this (constant) rate, Wikidata would still take at least 12-13 
hours.

> What the server hardware does do is allow me to run multiple processes and 
> average 60K.

tdb2.tdbloader is single threaded though, I don't know how multiple cores are 
going to help.

> We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1.

One graph per TDB store?

> On the minefield subject of hardware, do you have DDR3 or DDR4?

DDR3 1600MHz

> What
> chipset is driving it because Haswell’s dual-channel memory controller is
> going to have a hard time keeping up with the quad-channel memory
> controllers on Ivy Bridge-E and Haswell-E

Haswell, dual-channel I think.

> What files are you trying to import and i'll run them through?

The 1.1GB that I mentioned contains data that I can't make public on the 
Internet, but you can try with the Wikidata dump 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
You probably don't have to convert all of it. Just starting the conversion you 
should already see how many triples it's handling. I ran this comman 
`./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`.
If it goes any faster than 70K AVG triples/second, I'd be interested to know 
what hardware components you've got.

Re: tdb2.tdbloader performance

Reply via email to