I've had loads take over 24 hours and produce 350GB TDB1 instances... You can run multiple loaders into separate instances and on sufficient kit they don't slow down. As a back ground I convert CAD files to triples or quads, typically 100M but some can be 500M. That's triples output not file input size. Ok with the data, I have that somewhere and will run it through, hopefully tonight if paid work doesn't get in the way ;-)
Dick -------- Original message --------From: Laura Morales <[email protected]> Date: 28/11/2017 18:34 (GMT+00:00) To: [email protected] Cc: [email protected] Subject: Re: tdb2.tdbloader performance > I've achieved concurrent 120K on the server hardware but it depends on the input. Good to see that it can go faster. I do understand that this metric is dependent on input, but it still looks rather slow considering that datasets keep growing. At this (constant) rate, Wikidata would still take at least 12-13 hours. > What the server hardware does do is allow me to run multiple processes and > average 60K. tdb2.tdbloader is single threaded though, I don't know how multiple cores are going to help. > We tend towards running multiple TDB's and present them as one, a legacy of overcoming the one writer in TDB1. One graph per TDB store? > On the minefield subject of hardware, do you have DDR3 or DDR4? DDR3 1600MHz > What > chipset is driving it because Haswell’s dual-channel memory controller is > going to have a hard time keeping up with the quad-channel memory > controllers on Ivy Bridge-E and Haswell-E Haswell, dual-channel I think. > What files are you trying to import and i'll run them through? The 1.1GB that I mentioned contains data that I can't make public on the Internet, but you can try with the Wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz You probably don't have to convert all of it. Just starting the conversion you should already see how many triples it's handling. I ran this comman `./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`. If it goes any faster than 70K AVG triples/second, I'd be interested to know what hardware components you've got.
