absolutely it does, preferably NVMe SSD. tdbloaders are almost a showcase themselves for good up-to-date hardware..
if possible I'd like to load the wikidata dataset* at at some point to see where 57GB fits in terms of tdb. The wikidata team is currently looking at new solutions that can go beyond blazegraph. And I get the impression that they have not yet actively considered to give jena tdb try. https://dumps.wikimedia.org/wikidatawiki/entities/ On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius <[email protected]> wrote: > What about SSD disks, don't they make a difference? > > On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann <[email protected]> > wrote: > > > > that did the trick Andy, very good might be a good idea to add this to > the > > distribution in jena-log4j.properties > > > > I am getting these numbers for a midsize dedicated server, very nice > > numbers indeed Andy. well done! > > > > 00:24:53 INFO loader :: Loader = LoaderPhased > > 00:24:53 INFO loader :: Start: > > ../../public_html/lotico.ttl.gz > > 00:24:55 INFO loader :: Add: 500,000 lotico.ttl.gz (Batch: > > 237,755 / Avg: 237,755) > > 00:24:56 INFO loader :: Add: 1,000,000 lotico.ttl.gz > (Batch: > > 305,250 / Avg: 267,308) > > 00:24:58 INFO loader :: Add: 1,500,000 lotico.ttl.gz > (Batch: > > 313,087 / Avg: 281,004) > > 00:25:00 INFO loader :: Add: 2,000,000 lotico.ttl.gz > (Batch: > > 328,299 / Avg: 291,502) > > 00:25:01 INFO loader :: Add: 2,500,000 lotico.ttl.gz > (Batch: > > 341,763 / Avg: 300,336) > > 00:25:03 INFO loader :: Add: 3,000,000 lotico.ttl.gz > (Batch: > > 337,381 / Avg: 305,935) > > 00:25:04 INFO loader :: Add: 3,500,000 lotico.ttl.gz > (Batch: > > 318,877 / Avg: 307,719) > > 00:25:06 INFO loader :: Add: 4,000,000 lotico.ttl.gz > (Batch: > > 295,857 / Avg: 306,184) > > 00:25:07 INFO loader :: Add: 4,500,000 lotico.ttl.gz > (Batch: > > 327,225 / Avg: 308,388) > > 00:25:09 INFO loader :: Add: 5,000,000 lotico.ttl.gz > (Batch: > > 349,406 / Avg: 312,051) > > 00:25:09 INFO loader :: Elapsed: 16.02 seconds > [2019/06/15 > > 00:25:09 CEST] > > 00:25:11 INFO loader :: Add: 5,500,000 lotico.ttl.gz > (Batch: > > 285,062 / Avg: 309,388) > > 00:25:13 INFO loader :: Add: 6,000,000 lotico.ttl.gz > (Batch: > > 203,665 / Avg: 296,559) > > 00:25:16 INFO loader :: Add: 6,500,000 lotico.ttl.gz > (Batch: > > 189,393 / Avg: 284,190) > > > > on another machine that sits in the Azure infrastructure somewhere it > > tdbloader doesn't look as good, even with decent hardware it seems to > die a > > slow death of memory exhaustion at 16GB. started off with 70kT/s and is > now > > down to 17kT/s and still going. > > > > lesson learned big iron and big memory is the way to go with Jena > > tdbloaders. > > > > > > > > > > On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne <[email protected]> wrote: > > > > > These messages are logged (to logger "org.apache.jena.tdb2.loader") - > do > > > you have log4j.proprties in the current working directory? > > > > > > Do you get any output? > > > > > > INFO Loader = LoaderParallel > > > INFO Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz > > > INFO Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770) > > > INFO Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604) > > > INFO Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920) > > > INFO Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189) > > > INFO Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508) > > > INFO Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173) > > > INFO Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804) > > > INFO Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676) > > > INFO Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081) > > > INFO Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076) > > > INFO Elapsed: 28.56 seconds [2019/06/14 22:51:37 BST] > > > INFO Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples > > > in 28.63s (Avg: 174,644) > > > INFO Finish - index SPO > > > INFO Finish - index POS > > > INFO Finish - index OSP > > > INFO Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s > > > > > > > > > There is pause after the first "Finished:" - this is finished data in, > > > the index threads are still running and the pause comes from flush to > disk. > > > > > > Andy > > > > > > On 14/06/2019 20:16, Marco Neumann wrote: > > > > let me fire up one of the big machines to see what I will get there. > > > > currently I have no info display during load with tdb2.tdbloader . > if -v > > > is > > > > specified I get some extra info but no load info. > > > > > > > > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <[email protected]> > wrote: > > > > > > > >> > > > >> > > > >> On 14/06/2019 18:13, Marco Neumann wrote: > > > >>> I am collecting jena loader benchmarks. if you have results please > post > > > >>> them directly. > > > >>> > > > >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks > > > >> > > > >> tdb2.tdbloader has variations controlled by --loader. > > > >> > > > >> --loader= > > > >> Loader to use: 'basic', 'phased' (default), 'sequential', > 'parallel' or > > > >> 'light' > > > >> > > > >> "basic" is a super naive parser-add triple loop - it used if a > loader > > > >> can't cope with an already loaded database. > > > >> > > > >> "phased" is a balanced, does not saturate the machine loader. Some > > > >> parallelism. > > > >> > > > >> "sequential" is the tdbloader algorithm for TDB2, more for > reference. > > > >> > > > >> "parallel" is as much parallelism as it wants. (5 for triples, more > for > > > >> quads) > > > >> > > > >> "light" is two threaded. Slightly ligther than "phased". > > > >> > > > >> See LoaderPlans. > > > >> > > > >>> On a linux machine I am using "time" to collect data. > > > >>> > > > >>> Is there a flag on tdb2.tdbloader to report time and triples per > > > second? > > > >>> > > > >>> I have noticed that storage space use for tdbloader2 is > significantly > > > >>> smaller on disk compared to tdbloader and tdb2.tdbloader. Is there > a > > > >>> straight forward explanation here? > > > >>> > > > >> > > > > > > > > > > > > > > > > > -- > > > > > > --- > > Marco Neumann > > KONA > -- --- Marco Neumann KONA
