What about SSD disks, don't they make a difference?

On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann <[email protected]> wrote:
>
> that did the trick Andy, very good might be a good idea to add this to the
> distribution in jena-log4j.properties
>
> I am getting these numbers for a midsize dedicated server, very nice
> numbers indeed Andy. well done!
>
> 00:24:53 INFO  loader               :: Loader = LoaderPhased
> 00:24:53 INFO  loader               :: Start:
> ../../public_html/lotico.ttl.gz
> 00:24:55 INFO  loader               :: Add: 500,000 lotico.ttl.gz (Batch:
> 237,755 / Avg: 237,755)
> 00:24:56 INFO  loader               :: Add: 1,000,000 lotico.ttl.gz (Batch:
> 305,250 / Avg: 267,308)
> 00:24:58 INFO  loader               :: Add: 1,500,000 lotico.ttl.gz (Batch:
> 313,087 / Avg: 281,004)
> 00:25:00 INFO  loader               :: Add: 2,000,000 lotico.ttl.gz (Batch:
> 328,299 / Avg: 291,502)
> 00:25:01 INFO  loader               :: Add: 2,500,000 lotico.ttl.gz (Batch:
> 341,763 / Avg: 300,336)
> 00:25:03 INFO  loader               :: Add: 3,000,000 lotico.ttl.gz (Batch:
> 337,381 / Avg: 305,935)
> 00:25:04 INFO  loader               :: Add: 3,500,000 lotico.ttl.gz (Batch:
> 318,877 / Avg: 307,719)
> 00:25:06 INFO  loader               :: Add: 4,000,000 lotico.ttl.gz (Batch:
> 295,857 / Avg: 306,184)
> 00:25:07 INFO  loader               :: Add: 4,500,000 lotico.ttl.gz (Batch:
> 327,225 / Avg: 308,388)
> 00:25:09 INFO  loader               :: Add: 5,000,000 lotico.ttl.gz (Batch:
> 349,406 / Avg: 312,051)
> 00:25:09 INFO  loader               ::   Elapsed: 16.02 seconds [2019/06/15
> 00:25:09 CEST]
> 00:25:11 INFO  loader               :: Add: 5,500,000 lotico.ttl.gz (Batch:
> 285,062 / Avg: 309,388)
> 00:25:13 INFO  loader               :: Add: 6,000,000 lotico.ttl.gz (Batch:
> 203,665 / Avg: 296,559)
> 00:25:16 INFO  loader               :: Add: 6,500,000 lotico.ttl.gz (Batch:
> 189,393 / Avg: 284,190)
>
> on another machine that sits in the Azure infrastructure somewhere it
> tdbloader doesn't look as good, even with decent hardware it seems to die a
> slow death of memory exhaustion at 16GB. started off with 70kT/s and is now
> down to 17kT/s and still going.
>
> lesson learned big iron and big memory is the way to go with Jena
> tdbloaders.
>
>
>
>
> On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne <[email protected]> wrote:
>
> > These messages are logged (to logger "org.apache.jena.tdb2.loader") - do
> > you have log4j.proprties in the current working directory?
> >
> > Do you get any output?
> >
> > INFO  Loader = LoaderParallel
> > INFO  Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz
> > INFO  Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770)
> > INFO  Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604)
> > INFO  Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920)
> > INFO  Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189)
> > INFO  Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508)
> > INFO  Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173)
> > INFO  Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804)
> > INFO  Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676)
> > INFO  Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081)
> > INFO  Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076)
> > INFO    Elapsed: 28.56 seconds [2019/06/14 22:51:37 BST]
> > INFO  Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples
> > in 28.63s (Avg: 174,644)
> > INFO  Finish - index SPO
> > INFO  Finish - index POS
> > INFO  Finish - index OSP
> > INFO  Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s
> >
> >
> > There is pause after the first "Finished:" - this is finished data in,
> > the index threads are still running and the pause comes from flush to disk.
> >
> >      Andy
> >
> > On 14/06/2019 20:16, Marco Neumann wrote:
> > > let me fire up one of the big machines to see what I will get there.
> > > currently I have no info display during load with tdb2.tdbloader . if -v
> > is
> > > specified I get some extra info but no load info.
> > >
> > > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <[email protected]> wrote:
> > >
> > >>
> > >>
> > >> On 14/06/2019 18:13, Marco Neumann wrote:
> > >>> I am collecting jena loader benchmarks. if you have results please post
> > >>> them directly.
> > >>>
> > >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> > >>
> > >> tdb2.tdbloader has variations controlled by --loader.
> > >>
> > >> --loader=
> > >> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> > >> 'light'
> > >>
> > >> "basic" is a super naive parser-add triple loop - it used if a loader
> > >> can't cope with an already loaded database.
> > >>
> > >> "phased" is a balanced, does not saturate the machine loader. Some
> > >> parallelism.
> > >>
> > >> "sequential" is the tdbloader algorithm for TDB2, more for reference.
> > >>
> > >> "parallel" is as much parallelism as it wants. (5 for triples, more for
> > >> quads)
> > >>
> > >> "light" is two threaded. Slightly ligther than "phased".
> > >>
> > >> See LoaderPlans.
> > >>
> > >>> On a linux machine I am using "time" to collect data.
> > >>>
> > >>> Is there a flag on tdb2.tdbloader to report time and triples per
> > second?
> > >>>
> > >>> I have noticed that storage space use for tdbloader2 is significantly
> > >>> smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> > >>> straight forward explanation here?
> > >>>
> > >>
> > >
> > >
> >
>
>
> --
>
>
> ---
> Marco Neumann
> KONA

Reply via email to