These messages are logged (to logger "org.apache.jena.tdb2.loader") - do
you have log4j.proprties in the current working directory?
Do you get any output?
INFO Loader = LoaderParallel
INFO Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz
INFO Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770)
INFO Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604)
INFO Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920)
INFO Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189)
INFO Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508)
INFO Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173)
INFO Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804)
INFO Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676)
INFO Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081)
INFO Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076)
INFO Elapsed: 28.56 seconds [2019/06/14 22:51:37 BST]
INFO Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples
in 28.63s (Avg: 174,644)
INFO Finish - index SPO
INFO Finish - index POS
INFO Finish - index OSP
INFO Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s
There is pause after the first "Finished:" - this is finished data in,
the index threads are still running and the pause comes from flush to disk.
Andy
On 14/06/2019 20:16, Marco Neumann wrote:
let me fire up one of the big machines to see what I will get there.
currently I have no info display during load with tdb2.tdbloader . if -v is
specified I get some extra info but no load info.
On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <[email protected]> wrote:
On 14/06/2019 18:13, Marco Neumann wrote:
I am collecting jena loader benchmarks. if you have results please post
them directly.
http://www.lotico.com/index.php/JENA_Loader_Benchmarks
tdb2.tdbloader has variations controlled by --loader.
--loader=
Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
'light'
"basic" is a super naive parser-add triple loop - it used if a loader
can't cope with an already loaded database.
"phased" is a balanced, does not saturate the machine loader. Some
parallelism.
"sequential" is the tdbloader algorithm for TDB2, more for reference.
"parallel" is as much parallelism as it wants. (5 for triples, more for
quads)
"light" is two threaded. Slightly ligther than "phased".
See LoaderPlans.
On a linux machine I am using "time" to collect data.
Is there a flag on tdb2.tdbloader to report time and triples per second?
I have noticed that storage space use for tdbloader2 is significantly
smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
straight forward explanation here?