Hello Andy,

On Tue, Mar 31, 2015 at 10:25:32AM +0100, Andy Seaborne wrote:
> >Also, tdbloader2 seems to be gradually slowed down from 100k triples/s to
> >< 1000 triples/s on a normal disk drive by random access after ca. 10 million
> >triples. Is this unavoidable? I made this change to tdbloader2 but I think it
> >is not relevant during the data phase:
> >
> >-    SORT_ARGS="--buffer-size=50%"
> >+    SORT_ARGS="--buffer-size=2048M"
> >
> >I have tried with Jena 2.13.0 and 2.11.1.
> 
> What's the machine it's running on?  OS?

Xeon E5502 with 48GB RAM, Linux 3.4.105 with glibc 2.19 and jdk-8u31-linux-x64.

> As this is the data phase, tdbloader2 is, roughly, streaming the parser to
> disk, allocating nodeids (which is a bad access pattern).  What size are the
> node-related files?

I have it running right now at 

"INFO  Add: 138,800,000 Data (Batch: 15,792 / Avg: 9,656)"

-rw-r--r-- 1 java java 7070640000 Mar 31 13:10 data-triples.17513
-rw-r--r-- 1 java java 2021654528 Mar 31 13:10 node2id.dat
-rw-r--r-- 1 java java   16777216 Mar 31 13:10 node2id.idn
-rw-r--r-- 1 java java 3858513162 Mar 31 13:10 nodes.dat

> Does tdbloader do better? (sometimes it does, sometimes it doesn't).

I will try if I fail with tdbloader2 but I guess it will work now because
I switched to a SSD for the tdb dir.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail [email protected]
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Attachment: pgp29Kvcoco0J.pgp
Description: PGP signature

Reply via email to