Hello Andy, On Tue, Mar 31, 2015 at 10:25:32AM +0100, Andy Seaborne wrote: > >Also, tdbloader2 seems to be gradually slowed down from 100k triples/s to > >< 1000 triples/s on a normal disk drive by random access after ca. 10 million > >triples. Is this unavoidable? I made this change to tdbloader2 but I think it > >is not relevant during the data phase: > > > >- SORT_ARGS="--buffer-size=50%" > >+ SORT_ARGS="--buffer-size=2048M" > > > >I have tried with Jena 2.13.0 and 2.11.1. > > What's the machine it's running on? OS?
Xeon E5502 with 48GB RAM, Linux 3.4.105 with glibc 2.19 and jdk-8u31-linux-x64. > As this is the data phase, tdbloader2 is, roughly, streaming the parser to > disk, allocating nodeids (which is a bad access pattern). What size are the > node-related files? I have it running right now at "INFO Add: 138,800,000 Data (Batch: 15,792 / Avg: 9,656)" -rw-r--r-- 1 java java 7070640000 Mar 31 13:10 data-triples.17513 -rw-r--r-- 1 java java 2021654528 Mar 31 13:10 node2id.dat -rw-r--r-- 1 java java 16777216 Mar 31 13:10 node2id.idn -rw-r--r-- 1 java java 3858513162 Mar 31 13:10 nodes.dat > Does tdbloader do better? (sometimes it does, sometimes it doesn't). I will try if I fail with tdbloader2 but I guess it will work now because I switched to a SSD for the tdb dir. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail [email protected] ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
pgp29Kvcoco0J.pgp
Description: PGP signature
