This is for the large amount of temporary space that tdbloader2 uses?

I got "latest-all" to load but I had to do some things with tdbloader2 to work with a compresses data-triples.tmp.gz and also have sort write comprssed temporary files (I messed up a bit and set the gzip compression too high so it slowed things down).

There are some small problems with tdbloader2 with complex --sort-args (it only handles one single arg/value correctly). My main trick was to put in a script for "sort" that had the required settings built-in. I wanted to set --compress, -T and the buffer size.

On 10/12/17 21:18, Dick Murray wrote:
Ryzen 1920X 3.5GHz, 32GB DDR4 quad channel, 3 x M.2 Samsung 960 EVO,
172K/sec 3h45m for truthy.

Is it possible to split the index files into separate folders?

Not built-in.  Symbolic links will work.

I'm keen on symbolic links here because built-in support would hard to keep all cases covered.


Or sym link the files, if I run the data phase, sym link, then run the
index phase?

Symbolic links will work.

"sort" can be configured to use a temporary folder as well.

The only place symbolic links will not work is for data-triples.tmp. It must not exist at all - we ought to change that to make it OK to have a zero-length file in place so it can be redirected ahead of time.

    Andy


Point me in the right direction and I'll extend the TDB file open code.

Dick


On 7 Dec 2017 22:21, "Andy Seaborne" <[email protected]> wrote:



On 07/12/17 19:01, Laura Morales wrote:

Thank you a lot Andy, very informative (special thanks for specifying the
hardware).
For anybody reading this, I'd like to highlight the fact that the data
source is "latest-truthy" and not "latest-all".
 From what I understand, truthy leaves out a lot of data (50% ??) and "all"
is more than 4 billion triples.


4,787,194,669 Triples

Dick reported figures for truthy as well.

I used a *16G* machine, and it is a portable with all it's memory
architecture tradeoffs.

"all" is running ATM - it will be much slower due to RAM needs of
tdbloader2 for the data phase.  Not sure the figures will mean anything for
you.

I'd need a machine with (guess) 32G RAM which is still a small server these
days.

(A similar tree builder technique could be applied to the node index and
reduce the max RAM needs but - hey, ho - that's free software for you.)

     Andy

Reply via email to