This is for the large amount of temporary space that tdbloader2 uses?
I got "latest-all" to load but I had to do some things with tdbloader2
to work with a compresses data-triples.tmp.gz and also have sort write
comprssed temporary files (I messed up a bit and set the gzip
compression too high so it slowed things down).
There are some small problems with tdbloader2 with complex --sort-args
(it only handles one single arg/value correctly). My main trick was to
put in a script for "sort" that had the required settings built-in. I
wanted to set --compress, -T and the buffer size.
On 10/12/17 21:18, Dick Murray wrote:
Ryzen 1920X 3.5GHz, 32GB DDR4 quad channel, 3 x M.2 Samsung 960 EVO,
172K/sec 3h45m for truthy.
Is it possible to split the index files into separate folders?
Not built-in. Symbolic links will work.
I'm keen on symbolic links here because built-in support would hard to
keep all cases covered.
Or sym link the files, if I run the data phase, sym link, then run the
index phase?
Symbolic links will work.
"sort" can be configured to use a temporary folder as well.
The only place symbolic links will not work is for data-triples.tmp. It
must not exist at all - we ought to change that to make it OK to have a
zero-length file in place so it can be redirected ahead of time.
Andy
Point me in the right direction and I'll extend the TDB file open code.
Dick
On 7 Dec 2017 22:21, "Andy Seaborne" <[email protected]> wrote:
On 07/12/17 19:01, Laura Morales wrote:
Thank you a lot Andy, very informative (special thanks for specifying the
hardware).
For anybody reading this, I'd like to highlight the fact that the data
source is "latest-truthy" and not "latest-all".
From what I understand, truthy leaves out a lot of data (50% ??) and "all"
is more than 4 billion triples.
4,787,194,669 Triples
Dick reported figures for truthy as well.
I used a *16G* machine, and it is a portable with all it's memory
architecture tradeoffs.
"all" is running ATM - it will be much slower due to RAM needs of
tdbloader2 for the data phase. Not sure the figures will mean anything for
you.
I'd need a machine with (guess) 32G RAM which is still a small server these
days.
(A similar tree builder technique could be applied to the node index and
reduce the max RAM needs but - hey, ho - that's free software for you.)
Andy