On 07/12/17 19:01, Laura Morales wrote:
Thank you a lot Andy, very informative (special thanks for specifying the
hardware).
For anybody reading this, I'd like to highlight the fact that the data source is
"latest-truthy" and not "latest-all".
From what I understand, truthy leaves out a lot of data (50% ??) and "all" is more than 4 billion triples.
4,787,194,669 Triples
Dick reported figures for truthy as well.
I used a *16G* machine, and it is a portable with all it's memory
architecture tradeoffs.
"all" is running ATM - it will be much slower due to RAM needs of
tdbloader2 for the data phase. Not sure the figures will mean anything
for you.
I'd need a machine with (guess) 32G RAM which is still a small server
these days.
(A similar tree builder technique could be applied to the node index and
reduce the max RAM needs but - hey, ho - that's free software for you.)
Andy