> > Creating the node table index may be amenable to the same approach as
> > index building, caveat details.
> 
> Or switch to NodeId being hash based. What blocks certain parallel
> processing currently is that NodeIds are allocated sequentially.
> 
> But that has an impact when the loaded data is used (and is a different
> on-disk format).


Do you think dividing the load like this: "node2id files in SSD, everything 
else on SATA" can prevent that huge slowdown? It could be a good compromise 
given that I) the nodes table (node2id?) seems usually pretty small compared to 
a whole dataset (for Wikidata is <32GB IIRC), and II) 32GB SSD are much more 
affordable than 32GB of RAM

Did you also experience such remarkable slow down when loading truthy on your 
SSD? Or did you have enough RAM for the whole table?

Reply via email to