> > Creating the node table index may be amenable to the same approach as > > index building, caveat details. > > Or switch to NodeId being hash based. What blocks certain parallel > processing currently is that NodeIds are allocated sequentially. > > But that has an impact when the loaded data is used (and is a different > on-disk format).
Do you think dividing the load like this: "node2id files in SSD, everything else on SATA" can prevent that huge slowdown? It could be a good compromise given that I) the nodes table (node2id?) seems usually pretty small compared to a whole dataset (for Wikidata is <32GB IIRC), and II) 32GB SSD are much more affordable than 32GB of RAM Did you also experience such remarkable slow down when loading truthy on your SSD? Or did you have enough RAM for the whole table?
