On 16/11/17 11:30, Osma Suominen wrote:


Rob Vesse kirjoitti 16.11.2017 klo 13:13:

This is by design. As has been discussed in the past tdbloader2 produces maximally packed B+Trees by preprocessing data which will minimise disk space usage.
[...]
  As Andy mentioned on an earlier thread tdb2.tdbloader essentially has the same behaviour as tdbloader, because of the different data structures low performance should be much better anyway and he did not think there would be much benefit to having a tdb2.tdbloader2 variant. Also given the different Data structures I’m not sure if this would be as practical.

Right. I was just surprised at how big the difference is.

tdbloader2 is both fast and space-efficient, that makes it a lot more appealing than tdb2.tdbloader which in my (very limited) experience is slow and space-hungry (but similar to tdbloader for TDB1).

But the real surprise was the space overhead of named graphs. More than twice the space just because I decide to put the data in a named graph instead of the default graph? And that seems to be the case both for TDB1 (both tdbloader and tdbloader2) and TDB2.

I assume you are seeing the difference between a triple store and quad store configuration.

If a TDB image only has a default graph, no named graphs at all, then it acts a triple store and only needs the three SPO, POS, OSP indexes. In that configuration it doesn't generate the graph indexes at all.

As soon as you have one named graph (even if small) then it acts as a quad store and needs all 9 indexes (GOSP etc). The extra indexes take more space, even if the underlying quad count is the same as the triple count in default-only case.

Dave

Reply via email to