On 26/01/16 15:08, Chris Dollin wrote:
Dear All
If a dataset graph G backed by TDB runs .find(ANY, ANY, ANY, ANY),
Is that Graph.find(?,?,?) on graph G or DatasetGraph.find(?,?,?,?).
There isn't a find/4 on Graph. Or is it a default union graph G?
These all make a difference to what non-promise you get.
are
there any promises made about the order in which quads come out of
the iterator?
Promised - no.
(It has even changed between versions in one case.)
Failing a promise, how about a strong likelihood of
some specific order? [1]
Yes. Currently.
DatasetGraph.find(?,?,?,?) uses
a default graph, it uses SPO
then
named graphs, it uses SPOG (caution - this is a special case)
But DatasetGraph.find(G,?,?,?) uses GSPO for fixed G (not SPOG)
The special case is because of default union graph. Normally, GSPO is
the "primary" index.
I ask because we have a (large) dataset for which we wish to apply
an operation (as it happens, text indexing) to each subject+graph
in the graph exactly once. Currently we write code that runs the above
find() call and processes the graph+subject if it has not already
seen it, using a Set<Node> to remember subject Nodes.
This confuses me - where has the graph name gone?
Or are you assuming subjects only in one graph?
If all the quads with the same graph+subject turned up together we could
dispense with this machinery and its overhead.
As you are prepared to make version and TDB specific assumptions you
could access the specific TDB index you are interested in. You will
need to reconstruct Nodes. Make a QuadTable or TripleTable of one index.
That way, you will see GSPO which is the index you want.
GSPO is sorted by G then S then P then O.
Caveat emptor.
Caveat emptor^2 if a live database when write transactions are around
(still possible but harder).
If not, well, we have other approaches in mind (to avoid big sets).
Do a backup, sort the n-quads so (S,G) are adjacent, and read that as
input. This avoids in-memory workspace for (S,G) or (S) depending on
which case we are in.
Even more "caveat emptor" - it all depends.
Andy
Chris
[1] I'm not expecting such a promise but it would be remiss of me
not to check and dismiss it a priori ...