On 12/09/2019 07:23, Laura Morales wrote:
:dataset a ja:RDFDataset ;
ja:namedGraph [ ja:graphName "http;//example/name" ;
ja:graph :graph1 ] ;
ja:namedGraph ...
a quick test of this one with ?default-graph-uri=urn:x-arq:UnionGraph on Fuseki
3.12.0 seems to run just fine (actually, it seems to respond even quicker than
the ja:UnionModel configuration).
Could you please explain in just a few words the difference between using
?default-graph-uri=urn:x-arq:UnionGraph and ja:UnionModel? I'm having a hard
time wrapping my head around this.
Fair point. There are different ways to achieve the same functionality.
ja:UnionGraph builds a graph which is implemented with a number of
subgraphs. Any kind of graph - it's general purpose.
It is class MultiUnion and used in ontologies (owl:imports) and in
inference.
That is then put inside a general purpose dataset which can hold any
kind of graph.
A SPARQL query executes on this general structure and everything, even
the basic graph pattern evaluations, works in Node objects. Nodes
themselves are compound objects and the upshot is that comparisons are
being done on strings, via several layers of object reference indirection.
Access to the union graph is general purpose but involves removing
duplicates and in this general case that is a bit expensive. its a set
to remember what's been seen.
?default-graph-uri=urn:x-arq:UnionGraph or GRAPH <urn:x-arq:UnionGraph>
or DatasetGraph.getUnionGraph all go to the actual TDB dataset and query
execution of basic graph pattern is done with TDB's NodeIds whcih are
cheaper to compare.
String comparison (the general datset with general graphs) is more
expensive than comparing longs (TDB NodeIds).
Extra object indirection, more bytes to compare, more data moved from
main RAM,less efficient us of L2 cache (very *very* roughly that's 10x
faster to access than RAM).
What is more, TDB does access to the union graph more efficiently
because it is close to the storage. Union is access one quad index
(which one depends on the pattern details). Duplicate removal is cheaper
- the way indexing is done across graphs means that duplicate triples
are adjacent so no need to remember more than the previous row.
c.f. DISTINCT vs REDUCED.
More: a SPARQL COUNT(*) doesn't even create Nodes from TDB; it counts
the iternal datstructure used by TDB.
Andy
set union mode on the dataset or (3.13.0) the
service for default to query all graphs.
my previous question about this one also stands, is there a new feature being
introduced in v3.13.0 regarding the union graph?
No - only the way to set the context on a per endpoint basis.
Thank you Andy for all the help.