On 12/09/2019 07:23, Laura Morales wrote:
:dataset a ja:RDFDataset ;
    ja:namedGraph [ ja:graphName "http;//example/name" ;
                    ja:graph :graph1 ] ;
    ja:namedGraph ...


a quick test of this one with ?default-graph-uri=urn:x-arq:UnionGraph on Fuseki 
3.12.0 seems to run just fine (actually, it seems to respond even quicker than 
the ja:UnionModel configuration).
Could you please explain in just a few words the difference between using 
?default-graph-uri=urn:x-arq:UnionGraph and ja:UnionModel? I'm having a hard 
time wrapping my head around this.

Fair point. There are different ways to achieve the same functionality.

ja:UnionGraph builds a graph which is implemented with a number of subgraphs. Any kind of graph - it's general purpose.

It is class MultiUnion and used in ontologies (owl:imports) and in inference.

That is then put inside a general purpose dataset which can hold any kind of graph.

A SPARQL query executes on this general structure and everything, even the basic graph pattern evaluations, works in Node objects. Nodes themselves are compound objects and the upshot is that comparisons are being done on strings, via several layers of object reference indirection.

Access to the union graph is general purpose but involves removing duplicates and in this general case that is a bit expensive. its a set to remember what's been seen.

?default-graph-uri=urn:x-arq:UnionGraph or GRAPH <urn:x-arq:UnionGraph> or DatasetGraph.getUnionGraph all go to the actual TDB dataset and query execution of basic graph pattern is done with TDB's NodeIds whcih are cheaper to compare.

String comparison (the general datset with general graphs) is more expensive than comparing longs (TDB NodeIds).

Extra object indirection, more bytes to compare, more data moved from main RAM,less efficient us of L2 cache (very *very* roughly that's 10x faster to access than RAM).

What is more, TDB does access to the union graph more efficiently because it is close to the storage. Union is access one quad index (which one depends on the pattern details). Duplicate removal is cheaper - the way indexing is done across graphs means that duplicate triples are adjacent so no need to remember more than the previous row.

c.f. DISTINCT vs REDUCED.

More: a SPARQL COUNT(*) doesn't even create Nodes from TDB; it counts the iternal datstructure used by TDB.

    Andy





set union mode on the dataset or (3.13.0) the
service for default to query all graphs.


my previous question about this one also stands, is there a new feature being 
introduced in v3.13.0 regarding the union graph?

No - only the way to set the context on a per endpoint basis.



Thank you Andy for all the help.

Reply via email to