Re: "not in a transaction", but only sometimes

Andy Seaborne Thu, 12 Sep 2019 06:14:10 -0700



On 12/09/2019 07:23, Laura Morales wrote:

:dataset a ja:RDFDataset ;
    ja:namedGraph [ ja:graphName "http;//example/name" ;
                    ja:graph :graph1 ] ;
    ja:namedGraph ...



a quick test of this one with ?default-graph-uri=urn:x-arq:UnionGraph on Fuseki 
3.12.0 seems to run just fine (actually, it seems to respond even quicker than 
the ja:UnionModel configuration).
Could you please explain in just a few words the difference between using 
?default-graph-uri=urn:x-arq:UnionGraph and ja:UnionModel? I'm having a hard 
time wrapping my head around this.


Fair point. There are different ways to achieve the same functionality.

ja:UnionGraph builds a graph which is implemented with a number ofsubgraphs. Any kind of graph - it's general purpose.

It is class MultiUnion and used in ontologies (owl:imports) and ininference.

That is then put inside a general purpose dataset which can hold anykind of graph.

A SPARQL query executes on this general structure and everything, eventhe basic graph pattern evaluations, works in Node objects. Nodesthemselves are compound objects and the upshot is that comparisons arebeing done on strings, via several layers of object reference indirection.

Access to the union graph is general purpose but involves removingduplicates and in this general case that is a bit expensive. its a setto remember what's been seen.

?default-graph-uri=urn:x-arq:UnionGraph or GRAPH <urn:x-arq:UnionGraph>or DatasetGraph.getUnionGraph all go to the actual TDB dataset and queryexecution of basic graph pattern is done with TDB's NodeIds whcih arecheaper to compare.

String comparison (the general datset with general graphs) is moreexpensive than comparing longs (TDB NodeIds).

Extra object indirection, more bytes to compare, more data moved frommain RAM,less efficient us of L2 cache (very *very* roughly that's 10xfaster to access than RAM).

What is more, TDB does access to the union graph more efficientlybecause it is close to the storage. Union is access one quad index(which one depends on the pattern details). Duplicate removal is cheaper- the way indexing is done across graphs means that duplicate triplesare adjacent so no need to remember more than the previous row.


c.f. DISTINCT vs REDUCED.

More: a SPARQL COUNT(*) doesn't even create Nodes from TDB; it countsthe iternal datstructure used by TDB.


    Andy

set union mode on the dataset or (3.13.0) the
service for default to query all graphs.



my previous question about this one also stands, is there a new feature being 
introduced in v3.13.0 regarding the union graph?


No - only the way to set the context on a per endpoint basis.


Thank you Andy for all the help.

Re: "not in a transaction", but only sometimes

Reply via email to