Re: Jena ARQ Graph Pattern Matching Algorithm Question

Andy Seaborne Sat, 05 Jul 2014 13:24:13 -0700

On 03/07/14 08:44, Paulo Picota wrote:

Thanks for your answer Andy,


It really helped me to understand a little more how the BGP  works.

I have been readding a little more online and I found this:
http://answers.semanticweb.com/questions/2481/sparql-basic-graph-pattern-matching

Following that forum answer,  i have looked more into different Join
algorithm, mainly in wikipedia and some papers and one or two articles...

My initial interest was to learn about  the asymptotic notation of the
algorithm in order to have a reference in terms of nodes and links (within
the graph). Now i see that the joins algorithms work in terms of
relationships.

My understanding so far is that the Index-Join  algorithm that you guys
have implemented works on a Memory  constant value (pretty cool i got to
say)

So is a merge join (if the indexes were PSO and PSO, then any triplepattern with constant predicates can be done that way).


> and the algorith is close to O(n^2). The last part is a conclusion i

have come out with after reading about other algorithms (Merge-sort, Hash
join, nested loop) and also from the example (very basic one) found here:

http://oak.cs.ucla.edu/cs143/notes/join.pdf

Friend Andy, could you kindly tell me if Im on the right track? Is it safe
to say that the index-join algorithm used in ARQ to solve BGP  could  be
O(n^2) where is n the amount of relationships ? maybe im shooting a little
over the moon with this, but im very interested in knowing the asymptotic
notation.


Worst case, yes.

Any join when there is no shared key is |R|x|S| (size of right side *size of left) and there are |R|x|S| result rows.

There are other cost issues hidden behind the asymptotic performance ofthe join algorithm. For example, that all access into the data coststhe same fixed amount. That is not true - one (uncached) disk access isof the order of a million CPU instructions. SSDs are faster ("no" seektime but a long code path from program to SSD and back). Indexstructures have different growth characteristics. RAM cache levels makea difference.


        Andy


once again, thanks a lot for your time!

Paulo.





2014-06-27 14:40 GMT-05:00 Andy Seaborne <[email protected]>:

On 27/06/14 08:50, Paulo Picota wrote:

Hello friends =)

Im using ARQ to do some sparql queries to a OWL ontolgy for my Master
Thesis. One thing I need for my thesis is the order O(n) of the  graph
pattern matching algorithm that is used to resolve the querie after its
final transformations and optimization steps.

(http://jena.apache.org/documentation/query/arq-query-eval.html)

I have been looking for a while now and I cant find any reference to what
algorithm is been used.


You mean the basic graph pattern matcher?

In the released code, it is a form of index-join done by substituting
results from earlier parts of the pattern repeated into the later parts.
It's not O(n) for whatever the n is measuring.

It is constant memory - one of the advantages of the implementation is
that it does not consume increased working space as the data increases.

For patterns that have moderate

There is no reason it has to be this algorithms - there is code elsewhere
that uses hash joins, merge joins or sorted merge joins.

Merge joins are useful in TDB (if you run with a different index set).


  if there is a paper or any one has the information about the order of the

algorithm, its medium and worst case scenario, or maybe some documentation
I could read about this, it will be a great help.


The code is good place to look.

         Andy

thanks So much,

Best regards =)

Re: Jena ARQ Graph Pattern Matching Algorithm Question

Reply via email to