On 03/07/14 08:44, Paulo Picota wrote:
Thanks for your answer Andy,

It really helped me to understand a little more how the BGP  works.

I have been readding a little more online and I found this:
http://answers.semanticweb.com/questions/2481/sparql-basic-graph-pattern-matching

Following that forum answer,  i have looked more into different Join
algorithm, mainly in wikipedia and some papers and one or two articles...

My initial interest was to learn about  the asymptotic notation of the
algorithm in order to have a reference in terms of nodes and links (within
the graph). Now i see that the joins algorithms work in terms of
relationships.

My understanding so far is that the Index-Join  algorithm that you guys
have implemented works on a Memory  constant value (pretty cool i got to
say)

So is a merge join (if the indexes were PSO and PSO, then any triple pattern with constant predicates can be done that way).

> and the algorith is close to O(n^2). The last part is a conclusion i
have come out with after reading about other algorithms (Merge-sort, Hash
join, nested loop) and also from the example (very basic one) found here:

http://oak.cs.ucla.edu/cs143/notes/join.pdf

Friend Andy, could you kindly tell me if Im on the right track? Is it safe
to say that the index-join algorithm used in ARQ to solve BGP  could  be
O(n^2) where is n the amount of relationships ? maybe im shooting a little
over the moon with this, but im very interested in knowing the asymptotic
notation.

Worst case, yes.

Any join when there is no shared key is |R|x|S| (size of right side * size of left) and there are |R|x|S| result rows.

There are other cost issues hidden behind the asymptotic performance of the join algorithm. For example, that all access into the data costs the same fixed amount. That is not true - one (uncached) disk access is of the order of a million CPU instructions. SSDs are faster ("no" seek time but a long code path from program to SSD and back). Index structures have different growth characteristics. RAM cache levels make a difference.

        Andy


once again, thanks a lot for your time!

Paulo.





2014-06-27 14:40 GMT-05:00 Andy Seaborne <[email protected]>:

On 27/06/14 08:50, Paulo Picota wrote:

Hello friends =)

Im using ARQ to do some sparql queries to a OWL ontolgy for my Master
Thesis. One thing I need for my thesis is the order O(n) of the  graph
pattern matching algorithm that is used to resolve the querie after its
final transformations and optimization steps.

(http://jena.apache.org/documentation/query/arq-query-eval.html)

I have been looking for a while now and I cant find any reference to what
algorithm is been used.


You mean the basic graph pattern matcher?

In the released code, it is a form of index-join done by substituting
results from earlier parts of the pattern repeated into the later parts.
It's not O(n) for whatever the n is measuring.

It is constant memory - one of the advantages of the implementation is
that it does not consume increased working space as the data increases.

For patterns that have moderate

There is no reason it has to be this algorithms - there is code elsewhere
that uses hash joins, merge joins or sorted merge joins.

Merge joins are useful in TDB (if you run with a different index set).


  if there is a paper or any one has the information about the order of the
algorithm, its medium and worst case scenario, or maybe some documentation
I could read about this, it will be a great help.


The code is good place to look.

         Andy



thanks So much,

Best regards =)





Reply via email to