Thank you SO much for your help =)


2014-07-05 15:23 GMT-05:00 Andy Seaborne <[email protected]>:

> On 03/07/14 08:44, Paulo Picota wrote:
>
>> Thanks for your answer Andy,
>>
>> It really helped me to understand a little more how the BGP  works.
>>
>> I have been readding a little more online and I found this:
>> http://answers.semanticweb.com/questions/2481/sparql-
>> basic-graph-pattern-matching
>>
>> Following that forum answer,  i have looked more into different Join
>> algorithm, mainly in wikipedia and some papers and one or two articles...
>>
>> My initial interest was to learn about  the asymptotic notation of the
>> algorithm in order to have a reference in terms of nodes and links (within
>> the graph). Now i see that the joins algorithms work in terms of
>> relationships.
>>
>> My understanding so far is that the Index-Join  algorithm that you guys
>> have implemented works on a Memory  constant value (pretty cool i got to
>> say)
>>
>
> So is a merge join (if the indexes were PSO and PSO, then any triple
> pattern with constant predicates can be done that way).
>
>
> > and the algorith is close to O(n^2). The last part is a conclusion i
>
>> have come out with after reading about other algorithms (Merge-sort, Hash
>> join, nested loop) and also from the example (very basic one) found here:
>>
>> http://oak.cs.ucla.edu/cs143/notes/join.pdf
>>
>> Friend Andy, could you kindly tell me if Im on the right track? Is it safe
>> to say that the index-join algorithm used in ARQ to solve BGP  could  be
>> O(n^2) where is n the amount of relationships ? maybe im shooting a little
>> over the moon with this, but im very interested in knowing the asymptotic
>> notation.
>>
>
> Worst case, yes.
>
> Any join when there is no shared key is |R|x|S| (size of right side * size
> of left) and there are |R|x|S| result rows.
>
> There are other cost issues hidden behind the asymptotic performance of
> the join algorithm.  For example, that all access into the data costs the
> same fixed amount.  That is not true - one (uncached) disk access is of the
> order of a million CPU instructions.  SSDs are faster ("no" seek time but a
> long code path from program to SSD and back).  Index structures have
> different growth characteristics.  RAM cache levels make a difference.
>
>         Andy
>
>
>> once again, thanks a lot for your time!
>>
>> Paulo.
>>
>>
>>
>>
>>
>> 2014-06-27 14:40 GMT-05:00 Andy Seaborne <[email protected]>:
>>
>>  On 27/06/14 08:50, Paulo Picota wrote:
>>>
>>>  Hello friends =)
>>>>
>>>> Im using ARQ to do some sparql queries to a OWL ontolgy for my Master
>>>> Thesis. One thing I need for my thesis is the order O(n) of the  graph
>>>> pattern matching algorithm that is used to resolve the querie after its
>>>> final transformations and optimization steps.
>>>>
>>>> (http://jena.apache.org/documentation/query/arq-query-eval.html)
>>>>
>>>> I have been looking for a while now and I cant find any reference to
>>>> what
>>>> algorithm is been used.
>>>>
>>>>
>>> You mean the basic graph pattern matcher?
>>>
>>> In the released code, it is a form of index-join done by substituting
>>> results from earlier parts of the pattern repeated into the later parts.
>>> It's not O(n) for whatever the n is measuring.
>>>
>>> It is constant memory - one of the advantages of the implementation is
>>> that it does not consume increased working space as the data increases.
>>>
>>> For patterns that have moderate
>>>
>>> There is no reason it has to be this algorithms - there is code elsewhere
>>> that uses hash joins, merge joins or sorted merge joins.
>>>
>>> Merge joins are useful in TDB (if you run with a different index set).
>>>
>>>
>>>   if there is a paper or any one has the information about the order of
>>> the
>>>
>>>> algorithm, its medium and worst case scenario, or maybe some
>>>> documentation
>>>> I could read about this, it will be a great help.
>>>>
>>>>
>>> The code is good place to look.
>>>
>>>          Andy
>>>
>>>
>>>
>>>  thanks So much,
>>>>
>>>> Best regards =)
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to