Thank you SO much for your help =)
2014-07-05 15:23 GMT-05:00 Andy Seaborne <[email protected]>: > On 03/07/14 08:44, Paulo Picota wrote: > >> Thanks for your answer Andy, >> >> It really helped me to understand a little more how the BGP works. >> >> I have been readding a little more online and I found this: >> http://answers.semanticweb.com/questions/2481/sparql- >> basic-graph-pattern-matching >> >> Following that forum answer, i have looked more into different Join >> algorithm, mainly in wikipedia and some papers and one or two articles... >> >> My initial interest was to learn about the asymptotic notation of the >> algorithm in order to have a reference in terms of nodes and links (within >> the graph). Now i see that the joins algorithms work in terms of >> relationships. >> >> My understanding so far is that the Index-Join algorithm that you guys >> have implemented works on a Memory constant value (pretty cool i got to >> say) >> > > So is a merge join (if the indexes were PSO and PSO, then any triple > pattern with constant predicates can be done that way). > > > > and the algorith is close to O(n^2). The last part is a conclusion i > >> have come out with after reading about other algorithms (Merge-sort, Hash >> join, nested loop) and also from the example (very basic one) found here: >> >> http://oak.cs.ucla.edu/cs143/notes/join.pdf >> >> Friend Andy, could you kindly tell me if Im on the right track? Is it safe >> to say that the index-join algorithm used in ARQ to solve BGP could be >> O(n^2) where is n the amount of relationships ? maybe im shooting a little >> over the moon with this, but im very interested in knowing the asymptotic >> notation. >> > > Worst case, yes. > > Any join when there is no shared key is |R|x|S| (size of right side * size > of left) and there are |R|x|S| result rows. > > There are other cost issues hidden behind the asymptotic performance of > the join algorithm. For example, that all access into the data costs the > same fixed amount. That is not true - one (uncached) disk access is of the > order of a million CPU instructions. SSDs are faster ("no" seek time but a > long code path from program to SSD and back). Index structures have > different growth characteristics. RAM cache levels make a difference. > > Andy > > >> once again, thanks a lot for your time! >> >> Paulo. >> >> >> >> >> >> 2014-06-27 14:40 GMT-05:00 Andy Seaborne <[email protected]>: >> >> On 27/06/14 08:50, Paulo Picota wrote: >>> >>> Hello friends =) >>>> >>>> Im using ARQ to do some sparql queries to a OWL ontolgy for my Master >>>> Thesis. One thing I need for my thesis is the order O(n) of the graph >>>> pattern matching algorithm that is used to resolve the querie after its >>>> final transformations and optimization steps. >>>> >>>> (http://jena.apache.org/documentation/query/arq-query-eval.html) >>>> >>>> I have been looking for a while now and I cant find any reference to >>>> what >>>> algorithm is been used. >>>> >>>> >>> You mean the basic graph pattern matcher? >>> >>> In the released code, it is a form of index-join done by substituting >>> results from earlier parts of the pattern repeated into the later parts. >>> It's not O(n) for whatever the n is measuring. >>> >>> It is constant memory - one of the advantages of the implementation is >>> that it does not consume increased working space as the data increases. >>> >>> For patterns that have moderate >>> >>> There is no reason it has to be this algorithms - there is code elsewhere >>> that uses hash joins, merge joins or sorted merge joins. >>> >>> Merge joins are useful in TDB (if you run with a different index set). >>> >>> >>> if there is a paper or any one has the information about the order of >>> the >>> >>>> algorithm, its medium and worst case scenario, or maybe some >>>> documentation >>>> I could read about this, it will be a great help. >>>> >>>> >>> The code is good place to look. >>> >>> Andy >>> >>> >>> >>> thanks So much, >>>> >>>> Best regards =) >>>> >>>> >>>> >>> >> >
