There was some interesting research from Microsoft about their Trinity
RDF store, could be relevant here:
http://research.microsoft.com/apps/pubs/default.aspx?id=183717

On Fri, Jan 9, 2015 at 7:05 PM, Andy Seaborne <[email protected]> wrote:
> On 09/01/15 11:08, Olivier Rossel wrote:
>>
>> Hi everyone.
>>
>> I have this (classical) graph traversal use case:
>> in a graph, let's find all the paths from resource A to resource B.
>>
>> AFAIU, SPARQL has a dedicated syntax for that.
>
>
> http://www.w3.org/TR/sparql11-query/#propertypaths
>
>> (called property path).
>
>
> Yes-ish.  You can write
>
> ?x :property* ?y .
>
> You can't find out about the path chosen, e.g. length (or how many). ?var*
> is illegal but there are tricks for "any property" like usage with
> NegatedPropertySets (probably perform badly).
>
>> In my use case, the property paths to match have unknown length.
>>
>> I think Virtuoso stores triples in a g,s,p,o table.
>> So property paths of unknown length will be translated into a massive
>> amount
>> of self-joins.
>> Does it sound right?
>
>
> Ask Openlink?
>
> Seems likely. As ever there are implementation choices - in RDF stores,
> self-joins are usually not bad (and they are very common).
>
>> How would TDB handle those property paths?
>> With a similar strategy?
>
>
> Yes [*]
>
> ([*] unless you run some magic in the query execution engine, for example,
> to support RDFS at scale.)
>
>> And then, the graph databases.
>
>
> Of several different kinds - Apache Giraph is very different to Neo4J in
> target usage.
>
> https://giraph.apache.org/
> https://spark.apache.org/graphx/
>
>> Do they have an internal structure that makes this graph traversal more
>> natural?
>> Or do they rely, eventually, on similar strategies?
>
>
> It's hard to answer simply - it depends on how you look at them.
>
> One step of parallel graph traversal looks very much like a join to me.
>
> Neo claim they have no joins because they have pointers (it's comceptually
> an in-memory design as described in the Graph Databases book - the design
> has moved on since that description).  I'd say that looking at all the
> pointer following in parallel, the pointers are remarkably like index joins
> and hash tables, especially as it goes cluster.
>
> http://graphdatabases.com/
>
> The analytics engines - Apache Giraph and many others - are fundamentally
> different.  They are node engines with highly parallel transversal.
>
> "Designing Data-Intensive Applications" has some material on graph
> databases.
> http://shop.oreilly.com/product/0636920032175.do
>
>         Andy
>
>
>
>>
>> Any help of that subject would be of great interest.
>>
>
>
>

Reply via email to