Re: Graph traversal and Jena TDB

Andy Seaborne Fri, 09 Jan 2015 10:07:51 -0800

On 09/01/15 11:08, Olivier Rossel wrote:

Hi everyone.


I have this (classical) graph traversal use case:
in a graph, let's find all the paths from resource A to resource B.

AFAIU, SPARQL has a dedicated syntax for that.


http://www.w3.org/TR/sparql11-query/#propertypaths

(called property path).


Yes-ish.  You can write

?x :property* ?y .

You can't find out about the path chosen, e.g. length (or how many).?var* is illegal but there are tricks for "any property" like usage withNegatedPropertySets (probably perform badly).

In my use case, the property paths to match have unknown length.

I think Virtuoso stores triples in a g,s,p,o table.
So property paths of unknown length will be translated into a massive amount
of self-joins.
Does it sound right?


Ask Openlink?

Seems likely. As ever there are implementation choices - in RDF stores,self-joins are usually not bad (and they are very common).

How would TDB handle those property paths?
With a similar strategy?


Yes [*]

([*] unless you run some magic in the query execution engine, forexample, to support RDFS at scale.)

And then, the graph databases.

Of several different kinds - Apache Giraph is very different to Neo4J intarget usage.


https://giraph.apache.org/
https://spark.apache.org/graphx/

Do they have an internal structure that makes this graph traversal more natural?
Or do they rely, eventually, on similar strategies?


It's hard to answer simply - it depends on how you look at them.

One step of parallel graph traversal looks very much like a join to me.

Neo claim they have no joins because they have pointers (it'scomceptually an in-memory design as described in the Graph Databasesbook - the design has moved on since that description). I'd say thatlooking at all the pointer following in parallel, the pointers areremarkably like index joins and hash tables, especially as it goes cluster.


http://graphdatabases.com/

The analytics engines - Apache Giraph and many others - arefundamentally different. They are node engines with highly paralleltransversal.

"Designing Data-Intensive Applications" has some material on graphdatabases.

http://shop.oreilly.com/product/0636920032175.do

        Andy


Any help of that subject would be of great interest.

Re: Graph traversal and Jena TDB

Reply via email to