On 09/01/15 11:08, Olivier Rossel wrote:
Hi everyone.
I have this (classical) graph traversal use case:
in a graph, let's find all the paths from resource A to resource B.
AFAIU, SPARQL has a dedicated syntax for that.
http://www.w3.org/TR/sparql11-query/#propertypaths
(called property path).
Yes-ish. You can write
?x :property* ?y .
You can't find out about the path chosen, e.g. length (or how many).
?var* is illegal but there are tricks for "any property" like usage with
NegatedPropertySets (probably perform badly).
In my use case, the property paths to match have unknown length.
I think Virtuoso stores triples in a g,s,p,o table.
So property paths of unknown length will be translated into a massive amount
of self-joins.
Does it sound right?
Ask Openlink?
Seems likely. As ever there are implementation choices - in RDF stores,
self-joins are usually not bad (and they are very common).
How would TDB handle those property paths?
With a similar strategy?
Yes [*]
([*] unless you run some magic in the query execution engine, for
example, to support RDFS at scale.)
And then, the graph databases.
Of several different kinds - Apache Giraph is very different to Neo4J in
target usage.
https://giraph.apache.org/
https://spark.apache.org/graphx/
Do they have an internal structure that makes this graph traversal more natural?
Or do they rely, eventually, on similar strategies?
It's hard to answer simply - it depends on how you look at them.
One step of parallel graph traversal looks very much like a join to me.
Neo claim they have no joins because they have pointers (it's
comceptually an in-memory design as described in the Graph Databases
book - the design has moved on since that description). I'd say that
looking at all the pointer following in parallel, the pointers are
remarkably like index joins and hash tables, especially as it goes cluster.
http://graphdatabases.com/
The analytics engines - Apache Giraph and many others - are
fundamentally different. They are node engines with highly parallel
transversal.
"Designing Data-Intensive Applications" has some material on graph
databases.
http://shop.oreilly.com/product/0636920032175.do
Andy
Any help of that subject would be of great interest.