There was some interesting research from Microsoft about their Trinity RDF store, could be relevant here: http://research.microsoft.com/apps/pubs/default.aspx?id=183717
On Fri, Jan 9, 2015 at 7:05 PM, Andy Seaborne <[email protected]> wrote: > On 09/01/15 11:08, Olivier Rossel wrote: >> >> Hi everyone. >> >> I have this (classical) graph traversal use case: >> in a graph, let's find all the paths from resource A to resource B. >> >> AFAIU, SPARQL has a dedicated syntax for that. > > > http://www.w3.org/TR/sparql11-query/#propertypaths > >> (called property path). > > > Yes-ish. You can write > > ?x :property* ?y . > > You can't find out about the path chosen, e.g. length (or how many). ?var* > is illegal but there are tricks for "any property" like usage with > NegatedPropertySets (probably perform badly). > >> In my use case, the property paths to match have unknown length. >> >> I think Virtuoso stores triples in a g,s,p,o table. >> So property paths of unknown length will be translated into a massive >> amount >> of self-joins. >> Does it sound right? > > > Ask Openlink? > > Seems likely. As ever there are implementation choices - in RDF stores, > self-joins are usually not bad (and they are very common). > >> How would TDB handle those property paths? >> With a similar strategy? > > > Yes [*] > > ([*] unless you run some magic in the query execution engine, for example, > to support RDFS at scale.) > >> And then, the graph databases. > > > Of several different kinds - Apache Giraph is very different to Neo4J in > target usage. > > https://giraph.apache.org/ > https://spark.apache.org/graphx/ > >> Do they have an internal structure that makes this graph traversal more >> natural? >> Or do they rely, eventually, on similar strategies? > > > It's hard to answer simply - it depends on how you look at them. > > One step of parallel graph traversal looks very much like a join to me. > > Neo claim they have no joins because they have pointers (it's comceptually > an in-memory design as described in the Graph Databases book - the design > has moved on since that description). I'd say that looking at all the > pointer following in parallel, the pointers are remarkably like index joins > and hash tables, especially as it goes cluster. > > http://graphdatabases.com/ > > The analytics engines - Apache Giraph and many others - are fundamentally > different. They are node engines with highly parallel transversal. > > "Designing Data-Intensive Applications" has some material on graph > databases. > http://shop.oreilly.com/product/0636920032175.do > > Andy > > > >> >> Any help of that subject would be of great interest. >> > > >
