[Neo4j] Enhanced API rewrite

Niels Hoogeveen Fri, 05 Aug 2011 17:51:37 -0700

Today I pushed a major rewrite of the Enhanced API. See: 
https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdb


Originally the Enhanced API was a drop-in replacement of the standard Neo4j 
API. This resulted in lots of wrapper classes that needed to be maintained.

The rewrite of Enhanced API is no longer a drop-in replacement and contains no 
interface/class names that can be found in the standard API.

Enhanced API no longer speaks of Nodes but of Vertices and doesn't speak of 
Relationships but of Edges. This helps to prevent name clashes at the expense 
of somewhat less recognizable names (Relationship is after all a more common 
word than Edge). 

This rewrite is not merely a renaming of classes and interfaces, but is in most 
part a complete rewrite and also a rethinking of the API on my part.

Enhanced API consists of two basic elements: Vertex and EdgeRole. Most elements 
are a subclass of Vertex, though there are some specialized versions of 
EdgeRole.

Let me start with an example:

Suppose we have two vertices denoting the persons Tom and Paula, and we want to 
state that Tom is the father of Paula.

For standard Neo4j we tend to write such a fact as:

Tom --Father--> Paula

For Enhanced API we can conceptually write this fact as follows:

       --StartRole--Tom
Father 
       --EndRole--Paula

This should be read as follows: We have two Vertices: Tom and Paula and we have 
a BinaryEdge (similar to a Relationship in the standard API) of type "Father", 
where Tom has the StartRole for that edge and Paula has the EndRole for that 
edge.

So instead of a directed graph, we conceptually have an undirected bipartite 
graph.

For binary edges (edges between two vertices), this is mostly conceptually the 
case, because the API will simply allow you to write: tom.createEdgeTo(paula, 
FATHER) (similar to tom.createRelationshipTo(paula, FATHER) as we would have in 
the standard API). 

It is also possible to fetch the start vertex of the binary relationship with 
the method: edge.getStartVertex() (similar to relationship.getStartNode()), 
although it is also possible to treat the binary edge as a generic edge and 
fetch that Vertex as: edge.getElement(db.getStartRole()). 

BinaryEdges, are a special case and have special methods which cover the same 
functionality as can be found in the standard Neo4j API.

In general, we can say that Vertices are connected to Edges by means of 
EdgeRoles. In the binary case there are two predefined EdgeRoles: StartRole and 
EndRole.

Before we get deeper into the general case of n-ary edges, let's first look at 
another special case: Properties.

Properties can be thought of as unary edges, an edge that connects to only one 
Vertex (as opposed to two in the binary case). 

Suppose we want to state that Tom is 49 years old, we can write that as:

age(49)--PropertyRole--Tom

We have an edge of type "age" that is connected to the vertex Tom in the role 
of a property.

Again this is mostly conceptually true, because there are lots of methods in 
Enhanced API that are very similar to the ones found in the standard API; 
getProperty, hasProperty, setProperty. Instead, we can also call methods on the 
property itself, after all the age property connected to the Vertex "Tom", is 
an object all of itself. More precisely it is a Property and with that it is a 
UnaryEdge, which is an Edge, which is a Vertex.

>From the age property we can fetch the ProperyType, but we can also ask for 
>the Vertex it is connected to: getVertex(). Since a Property is an Edge we can 
>also fetch the connected vertex (Tom) as follows: 
>age.getElement(db.getPropertyRole).

So we have seen the two special cases: unary edges and binary edges, which work 
very much the same as properties and Relationships in the standard Neo4j API, 
though we have given it a conceptually different perspective that unifies the 
two and fits it neatly into the general case of N-ary edges.

As said before, an Edge is a Vertex that connects other Vertices by means of 
EdgeRoles. Since Edges are Vertices, they can have other Edges connected to 
them. Or in standard API talk: relationships can be connected to other 
relationships and they can have properties.

The concept of EdgeRoles separates Edges from Vertices, so we will effectively 
have a bipartite graph where Vertices can only connect to Edges and Edges can 
only connect to Vertices. Given the fact that Edges are also Vertices, Edges 
can be connected to Edges, but in such a case it is unambiguous which plays the 
role of Edge and which plays the role of Vertex in that connection. 

Let's look at an example of an N-ary edge:

Suppose we want to state the fact that Tom gives Paula a Bicycle (no golden 
helicopters in stock today). We can write that as follows:

      --Giver--Tom
GIVES --Recipient -- Paula
      --Gift -- Bicycle

There is an EdgeType GIVES which defines three EdgeRoles: Giver, Recipient and 
Gift, which connect Tom, Paula and Bicycle to the Edge.

The edge is created by first creating three EdgeElement objects that each 
contain a Role and the connected Vertex. We can then make the call 
db.createEdge(GIVES, edgeElements).

An EdgeElement is that what is connected to Edge for a particular EdgeRole 
(including that EdgeRole itself). 

An EdgeElement can contain more than one connected Vertex. We can for example 
state: Tom and Dick give Paula a Bicycle. 

In Enhanced API notation:

      --Giver--Tom, Dick
GIVES --Recipient -- Paula
      --Gift -- Bicycle

Or we may want to state: Tom, Dick and Harry give Paula and Josephine a Bicycle 
and an Icecream. 

In Enhanced API notation:

      --Giver--Tom, Dick, Harry
GIVES --Recipient -- Paula, Josephine
      --Gift -- Bicycle, Icecream

The API allow the user to fetch an EdgeElement by means of an EdgeRole and 
iterate over the connected Vertices:

for(EdgeElement givers: gives.getElements(Giver)){
  for(Vertex giver: givers.getVertices){
     //do something with the giver Vertex
  }
}

For those cases where an EdgeElement can contain only one Vertex, there is a 
FunctionalEdgeElement, which can only be used in conjunction with 
FunctionalEdgeRoles. 

StartRole, EndRole and PropertyRole are all FunctionalEdgeRoles, since we can 
have only one start Vertex and one end Vertex per BinaryEdge (just like there 
can only be one StartNode and one EndNode for a Relationship in the standard 
API) and we can only have one Vertex associated with a Property (just like a 
property can not belong to two different Nodes in the standard Neo4j API) .

The Enhanced API can be used in conjunction with standard Neo4j API. The only 
replacement needed is that of the database instance. The Enhanced API defines a 
DatabaseService interface, which extends the standard GraphDatabaseService 
interface and adds several enhanced methods for the creation and lookup of 
Vertices, Edges and several kinds of VertexTypes.

Now the big question is of course, what do we gain with this entire apparatus?

First of all, we have unification of the storage elements of Neo4j. Everything 
that can be stored in Neo4j is a Vertex:

Node is very much like a Vertex (with a slightly different interface that has 
similar features to the standard Neo4j API, and more...)
Relationship is very much like BinaryEdge, which is an Edge, which is a Vertex
RelationshipType is covered by BinaryEdgeType which is an EdgeType, which is a 
VertexType, which is a Vertex
property name is wrapped as a PropertyType which is an an EdgeType, which is a 
VertexType, which is a Vertex.
propery value is wrapped as a Property which is a UnaryEdge, which is an Edge, 
which is a Vertex

Having this unification, it is possible to write traversals to every part of 
the Neo4j database. And that is the big boon of this unification.

Every part of the database can be accessed with a traveral description. 

The standard Neo4j API only allows traversals to return Nodes given a start 
Node. The Enhanced API allows traversals from any part of the graph, whether it 
is a regular Vertex, an Edge or a Property (or a type thereof), to any other 
part of the graph, no matter if it is a regular Vertex, an Edge or a Property 
(or a type thereof).

All that needs to be supplied are the EdgeTypes that need to be followed in a 
traversal (and the regular evaluators that go with it).

Now the big downer to this all: 

I still have to write the traversal framework, which will actually follow the 
Standard Neo4j framework, but will certainly make traversals composable.

Every Vertex is not just a Vertex, but it is also a bunch of paths. Well not 
really a bunch, it is a bunch of size one, and not much of a path either, since 
it only contains one path element, the Vertex itself.

A traversal returns a bunch of paths (Iterable<Path>) and starts from a bunch 
of paths (still Iterable<Path>).

Since the output of a traversal is the same as the input of a traversal we can 
now compose them. This makes it possible to write a traversal description which 
states that we want to retrieve the parents of our friends, or the neighbours 
of the parents of our friends, and even: the names of the dogs of the 
neighbours of the parents of our friends (after all, we can now traverse to a 
property). 

This can be achieved when we make traversal descriptions composable. Most users 
probably don't want to manually compose traversals, they would much rather 
compose traversal descriptions and let those descriptions do the composition of 
the traversals. 

These are some things to work on over the weekend + plus + plus + documentation 
(especially Javadoc) and more test cases (especially the integration of 
IndexedRelationships as SortableBinaryEdges needs thorough testing).

For the rest, I'd like to hear opinions and suggestions for improvement.

Niels                                     
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Enhanced API rewrite

Reply via email to