Today I pushed a major rewrite of the Enhanced API. See:
https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdb
Originally the Enhanced API was a drop-in replacement of the standard Neo4j
API. This resulted in lots of wrapper classes that needed to be maintained.
The rewrite of Enhanced API is no longer a drop-in replacement and contains no
interface/class names that can be found in the standard API.
Enhanced API no longer speaks of Nodes but of Vertices and doesn't speak of
Relationships but of Edges. This helps to prevent name clashes at the expense
of somewhat less recognizable names (Relationship is after all a more common
word than Edge).
This rewrite is not merely a renaming of classes and interfaces, but is in most
part a complete rewrite and also a rethinking of the API on my part.
Enhanced API consists of two basic elements: Vertex and EdgeRole. Most elements
are a subclass of Vertex, though there are some specialized versions of
EdgeRole.
Let me start with an example:
Suppose we have two vertices denoting the persons Tom and Paula, and we want to
state that Tom is the father of Paula.
For standard Neo4j we tend to write such a fact as:
Tom --Father--> Paula
For Enhanced API we can conceptually write this fact as follows:
--StartRole--Tom
Father
--EndRole--Paula
This should be read as follows: We have two Vertices: Tom and Paula and we have
a BinaryEdge (similar to a Relationship in the standard API) of type "Father",
where Tom has the StartRole for that edge and Paula has the EndRole for that
edge.
So instead of a directed graph, we conceptually have an undirected bipartite
graph.
For binary edges (edges between two vertices), this is mostly conceptually the
case, because the API will simply allow you to write: tom.createEdgeTo(paula,
FATHER) (similar to tom.createRelationshipTo(paula, FATHER) as we would have in
the standard API).
It is also possible to fetch the start vertex of the binary relationship with
the method: edge.getStartVertex() (similar to relationship.getStartNode()),
although it is also possible to treat the binary edge as a generic edge and
fetch that Vertex as: edge.getElement(db.getStartRole()).
BinaryEdges, are a special case and have special methods which cover the same
functionality as can be found in the standard Neo4j API.
In general, we can say that Vertices are connected to Edges by means of
EdgeRoles. In the binary case there are two predefined EdgeRoles: StartRole and
EndRole.
Before we get deeper into the general case of n-ary edges, let's first look at
another special case: Properties.
Properties can be thought of as unary edges, an edge that connects to only one
Vertex (as opposed to two in the binary case).
Suppose we want to state that Tom is 49 years old, we can write that as:
age(49)--PropertyRole--Tom
We have an edge of type "age" that is connected to the vertex Tom in the role
of a property.
Again this is mostly conceptually true, because there are lots of methods in
Enhanced API that are very similar to the ones found in the standard API;
getProperty, hasProperty, setProperty. Instead, we can also call methods on the
property itself, after all the age property connected to the Vertex "Tom", is
an object all of itself. More precisely it is a Property and with that it is a
UnaryEdge, which is an Edge, which is a Vertex.
>From the age property we can fetch the ProperyType, but we can also ask for
>the Vertex it is connected to: getVertex(). Since a Property is an Edge we can
>also fetch the connected vertex (Tom) as follows:
>age.getElement(db.getPropertyRole).
So we have seen the two special cases: unary edges and binary edges, which work
very much the same as properties and Relationships in the standard Neo4j API,
though we have given it a conceptually different perspective that unifies the
two and fits it neatly into the general case of N-ary edges.
As said before, an Edge is a Vertex that connects other Vertices by means of
EdgeRoles. Since Edges are Vertices, they can have other Edges connected to
them. Or in standard API talk: relationships can be connected to other
relationships and they can have properties.
The concept of EdgeRoles separates Edges from Vertices, so we will effectively
have a bipartite graph where Vertices can only connect to Edges and Edges can
only connect to Vertices. Given the fact that Edges are also Vertices, Edges
can be connected to Edges, but in such a case it is unambiguous which plays the
role of Edge and which plays the role of Vertex in that connection.
Let's look at an example of an N-ary edge:
Suppose we want to state the fact that Tom gives Paula a Bicycle (no golden
helicopters in stock today). We can write that as follows:
--Giver--Tom
GIVES --Recipient -- Paula
--Gift -- Bicycle
There is an EdgeType GIVES which defines three EdgeRoles: Giver, Recipient and
Gift, which connect Tom, Paula and Bicycle to the Edge.
The edge is created by first creating three EdgeElement objects that each
contain a Role and the connected Vertex. We can then make the call
db.createEdge(GIVES, edgeElements).
An EdgeElement is that what is connected to Edge for a particular EdgeRole
(including that EdgeRole itself).
An EdgeElement can contain more than one connected Vertex. We can for example
state: Tom and Dick give Paula a Bicycle.
In Enhanced API notation:
--Giver--Tom, Dick
GIVES --Recipient -- Paula
--Gift -- Bicycle
Or we may want to state: Tom, Dick and Harry give Paula and Josephine a Bicycle
and an Icecream.
In Enhanced API notation:
--Giver--Tom, Dick, Harry
GIVES --Recipient -- Paula, Josephine
--Gift -- Bicycle, Icecream
The API allow the user to fetch an EdgeElement by means of an EdgeRole and
iterate over the connected Vertices:
for(EdgeElement givers: gives.getElements(Giver)){
for(Vertex giver: givers.getVertices){
//do something with the giver Vertex
}
}
For those cases where an EdgeElement can contain only one Vertex, there is a
FunctionalEdgeElement, which can only be used in conjunction with
FunctionalEdgeRoles.
StartRole, EndRole and PropertyRole are all FunctionalEdgeRoles, since we can
have only one start Vertex and one end Vertex per BinaryEdge (just like there
can only be one StartNode and one EndNode for a Relationship in the standard
API) and we can only have one Vertex associated with a Property (just like a
property can not belong to two different Nodes in the standard Neo4j API) .
The Enhanced API can be used in conjunction with standard Neo4j API. The only
replacement needed is that of the database instance. The Enhanced API defines a
DatabaseService interface, which extends the standard GraphDatabaseService
interface and adds several enhanced methods for the creation and lookup of
Vertices, Edges and several kinds of VertexTypes.
Now the big question is of course, what do we gain with this entire apparatus?
First of all, we have unification of the storage elements of Neo4j. Everything
that can be stored in Neo4j is a Vertex:
Node is very much like a Vertex (with a slightly different interface that has
similar features to the standard Neo4j API, and more...)
Relationship is very much like BinaryEdge, which is an Edge, which is a Vertex
RelationshipType is covered by BinaryEdgeType which is an EdgeType, which is a
VertexType, which is a Vertex
property name is wrapped as a PropertyType which is an an EdgeType, which is a
VertexType, which is a Vertex.
propery value is wrapped as a Property which is a UnaryEdge, which is an Edge,
which is a Vertex
Having this unification, it is possible to write traversals to every part of
the Neo4j database. And that is the big boon of this unification.
Every part of the database can be accessed with a traveral description.
The standard Neo4j API only allows traversals to return Nodes given a start
Node. The Enhanced API allows traversals from any part of the graph, whether it
is a regular Vertex, an Edge or a Property (or a type thereof), to any other
part of the graph, no matter if it is a regular Vertex, an Edge or a Property
(or a type thereof).
All that needs to be supplied are the EdgeTypes that need to be followed in a
traversal (and the regular evaluators that go with it).
Now the big downer to this all:
I still have to write the traversal framework, which will actually follow the
Standard Neo4j framework, but will certainly make traversals composable.
Every Vertex is not just a Vertex, but it is also a bunch of paths. Well not
really a bunch, it is a bunch of size one, and not much of a path either, since
it only contains one path element, the Vertex itself.
A traversal returns a bunch of paths (Iterable<Path>) and starts from a bunch
of paths (still Iterable<Path>).
Since the output of a traversal is the same as the input of a traversal we can
now compose them. This makes it possible to write a traversal description which
states that we want to retrieve the parents of our friends, or the neighbours
of the parents of our friends, and even: the names of the dogs of the
neighbours of the parents of our friends (after all, we can now traverse to a
property).
This can be achieved when we make traversal descriptions composable. Most users
probably don't want to manually compose traversals, they would much rather
compose traversal descriptions and let those descriptions do the composition of
the traversals.
These are some things to work on over the weekend + plus + plus + documentation
(especially Javadoc) and more test cases (especially the integration of
IndexedRelationships as SortableBinaryEdges needs thorough testing).
For the rest, I'd like to hear opinions and suggestions for improvement.
Niels
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user