Re: [Neo4j] Spring Data Neo4j 2.0 Roadmap

Jean-Pierre Bergamin Wed, 05 Oct 2011 14:39:03 -0700

Hello everyone

I'm really glad to see that SDG or SDN (as it will be called on from
now on) is actively maintained and that new ideas are followed to
solve the day to day "problems" when working with graphs. I'm not
really sad to hear that the AspectJ mapping will not be in main focus
for the future. Working with AspectJ can be really cumbersome and
error prone. And I personally don't like the active record pattern for
enterprise applications anyway, because it ties the database state
directly to your domain object. But this may be a matter of taste
also...


Interestingly enough we just had many internal discussions these days
how to model our domain and how spring data neo4j will be used to
implement it. Let me describe the challenges we have/had while working
with Neo4j and SDN:

"Multiple inheritance"
------------------------------
Our domain model is quite complex and entities can play many different
roles in a graph and therefore they can have many different
relationships to many other entities of different types. Some entity
types can be the starting- *and* end-point of some relationships,
whereas other types only can be a start- *or* end-point for certain
relationships and so on. Let me show a very, very simplified example:

Having three entity types Service, Device and Process (among many
others), there are the following valid relationships:
(Service)---[DEPENDS_ON]--->(Service)
(Service)---[DEPENDS_ON]--->(Device)
(Process)---[DEPENDS_ON]--->(Service)

(Service)---[HAS_FLOW_TO]--->(Service)
(Service)---[HAS_FLOW_TO]--->(Device)
(Device)---[HAS_FLOW_TO]--->(Service)
(Device)---[HAS_FLOW_TO]--->(Device)

Such a model is unrepresentable with a class hierarchy in Java's
single inheritance approach. Classes in such a tree like hierarchy
cannot take so many different roles. It requires interfaces to model
the various roles an entity can play. The entity class can then
implement many interfaces as roles (~ "multiple inheritance").

We ended up defining one interface per start- and one per end-node for
each relationship (role):
(IDepender)---[DEPENDS_ON]--->(IDependee)
(IDataProvider)---[HAS_FLOW_TO]--->(IDataConsumer)

The interface define the allowed operations:

interface IDepender {
    DependsOnRelationship dependsOn(IDependee dependee);
}

The actual entity interface we use in client code then extends all
these interfaces (roles).

interface IService extends IDepender, IDependee, IDataProvider, IDataConsumer
interface IDevice extends IDependee, IDataProvider, IDataConsumer
interface IProcess extends IDepender

The node entity classes then implements this entity interface.

@NodeEntity class Service implements IService { ... }
@NodeEntity class Device implements IDevice { ... }
@NodeEntity class Process implements IProcess { ... }


This works quite well, but does not fit very well into the SDN
concept. How would i.e. the DEPENDS_ON Relationship be declared?
Maybe...

@RelationshipEntity
class DependsOnRelationship {
    @StartNode
    private IDepender start;
    @EndNode
    private IDependee end;
}

This unfortunately is not possible, because concrete NodeBacked
classes are expected as @StartNode and @EndNode here. Since there is
no way to define base classes for all the start- and end-points for
all relationships in a complex model, it becomes impossible to have a
type-safe declaration here. We ended up defining a common base class
for all entities (class BaseEntity) and use this when SDN is requiring
a NodeBacked class:

@NodeEntity class BaseEntity {}
@NodeEntity class Service extends BaseEntity implements IService { ... }
...

@RelationshipEntity
class DependsOnRelationship {
    @StartNode
    private BaseEntity start;
    @EndNode
    private BaseEntity end;

    public IDepender start() { return (IDepender)start; }
    public IDependee end() { return (IDependee)end; }
}

But this is not strictly type safe and does not "describe" what the
start and endpoint of a DependsOnRelationship can be.

Pojos
-----------------------------------------
We have a graphical editor that allows the user to design a data flow
graphically. The editor has its own model to read and store this "flow
model" the user is drawing. When a flow model is loaded, we have to
extract a sub-graph from the neo4j DB and convert it to the editor's
format. When the user saves the flow model, we convert it back,
compare it with the DB and save the changes. What sounds really simple
is actually quite complex. Because the @NodeEntity classes are no
POJOs but directly wired to the DB, we cannot use them to store an in
memory representation of the flow model we get from the editor for
further comparison with the DB. We had to introduce POJO
implementations for all our entity types. So it would be very helpful
to have POJOs as entity types so that they also can "live" without a
database representation.

Subgraphs/use-cases/schema
-----------------------------------------
As you mentioned in your possible loading strategies, it would be very
helpful if the graph model could be described somehow. Although this
collides with the idea of a "schema-free" database which neo4j is,
having a schema would make working with static languages like java
much more comfortable. What I really would like to see is a way to
express the model with interfaces which would give one the highest
flexibility. What the description of the model could be can be
discussed (TraversalDescriptions, queries, some schema definition DSL,
etc.)
Fetching and esp updating subgraphs would be just too wonderful...

Dynamic content
-----------------------------------------
A way to dynamically store "any" content in a node would come in
handy. What if you could just store i.e. a JSON document instead of
single properties? :-o
Tell me if there already is such a feature request for neo4j itself -
otherwise i'll create one immediately.

Traversals
-----------------------------------------
Working with traversals is a bit cumbersome. Using the @GraphTraversal
method only allows very simple traversals out-of-the-box. When an
evaluator is needed that decides on the info of the actually traversed
path, this neo4j path must be converted with ConvertingEntityPath.
Doing this requires a GraphDatabaseContext, which is not available in
an entity, because it is declared private in the aspect. So one has to
annotate the entity with @Configurable to be able to autowire the
GraphDatabaseContext.
The repositories also could provide move flexible traversal methods.

Javadoc
-----------------------------------------
I like the "Good Relationships" tutorial and reference guide. It gives
a very good introduction. But for the daily work one needs up-to-date
javadoc - which is honestly missing in some parts of the project. So
there's room for improvement in this area... :-P



Regarding your loading strategies:
> #1 static declarations on @RelatedTo* annotations (like in JPA) -> don't like 
> b/c of context ignorance
I think this is what people will expect and maybe understand best.
Simple models can be cluttered together very easily with such an
approach.

> #2 dynamic programatic fetching within the session context (by navigating the 
> required relationships)
A method to programatically traverse and navigate through the graph
must be provided anyway. This is fundamental.

> #3 use-case specific fetch-groups, declarative (w/ annotations) + 
> use-case/fetch-group name as context when loading (would be also possible 
> with repository annotations)
> #4 use-case specific fetch groups that are "auto-learned" by the 
> infrastructure when the user navigates relationshiops manually and then are 
> stored as meta information so that at the next fetch with that use-case 
> pre-fetches the data
> #5 declare/define fetch-groups as cypher-queries or traversals (which might 
> also be the "metadata" for #4)
This sounds promising, but it could be very hard for the user to
define a model in such a way. If an approach like this is implemented,
there must be some easy DSL that can be used to describe the
fetch-groups/use-cases. I don't know if it's e.g. feasible to have
such quite complex information in annotations (#3).
Do you maybe know apache's Camel? It has a very, very intuitive DSL to
define "communication routes" in java code. I could imagine having a
DSL like that to describe traversals and maybe even to describe a
schema that defines which entities connect over which relationship to
which other entities. You surely know tinkerpop's Pipes... We make its
DSL typed and here we go... :-)

> #6 have a use-case specific version of the domain objects that just contain 
> the subgraph that the use-case is interested in and no outgoing relationships 
> elsewhere (aka. DDD aggregate) then the infrastructure can fetch this whole 
> subgraph and return it, with the projection abilities the entities can still 
> be projected to other types (and the fetch-group-subgraph could probably used 
> within the domain model to define mapping-contexts/boundaries (DDD again).
Where's the "+1000" or the "I like this very much" Button? :-D

There is plenty of room for discussion though...

But anyway. Keep up the good work. We will be glad seeing version
2.0.0 solving all our graph problems. :-D


Best regards
James


2011/10/4 Michael Hunger <[email protected]>:
> Hi,
>
> I just wanted to share the news around the next version of the Spring Data 
> Neo4j project with you.
>
> First of all - the library will be renamed to Spring Data Neo4j and the next 
> release will be version 2.0
> due to the many breaking changes and new approaches. The new github 
> repository can be found at
>
> http://github.com/springsource/spring-data-neo4j
>
> the old one will stay around for a while (as there are some forks and 
> watchers) and point to the new one.
> We plan to release a milestone around the end of the week and the release 
> candidate in time for Spring One.
>
> We've gotten a lot of feedback and some contribution to the library - thank 
> you all for that.
>
> From the feedback we've seen that many people are not comfortable using full 
> blown AspectJ to handle their
> mapping. So we've decided to expand the approach - the AspectJ mapping will 
> be available as separate library
> but the main Spring Data Neo4j project will use a mapping that is more in 
> line with the other Spring Data
> projects using the mapping facilities from Spring Data Commons.
>
> The project will be split into separate parts:
>
> * spring-data-neo4j - contains all the core code, annotations, template, 
> repository support and Spring Data Commons Mapping based persistence
> * spring-data-neo4j-aspects - contains the aspects for the field interception 
> and the Active Record like method introduction as separate mixin
> * spring-data-neo4j-cross-store - contains additional handling for JPA (and 
> other) cross store implementations
> * spring-data-neo4j-rest - will use the neo4j-java-rest-binding and provide a 
> thin integration layer for Spring Data Neo4j
>
> Some features that we're currently discussing / are going to address:
>
> * provide complete object-graph persistence based on spring-data-commons 
> mapping
> * direct-field access mapping (using AJ) will probably limited to tx-scope
> * remove current lifecycle handling in favor to an explict detach operation 
> that uses one of the strategies outlined below
> * examples will be included in the library and always up to date
> * REST Batch Mode
> * extensive documentation updates
> * a version of the cineasts tutorial to work against REST server
> * address open jira issues like multiple same type relationships, traversal 
> results, serialization, type-identifier-indirection
> * much more
>
> Planned:
> * Cypher DSL + Query-DSL support
> * derived finder support
>
> I'm still discussing a important issue for the mapping - namely loading 
> strategies. So far I have about 5-6 ways to think about none of which I think 
> is the best fit, so getting input on this would be great too.
>
> #1 static declarations on @RelatedTo* annotations (like in JPA) -> don't like 
> b/c of context ignorance
> #2 dynamic programatic fetching within the session context (by navigating the 
> required relationships)
> #3 use-case specific fetch-groups, declarative (w/ annotations) + 
> use-case/fetch-group name as context when loading (would be also possible 
> with repository annotations)
> #4 use-case specific fetch groups that are "auto-learned" by the 
> infrastructure when the user navigates relationshiops manually and then are 
> stored as meta information so that at the next fetch with that use-case 
> pre-fetches the data
> #5 declare/define fetch-groups as cypher-queries or traversals (which might 
> also be the "metadata" for #4)
> #6 have a use-case specific version of the domain objects that just contain 
> the subgraph that the use-case is interested in and no outgoing relationships 
> elsewhere (aka. DDD aggregate) then the infrastructure can fetch this whole 
> subgraph and return it, with the projection abilities the entities can still 
> be projected to other types (and the fetch-group-subgraph could probably used 
> within the domain model to define mapping-contexts/boundaries (DDD again).
>
> Paradox of choice as always. I'd prefer either something like 5 as the 
> advanced version or #6 as type safe explicit one. But probably something 
> simpler for the start would be more sensible.
>
> Looking forward to your feedback for this roadmap and any helpful input you 
> can provide.
>
> Thanks
>
> Michael
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Spring Data Neo4j 2.0 Roadmap

Reply via email to