[Neo4j] [Spring Data Graph] Some questions/suggestions about cross-store persistence

Michel Domenjoud Fri, 02 Sep 2011 03:16:05 -0700

Hi,
I'm currently testing Spring Data Graph, with a focus on polyglot
persistence use cases, in order to give a short presentation at Spring User
Group in Paris on September.
 This email follows my previous discussion with Michael Hunger (pasted
below), and I have some questions/suggestions:


1- Add a real detached state for entities:
In my previous discussion, I was a bit worrying about the behaviour of Node
Entities which make all getters calls doing a read through in the graph
database, even if we are not in a transaction.
If I understood it correctly, there is indeed no real detached state for
node entities.
I think this is really an issue because it doesn't correspond to the domain
centric purpose of Spring Data. IMHO, this is a semantic problem: if my
NodeEntities are domain objects, I expect that a getter call is immutable,
and so that it is not a read from database operation (at least once I'm out
of a transaction).

=> Imagine I have a big process, for example a computation engine using
nodes entities retrieved from the graph, with long computation, and output
to a file, or another storage engine:
-With the current behaviour, the only way to be sure that all properties of
a Node Entity are immutable when doing some processes is either to keep a
transaction opened during the whole process, either using clones for all
nodes.
- Keeping a very long transaction, knowing I may use many nodes is IMHO
definitly a bad idea.
- I can clone my entities, but I think this is not a good idea too, as I
will use exactly the same class without any backing Node.
=> This matter can be really more confusing when using cross-store
persistence, as JPA entities have real detached state.
To answer to Michael, I don't think this must always goes with complicated
Fetch strategies : you could implement a Lazy loading, which would only
retrieve node properties by default, then the developper would need to
retrieve relationships and related Nodes using an explicit call.

2. persist() operation is a bit confusing and could lead to mistakes: I'd
suggest to separate it in two methods, save and merge.

3. Cross-store persistence: Allow explicit re-attaching JPA side operation.
Currently, when retrieving a partial NodeEntity from graph database, its JPA
is automatically retrieved. On the other side, when retrieving an entity
from relationnal database, I have to make an explicit call to persist() to
merge the graph side.
=> I think this can lead to errors, and performances leaks, by example:
I use a Traversal to retrieve some partial entities in order to update them,
but only for graph side properties. This will work, but for each retrieved
entity a implicit JPA merge call will be done...

4- Last question: what are the forecasts about Cross-store persistence API
in Spring Data Graph? Are you planning to make some enhancements on it, or
is it just some sugar over Spring Data Graph API?

Thanks by advance for your answers!
Michel

Hi Michael,
> Ok, I get your point now. In fact, the thing I didn't understand yet was
> that each get call on an entity can be compared as a SELECT on relational
> db, even no explicit call to the graph repository is done.
>
> So, if I understand well, I'd improve the documentation by adding somthing
> like that after the paragraph
> Existing > All entities returned by library functions are initially in an
> attached state. Just as with any other entity, changing them outside of a
> transaction detaches them, and they must be reattached with persist() for
> the data to be saved.
> Add this after > However, all entities are still attached when reading
> fields, as all gettters will read through the last data in the graph. For
> people used to develop with relationnal databases, this must be  undestood
> as each getter call can be assimiled to a SELECT operation.
>
> Finally, I understand your point about read-through vs. fetch strategies
> issues, but this only means that developpers will have to code this glu by
> themselves on each application. I think that if SDG is intended to become a
> reference for using graph database, this kind of API will have to come with
> it one day (but maybe am I misleading because I'm too used to relationnal
> DB).
>
> Moreover, I see one point that could be really confusing with this approach
> in SDG : cross-store persistence. With this API, you provided the capability
> to manage JPA entities, which can use various fetch strategies, and Graph
> entities which use read through, in the same entity class.
> This point is not mentionned is the documentation, I think you should add a
> big warning about this. Something like:
> > As mentionned on Chapter 18.8 Detached node entities, node entities are
> using read-through. On the other side, JPA entities can use various fetch
> strategies. This point must be considered with caution when developping
> applications.
>
> HTH, and more questions will certainly come about cross-store persistence!
> :)
>
> Michel
>
> - Masquer le texte des messages précédents -
> 2011/8/23 Michael Hunger <michael.hun...@neotechnology.com>
> - Masquer le texte des messages précédents -
> Hi Michel,
>
> they are implicitely detached when modified outside of a transaction. But
> even in detached mode, for the unmodified fields it still reads through !
>
>
> Could you point out how the docs could be improved? To make that easier to
> understand:
>
> http://static.springsource.org/spring-data/data-graph/snapshot-site/reference/html/#reference:programming-model:lifecycle
>
> They read always through but the db uses a cache of course.
>
> Regarding your example with different clients.
>
> Assuming the operation persists before #4 the title will be the new one as
> this is the new state in the db.
>
> It is the same as in a relational db, if you do two selects (which the read
> through is) then you get the value back that is current in the db.
>
> I understand your issue though. Right now the only option would be to copy
> the values that are needed for the output to a separate datastructure if you
> never want to have that happen.
>
> The problem with detaching and copying is that you get quickly into all the
> annoyances of fetch-depths, fetch-groups etc. again, that's a path I don't
> want to walk, it leads to hell :)
>
> Michael
>
> Am 23.08.2011 um 12:55 schrieb Michel Domenjoud:
> - Masquer le texte des messages précédents -
>
> > Michael,
> > Thanks for your quick answer.
> >
> > This leads me to two new points:
> >
> > - You said that an entity is attached when freshly loaded, but I found no
> > way to explicitly detach entities. Am I right?
> > If so, I think you should update the documentation which is quite
> confusing
> > on this point, and explain clearly that detach entities should be used in
> a
> > "write-only" mode.
> >
> > - Moreover, I think there  could be some confusing side effects if
> entities
> > always use read-through :
> > Does this work with a cache or do the entities always read through the
> > database?
> >
> > How would this example behave with two different clients :
> >
> > A client X does the following (let's say title property is indexed):
> > 1. Movie retrievedMovie = movieRepository.findByPropertyValue("Babel");
> > 2. output(retrievedMovie.getTitle()) // prepare some output like Web page
> > 3. ... do some other operations
> > 4. output(retrievedMovie.getTitle()) // for some reason, a second output
> is
> > needed
> >
> > In the same time, a client Y executes the following code:
> > 1. Movie retrievedMovie = movieRepository.findByPropertyValue("Babel");
> > 2.retrievedMovie.setTitle("New title"));
> > 3. retrievedMovie.persist();
> > 4. Some other stuff we don't care
> >
> > Which should be the value of the movie title for client X on step 4?
> >
> > Thanks by advance for your answer.
> > Michel
> >
> >
> >> Date: Tue, 23 Aug 2011 11:42:13 +0200
> >> From: Michael Hunger <michael.hun...@neotechnology.com>
> >> Subject: Re: [Neo4j] [Spring Data Graph] Precisions about Detached
> >>       Entities        and SDG under the hood
> >> To: Neo4j user discussions <user@lists.neo4j.org>
> >> Message-ID: <3a2f0a73-6183-4b32-a02a-7219f0a7f...@neotechnology.com
> >>>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> there are two states attached and detached:
> >>
> >> an entity is detached when it is created or when it is changed outside
> of a
> >> transaction.
> >>
> >> Otherwise (when it is freshly loaded, or after persist it is attached).
> >>
> >> For detached entities: persist() writes the changed properties and
> >> relationships to the graph. if attached (and inside of a tx) all changes
> are
> >> written directly.
> >>
> >> In your example you just overwrote the title with Babel and persisted
> that
> >> information to the graph, so the assert should say:
> >> The retrieved movie is attached, it is never detached, so it always
> refers
> >> to the node in the graph (read-through) (the data is _not_ copied).
> >>
> >>> assertEquals("Babel", retrievedMovie.getTitle());
> >>
> >>
> >> Attached entities read their data directly from the underlying node.
> >>
> >> HTH
> >>
> >> Michael
> >>
> >> The model is different to hibernate, as hibernate has no read-through.
> We
> >> would have loved not to support detached entities but as they are so
> common
> >> in web-frameworks we had to.
> >>
> >> The best way of working with SDG is to use domain level service methods
> >> which are transactional and do the interaction with the graph. Detached
> >> entities should just be used to (if at all) to persist
> >> user input (form data) from the UI.
> >>
> >>
> >>
> >> Am 23.08.2011 um 10:56 schrieb Michel Domenjoud:
> >>
> - Masquer le texte des messages précédents -
> >>> Hello,
> >>> I'm currently testing some of Spring Data Graph features, and I have a
> >> few
> >>> questions about some usages.
> >>>
> >>> Could someone explain to me how the following example works?
> >>> I run the following unit test:
> >>>
> >>> @Test
> >>> public void testUpdatingEntitiesNotInTrans
> action(){
> >>>      Movie m = new Movie();
> >>>      m.setTitle("Leon");
> >>>      m.persist();
> >>>      Long id = m.getNodeId();
> >>>      Movie retrievedMovie = movieRepository.findOne(id);
> >>>      m.setTitle("Babel");
> >>>      m.persist();
> >>>      assertEquals("Leon", retrievedMovie.getTitle());
> >>>
> >>> }
> >>>
> >>> And the assertion at the end fails, as retrievedMovie.getTitle() equals
> >>> "Babel" and not "Leon".
> >>> This point is not really clear in the documentation :
> >>> Does this occurs because of some cache? If so, is it the Neo4j cache?
> And
> >>> what is exactly its scope : thread, session, ...?
> >>> Or is any call to getters triggering an access to the database because
> of
> >>> AspectJ?
> >>>
> >>> Anyway, unless I misundestood something, it's a bit confusing.
> Especially
> >>> when used to APIs like Hibernate, which don't make any refresh of
> >> retrieved
> >>> entities once we are outside of a transaction.
> >>>
> >>> When I read this in documentation, I don't expect that any persist
> >> operation
> >>> affect other retrieved entities :
> >>> Changing an attached entity inside a transaction will immediately write
> >>> through the changes to the datastore. Whenever an entity is changed
> >> outside
> >>> of a transaction it becomes detached. The changes are stored in the
> >> entity
> >>> itself until the next call to persist().
> >>>
> >>> All entities returned by library functions are initially in an attached
> >>> state. Just as with any other entity, changing them outside of a
> >> transaction
> >>> detaches them, and they must be reattached with persist() for the data
> to
> >> be
> >>> saved.
> >>> Maybe I have to precise some points :
> >>>
> >>>  - I'm using Embedded database, with beforeTest cleaning
> >>>  - I don't use any transaction in this test.
> >>>
> >>>
> >>> Thanks by advance for your help!
> >>> Michel
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] [Spring Data Graph] Some questions/suggestions about cross-store persistence

Reply via email to