Re: [Neo] Starting pain with Neo4j

Emil Eifrem Tue, 02 Dec 2008 12:00:50 -0800

Robert --

First off: great and relevant feedback. So thank you for that. Let's
see what we can address. Comments inline.

On Tue, Dec 2, 2008 at 8:41 PM, Emil Eifrem <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Robert Rees gave Neo4j a shot last week, trying to drive it from
> Groovy but came away disappointed:
>
>   http://twitter.com/rrees/statuses/1022507656
>
> I asked why:
>
>   http://twitter.com/emileifrem/status/1023049302
>
> And here is Robert's excellent reply. I asked permission to repost on
> the list to get everyone's input and kept Robert in cc since he's not
> subscribed.
>
> See mail below.
>
> -EE
>
>
> ---------- Forwarded message ----------
> From: Robert Rees <[EMAIL PROTECTED]>
> Date: Fri, Nov 28, 2008 at 4:04 PM
> Subject: Starting pain with Neo4j
> To: [EMAIL PROTECTED]
>
>
> Hi Emil,
>
> This is a follow up to the Tweets we've exchanged. I might also blog
> about this at some point, for my own reference.
>
> So what was the pain with getting started with Neo4j?
>
> Well first I found I couldn't run it without the Shell and JTA jars. I
> wouldn't mind but the documentation implied that if you just want to
> embed a graph db and go you only need the main jar.

Hmm, yea that sucks.

So, the 'neo' component has a compile-time dependency on the 'shell'
component, but only a soft runtime dependency. I.e. it detects the
presence of the shell jar on the classpath when enableRemoteShell() is
invoked and only start up the shell server if it's available. If not,
it'll print an error message. So shell shouldn't have caused you any
trouble, unless of course you wanted to use it, at which point of
course you need the jar.

We do have a 'hard' dependency on JTA. I've toyed with the idea to
replace this dependency with a soft dep, but at the end of the day I
think it introduces more complexity than it's worth. It's probably
better to solve that through an assembly as discussed below.

Anyway, I fixed the line on neo4j.org that said "a single <500k jar."
(Now says that it's a single jar w/ one dependency.)

> I think in terms
> of distributables if Neo4J requires these things then it is worth
> having a "complete" jar for when you do want to just get going and
> play around with it.

Yea, we should definitely make an aggregate assembly of some sort that
includes the most common artifacts like neo, shell, index-util and so
on.

Anders: you suggested this last week, did you create a ticket for it?

>
> The EmbeddedNeo name also caused me a lot of confusion. I read that as
> being "in-memory" in the style of Hsql or Derby. I didn't understand
> that the path I was passing had to be a real directory rather than a
> virtual or conceptual path. I then felt that if the process was going
> to create a lot of files and be quite fussy as to whether it was
> explicitly asked to shutdown then it would actually be easier to have
> it run as a server process.

I think this potential name confusion is something we'll have to live
with. Embedded really means "in-process" and hopefully most people
won't associate it with "in-memory."

> The Abstract Server classes and the whole
> process felt like I had all the process of a big server architecture
> with all the micromanagement involved but none of the power of being
> able to connect and share multiple clients.

Hmm, I'm not sure exactly what you're refering to. I believe Mattias'
shell work has an AbstractServer (?), but that's an internal
implementation class. We're currently not a standalone server but are
doing work in that area (mainly via the RemoteNeo project).

(As a side note: I believe running standalone database servers is an
architectural bug, a boiling frog situation we've got stuck in because
of the 'best practices' and inertia of past paradigms when it was for
some reason deemed ok to expose your persistence layer to everyone and
their mom and use the database as the de-facto integration bus. That's
not the way to roll it in a world of services, IMHO, where you should
expose a domain abstraction rather than your underlying representation
on the wire. In that world, having an embedded database is the only
thing that makes sense. </rant>

Having said that, I realize that a lot of people still expect a
standalone server and it IS very convenient in some situations. So
it's still something we should and will do. But that's the reason why
we didn't start out with that.)

>
> The error messages you get when you fire up an "Embedded" datastore on
> a directory that is either locked or non-existent didn't feel that
> intuitive. The locked message says something like "cannot create
> neoidb" or something similar rather than informing that the store was
> already in use.

Hmm, the exception message is:

               throw new IllegalStateException( "Unable to lock store ["
                   + storageFileName + "], this is usually a result of some "
                   + "other Neo running using the same store." );

where storageFileName will be the file we weren't able to lock. (The
file name is printed for debugging purposes so we can figure out what
went wrong before then.) How should we better phrase it to be more
intuitive?

Traditionally, we've sucked at our exception names (less) and messages
(more). Please help us improve this by letting us know every time an
exception wasn't helpful!

>
> Once everything was running I had two issues. The first was having to
> open a transaction just to read data seemed wrong. I see pure read
> data as being one of the most common tasks and if I am not going to be
> changing anything I don't see why I have to manage a transaction.

Well, the problem is that whether reading data must be in the context
of a transaction is dependent on the transaction's isolation level.
For example, there are isolation levels where with pessimistic locking
you want read locks to block both other writes and reads.

I like grouping logical operations together in transactions. It's a
good way to convey intent ("all these things that I do logically
belong together") and it allows the underlying infrastructure to do
cool optimizations, like batch up flushes and reorder disk operations
sequentially.

I think the overarching issue is that using programmatic transactions
kinda sucks for most cases. That's what we support out of the box, but
we should integrate well with any JTA-compliant container so if you
run something like Spring you can use declarative transactions.That
really cleans up your code.

Furthermore, Michael Hunger has experimented with a neat
template-based API that miltigates the requirement of programmatic
transactions, see http://components.neo4j.org/neo-template-api/.

> Secondly the property setting felt quite cumbersome, I would expect to
> be able to set multiple Properties via a Map<String, type> for
> example. I also think it should be part of the Core API to retrieve
> Nodes by Property although if I am reading the documentation correctly
> I think that might already be on your roadmap.

Properties as a Map: Yea, I've wanted a node.propertiesAsMap() also a
few times. But when I've thought it through I've always ended up with
ambiguous (or at least non-intuitive) semantics for it. Maybe if we
throw some more cycles at it we'll figure it out.

We will improve the integration between index-util and the Neo4j
kernel in the future, at least by bundling them up together in a
convenience assembly (see above) and by using an event framework for
keeping indexes in sync with the node space.

> Again it might be just to do with the name but I would not expect to
> have to explicitly shutdown an Embedded process. The shutdown should
> be on the finalizer for the server. Once an Embedded object goes out
> of scope it is, to my mind, not in use any longer.

Well, it's really tricky to make that stable and easy to use on top of
the JVM. We have very little control over the GC and in particular on
Windows all hell breaks loose with open files.

Idea: What if EmbeddedNeo's constructor registered a shutdown hook
with the JVM to shutdown (if not already done) when the JVM exits?
There's obvious goodness with that, but any bad consequences that I'm
missing?

>
> Finally Traversing queries again felt quite heavyweight, I felt that
> some things like Direction.BOTH and StopEvaluator.END_OF_GRAPH could
> be assumed unless I stated otherwise. I felt that maybe a Traverser
> Builder would have helped my pain by creating Traversers with common
> characteristics quickly.
>
> The same is true of nodes as I kind of envisaged a builder that would
> build up a set of properties and perhaps even the relationships and
> the build the required node and store it.

Hmm, yea. I've heard this come up before. I've opted so far to keep
the API as clean, simple and compact as possible -- my reasoning is
that it's really easy to layer stuff like this on top of the core API
and if a lot of people really end up doing it, we'll move it into a
component and if THAT ends up being used almost all the time, then
we'll push it down into the core API. ("Vote through code.")

Would love to see your (or anyone else's) take on a builder API. (I
think Michael may have included one in his neo-template-api
component.)

> So to sum up, things I liked: a working graph database(!),
> relationships being first order.
>
> Things that were difficult: micromanging the database process, lack of
> lightweight query modes, ceremony for node creation.
>
> I don't know if I am going to get time next week but I might try and
> generate some code illustrating what I mean about wrapping the current
> API in something that assumes basic options until overridden.

Would be great! Thanks a lot for the feedback!

Cheers,

-- 
Emil Eifrém, CEO [EMAIL PROTECTED]
Neo Technology, www.neotechnology.com
Cell: +46 733 462 271 | US: 206 403 8808
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] Starting pain with Neo4j

Reply via email to