Robert -- First off: great and relevant feedback. So thank you for that. Let's see what we can address. Comments inline.
On Tue, Dec 2, 2008 at 8:41 PM, Emil Eifrem <[EMAIL PROTECTED]> wrote: > Hi all, > > Robert Rees gave Neo4j a shot last week, trying to drive it from > Groovy but came away disappointed: > > http://twitter.com/rrees/statuses/1022507656 > > I asked why: > > http://twitter.com/emileifrem/status/1023049302 > > And here is Robert's excellent reply. I asked permission to repost on > the list to get everyone's input and kept Robert in cc since he's not > subscribed. > > See mail below. > > -EE > > > ---------- Forwarded message ---------- > From: Robert Rees <[EMAIL PROTECTED]> > Date: Fri, Nov 28, 2008 at 4:04 PM > Subject: Starting pain with Neo4j > To: [EMAIL PROTECTED] > > > Hi Emil, > > This is a follow up to the Tweets we've exchanged. I might also blog > about this at some point, for my own reference. > > So what was the pain with getting started with Neo4j? > > Well first I found I couldn't run it without the Shell and JTA jars. I > wouldn't mind but the documentation implied that if you just want to > embed a graph db and go you only need the main jar. Hmm, yea that sucks. So, the 'neo' component has a compile-time dependency on the 'shell' component, but only a soft runtime dependency. I.e. it detects the presence of the shell jar on the classpath when enableRemoteShell() is invoked and only start up the shell server if it's available. If not, it'll print an error message. So shell shouldn't have caused you any trouble, unless of course you wanted to use it, at which point of course you need the jar. We do have a 'hard' dependency on JTA. I've toyed with the idea to replace this dependency with a soft dep, but at the end of the day I think it introduces more complexity than it's worth. It's probably better to solve that through an assembly as discussed below. Anyway, I fixed the line on neo4j.org that said "a single <500k jar." (Now says that it's a single jar w/ one dependency.) > I think in terms > of distributables if Neo4J requires these things then it is worth > having a "complete" jar for when you do want to just get going and > play around with it. Yea, we should definitely make an aggregate assembly of some sort that includes the most common artifacts like neo, shell, index-util and so on. Anders: you suggested this last week, did you create a ticket for it? > > The EmbeddedNeo name also caused me a lot of confusion. I read that as > being "in-memory" in the style of Hsql or Derby. I didn't understand > that the path I was passing had to be a real directory rather than a > virtual or conceptual path. I then felt that if the process was going > to create a lot of files and be quite fussy as to whether it was > explicitly asked to shutdown then it would actually be easier to have > it run as a server process. I think this potential name confusion is something we'll have to live with. Embedded really means "in-process" and hopefully most people won't associate it with "in-memory." > The Abstract Server classes and the whole > process felt like I had all the process of a big server architecture > with all the micromanagement involved but none of the power of being > able to connect and share multiple clients. Hmm, I'm not sure exactly what you're refering to. I believe Mattias' shell work has an AbstractServer (?), but that's an internal implementation class. We're currently not a standalone server but are doing work in that area (mainly via the RemoteNeo project). (As a side note: I believe running standalone database servers is an architectural bug, a boiling frog situation we've got stuck in because of the 'best practices' and inertia of past paradigms when it was for some reason deemed ok to expose your persistence layer to everyone and their mom and use the database as the de-facto integration bus. That's not the way to roll it in a world of services, IMHO, where you should expose a domain abstraction rather than your underlying representation on the wire. In that world, having an embedded database is the only thing that makes sense. </rant> Having said that, I realize that a lot of people still expect a standalone server and it IS very convenient in some situations. So it's still something we should and will do. But that's the reason why we didn't start out with that.) > > The error messages you get when you fire up an "Embedded" datastore on > a directory that is either locked or non-existent didn't feel that > intuitive. The locked message says something like "cannot create > neoidb" or something similar rather than informing that the store was > already in use. Hmm, the exception message is: throw new IllegalStateException( "Unable to lock store [" + storageFileName + "], this is usually a result of some " + "other Neo running using the same store." ); where storageFileName will be the file we weren't able to lock. (The file name is printed for debugging purposes so we can figure out what went wrong before then.) How should we better phrase it to be more intuitive? Traditionally, we've sucked at our exception names (less) and messages (more). Please help us improve this by letting us know every time an exception wasn't helpful! > > Once everything was running I had two issues. The first was having to > open a transaction just to read data seemed wrong. I see pure read > data as being one of the most common tasks and if I am not going to be > changing anything I don't see why I have to manage a transaction. Well, the problem is that whether reading data must be in the context of a transaction is dependent on the transaction's isolation level. For example, there are isolation levels where with pessimistic locking you want read locks to block both other writes and reads. I like grouping logical operations together in transactions. It's a good way to convey intent ("all these things that I do logically belong together") and it allows the underlying infrastructure to do cool optimizations, like batch up flushes and reorder disk operations sequentially. I think the overarching issue is that using programmatic transactions kinda sucks for most cases. That's what we support out of the box, but we should integrate well with any JTA-compliant container so if you run something like Spring you can use declarative transactions.That really cleans up your code. Furthermore, Michael Hunger has experimented with a neat template-based API that miltigates the requirement of programmatic transactions, see http://components.neo4j.org/neo-template-api/. > Secondly the property setting felt quite cumbersome, I would expect to > be able to set multiple Properties via a Map<String, type> for > example. I also think it should be part of the Core API to retrieve > Nodes by Property although if I am reading the documentation correctly > I think that might already be on your roadmap. Properties as a Map: Yea, I've wanted a node.propertiesAsMap() also a few times. But when I've thought it through I've always ended up with ambiguous (or at least non-intuitive) semantics for it. Maybe if we throw some more cycles at it we'll figure it out. We will improve the integration between index-util and the Neo4j kernel in the future, at least by bundling them up together in a convenience assembly (see above) and by using an event framework for keeping indexes in sync with the node space. > Again it might be just to do with the name but I would not expect to > have to explicitly shutdown an Embedded process. The shutdown should > be on the finalizer for the server. Once an Embedded object goes out > of scope it is, to my mind, not in use any longer. Well, it's really tricky to make that stable and easy to use on top of the JVM. We have very little control over the GC and in particular on Windows all hell breaks loose with open files. Idea: What if EmbeddedNeo's constructor registered a shutdown hook with the JVM to shutdown (if not already done) when the JVM exits? There's obvious goodness with that, but any bad consequences that I'm missing? > > Finally Traversing queries again felt quite heavyweight, I felt that > some things like Direction.BOTH and StopEvaluator.END_OF_GRAPH could > be assumed unless I stated otherwise. I felt that maybe a Traverser > Builder would have helped my pain by creating Traversers with common > characteristics quickly. > > The same is true of nodes as I kind of envisaged a builder that would > build up a set of properties and perhaps even the relationships and > the build the required node and store it. Hmm, yea. I've heard this come up before. I've opted so far to keep the API as clean, simple and compact as possible -- my reasoning is that it's really easy to layer stuff like this on top of the core API and if a lot of people really end up doing it, we'll move it into a component and if THAT ends up being used almost all the time, then we'll push it down into the core API. ("Vote through code.") Would love to see your (or anyone else's) take on a builder API. (I think Michael may have included one in his neo-template-api component.) > So to sum up, things I liked: a working graph database(!), > relationships being first order. > > Things that were difficult: micromanging the database process, lack of > lightweight query modes, ceremony for node creation. > > I don't know if I am going to get time next week but I might try and > generate some code illustrating what I mean about wrapping the current > API in something that assumes basic options until overridden. Would be great! Thanks a lot for the feedback! Cheers, -- Emil Eifrém, CEO [EMAIL PROTECTED] Neo Technology, www.neotechnology.com Cell: +46 733 462 271 | US: 206 403 8808 _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user