Re: [Neo4j] Clarification on read and write locks
Yeah I had seen that, this would allow partially for what I was hoping to do, and is still obviously part of the implementation API. The problem with this is that (from my reading of LockReleaser and LockManager) you can't release a read lock that has been attached to a transaction, otherwise you will have an exception thrown at the point the transaction is doing its lock cleanups. Quick explanation of exactly why this is of interest to me: When client code gets a node collection they can get an iterator from it. While they are iterating over the node collection data structure I would like to read lock as little of the data structure as possible (e.g. say a node representing a sub branch of a tree), then when the iteration moves out of that particular scope release the read lock. that works perfectly fine. The problem is that client code can abandon an iterator at any point meaning that either there needs to be a call made to a node collection cleanup method, or preferably the cleanup releasing of locks is done based on a transaction finishing (which happens if you enlist the lock into the transaction using the lock releaser). Guess I want the best of both worlds, releasing read locks (and even write locks) when I am sure within the library code that they aren't needed anymore, but then having dangling locks auto cleaned up at the end of a transaction. Cheers Bryce On Tue, Sep 27, 2011 at 10:28 AM, McKinley mckinley1...@gmail.com wrote: Bryce, Have you looked at LockManage.releaseReadLock(...)? https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/transaction/LockManager.java#L143 You can call that method with a null argument for the transaction which will take you here: https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/transaction/RWLock.java#L190 Which creates a https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/transaction/RWLock.java#L469 I haven't thought through what you are describing, but I just wanted to make sure you saw that part of the API. Cheers, McKinley ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
Following up on the part of this discussion about moving the enhanced api out of the graph collections module, was meaning to get to this earlier but got side tracked. The dependency that IndexedRelationship had on the ComparablePropertyType which I am assuming is what you are referring to there no longer exists. PropertySortedTree still has a dependency on ComparablePropertyType but this collection could be taken along with the enhanced api as a specialisation of the SortedTree collection specific to the enhanced api. As it still implements NodeCollection it can be used as the basis of an IndexedRelationship, but IndexedRelationship doesn't need to know about it at all. I am wondering however whether there is a problem with this as one thing I just realised isn't happening that previously did is the storage of the property type into the node collection, and for that matter PropertySortedTree currently has no node only constructor so wouldn't currently work correctly I will look into that (hadn't done that since I haven't even had a good look at the enhanced api yet). On Sun, Sep 25, 2011 at 3:28 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: +1 Enhanced API grew out of a couple of classes I added to make IndexedRelationship work more easily (not exposing comparators), but it is essentially a separate component. Giving it that status would help other's improve it. Having laid some of the ground work, I feel it needs other people's input too. As it stands now, it is very much a one-man's work and while I am confident it contains plenty of good ideas, it can only grow with the input of other developers, just like IndexedRelationships has become much better thanks to the work Bryce put into it, and the work of others to include graph-collections with structures I would not even have thought about. There is however one thing we need to look at. Right now IndexRelationships has a dependency on Enhanced API for the indexing of nodes based on a property. At the same time Enhanced API has a dependency on graph-collections, transparently supporting IndexedRelationships in the API. I think it would be best to remove the dependency of graph-collections on enhanced-api and only offer the slightly more complex option where the user needs to provide a comparator. The other dependency can remain and in fact can even be made stronger. Enhanced API could in principle be made to support any type of collection, now that Bryce has added a generic nodecollection interface. I agree enhanced api is not a great name, it says what it does, but certainly has little appeal. So I will be happy if someone can come up with something sexier. Niels From: peter.neuba...@neotechnology.com Date: Sat, 24 Sep 2011 15:42:13 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j Great thoughts guys! I think it would be interesting to break out the Enhanced API from graph-collections, rename it into something better (we can think of a name together) and provide a more fully fledged example that we can document and evolve. WDYT? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta rick.bullo...@thingworx.com wrote: That's a great summary, Niels. Very similar to how we've applied Neo4J here at ThingWorx, though we've done most of the type system work (nodes and relationships are all typed/subtyped) in our application domain layer. A few other items that we leveraged in our implementation that you may wish to consider: - A common pattern we encountered was a collection of typed entities (e.g. a typed collection), and we implemented a specific model using supernodes for this. This also allowed us to rapidly and easily iterate/search collections and also to organize nodes in a human comprehensible way that can be readily viewed with something like Neoclipse for troubleshooting purposes. Also, if the type was truck, we stamped the node with the type truck as a property (using enumerations with a custom int member) and used that same enum as the relationship type between the node and the collection node. In our model, an entity has a single type, but we implemented the concept of supertyping/subtyping in our domain model - We found quite a few examples where a one-way relationship was more than adequate and, instead of incurring the overhead of a relationship (particularly when millions of these relationships were attached to a single supernode), we used a *property* on a node containing the node
Re: [Neo4j] Clarification on read and write locks
Hi Tobias, Thanks for the response, these are exactly the kind of details that I was wanting to get a better understanding of. Would be good to get a public api lock system working, though I know you guys have lots of time pressures :-) From what you have said my understanding is that there is no supported public api way of getting a read lock at present. Doing the best with the tools at hand at present, and understanding this could break in the future if the implementation changes, would the following fix the acquireLock method I posted earlier: private void acquireLock( LockType lockType ) { GraphDatabaseService graphDb = baseNode.getGraphDatabase(); if ( lockType == LockType.READ graphDb instanceof AbstractGraphDatabase ) { Config config = ((AbstractGraphDatabase) baseNode.getGraphDatabase()).getConfig(); config.getLockManager().getReadLock( baseNode ); config.getLockReleaser().addLockToTransaction( baseNode, LockType.READ ); } else { // default to write lock if read locks unavailable baseNode.removeProperty( ___dummy_property_to_acquire_lock___ ); } } My thoughts on the questions you have posed would be: 1) Should there be release() methods for locks as well? or should locks always be tied to the transaction? I like the idea of having the flexibility of locking a certain node for only a part of a transaction, especially in the case of read locks. write locks I am not so sure about, maybe being able to release a write lock only if you didn't actually change the locked element. I am not sure if you have looked at the graph collections stuff, and particularly the UnrolledLinkedList implementation I recently added, but that is the specific situation I am thinking of. My thinking of how an ideal locking situation would work here would be, for reading: - get read lock on the node representing the first page of the ULL - iterate over the value relationships associated with this page - get read lock on the node representing the second page - release the read lock on the first page - any read locks left at the end would need to be tied to transaction, therefore would need to be able to attach and detach locks from transactions For writing: - get write lock on the base node (representing the entry point to the data structure) - do processing to find where the write is going to occur within the data structure (i.e. which page(s) are changed) - get write lock on these page nodes - release write lock on base node (since it hasn't changed then this would be instant release) - change the page nodes - the write locks associated with the page nodes would be released with the transaction commit With this as little of the data structure is being locked at a given time but enough to stop concurrency issues (I think, this is all off the top of my head right now). 2) Where should the methods live? On the entities themselves? i.e. node.acquireReadLock() / node.aquireWriteLock() or on the Transaction object to signal that locks are tied to the life cycle of the transaction? i.e. tx.readLock(node) / tx.writeLock(node) Bit less sure about this but something like this might work: Lock lock = node.acquireReadLock(); // or node.aquireWriteLock() tx.addLock(lock); // now lock is tied to transaction tx.removeLock(lock); // no longer tied to transaction Cheers Bryce On Tue, Sep 27, 2011 at 12:12 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: On Mon, Sep 26, 2011 at 2:28 AM, Bryce bryc...@gmail.com wrote: Is there any reason that access to the lock manager is a little difficult? Is there any issue with using it (both single server and HA)? Are their any issues to look out for? And does the above code look workable? The reason the LockManager is hidden away is because it is considered an implementation detail. We reserve the right to change it at any point without prior notice. From one version to another, the API of this class could potentially look completely different, or have very different semantics. If you write code that depend on implementation internal classes it will at best cause you compilation errors when upgrading to a later version of Neo4j, at worst break in subtle ways after you've deployed your application. There is another important thing to note about your sample code: On Mon, Sep 26, 2011 at 2:28 AM, Bryce bryc...@gmail.com wrote: private void acquireLock( LockType lockType ) { GraphDatabaseService graphDb = baseNode.getGraphDatabase(); if ( lockType == LockType.READ graphDb instanceof AbstractGraphDatabase ) { ((AbstractGraphDatabase) baseNode.getGraphDatabase()).getConfig().getLockManager().getReadLock( baseNode ); } else { // default to write lock if read locks unavailable
Re: [Neo4j] add property to all relationships
I do similar processing a lot, i.e. changes across the entire graph database (either relationships or nodes). Code along the lines of: public void process() { int count = 0; Transaction tx = null; for (Node node : graphDb.getAllNodes()) { if ((count % 1) == 0) { if (tx != null) { System.err.println(Processed: + count); tx.success(); tx.finish(); } tx = graphDb.beginTx(); } // do some processing } if (tx != null) { tx.success(); tx.finish(); } } will split processing of all nodes up into transaction batches of 10,000 changes. On Mon, Sep 26, 2011 at 11:37 PM, sometime dons...@gmail.com wrote: Can you please tell me where to read about transaction batches ? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/add-property-to-all-relationships-tp3368779p3368893.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
One initial suggestion would be that your memory mapped settings are probably not very near optimal. If you have a look at the file sizes in your graph data directory then the closer you can get to covering each db files entire size the better. I would assume that some of the files will be bigger than others and in fact you will probably find a few of them are very small, so you are wasting memory on them that you could assign to another memory mapping. So in one of mine I have: 5807428 neostore.nodestore.db 335536170 neostore.relationshipstore.db 398675470 neostore.propertystore.db 1208 neostore.propertystore.db.index 6906 neostore.propertystore.db.index.keys 1112428784 neostore.propertystore.db.strings 158 neostore.propertystore.db.arrays In which case there is no point in me assigning much if any memory to: neostore.propertystore.db.arrays.mapped_memory neostore.propertystore.db.index.keys.mapped_memory neostore.propertystore.db.index.mapped_memory The other thing to take into account is that the neostore.nodestore.db.mapped_memory and neostore.relationshipstore.db.mapped_memory settings have a lot more impact on traversal than the property story settings. The property store settings will help when you are reading properties from nodes or relationships. So if you can assign memory mapping settings for nodes and relationships to fit it all in memory map that would be good, otherwise still best to assign more to those, and definitely don't give the ones like arrays much memory (unless you are using them a lot). On Tue, Sep 27, 2011 at 12:52 PM, Rick rick.devin...@gmail.com wrote: Looking for help on how to tune traversals, this is a great product with the best API and I want to make sure Im getting the most from it. I'm trying to understand if 62,500 traversals per second is the best I can do given the following scenario: - 15.6M nodes - 15.6M relationships - Data is structured as shown below so that the root has 250 children, each of its children have 250 children, and each of their children have 250 children - If i get the entire list of children and grandchildren for a top node (max 3 levels deep), I get 62,500 nodes, and this takes about 800-1000ms - The server is a dual core quad 3.2ghz Xeon with 16gb ram - The neo4j.props settings are: neostore.nodestore.db.mapped_memory=1G neostore.relationshipstore.db.mapped_memory=1G neostore.propertystore.db.mapped_memory=1G neostore.propertystore.db.index.mapped_memory=1G neostore.propertystore.db.index.keys.mapped_memory=1G neostore.propertystore.db.strings.mapped_memory=1G neostore.propertystore.db.arrays.mapped_memory=1G - The code that does the traversal is Traverser trav = user.traverse( Order.BREADTH_FIRST, new StopEvaluator() { public boolean isStopNode(TraversalPosition pos) { return pos.depth() = 3; } }, new ReturnableEvaluator() { public boolean isReturnableNode(TraversalPosition pos) { return pos.depth() 3; } }, KNOWS, Direction.BOTH ); for ( Node node : trav ) { // Do something with node... i++; } Data example root node 0-0-0 node 0-0-1 node 0-0-2 ... node 0-1-0 node 0-1-1 node 0-1-2 ... -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371038.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
It wont make any difference if the memory mapping settings are just larger than the file sizes, or a lot larger therefore fiddling with those settings wont make any difference from your original test. Generally when people see very high performance it is because a lot of the data they are traversing over is already in memory, i.e. the caches are warmed. So is this test you are running just from a cold start, and if so can you try the test twice, within the same vm that is. On Tue, Sep 27, 2011 at 3:48 PM, Rick Devinsus rick.devin...@gmail.comwrote: I took a look at the files and none were larger than 500MB, however it makes a lot of sense to change the memory as you suggested so I altered the options as shown below. I also started eclipse with different memory options than the defaults (eclipse -vmargs -Xmx2000m -server). The changes didn't make it any faster though. I had read about people getting 2M traversals per second, since I'm only seeing around 65000/sec I'm starting to think that represented the number of nodes searched through not the number returned based on the traversal's criteria. neostore.nodestore.db.mapped_memory=1.5G neostore.relationshipstore.db.mapped_memory=1.5G neostore.propertystore.db.mapped_memory=1.5G neostore.propertystore.db.index.mapped_memory=1.5G neostore.propertystore.db.index.keys.mapped_memory=50M neostore.propertystore.db.strings.mapped_memory=50M neostore.propertystore.db.arrays.mapped_memory=50M my file sizes: neostore.relationshipstore.db 500MB neostore.propertystore.db 383MB neostore.nodestore.db 137MB (others are all less than 1MB) the largest lucene node is 367MB -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371379.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Clarification on read and write locks
Hi, I am just trying to clarify my understanding of read and write locks. There has been discussions around getting a write lock on a given node by removing a non-existent property from the node, which in turn triggers a nodeManager.acquireLock which calls lockManager.getWriteLock. The only other option I see is casting the GraphDatabaseService to an AbstractGraphDatabase then calling getConfig().getLockManager().getWriteLock( node ). I am guessing the main reason for the removing non-existent property hack is ease of use write locking? From the lock manager you can also get a read lock, which would come in handy, but since access to this is through AbstractGraphDatabase I would have to check the graph db instance to make sure I can cast it, then have if / else code handling the different cases, e.g.: private void acquireLock( LockType lockType ) { GraphDatabaseService graphDb = baseNode.getGraphDatabase(); if ( lockType == LockType.READ graphDb instanceof AbstractGraphDatabase ) { ((AbstractGraphDatabase) baseNode.getGraphDatabase()).getConfig().getLockManager().getReadLock( baseNode ); } else { // default to write lock if read locks unavailable baseNode.removeProperty( ___dummy_property_to_acquire_lock___ ); } } Is there any reason that access to the lock manager is a little difficult? Is there any issue with using it (both single server and HA)? Are their any issues to look out for? And does the above code look workable? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Clarification on read and write locks
I am looking at this in relation to an graph collections class, UnrolledLinkedList. Since this is a library class that others might use it has to rely on the top level interface, or more correctly it has to be able to work off being passed a Node. Since Node has getGraphDatabase() returning a GraphDatabaseService that is what I have to work with. On Mon, Sep 26, 2011 at 2:44 PM, McKinley mckinley1...@gmail.com wrote: Bryce, The situation you are facing seems to only be a problem with choosing the type of the interface for your variable instead of the type that implements the getConfig method. For example: EmbeddedGraphDatabase graphDb = new EmbeddedGraphDatabase(data-dir); lockManager = graphDb.getConfig().getLockManager(); That will work straight without any type checks needed. If you have a need to only rely on GraphDatabaseService in other parts of your code to provide a common base for both EmbeddedGraphDatabase and something that is not EmbeddedGraphDatabase then you coud perhaps easier assign: GraphDatabaseService graphDbService = graphDb; Cheers, McKinley ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unrolled Linked List
Thad, there is some good advice contained within those links, but it doesn't really fit the situation that I am discussing. The locks I am talking about are locks on data within the graph database, not java programmatic locks (though in a non-HA environment java locks could well be used as an optimisation). Think about this as row level locks in a database. The main difference is gaining a lock on a given node within neo4j means that the thread that has the lock will keep the lock for the entirety of the current transaction. I think there is no way to remove the lock earlier, as far as I am aware, it could be done theoretically if neo4j supported nested transactions (but presently it doesn't) -- again I am not 100% sure on this point. I have however realised I had missed the read locks (or at least forgotten about them) that neo4j lock manager offers. So most of what I was talking about in regards to pseudo read locks is solved by actual read locks :-) On Sat, Sep 24, 2011 at 2:33 PM, Thad Guidry thadgui...@gmail.com wrote: Non-blocking has a new meaning compared to the old ways: http://en.wikipedia.org/wiki/Non-blocking_algorithm So.. Try to Reduce Lock Contention, instead of getting rid of them entirely: http://www.thinkingparallel.com/2007/07/31/10-ways-to-reduce-lock-contention-in-threaded-programs/ But also do not forgot the ways of old, which are now buried deep in our most useful java libraries. http://java.sun.com/developer/technicalArticles/Programming/Performance/ On Fri, Sep 23, 2011 at 9:09 PM, Bryce bryc...@gmail.com wrote: As long as all access to these nodes is through the ULL class then the lock at the base node for writes, giving a datastructure wide lock, should make that threadsafe (i.e. only one writer can lock the base at a given time, therefore only one writer can be adjusting the item count at a given time). Though in this case the lock can be gained against the page node very easily just by changing the order of the code: Node page = checkSplitNode( getPage( node ), node ); page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); return page.createRelationshipTo( node, GraphCollection.RelationshipTypes.VALUE ); To: Node page = checkSplitNode( getPage( node ), node ); Relationship relationship = page.createRelationshipTo( node, GraphCollection.RelationshipTypes.VALUE ); page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); return relationship; Where from my understanding the creation of the relationship from the page node will lock that node therefore making the subsequent line thread safe even without the global base node lock. I could live easily with having to have a transaction for reading over the collection, as long as multiple readers could be in the collection at the same time. I am trying to think of a way of doing this, basically at present my thinking is along the lines of: 1. multiple read lock nodes are attached to the base node (say N) 2. when a thread accesses the collection for read access it locks one of these nodes (by removing non-existent property) 3. therefore, as long as each read thread locks a different node, N reading threads can be in at once 4. for a write to occur the writer first locks the base node, then locks each of the read lock nodes meaning that neither readers or writers can get their required lock The difficulty comes in point 3 above, how to have threads lock different nodes? To do this accurately you would need to be able to check whether someone else has a lock on a given node at a given time. A simple half working option would be having say 20 read lock nodes hanging off the base node, and randomly choose one. Would mean there is a good chance that multiple readers could run at once. On Sat, Sep 24, 2011 at 10:20 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Read integrity is really a dog. We haven't even begun to address that in the other collections. With regards to write locks ( and this is something we should check in sortedtree too ) is code like: page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); This is only threadsafe if the value returned by page.getProperty( ITEM_COUNT ) is read from a locked node. Niels Date: Sat, 24 Sep 2011 09:14:07 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List For writing that works well, and in fact for add node I am doing that (just realised I am not for remove node but I should be). The problems for reading are: - it should allow multiple threads to read at the same time - it shouldn't dictate that client code has a transaction in order to read As a simple solution thats probably workable (and probably
Re: [Neo4j] HyperRelationship example
Here you go: https://github.com/neo4j/graph-collections/wiki/HyperRelationship-example Though that page just has a link to: https://github.com/neo4j/graph-collections/wiki/Enhanced-API Bryce On Sat, Sep 24, 2011 at 5:00 PM, loldrup lold...@gmail.com wrote: Niels Hoogeveen wrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example This link now gives 404. Does it have a new address? If so, what is it? Jon -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-HyperRelationship-example-tp3204449p3363779.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unrolled Linked List
For writing that works well, and in fact for add node I am doing that (just realised I am not for remove node but I should be). The problems for reading are: - it should allow multiple threads to read at the same time - it shouldn't dictate that client code has a transaction in order to read As a simple solution thats probably workable (and probably the safest), and means that HA will just work, but restricting one thread at a time into a given node collection isn't the best. Maybe the client code should set whether it locks the data structure when reading, or fails with a ConcurrentModificationException when reading and data is changed. On Sat, Sep 24, 2011 at 6:00 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: A quick skim of the code shows me you have a baseNode which is an entrypoint for the ULL. This is a logical candidate node to use for the purpose of locking. What are the pros and cons to locking the baseNode on every read and write operation? Niels Date: Fri, 23 Sep 2011 09:39:38 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List Good stuff. I am presently looking into concurrent use of a given UnrolledLinkedList at least within the same graph database instance, might be a little bit harder in HA environment. Its hard enough writing test cases for this, maybe even harder than making it work properly! Hoping that some utility code I am going to produce will help with testing concurrency of other data structures. By concurrent use I mean concurrent use of the data within the graph, not of the given instantiation of the class, e.g. what happens when one thread gets an instance of ULL based off a given node and is iterating over it, then another thread gets an instance of a ULL and writes into it. Cheers Bryce On Fri, Sep 23, 2011 at 4:57 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: It looks really cool. I always find it fun to create something and later find out it is an already known construction (something worth inventing). Anyway, I pulled your code and will removed the dependencies to the Enhanced API stuff this week. After that we can start adding some documentation. Niels Date: Thu, 22 Sep 2011 15:57:13 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Unrolled Linked List Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags with a large number of items against them, I might read the first 100-200 items out of a collection of 30,000 and therefore by not reading in the other 29800 relationships / nodes (per tag) I should be saving 90% or more. here's hoping. From the java doc: The structure is broken into pages of links to nodes where the size of the page can be controlled at initial construction time. Page size is not fixed but instead can float between a lower bound, and an upper bound. The bounds are at a fixed margin from the page size of M. When a page drops below the lower bound it will be joined onto
Re: [Neo4j] Unrolled Linked List
As long as all access to these nodes is through the ULL class then the lock at the base node for writes, giving a datastructure wide lock, should make that threadsafe (i.e. only one writer can lock the base at a given time, therefore only one writer can be adjusting the item count at a given time). Though in this case the lock can be gained against the page node very easily just by changing the order of the code: Node page = checkSplitNode( getPage( node ), node ); page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); return page.createRelationshipTo( node, GraphCollection.RelationshipTypes.VALUE ); To: Node page = checkSplitNode( getPage( node ), node ); Relationship relationship = page.createRelationshipTo( node, GraphCollection.RelationshipTypes.VALUE ); page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); return relationship; Where from my understanding the creation of the relationship from the page node will lock that node therefore making the subsequent line thread safe even without the global base node lock. I could live easily with having to have a transaction for reading over the collection, as long as multiple readers could be in the collection at the same time. I am trying to think of a way of doing this, basically at present my thinking is along the lines of: 1. multiple read lock nodes are attached to the base node (say N) 2. when a thread accesses the collection for read access it locks one of these nodes (by removing non-existent property) 3. therefore, as long as each read thread locks a different node, N reading threads can be in at once 4. for a write to occur the writer first locks the base node, then locks each of the read lock nodes meaning that neither readers or writers can get their required lock The difficulty comes in point 3 above, how to have threads lock different nodes? To do this accurately you would need to be able to check whether someone else has a lock on a given node at a given time. A simple half working option would be having say 20 read lock nodes hanging off the base node, and randomly choose one. Would mean there is a good chance that multiple readers could run at once. On Sat, Sep 24, 2011 at 10:20 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Read integrity is really a dog. We haven't even begun to address that in the other collections. With regards to write locks ( and this is something we should check in sortedtree too ) is code like: page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); This is only threadsafe if the value returned by page.getProperty( ITEM_COUNT ) is read from a locked node. Niels Date: Sat, 24 Sep 2011 09:14:07 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List For writing that works well, and in fact for add node I am doing that (just realised I am not for remove node but I should be). The problems for reading are: - it should allow multiple threads to read at the same time - it shouldn't dictate that client code has a transaction in order to read As a simple solution thats probably workable (and probably the safest), and means that HA will just work, but restricting one thread at a time into a given node collection isn't the best. Maybe the client code should set whether it locks the data structure when reading, or fails with a ConcurrentModificationException when reading and data is changed. On Sat, Sep 24, 2011 at 6:00 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: A quick skim of the code shows me you have a baseNode which is an entrypoint for the ULL. This is a logical candidate node to use for the purpose of locking. What are the pros and cons to locking the baseNode on every read and write operation? Niels Date: Fri, 23 Sep 2011 09:39:38 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List Good stuff. I am presently looking into concurrent use of a given UnrolledLinkedList at least within the same graph database instance, might be a little bit harder in HA environment. Its hard enough writing test cases for this, maybe even harder than making it work properly! Hoping that some utility code I am going to produce will help with testing concurrency of other data structures. By concurrent use I mean concurrent use of the data within the graph, not of the given instantiation of the class, e.g. what happens when one thread gets an instance of ULL based off a given node and is iterating over it, then another thread gets an instance of a ULL and writes into it. Cheers Bryce On Fri, Sep 23, 2011 at 4:57 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: It looks really cool. I always find it fun
Re: [Neo4j] Unrolled Linked List
Good stuff. I am presently looking into concurrent use of a given UnrolledLinkedList at least within the same graph database instance, might be a little bit harder in HA environment. Its hard enough writing test cases for this, maybe even harder than making it work properly! Hoping that some utility code I am going to produce will help with testing concurrency of other data structures. By concurrent use I mean concurrent use of the data within the graph, not of the given instantiation of the class, e.g. what happens when one thread gets an instance of ULL based off a given node and is iterating over it, then another thread gets an instance of a ULL and writes into it. Cheers Bryce On Fri, Sep 23, 2011 at 4:57 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: It looks really cool. I always find it fun to create something and later find out it is an already known construction (something worth inventing). Anyway, I pulled your code and will removed the dependencies to the Enhanced API stuff this week. After that we can start adding some documentation. Niels Date: Thu, 22 Sep 2011 15:57:13 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Unrolled Linked List Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags with a large number of items against them, I might read the first 100-200 items out of a collection of 30,000 and therefore by not reading in the other 29800 relationships / nodes (per tag) I should be saving 90% or more. here's hoping. From the java doc: The structure is broken into pages of links to nodes where the size of the page can be controlled at initial construction time. Page size is not fixed but instead can float between a lower bound, and an upper bound. The bounds are at a fixed margin from the page size of M. When a page drops below the lower bound it will be joined onto the an adjacent page, and when the page goes above the upper bound it will be split in half. I am about to do some tests with this based on my use case and will report back on the performance impacts. Cheers Bryce P.S. still thinking about how to make this thread safe, any suggestions would be appreciated (presently only one thread will be able to write at a time, I am worried about a thread reading while another is writing, especially when it joins / splits pages or changes the head). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Unrolled Linked List
Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags with a large number of items against them, I might read the first 100-200 items out of a collection of 30,000 and therefore by not reading in the other 29800 relationships / nodes (per tag) I should be saving 90% or more. here's hoping. From the java doc: The structure is broken into pages of links to nodes where the size of the page can be controlled at initial construction time. Page size is not fixed but instead can float between a lower bound, and an upper bound. The bounds are at a fixed margin from the page size of M. When a page drops below the lower bound it will be joined onto the an adjacent page, and when the page goes above the upper bound it will be split in half. I am about to do some tests with this based on my use case and will report back on the performance impacts. Cheers Bryce P.S. still thinking about how to make this thread safe, any suggestions would be appreciated (presently only one thread will be able to write at a time, I am worried about a thread reading while another is writing, especially when it joins / splits pages or changes the head). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface
Hi Niels, Probably is a good idea. I will try to get something done around that soon, flat out with work issues/features at present (including a nice concurrency bug, argh). Cheers Bryce On Wed, Sep 21, 2011 at 2:01 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi Bryce, Sorry for the late response. I understand it's difficult to come up with a really good use-case for making NodeCollection more general in the context of IndexedRelationships, but I like to think of that interface as something we can eventually use for all sorts of collections, not just the ones derived from SortedTree. There is of course the issue that relationships can not attach to relationships, so collections of relationships will need to be addressed by Id. This is not necessarily a bad thing, because it decouples the container and the elements. In other words the container knows what elements it contains, but the elements don't know in what containers they are placed. Another option would be to create shadow nodes for contained relationships. Instead of adding a relationships to the collection, its shadow node is added and both the shadow node and the relationship contain pointers (properties with Id values) towards each other. I think it would be best if we do indeed create a GraphCollection interface parameterized by T extends PropertyContainer even if that type parameter for now is always a Node. It doesn't add much complexity now to do it, and later on we may regret it and by then it becomes harder to do because there is an installed base. Niels Date: Sat, 17 Sep 2011 14:19:04 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface Hi Niels, I had wondered about having a collection interface that covered both nodes and relationships. There were a couple of reasons I didn't go with that right now, though well worthwhile discussing it and going with a GraphCollection super interface if it fits properly. Firstly I wanted to get something out there so people could have a look, and having something that matched what IndexedRelationship currently required was easiest first step. Biggest thing specific in there to that functionality is the addNode method returning a relationship. The other issue was more wondering how a relationship collection would work. Say I have a relationship collection, and I have a relationship R1 between node A and B, how am I going to represent that relationship withing some graph based data structure that makes sense. There could be a node X that is part of the relationship collection data structure (e.g. tree) and that node could have an attribute that has the relationship id on it, but that doesn't seem like it would be very performant. There could be a relationship between X and A that also gave the relationship type of R1, so you could find the relationship based on that, but there isn't any guarantee of the relationship type being unique. What it would need to properly model it is the ability to have a relationship between X and R1, i.e. a relationship from a node to a relationship. If instead of being able to add any given relationship to the relationship collection you instead restrict it to being relationships matching a certain criteria from a given node then it is practically the same thing as a relationship expander. Or if you instead have a way through the relationship collection to create relationships from a given node to a set of other arbitrary nodes, with the relationship collection having a fixed relationship type and direction, then that is practically the current IndexedRelationship. I guess a way it could work is similar to IndexedRelationship, basically more general case of that class, where you have a method on the relationship collection createRelationship(startNode, endNode, relationshipType, direction) that was then stored in an internal data structure to create a pseudo relationship between the start and end, and then being able to iterate over this set of relationships. Not sure exactly what the use case of that would be. Maybe of more interest could be the same situation where the relationship type and direction are fixed, then you may have a friend of set of relationships that you create between arbitrary nodes and then iterate over all of those. I can't personally think of a good way of adding a set of arbitrary relationships into a collection stored in a graph data structure. Thoughts? Cheers Bryce P.S. Peter, I had thought to remove the passing in of the graph database and instead just getting it from the node, or only passing in the graph database and creating the node internally. On Sat, Sep 17, 2011 at 2:19 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi Bryce, I really like what
Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface
Hi Niels, I had wondered about having a collection interface that covered both nodes and relationships. There were a couple of reasons I didn't go with that right now, though well worthwhile discussing it and going with a GraphCollection super interface if it fits properly. Firstly I wanted to get something out there so people could have a look, and having something that matched what IndexedRelationship currently required was easiest first step. Biggest thing specific in there to that functionality is the addNode method returning a relationship. The other issue was more wondering how a relationship collection would work. Say I have a relationship collection, and I have a relationship R1 between node A and B, how am I going to represent that relationship withing some graph based data structure that makes sense. There could be a node X that is part of the relationship collection data structure (e.g. tree) and that node could have an attribute that has the relationship id on it, but that doesn't seem like it would be very performant. There could be a relationship between X and A that also gave the relationship type of R1, so you could find the relationship based on that, but there isn't any guarantee of the relationship type being unique. What it would need to properly model it is the ability to have a relationship between X and R1, i.e. a relationship from a node to a relationship. If instead of being able to add any given relationship to the relationship collection you instead restrict it to being relationships matching a certain criteria from a given node then it is practically the same thing as a relationship expander. Or if you instead have a way through the relationship collection to create relationships from a given node to a set of other arbitrary nodes, with the relationship collection having a fixed relationship type and direction, then that is practically the current IndexedRelationship. I guess a way it could work is similar to IndexedRelationship, basically more general case of that class, where you have a method on the relationship collection createRelationship(startNode, endNode, relationshipType, direction) that was then stored in an internal data structure to create a pseudo relationship between the start and end, and then being able to iterate over this set of relationships. Not sure exactly what the use case of that would be. Maybe of more interest could be the same situation where the relationship type and direction are fixed, then you may have a friend of set of relationships that you create between arbitrary nodes and then iterate over all of those. I can't personally think of a good way of adding a set of arbitrary relationships into a collection stored in a graph data structure. Thoughts? Cheers Bryce P.S. Peter, I had thought to remove the passing in of the graph database and instead just getting it from the node, or only passing in the graph database and creating the node internally. On Sat, Sep 17, 2011 at 2:19 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi Bryce, I really like what you are trying to achieve here. One question: Instead of having NodeCollection, why not have GraphCollectionT extends PropertyContainer. That way we can have collections of both Relationships and Nodes. Niels Date: Fri, 16 Sep 2011 17:37:29 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Neo4j graph collections introduction of NodeCollection interface Hi, I had mentioned in a previous thread that I was working on introducing a NodeCollection interface to remove the dependency from IndexedRelationship to SortedTree. I have an initial cut of this up now in my github repo: https://github.com/brycenz/graph-collections It would be great to get community feedback on this as I think that having a well designed and common NodeCollection interface would help for multiple use cases, e.g. sortedTreeNodeCollection.addAll(linkedListNodeCollection) doing exactly what you think it would. IndexedRelationship now takes a node to index relationships from, a relationship type, and a direction, as well as a NodeCollection at creation time. As in the unit tests this then leads to: Node indexedNode = graphDb().createNode(); SortedTree st = new SortedTree( graphDb(), graphDb().createNode(), new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, st ); To create the IndexedRelationship. To later add nodes to the relationship you need to create an instance of IndexedRelationship without the NodeCollection: IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING ); What this means from a NodeCollection implementation point of view is that firstly it needs to use the NodeCollection.RelationshipType.VALUE
[Neo4j] Neo4j graph collections introduction of NodeCollection interface
Hi, I had mentioned in a previous thread that I was working on introducing a NodeCollection interface to remove the dependency from IndexedRelationship to SortedTree. I have an initial cut of this up now in my github repo: https://github.com/brycenz/graph-collections It would be great to get community feedback on this as I think that having a well designed and common NodeCollection interface would help for multiple use cases, e.g. sortedTreeNodeCollection.addAll(linkedListNodeCollection) doing exactly what you think it would. IndexedRelationship now takes a node to index relationships from, a relationship type, and a direction, as well as a NodeCollection at creation time. As in the unit tests this then leads to: Node indexedNode = graphDb().createNode(); SortedTree st = new SortedTree( graphDb(), graphDb().createNode(), new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, st ); To create the IndexedRelationship. To later add nodes to the relationship you need to create an instance of IndexedRelationship without the NodeCollection: IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING ); What this means from a NodeCollection implementation point of view is that firstly it needs to use the NodeCollection.RelationshipType.VALUE relationship to connect from its internal data structure to the nodes being added to the collection, and it needs to be able to recreate itself from a base node that is passed into a constructor (that only takes the base node). A node collection also needs to store its class name on the base node for later construction purposes, as well as any other data required to recreate the NodeCollection instance (in the case of SortedTree this is the comparator class, the tree name, and whether it is a unique index. Niels, you may want to have a good look over SortedTree, I have made a few changes to it, mainly around introduction of a base node, and changing of the end value relationships. This could be cleaned up better, but I wanted to start with minimal changes. Both IndexedRelationship and IndexedRelationshipExpander have no dependencies on SortedTree now, and should work with any properly implemented NodeCollection. I will be putting together a paged linked list NodeCollection next to try this. Some future thoughts for NodeCollection, addition of as many of the java.util.Collection methods (e.g. addAll, removeAll, retainAll, contains, containsAll) as well as an abstract base NodeCollection to help provide non-optimised support for these methods. Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Write Test Compile Error
Hi James, I initially gave this a go and it worked, but found I had cached version of the parent pom in my local maven repo. Removing this I got the same issue you had. You can get this working by replacing the parent pom reference in the pom.xml file with: parent groupIdorg.neo4j.build/groupId artifactIdparent-central/artifactId version25/version /parent Instead of the current: parent groupIdorg.neo4j/groupId artifactIdparent-pom/artifactId version6/version /parent It still wont work quite as is though. The build will fail on a license check, so do compile like: mvn -Dlicense.failIfMissing=false compile Should work. Cheers Bryce On Fri, Sep 9, 2011 at 2:12 PM, espeed ja...@jamesthornton.com wrote: Hi Guys - I'm trying to compile and run the write test (https://svn.neo4j.org/laboratory/users/johan/write-test) from http://wiki.neo4j.org/content/Linux_Performance_Guide, and I'm getting this error: https://gist.github.com/1205327 Is there a newer version of this? Thanks. - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Test-Compile-Error-tp3321729p3321729.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Issues with IndexedRelationship
Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility
Re: [Neo4j] Issues with IndexedRelationship
Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I
Re: [Neo4j] Issues with IndexedRelationship
this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
I have made the changes in regards to SortedTree in regards to relationships vs nodes, and have got all the tests passing. The changes are pushed up to my github account (and pull request has been raised). The changes can be seen here: https://github.com/brycenz/graph-collections On Thu, Sep 8, 2011 at 3:41 PM, Bryce bryc...@gmail.com wrote: Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same
Re: [Neo4j] IndexedRelationship some observations and questions
Hi Niels, Thanks for your feedback below. Specifically on the anonymous inner class issue in general they can be used as a Comparator without a problem, and in this case it worked perfectly for creating the IndexedRelationship. The issue was when then trying to instantiate the anonymous inner class in IndexedRelationshipExpander which was throwing a ClassNotFoundException (I think, from memory). I also think non-static inner classes also wouldn't work as it is coded right now, since they need a reference to the outer class, and personally supporting them may be rather difficult. Even if anonymous inner classes could be supported, is it really a good idea to store as a comparator XYZClass$5 when adding another anonymous inner class to the XYZClass would increment the number and mean the index will no longer work. Given what you have said about how the tree is represented it should be relatively easy to have IndexedRelationshipExpander support following relationships from the leaf nodes through to the root (as well as what it presently does: root - leaf direct relationships). I might have a look at doing that later on today. Cheers Bryce On Tue, Sep 6, 2011 at 6:58 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi Bryce, Sorry for my belated response. I have been away for a couple of days and wasn't able to check my emails. I am glad you took the time to look into the IndexRelationship module. It certainly could use some scrutiny. Remarks: 1) Good catch... Something the unit test didn't catch because it runs in the same namespace as IndexedRelationship itself. Didn't catch it in user code either, because personally I prefer to directly call SortedTree. 2) Agreed. It should be possible to define more than one IndexRelationship per node. 3) I haven't tried out an anonymous inner class as Comparator. As far as I can tell any object implementing ComparatorNode should be able to work as a comparator. Questions: 1) That is certainly an option. IndexRelationships however offer you the possibility to sort your Relationships based on some value associated with a node (for example creation/edit date of the document). This may be a reason to use IndexRelationships even in the situation where you have less than 500 entries per tag (though it would be possible to do that sorting in memory too). 2) The end node of an IndexRelationship is always referred to by a Relationship with RelationshipType KEY_VALUE, and has a property tree_name (both are defined in SortedTree). The tree_name property has the same value as the RelationshipType.name used in IndexRelationship. To traverse from a leaf node to the tree root, keep following the incoming relationships: KEY_VALUE (there is only one), KEY_ENTRY (there can be many), SUB_TREE (there can be many), TREE_ROOT (there is only one) Niels Date: Fri, 2 Sep 2011 11:44:40 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] IndexedRelationship some observations and questions Hi, I have been looking at performance options for Neo4j as presently I have been observing a number of performance issues. I am still investigating the way to get the best performance out of what I am doing, and one thing it might be are longer running transactions stopping other work going on (but thats an aside to what this message is about). One of the things that I investigated using was the IndexedRelationship work by Niels. Thought I would give a bit of feedback, although I haven't quite got this implemented at present. 1) I had to change the IndexedRelationshipExpander to be a public class in order to use it outside the package its in. 2) IndexedRelationship assumes only one tree root per node, whereas the expander allows for multiple (IndexedRelationship uses getSingleRelationship vs expander using getRelationships then matching on tree name). Having multiple would obviously be good as it means you could have two types of relationships covered by IndexedRelationship's. 3) Might pay to make it clear in the Javadocs for IndexedRelationship that the comparator can't be an anonymous inner class. Then I have some questions about usage of this. First a little background of the model I have, from reading a few things it seems quite standard. There are a lot of document nodes each of which have a relationship with multiple tag nodes. Documents generally have in the order of 10-20 tags, and tags can have as few as 1 document and sometimes tens of thousands. When tags are viewed through the UI they are almost always displayed with a descending date ordered list of documents. Seemed to be to fit quite well with IndexedRelationship. 1) I was thinking of having a switch over point at say around 500 documents for a given node where I will switch from using normal relationships to an IndexedRelationship as I was thinking at small numbers
[Neo4j] IndexedRelationship some observations and questions
Hi, I have been looking at performance options for Neo4j as presently I have been observing a number of performance issues. I am still investigating the way to get the best performance out of what I am doing, and one thing it might be are longer running transactions stopping other work going on (but thats an aside to what this message is about). One of the things that I investigated using was the IndexedRelationship work by Niels. Thought I would give a bit of feedback, although I haven't quite got this implemented at present. 1) I had to change the IndexedRelationshipExpander to be a public class in order to use it outside the package its in. 2) IndexedRelationship assumes only one tree root per node, whereas the expander allows for multiple (IndexedRelationship uses getSingleRelationship vs expander using getRelationships then matching on tree name). Having multiple would obviously be good as it means you could have two types of relationships covered by IndexedRelationship's. 3) Might pay to make it clear in the Javadocs for IndexedRelationship that the comparator can't be an anonymous inner class. Then I have some questions about usage of this. First a little background of the model I have, from reading a few things it seems quite standard. There are a lot of document nodes each of which have a relationship with multiple tag nodes. Documents generally have in the order of 10-20 tags, and tags can have as few as 1 document and sometimes tens of thousands. When tags are viewed through the UI they are almost always displayed with a descending date ordered list of documents. Seemed to be to fit quite well with IndexedRelationship. 1) I was thinking of having a switch over point at say around 500 documents for a given node where I will switch from using normal relationships to an IndexedRelationship as I was thinking at small numbers of relationships normal relationships would be quicker. Would that be correct, or not worth it? 2) On the tag end (which is the incoming end of the document-tag relationship) I was going to use a IndexedRelationshipExpander which would cover the case of whether the relationship was done through normal relationships, or through an IndexedRelationship. I also need to get a set of tags from the document end where their may be both normal relationships, and relationships coming from multiple IndexedRelationship's. From looking at it IndexedRelationshipExpander doesn't cover the reverse direction, but I would imagine using a relationship expander here would be correct. What would the best way of doing this be? As an aside it may be a good idea to note in the configuration settings page: http://wiki.neo4j.org/content/Configuration_Settings#Optimizing_for_traversals_example that -XX:+UseNUMA only works when using the Parallel Scavenger garbage collector (default or -XX:+UseParallelGC) not the concurrent mark and sweep one. Based on Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Invitation to connect on LinkedIn
LinkedIn bryce hendrix requested to add you as a connection on LinkedIn: -- Craig, I'd like to add you to my professional network on LinkedIn. - bryce Accept invitation from bryce hendrix http://www.linkedin.com/e/5gyj7a-gqzdnnl0-p/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I181440225_9/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_elYRcz8Md3gNe359bQR9pQ9Tt6NBbPoRdPcPdz0TdPcLrCBxbOYWrSlI/EML_comm_afe/ View invitation from bryce hendrix http://www.linkedin.com/e/5gyj7a-gqzdnnl0-p/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I181440225_9/0VnPkOcz0Qd34UckALqnpPbOYWrSlI/svi/ -- DID YOU KNOW that LinkedIn can find the answers to your most difficult questions? Post those vexing questions on LinkedIn Answers to tap into the knowledge of the world's foremost business experts: http://www.linkedin.com/e/5gyj7a-gqzdnnl0-p/ask/inv-23/ -- (c) 2011, LinkedIn Corporation ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Invitation to connect on LinkedIn
LinkedIn bryce hendrix requested to add you as a connection on LinkedIn: -- Craig, I'd like to add you to my professional network on LinkedIn. - bryce Accept invitation from bryce hendrix http://www.linkedin.com/e/5gyj7a-gr021ccv-5u/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I181685058_9/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_elYUdj0Re3oNe359bQR9pQ9Tt6NBbPoRdPcPdz0TdPcLrCBxbOYWrSlI/EML_comm_afe/ View invitation from bryce hendrix http://www.linkedin.com/e/5gyj7a-gr021ccv-5u/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I181685058_9/0VnPwRc3kUdz4UckALqnpPbOYWrSlI/svi/ -- DID YOU KNOW that LinkedIn can find the answers to your most difficult questions? Post those vexing questions on LinkedIn Answers to tap into the knowledge of the world's foremost business experts: http://www.linkedin.com/e/5gyj7a-gr021ccv-5u/ask/inv-23/ -- (c) 2011, LinkedIn Corporation ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] path finding using OSM ways
I am finally getting back to experimenting with Neo4j. Because it has been a while since I last looked at it, I've forgotten just about everything. I want to start with something simple, is there any sample code which does A* path finding over OSM ways? Thanks, Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Getting started with Neo4J Spatial
Nolan, The first experience with Neo4j Spatial was with the texas.osm file. I imported it on my notebook I think it took 15 hours, if I remember correctly. I quickly decided to play around with just Austin for the time being. If you'd like, i can zip up my Austin file (just Austin, not any of the suburbs) and send it to you. Austin takes about 30 secs to import and index. Bryce On Fri, Feb 18, 2011 at 2:41 PM, Nolan Darilek no...@thewordnerd.infowrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/18/2011 11:23 AM, Craig Taverner wrote: I will answer inline below. Thanks, I'll do the same. Neo4j Spatial is a good choice for OSM data, and can load OSM files and expose them to GIS application for analysis. Are you planning to run this on a mobile device? The question of whether or not, or how well, Neo4j runs on embedded devices would need to be answered by others. I only use it (so far) on PCs. I know there was an android prototype a few years back, but I do not believe android (or other platforms) are officially supported. If you have a backend server for neo4j (which it sounds like you do), then there is no problem. Right, currently my project is client-server, with an Android app or web browser communicating with a web-based server. I've made a rough cut at a database abstraction so I can move away from LibOSM to Neo4J and, presumably, to other database types in the future should I find something that works well on mobile devices. At the moment, that's almost certainly not Neo4J Spatial. It's been chugging away importing Texas on my modest desktop for several hours, and the file is nearly at 30 gigs. Considering that this is much larger than either H2 or MongoDB's storage of the same data, I'm hoping there are plans to optimize this in the future. :) I saw something about small strings here recently, and how this might decrease storage requirements by 80%. Is that just for strings, or across the entire database? For reference, the uncompressed OSM data is only 5G so a 6X increase worries me. Anyhow, information on what I'm doing can be found at my project's development site, http://hermesgps.info. There's also an instance which may or may not be running. A live demo of an actual map can sometimes be seen at http://thewordnerd.homeip.net/#30.267153,-97.74306 though performance is pretty bad right now since my desktop is taxed with the Neo4J import. I'll refer back to that later on. what you really want. I think I would need to know more about your needs before recommending further. So at the above link, you see a list of nearby points, along with a direction and distance. The same information is also exported via web services to mobile apps in a scrollable list. What I'd hoped to do was to create an infinitely-scrolling list as commonly found in other Android apps. Unfortunately, it's hard to do that by simply bumping out the distance, since you can't necessarily know whether adding X to the bounding box dimensions might give you no points or another thousand. :) Since MongoDB can do this with a rather limited geospatial implementation, I was hoping that Neo4J or something tuned for spatial queries would *really* rock at it. Certainly Neo4j can import OSM data into a decent graph. The OSM model in Neo4j is somewhat similar to the graph model in the OSM xml files, but expressed in a more graph-friendly way. The only public information on the model so far is in the presentation I gave at FOSS4J. See these at slideshare: http://www.slideshare.net/craigtaverner/neo4j-spatial-backing-a-gis-with-a-true-graph-database (slides 13, 14 and 15 show graphs of the OSM model in the database). I'd appreciate a description of those slides if you or someone else has the time, as I can't see them. :) In any case, I'll try playing with the database myself once this import finishes and see what I can learn that way. If you want to do this in Java, take a look at the sample code at https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java I'm doing this in Scala, so thanks for pointing me to the Java example. I've been seeing lots of old testcases and such so it was tough to find ones that were still valid. So I wrote a simple Scala script that is currently importing OSM data, and this raises a few questions. The README shows a shapefile being imported without a BatchInserter. Is there any way to do this with OSM as well? One feature I'd like to provide at some point are automatic map updates, so each week an OSM changeset would be fetched and merged in. It would be great if I could do that merge without having to shut down the live database, and to handle that changeset merge as a single transaction. I gather changeset imports aren't yet possible, but is there any way to forgo the BatchInserter? Even if the process is slower
Re: [Neo4j] Searching for ways in imported OSM data
Peter, I think my problem is more of understanding how the graph is built from OSM data. The method of find the closest edge gives me the way geometry, but (please correct me if I am wrong) I need to find the closest OSM node (Node with lat and lon properties) in the geometry. Thanks, Bryce On Tue, Jan 18, 2011 at 2:05 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Bryce, are you looking for some kind of snapping to the nearest geometry? I was doing something like that, Craig did a test on that, see https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSpatialUtils.java#L53 , there is some basic support for that in SpatialTopologyUtils.java. Does that help to get started? Feel free to add more utilities and maybe a routing example, when you get it working. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jan 18, 2011 at 5:50 AM, bryce hendrix brycehend...@gmail.com wrote: Craig, I've ran into a bit of a stumbling block. I am attempting to do find a simple route using A* from 2 nodes on 2 different ways. What is the best way to find the points closest to 2 reference points which are on ways? Assuming I've got those nodes, is there anything special I have to do, other than setting up the estimate and cost evaluators? If I can figure this out, I'll submit my example back to you guys. I'm excited about this stuff, but I seem to be discovering the limits of the docs. Bryce On Sun, Jan 16, 2011 at 7:11 PM, Craig Taverner cr...@amanzi.com wrote: Great that they all work :-) Good luck with the rest of the project and keep us posted, we're interested in any feedback on the API. (getting lat/long mixed up is one of those things we all keep doing, I'm pretty sure I did it once as recently as last month ... ;-) On Sun, Jan 16, 2011 at 11:52 PM, bryce hendrix brycehend...@gmail.com wrote: Craig, Peter, Its useful if I get the latitude and longitude in the correct order for the Point. Ugh. I've found that SearchPointsWithinOrthodromicDistance, SearchClosest, and SpatialTopologyUtils.findClosestEdges all work for me. Looks like my project is well on its way now, thanks for the help. Bryce On Sun, Jan 16, 2011 at 3:16 PM, Craig Taverner cr...@amanzi.com wrote: The SearchPointsWithinOrthodromicDistance basically does a search on a rectangular bounding box, and then inside the result set filters by distance from the center. The filter probably works only on points as implied by the class name. The SpatialTopologyUtils class has a method findClosestEdge, which will do what you are looking for. If you call it without a distance value, it will take 1% of the total span of your layer as the search window, so if this does not make sense for your data (eg. your layer covers a small area, as you hinted at), then pass in the distance in units of the coordinate system of the layer (probably WGS84, degrees, if you are using only OSM data). Try it out and let us know. See: - https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java - https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSpatialUtils.java On Sun, Jan 16, 2011 at 11:19 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Bryce, I think (Craig, correct me if I'm wrong) you need to have a Point layer to be able to do that search. The default OSM layer is containing a lot of geometries, so I think you first should define a layer on top of the full imported one, then search. I did something like that in another spike, see https://github.com/popdevelop/snapplr/blob/master/server_java/src/main/java/com/geosnappr/TaginfoImporter.java#L312 The layer is defined with something like https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L26 on top of the imported full data layer. Does that help? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party
Re: [Neo4j] Searching for ways in imported OSM data
Thanks Craig! I'll not only play around with it tonight, but you nudged me towards finally installing uDig. Bryce On Tue, Jan 18, 2011 at 6:45 PM, Craig Taverner cr...@amanzi.com wrote: Hi Bryce, While I had this on my mind, I decided to look into the code and I went ahead developed the first approach. So now we index osm nodes in the r-tree also (in addition to ways). By default, only nodes that have tags are added to the index, because these are seen as 'points of interest', while all other nodes are assumed to only be interesting as part of a way. However, I added a new method to the OSMImporter that allows you to turn on indexing of all nodes, regardless of whether they are points of interest or not. Just replace the old call to osmImporter.importFile(batchInserter,filePath) with instead osmImporter.importFile(batchInserter,filePath,true). The last parameter turns on indexing of all nodes. I've attached an image from the AWE product (based on uDig) that shows a large number of dynamic layers exported from the OSM model of Malmö, and you can see all nodes are visible. This is the output from the TestDynamicLayers unit test, run with the boolean set to true. If you are curious, the Neo Technology office is actually included in this image :-) I have just pushed the code to github, so you need to get the latest from there. Regards, Craig On Tue, Jan 18, 2011 at 9:27 PM, Craig Taverner cr...@amanzi.com wrote: Hi Bryce, Currently the OSM model in Neo4j Spatial only indexes the ways, which means that when you do spatial searches on the layers, you are always talking about the ways. Whether this suites you or not depends on your specific use case. You could search for the closest ways, and the use the SpatialTopologyUtils to snap to the point node on the way that is closest to your point of interest, which may give you the result you want. If you are specifically interested in only searching on nodes, not ways, then there are two options, both requiring some refactoring of the OSM model (ie. you need to make changes to Neo4j Spatial). - In the OSMImporter code, simply add all nodes as Point geometries to the index. This is simple to code and will get what you want, but might not lead to the most efficient index, since both the ways and the nodes they are composed of get indexed in the same index. - Enhance the OSM model to support multiple indexes, for nodes, ways and relationships. This requires much more code, but opens the possibility to enhance the performance by allowing the index to be optimized for the geometries and densities seen in the dataset. The reasons we have not done this yet are: - Existing use cases have not required it (but we are reaching cases that do now) - It is not clear yet which of the above options is the best overall (one index or several). Originally I believed in the multiple-index approach, mostly because points can be indexed with very different indexes than geometries (eg. we can even place lucene behind the point index, which we cannot do for geometries). But at the moment I am leaning back towards the single index approach (simpler code). So, I hope the approach of searching for close ways, and then using the topology utils to snap to the closest point is the right solution for you. Otherwise considering contributing to the updates necessary to complete the point index :-) Regards, Craig On Tue, Jan 18, 2011 at 6:08 PM, bryce hendrix brycehend...@gmail.com wrote: Peter, I think my problem is more of understanding how the graph is built from OSM data. The method of find the closest edge gives me the way geometry, but (please correct me if I am wrong) I need to find the closest OSM node (Node with lat and lon properties) in the geometry. Thanks, Bryce On Tue, Jan 18, 2011 at 2:05 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Bryce, are you looking for some kind of snapping to the nearest geometry? I was doing something like that, Craig did a test on that, see https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSpatialUtils.java#L53 , there is some basic support for that in SpatialTopologyUtils.java. Does that help to get started? Feel free to add more utilities and maybe a routing example, when you get it working. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jan 18, 2011 at 5:50 AM, bryce hendrix brycehend...@gmail.com
Re: [Neo4j] Searching for ways in imported OSM data
Craig, I've ran into a bit of a stumbling block. I am attempting to do find a simple route using A* from 2 nodes on 2 different ways. What is the best way to find the points closest to 2 reference points which are on ways? Assuming I've got those nodes, is there anything special I have to do, other than setting up the estimate and cost evaluators? If I can figure this out, I'll submit my example back to you guys. I'm excited about this stuff, but I seem to be discovering the limits of the docs. Bryce On Sun, Jan 16, 2011 at 7:11 PM, Craig Taverner cr...@amanzi.com wrote: Great that they all work :-) Good luck with the rest of the project and keep us posted, we're interested in any feedback on the API. (getting lat/long mixed up is one of those things we all keep doing, I'm pretty sure I did it once as recently as last month ... ;-) On Sun, Jan 16, 2011 at 11:52 PM, bryce hendrix brycehend...@gmail.com wrote: Craig, Peter, Its useful if I get the latitude and longitude in the correct order for the Point. Ugh. I've found that SearchPointsWithinOrthodromicDistance, SearchClosest, and SpatialTopologyUtils.findClosestEdges all work for me. Looks like my project is well on its way now, thanks for the help. Bryce On Sun, Jan 16, 2011 at 3:16 PM, Craig Taverner cr...@amanzi.com wrote: The SearchPointsWithinOrthodromicDistance basically does a search on a rectangular bounding box, and then inside the result set filters by distance from the center. The filter probably works only on points as implied by the class name. The SpatialTopologyUtils class has a method findClosestEdge, which will do what you are looking for. If you call it without a distance value, it will take 1% of the total span of your layer as the search window, so if this does not make sense for your data (eg. your layer covers a small area, as you hinted at), then pass in the distance in units of the coordinate system of the layer (probably WGS84, degrees, if you are using only OSM data). Try it out and let us know. See: - https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java - https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSpatialUtils.java On Sun, Jan 16, 2011 at 11:19 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Bryce, I think (Craig, correct me if I'm wrong) you need to have a Point layer to be able to do that search. The default OSM layer is containing a lot of geometries, so I think you first should define a layer on top of the full imported one, then search. I did something like that in another spike, see https://github.com/popdevelop/snapplr/blob/master/server_java/src/main/java/com/geosnappr/TaginfoImporter.java#L312 The layer is defined with something like https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L26 on top of the imported full data layer. Does that help? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Jan 15, 2011 at 10:58 PM, bryce hendrix brycehend...@gmail.com wrote: I'm pretty new to neo4j, so please excuse me if this is a FAQ. I exported OSM data for a city from the OSM site, then imported it using the OSMImporter. I can see the layer via the webserver, so I know if got imported okay. Now I would like to find the way nearest to a coordinate via the Java API, but I'm not really sure how to do that. I've tried using SearchPointsWithinOrthodromicDistance, but the results of the query are always empty. Can someone give me some tips, or provide a simple example? Thanks in advance, Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https
[Neo4j] Searching for ways in imported OSM data
I'm pretty new to neo4j, so please excuse me if this is a FAQ. I exported OSM data for a city from the OSM site, then imported it using the OSMImporter. I can see the layer via the webserver, so I know if got imported okay. Now I would like to find the way nearest to a coordinate via the Java API, but I'm not really sure how to do that. I've tried using SearchPointsWithinOrthodromicDistance, but the results of the query are always empty. Can someone give me some tips, or provide a simple example? Thanks in advance, Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user