Re: [Neo4j] Neo4jPHP
Thanks! I'll give it a try soon. On Thu, Jul 7, 2011 at 1:50 AM, Josh Adell josh.ad...@gmail.com wrote: Hey all, I've been working on another PHP client for Neo4j. I think it's ready for some real-life testing, and I'm interested to see what you all think. GitHub: https://github.com/jadell/Neo4jPHP Download: https://github.com/jadell/Neo4jPHP/tarball/0.0.1-beta Features: - Developed against the Neo4j 1.4 milestone releases - Simple, object-oriented API - Almost complete REST API coverage - Indexing of nodes and relationships, including exact match and query support - Cypher queries (thanks to Jacob Hansson) - Traversal support, including paged traversals - Lazy-loading of node and relationship data Hopefully coming soon: - Client-side caching - Batch operations There are some usage examples included. It's a beta release, so please be gentle (on me, that is; be as rough as you want with the code.) If anyone finds any bugs or has feature requests, please use the GitHub issues page at https://github.com/jadell/Neo4jPHP/issues Thanks and enjoy! -- Josh Adell ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial - Keep OSM imports
Robin, the database is deleted after each run in Neo4jTestCase.java, @Override @After protected void tearDown() throws Exception { shutdownDatabase(true); super.tearDown(); } if you change to shutdownDatabase(false), the database will not be deleted. In this case, make sure to run just that test in order not to write several tests to the same DB for clarity. mvn test -Dtest=TestDynamicLayers Does that work for you? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote: Hello, First of all, I don't know anything in java, and I'm trying to figure out if neo4j could be usefull for my projects. If it is, I will of course learn a bit of java so that I can use neo4j in a decent way for my needs. I'd like to use a neo4j spatial database together with GeoServer. For this, I'm following the tutorial here : http://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServer But this paragraph is blocking me : - One option for the database location is a database created using the unit tests in Neo4j Spatial. The rest of this wiki assumes that you ran the TestDynamicLayers unit test which loads an OSM dataset for the city of Malmö in Sweden, and then creates a number of Dynamic Layers (or views) on this data, which we can publish in GeoServer. - If you do use the unit test for the sample database, then the location of the database will be in the target/var/neo4j-db directory of the Neo4j Source code. My problem is I do not succeed keeping those neo4j spatial databases created with the tests : When I run TestDynamicLayers, it builds databases (in target/var/neo4j-db), but as soon as the database is successfully loaded, it deletes it and start importing another database, and so on. My poor understanding of java doesn't help a lot, I tried to edit the .java in Netbeans + Maven, but until then, it doesn't work, all the directories created during the tests are deleted when the test ends. Any idea how I could keep those databases ? Thanks, Robin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
Finished the implementation of indexed relationships. The graph collections component now contains the package https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship, containing the IndexedRelationship class. This class can be used instead of regular relationships when:relationships need to be stored in a particular sort ordera unicity constraint needs to be guaranteed nodes become densely populated with relationships. The implementation is traverser friendly. Given a start nodes all end nodes can be found by following four relationships types in outgoing direction. Given an end node the start node can be found by following these four relationship types in incoming direction. Of course this functionality is also covered in the API. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 02:36:29 +0200 Subject: Re: [Neo4j] Indexed relationships Pushed SortedTree to Git after adding a unit test and doing some debugging. TODO:Add API for indexed relationships using SortedTree as the implementation.Make SortedTree thread safe. With regard to the latter issue. I am considering the following solution. Acquire a lock (delete a non existent property) on the node that points to the root of the tree at the start of AddNode, RemoveNode and Delete. No other node in the SortedTree is really stable, even the rootnode may be moved down, turning another node into the new rootnode, while after a couple of remove actions the original rootnode may even be deleted. Locking the node pointing to the rootnode, prevents all other threads/transactions from updating the tree. This may seem restrictive, but a single new entry or a single removal may in fact have impact on much of the tree, due to balancing. More selective locking would require a prebalancing tree walk, determining the affected subtrees, lock them and once every affected subtree is locked, perform the actual balancing. Please let me hear if there are any objections to locking the node pointing to the tree as the a solution to make SortedTree thread safe. Niels Date: Tue, 5 Jul 2011 08:27:57 +0200 From: neubauer.pe...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Great work Nils! /peter Sent from my phone. On Jul 4, 2011 11:39 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Made some more changes to the SortedTree implementation. Previously SortedTree would throw an exception if a duplicate entry was being added. I changed SortedTree to allow a key to point to more than one node, unless the SortedTree is created as a unique index, in which case an exception is raised when an attempt is made to add a node to an existing key entry. A SortedTree once defined as unique can not be changed to a non-unique index or vice-versa. SortedTrees now have a name, which is stored in the a property of the TREE_ROOT relationship and in the KEY_VALUE relationship (a new relationship that points from the SortedTree to the Node inserted in the SortedTree). The name of a SortedTree can not be changed. SortedTrees now store the class of the Comparator, so a SortedTree, once created, can not be used with a different Comparator. SortedTree is now an Iterable, making it possible to use it in a foreach-loop. Since there are as of yet, no unit tests for SortedTree, I will create those first before pushing my changes to Git. Preliminary results so far are good. I integrated the changes in my own application and it seems to work fine. Todo: Decide on an API for indexed relationships. (Community input still welcome).Write unit tests.Make SortedTree thread safe (Community help still welcome). Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 4 Jul 2011 15:49:45 +0200 Subject: Re: [Neo4j] Indexed relationships I forgot to add another recurrent issue that can be solved with indexed relationships: guaranteed unicity constraints. From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 4 Jul 2011 01:55:08 +0200 Subject: [Neo4j] Indexed relationships In the thread [Neo4j] traversing densely populated nodes we discussed the problems arising when large numbers of relationships are added to the same node. Over the weekend, I have worked on a solution for the dense-relationship-nodes using SortedTree in the neo-graph-collections component. After some minor tweaks to the implementation of SortedTree, I have managed to get a workable solution, where two nodes are not directly linked by a relationship, but by means of a BTree (entirely stored in the graph). Before continuing this work, I'd like to have a discussion about features, since what we have now is not just a solution for the dense populated node issue, but is actually a
Re: [Neo4j] Indexed relationships
Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the constructor if there is no relationship to the treeNode, you create a new one, but that new treeNode is not connected to the actual node? I'm not sure if it should support the original relationship-traversal API / methods (getRelationships(Dir,type), etc). Perhaps that IndexedRelationship should rather be just a wrapper around a SuperNode ? So probably rename it to SuperNode(Wrapper) or HeavilyConnectedNode(Wrapper) ?) Cheers Michael Am 07.07.2011 um 12:51 schrieb Niels Hoogeveen: Finished the implementation of indexed relationships. The graph collections component now contains the package https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship, containing the IndexedRelationship class. This class can be used instead of regular relationships when:relationships need to be stored in a particular sort ordera unicity constraint needs to be guaranteed nodes become densely populated with relationships. The implementation is traverser friendly. Given a start nodes all end nodes can be found by following four relationships types in outgoing direction. Given an end node the start node can be found by following these four relationship types in incoming direction. Of course this functionality is also covered in the API. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 02:36:29 +0200 Subject: Re: [Neo4j] Indexed relationships Pushed SortedTree to Git after adding a unit test and doing some debugging. TODO:Add API for indexed relationships using SortedTree as the implementation.Make SortedTree thread safe. With regard to the latter issue. I am considering the following solution. Acquire a lock (delete a non existent property) on the node that points to the root of the tree at the start of AddNode, RemoveNode and Delete. No other node in the SortedTree is really stable, even the rootnode may be moved down, turning another node into the new rootnode, while after a couple of remove actions the original rootnode may even be deleted. Locking the node pointing to the rootnode, prevents all other threads/transactions from updating the tree. This may seem restrictive, but a single new entry or a single removal may in fact have impact on much of the tree, due to balancing. More selective locking would require a prebalancing tree walk, determining the affected subtrees, lock them and once every affected subtree is locked, perform the actual balancing. Please let me hear if there are any objections to locking the node pointing to the tree as the a solution to make SortedTree thread safe. Niels Date: Tue, 5 Jul 2011 08:27:57 +0200 From: neubauer.pe...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Great work Nils! /peter Sent from my phone. On Jul 4, 2011 11:39 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Made some more changes to the SortedTree implementation. Previously SortedTree would throw an exception if a duplicate entry was being added. I changed SortedTree to allow a key to point to more than one node, unless the SortedTree is created as a unique index, in which case an exception is raised when an attempt is made to add a node to an existing key entry. A SortedTree once defined as unique can not be changed to a non-unique index or vice-versa. SortedTrees now have a name, which is stored in the a property of the TREE_ROOT relationship and in the KEY_VALUE relationship (a new relationship that points from the SortedTree to the Node inserted in the SortedTree). The name of a SortedTree can not be changed. SortedTrees now store the class of the Comparator, so a SortedTree, once created, can not be used with a different Comparator. SortedTree is now an Iterable, making it possible to use it in a foreach-loop. Since there are as of yet, no unit tests for SortedTree, I will create those first before pushing my changes to Git. Preliminary results so far are good. I integrated the changes in my own application and it seems to work fine. Todo: Decide on an API for indexed relationships. (Community input still welcome).Write unit tests.Make SortedTree thread safe (Community help still welcome). Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 4 Jul 2011 15:49:45 +0200 Subject: Re: [Neo4j] Indexed relationships I forgot to add another recurrent issue that can be solved with indexed relationships: guaranteed unicity constraints. From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 4 Jul 2011 01:55:08 +0200 Subject: [Neo4j] Indexed relationships In the thread [Neo4j] traversing densely populated nodes we
Re: [Neo4j] REST batch support - transaction support for java rest client?
Following up on the topic of transactions for client API. What is the current plan for some sort of client side API supporting transactions? I'm playing around with some ideas here and the lack of transaction support in the client API is problematic. I know there's BATCH support in the REST API which effectively is a transaction, but it doesn't always suit. For example I have the following steps that I'd like to accomplish: - create a reference node - check if a node with a given domain id exist in an index, if it does, fail - create an entity node for the given domain id - add entity node to the index - attach entity node to ref node - create a node representing a specific version of the entity node - attach the version node to the entity node, with some properties on the relationships signifying valid time That should all be considered an atomic operation, all or nothing. Doing it step by step is very easy and natural with REST API, but trying to roll back on error is flaky. I think could batch it, but from a programming style it becomes pretty unnatural. Same thing with a plugin for doing the steps. The natural flow of code client side gets distorted by having to collect a lot of data upfront and then provide all that data to a method call. It's doable, just doesn't seem ideal. Using an embedded db, exposing as some sort of service etc is also doable, it's just that my domain is graph related and I'm pretty happy with just the primitives and using a remote server (if I could have transactions). Number of clients are quite a few and need to share their data + don't all run all the time so can't make the client API the embedded api. I'd think it's not an uncommon situation and many people wishing for a support for natural client side transaction API (similar to embedded api). Patrik On Tue, Jul 5, 2011 at 12:27 PM, Patrik Sundberg patrik.sundb...@gmail.comwrote: yeah, harder problem than my first hunch. sounds like plugins is the way to go for now, hopefully introduction of non-rest protocol with same interface as embedded API in 1.5 will simplify things in the future. thanks On Mon, Jul 4, 2011 at 11:07 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Patrick, I've already thought long and hard about that. The problem is you can't implement that transparently as you can never allow code in a second call rely on data derived from a previous one. The simplest form that I came up with is a BatchCommand that gets an API interface injected that allows requests but doesn't return data. The execution of this Batch command would then return a BatchResult with all the data acquired during the batch operation. Another way would be to inject the normal GraphDatabaseService interface, record the invocations in a first phase and then execute the batch command again (this time ignoring the inputs but then returning the results) but this is bad from a usability perspective. One critical issue is the creation of relationships as they depend on the correct node-ids of previously created nodes. Jacob already thought about some means of referring to previous output data but I think kept away from that as we didn't want to make this batch-interface a turing complete language. So you see, it's not that simple. Michael Am 27.06.2011 um 20:45 schrieb Patrik Sundberg: Hi, Since there is now possible to send off batches of operations via the REST interface, I was wondering if anyone has started to look at implementing transactions in the java REST client ( https://github.com/jexp/neo4j-java-rest-binding) ? It would seem possible, but I can also see it could involve some major reorganizing of the internals of the client to make everything aware of transactions and submit via batch command. Patrik ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST batch support - transaction support for java rest client?
But then it would be possible to write a RequestFilter for the Neo4j-Server that does start and commit/rollbacks transactions. I.e. you create a tx object and put it in the session-context if there is none and return a tx-token that the filter uses (e.g. as header-field). then later you can pull it out again and attach it to the current thread (that's the tricky part). On commit or rollback you just do that with the tx (after attaching it to the thread). As the RestfulGraphDb and the Filter share the same execution thread this could/should work. I wouldn't want to support that in the neo4j server by default as this creates a lot of server-side state that has to be managed. But if it works out one could publish that as server-extension. HTH Michael Am 07.07.2011 um 13:30 schrieb Patrik Sundberg: Following up on the topic of transactions for client API. What is the current plan for some sort of client side API supporting transactions? I'm playing around with some ideas here and the lack of transaction support in the client API is problematic. I know there's BATCH support in the REST API which effectively is a transaction, but it doesn't always suit. For example I have the following steps that I'd like to accomplish: - create a reference node - check if a node with a given domain id exist in an index, if it does, fail - create an entity node for the given domain id - add entity node to the index - attach entity node to ref node - create a node representing a specific version of the entity node - attach the version node to the entity node, with some properties on the relationships signifying valid time That should all be considered an atomic operation, all or nothing. Doing it step by step is very easy and natural with REST API, but trying to roll back on error is flaky. I think could batch it, but from a programming style it becomes pretty unnatural. Same thing with a plugin for doing the steps. The natural flow of code client side gets distorted by having to collect a lot of data upfront and then provide all that data to a method call. It's doable, just doesn't seem ideal. Using an embedded db, exposing as some sort of service etc is also doable, it's just that my domain is graph related and I'm pretty happy with just the primitives and using a remote server (if I could have transactions). Number of clients are quite a few and need to share their data + don't all run all the time so can't make the client API the embedded api. I'd think it's not an uncommon situation and many people wishing for a support for natural client side transaction API (similar to embedded api). Patrik On Tue, Jul 5, 2011 at 12:27 PM, Patrik Sundberg patrik.sundb...@gmail.comwrote: yeah, harder problem than my first hunch. sounds like plugins is the way to go for now, hopefully introduction of non-rest protocol with same interface as embedded API in 1.5 will simplify things in the future. thanks On Mon, Jul 4, 2011 at 11:07 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Patrick, I've already thought long and hard about that. The problem is you can't implement that transparently as you can never allow code in a second call rely on data derived from a previous one. The simplest form that I came up with is a BatchCommand that gets an API interface injected that allows requests but doesn't return data. The execution of this Batch command would then return a BatchResult with all the data acquired during the batch operation. Another way would be to inject the normal GraphDatabaseService interface, record the invocations in a first phase and then execute the batch command again (this time ignoring the inputs but then returning the results) but this is bad from a usability perspective. One critical issue is the creation of relationships as they depend on the correct node-ids of previously created nodes. Jacob already thought about some means of referring to previous output data but I think kept away from that as we didn't want to make this batch-interface a turing complete language. So you see, it's not that simple. Michael Am 27.06.2011 um 20:45 schrieb Patrik Sundberg: Hi, Since there is now possible to send off batches of operations via the REST interface, I was wondering if anyone has started to look at implementing transactions in the java REST client ( https://github.com/jexp/neo4j-java-rest-binding) ? It would seem possible, but I can also see it could involve some major reorganizing of the internals of the client to make everything aware of transactions and submit via batch command. Patrik ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net wrote: I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a lives-in relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cdnode-id` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to stream the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the constructor if there is no relationship to the treeNode, you create a new one, but that new treeNode is not connected to the actual node? I'm not sure if it should support the original relationship-traversal API / methods (getRelationships(Dir,type), etc). Perhaps that IndexedRelationship should rather be just a wrapper around a SuperNode ? So probably rename it to SuperNode(Wrapper) or HeavilyConnectedNode(Wrapper) ?) Cheers Michael Am 07.07.2011 um 12:51 schrieb Niels Hoogeveen: Finished the implementation of indexed relationships. The graph collections component now contains the package https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship, containing the IndexedRelationship class. This class can be used instead of regular relationships when:relationships need to be stored in a particular sort ordera unicity constraint needs to be guaranteed nodes become densely populated with relationships. The implementation is traverser friendly. Given a start nodes all end nodes can be found by following four relationships types in outgoing direction. Given an end node the start node can be found by following these four relationship types in incoming direction. Of course this functionality is also covered in the API. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 02:36:29 +0200 Subject: Re: [Neo4j] Indexed relationships Pushed SortedTree to Git after adding a unit test and doing some debugging. TODO:Add API for indexed relationships using SortedTree as the implementation.Make SortedTree thread safe. With regard to the latter issue. I am considering the following solution. Acquire a lock (delete a non existent property) on the node that points to the root of the tree at the start of AddNode, RemoveNode and Delete. No other node in the SortedTree is really stable, even the rootnode may be moved down, turning another node into the new rootnode, while after a couple of remove actions the original rootnode may even be deleted. Locking the node pointing to the rootnode, prevents all other threads/transactions from updating the tree. This may seem restrictive, but a single new entry or a single removal may in fact have impact on much of the tree, due to balancing. More selective locking would require a prebalancing tree walk, determining the affected subtrees, lock them and once every affected subtree is locked, perform the actual balancing. Please let me hear if there are any objections to locking the node pointing to the tree as the a solution to make SortedTree thread safe. Niels Date: Tue, 5 Jul 2011 08:27:57 +0200 From: neubauer.pe...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Great work Nils! /peter Sent from my phone. On Jul 4, 2011 11:39 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Made some more changes to the SortedTree implementation. Previously SortedTree would throw an exception if a duplicate entry was being added. I changed SortedTree to allow a key to point to more than one node, unless the SortedTree is created as a unique index, in which case an exception is raised when an attempt is made to add a node to an existing key
Re: [Neo4j] REST batch support - transaction support for java rest client?
good idea. i'll ponder it for a bit. but yes, we clearly need to keep state around, so for REST it'd be carried around in session. but on server side I guess you have issues with never ending transactions, how to cull them, etc. since it's a stateless req/response comm channel. on a permanent channel it's easy to detect disconnect and clean up, over http not as easy. thanks On Thu, Jul 7, 2011 at 12:42 PM, Michael Hunger michael.hun...@neotechnology.com wrote: But then it would be possible to write a RequestFilter for the Neo4j-Server that does start and commit/rollbacks transactions. I.e. you create a tx object and put it in the session-context if there is none and return a tx-token that the filter uses (e.g. as header-field). then later you can pull it out again and attach it to the current thread (that's the tricky part). On commit or rollback you just do that with the tx (after attaching it to the thread). As the RestfulGraphDb and the Filter share the same execution thread this could/should work. I wouldn't want to support that in the neo4j server by default as this creates a lot of server-side state that has to be managed. But if it works out one could publish that as server-extension. HTH Michael Am 07.07.2011 um 13:30 schrieb Patrik Sundberg: Following up on the topic of transactions for client API. What is the current plan for some sort of client side API supporting transactions? I'm playing around with some ideas here and the lack of transaction support in the client API is problematic. I know there's BATCH support in the REST API which effectively is a transaction, but it doesn't always suit. For example I have the following steps that I'd like to accomplish: - create a reference node - check if a node with a given domain id exist in an index, if it does, fail - create an entity node for the given domain id - add entity node to the index - attach entity node to ref node - create a node representing a specific version of the entity node - attach the version node to the entity node, with some properties on the relationships signifying valid time That should all be considered an atomic operation, all or nothing. Doing it step by step is very easy and natural with REST API, but trying to roll back on error is flaky. I think could batch it, but from a programming style it becomes pretty unnatural. Same thing with a plugin for doing the steps. The natural flow of code client side gets distorted by having to collect a lot of data upfront and then provide all that data to a method call. It's doable, just doesn't seem ideal. Using an embedded db, exposing as some sort of service etc is also doable, it's just that my domain is graph related and I'm pretty happy with just the primitives and using a remote server (if I could have transactions). Number of clients are quite a few and need to share their data + don't all run all the time so can't make the client API the embedded api. I'd think it's not an uncommon situation and many people wishing for a support for natural client side transaction API (similar to embedded api). Patrik On Tue, Jul 5, 2011 at 12:27 PM, Patrik Sundberg patrik.sundb...@gmail.comwrote: yeah, harder problem than my first hunch. sounds like plugins is the way to go for now, hopefully introduction of non-rest protocol with same interface as embedded API in 1.5 will simplify things in the future. thanks On Mon, Jul 4, 2011 at 11:07 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Patrick, I've already thought long and hard about that. The problem is you can't implement that transparently as you can never allow code in a second call rely on data derived from a previous one. The simplest form that I came up with is a BatchCommand that gets an API interface injected that allows requests but doesn't return data. The execution of this Batch command would then return a BatchResult with all the data acquired during the batch operation. Another way would be to inject the normal GraphDatabaseService interface, record the invocations in a first phase and then execute the batch command again (this time ignoring the inputs but then returning the results) but this is bad from a usability perspective. One critical issue is the creation of relationships as they depend on the correct node-ids of previously created nodes. Jacob already thought about some means of referring to previous output data but I think kept away from that as we didn't want to make this batch-interface a turing complete language. So you see, it's not that simple. Michael Am 27.06.2011 um 20:45 schrieb Patrik Sundberg: Hi, Since there is now possible to send off batches of operations via the REST interface, I was wondering if anyone has started to look at implementing transactions in the
Re: [Neo4j] Performance issue on nodes with lots of relationships
I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net wrote: I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a lives-in relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cdnode-id` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to stream the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org
[Neo4j] Unique Constaint on Index
We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the constructor if there is no relationship to the treeNode, you create a new one, but that new treeNode is not connected to the actual node? I'm not sure if it should support the original relationship-traversal API / methods (getRelationships(Dir,type), etc). Perhaps that IndexedRelationship should rather be just a wrapper around a SuperNode ? So probably rename it to SuperNode(Wrapper) or HeavilyConnectedNode(Wrapper) ?) Cheers Michael Am 07.07.2011 um 12:51 schrieb Niels Hoogeveen: Finished the implementation of indexed relationships. The graph collections component now contains the package https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship, containing the IndexedRelationship class. This class can be used instead of regular relationships when:relationships need to be stored in a particular sort ordera unicity constraint needs to be guaranteed nodes become densely populated with relationships. The implementation is traverser friendly. Given a start nodes all end nodes can be found by following four relationships types in outgoing direction. Given an end node the start node can be found by following these four relationship types in incoming direction. Of course this functionality is also covered in the API. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 02:36:29 +0200 Subject: Re: [Neo4j] Indexed relationships Pushed SortedTree to Git after adding a unit test and doing some debugging. TODO:Add API for indexed relationships using SortedTree as the implementation.Make SortedTree thread safe. With regard to the latter issue. I am considering the following solution. Acquire a lock (delete a non existent property) on the node that points to the root of the tree at the start of AddNode, RemoveNode and Delete. No other node in the SortedTree is really stable, even the rootnode may be moved down, turning another node into the new rootnode, while after a couple of remove actions the original rootnode may even be deleted. Locking the node pointing to the rootnode, prevents all other threads/transactions from updating the tree. This may seem restrictive, but a single new entry or a single removal may
Re: [Neo4j] Unique Constaint on Index
Hi, the ability to acquire locks cluster-wide exists, albeit in an ad hoc fashion. Grabbing a write lock on the node you want to ensure is uniquely indexed will ensure that the operations are serialized across all cluster members. The most simple way to get that lock currently is the (somewhat hackish but entirely correct) removal of a non-existing property. cheers, CG On Thu, Jul 7, 2011 at 5:53 PM, etc3 e...@nextideapartners.com wrote: How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
Marko's solution works, because you roll back the transaction once you find a duplicate entry. Another solution to this problem is to use the SortedTree index in graph-collections https://github.com/peterneubauer/graph-collections, which has a setting that makes an index unique. This component is relatively new and could use some proper testing, though. Niels From: e...@nextideapartners.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 10:53:20 -0400 Subject: Re: [Neo4j] Unique Constaint on Index How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
You can use Transactions. Marko. How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] typo in Expander interface
The interface of org.neo4j.graphdb.Expander contains a typo. The method addRelationsipFilter(Predicate? super Relationship) should be called addRelationshipFilter(Predicate? super Relationship). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] typo in Expander interface
I've sent a Pull Request long time ago fixing this, but it was to the old neo4j repository. Guess it wasn't merged. https://github.com/neo4j/graphdb/pull/2 Can send it again, if the guys want. 2011/7/7 Niels Hoogeveen pd_aficion...@hotmail.com The interface of org.neo4j.graphdb.Expander contains a typo. The method addRelationsipFilter(Predicate? super Relationship) should be called addRelationshipFilter(Predicate? super Relationship). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Adriano Almeida Caelum | Ensino e Inovação www.caelum.com.br ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] typo in Expander interface
Yes, please do, and send the CLA mail first, see http://wiki.neo4j.org/content/About_Contributor_License_Agreement Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 7, 2011 at 5:14 PM, Adriano Henrique de Almeida adrianoalmei...@gmail.com wrote: I've sent a Pull Request long time ago fixing this, but it was to the old neo4j repository. Guess it wasn't merged. https://github.com/neo4j/graphdb/pull/2 Can send it again, if the guys want. 2011/7/7 Niels Hoogeveen pd_aficion...@hotmail.com The interface of org.neo4j.graphdb.Expander contains a typo. The method addRelationsipFilter(Predicate? super Relationship) should be called addRelationshipFilter(Predicate? super Relationship). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Adriano Almeida Caelum | Ensino e Inovação www.caelum.com.br ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
I'm new to this framework, is there an example that demonstrates removing a non-existent property and how it would be used in this context? Thanks -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Chris Gioran Sent: Thursday, July 07, 2011 11:04 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, the ability to acquire locks cluster-wide exists, albeit in an ad hoc fashion. Grabbing a write lock on the node you want to ensure is uniquely indexed will ensure that the operations are serialized across all cluster members. The most simple way to get that lock currently is the (somewhat hackish but entirely correct) removal of a non-existing property. cheers, CG On Thu, Jul 7, 2011 at 5:53 PM, etc3 e...@nextideapartners.com wrote: How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unique Constaint on Index
I'll strongly +1 that having a concept of unique index values should be built into Neo4j. It's just too common of a requirement. Aseem On Thu, Jul 7, 2011 at 11:48 AM, etc3 e...@nextideapartners.com wrote: I'm new to this framework, is there an example that demonstrates removing a non-existent property and how it would be used in this context? Thanks -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Chris Gioran Sent: Thursday, July 07, 2011 11:04 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, the ability to acquire locks cluster-wide exists, albeit in an ad hoc fashion. Grabbing a write lock on the node you want to ensure is uniquely indexed will ensure that the operations are serialized across all cluster members. The most simple way to get that lock currently is the (somewhat hackish but entirely correct) removal of a non-existing property. cheers, CG On Thu, Jul 7, 2011 at 5:53 PM, etc3 e...@nextideapartners.com wrote: How do I ensure another request is not performing the same operation on another node in the cluster? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Thursday, July 07, 2011 10:35 AM To: Neo4j user discussions Subject: Re: [Neo4j] Unique Constaint on Index Hi, We are testing Neo4J and need to support unique emails across all users. Is this possible with the current API? You can add such a constraint when updating the indices: if(index.get('email', address).hasNext()) { throw new RuntimeException(There are two nodes that share the same email address.); } else { index.put('email', address, node); } Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] typo in Expander interface
I've done some traversal additions, and this also in a branch... pushing soon! 2011/7/7 Adriano Henrique de Almeida adrianoalmei...@gmail.com I've sent a Pull Request long time ago fixing this, but it was to the old neo4j repository. Guess it wasn't merged. https://github.com/neo4j/graphdb/pull/2 Can send it again, if the guys want. 2011/7/7 Niels Hoogeveen pd_aficion...@hotmail.com The interface of org.neo4j.graphdb.Expander contains a typo. The method addRelationsipFilter(Predicate? super Relationship) should be called addRelationshipFilter(Predicate? super Relationship). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Adriano Almeida Caelum | Ensino e Inovação www.caelum.com.br ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
2011/7/7 Agelos Pikoulas agelos.pikou...@gmail.com I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net wrote: I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a lives-in relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cdnode-id` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to stream the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org
[Neo4j] Add relationships dynamically
Is there anyway I can add relationships on-the-fly or programmatically? Because sometime I might not know the relationships and I want to add that to the database. Cheers, -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3149437.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
IndexedRelationship and IndexedRelationshipExpander are now in Git. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship An example: class IdComparator implements java.util.ComparatorNode{ public int compare(Node n1, Node n2){ long l1 = Long.reverse(n1.getId()); long l2 = Long.reverse(n2.getId()); if(l1 == l2) return 0; else if(l1 l2) return -1; else return 1; } }static enum RelTypes implements RelationshipType{ DIRECT_RELATIONSHIP, INDEXED_RELATIONSHIP, }; Node indexedNode = graphDb().createNode(); IndexedRelationship ir = new IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new IdComparator(), true, indexedNode, graphDb()); Node n1 = graphDb().createNode(); n1.setProperty(name, n1); Node n2 = graphDb().createNode(); n2.setProperty(name, n2); Node n3 = graphDb().createNode(); n3.setProperty(name, n3); Node n4 = graphDb().createNode(); n4.setProperty(name, n4); indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); ir.createRelationshipTo(n2); ir.createRelationshipTo(n4); IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.DIRECT_RELATIONSHIP); IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.INDEXED_RELATIONSHIP); for(Relationship rel: re1.expand(indexedNode)){ System.out.println(rel.getEndNode().getProperty(name)); } for(Relationship rel: re2.expand(indexedNode)){ System.out.println(re2.getEndNode().getProperty(name)); } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 16:55:36 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the constructor if there is no relationship to the treeNode, you create a new one, but that new treeNode is not connected to the actual node? I'm not sure if it should support the original relationship-traversal API / methods (getRelationships(Dir,type), etc). Perhaps that IndexedRelationship should rather be just a wrapper around a SuperNode ? So probably rename it to SuperNode(Wrapper) or HeavilyConnectedNode(Wrapper) ?)
Re: [Neo4j] Add relationships dynamically
Take a look at the RelationshipType interface. If you implement that (which is really simple - just a name() property), you can have your own class that can have relationships with any names you want. They do need to be unique, however. From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of noppanit [noppani...@gmail.com] Sent: Thursday, July 07, 2011 4:01 PM To: user@lists.neo4j.org Subject: [Neo4j] Add relationships dynamically Is there anyway I can add relationships on-the-fly or programmatically? Because sometime I might not know the relationships and I want to add that to the database. Cheers, -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3149437.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Add relationships dynamically
Also take a look at DynamicRelationshipTypes if you want to instantiate relationship types at runtime. On Thu, Jul 7, 2011 at 9:16 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: Take a look at the RelationshipType interface. If you implement that (which is really simple - just a name() property), you can have your own class that can have relationships with any names you want. They do need to be unique, however. From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of noppanit [noppani...@gmail.com] Sent: Thursday, July 07, 2011 4:01 PM To: user@lists.neo4j.org Subject: [Neo4j] Add relationships dynamically Is there anyway I can add relationships on-the-fly or programmatically? Because sometime I might not know the relationships and I want to add that to the database. Cheers, -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3149437.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
I am glad to see a solution will be provided at the core level. Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, see: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship This provides a solution to the issue, but is certainly not as fast as a solution in core would be. However, it does solve my issues and as a bonus, indexed relationships can be traversed in sorted order,this is especially pleasant, since I usually want to know only the recent additions of dense relationships. Niels Date: Thu, 7 Jul 2011 21:37:26 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships 2011/7/7 Agelos Pikoulas agelos.pikou...@gmail.com I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net wrote: I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node
Re: [Neo4j] Indexed relationships
Could you put these code examples into the Readme for the project or on a wiki page? Am 07.07.2011 um 22:11 schrieb Niels Hoogeveen: IndexedRelationship and IndexedRelationshipExpander are now in Git. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship An example: class IdComparator implements java.util.ComparatorNode{ public int compare(Node n1, Node n2){ long l1 = Long.reverse(n1.getId()); long l2 = Long.reverse(n2.getId()); if(l1 == l2) return 0; else if(l1 l2) return -1; else return 1; } }static enum RelTypes implements RelationshipType{ DIRECT_RELATIONSHIP, INDEXED_RELATIONSHIP, }; Node indexedNode = graphDb().createNode(); IndexedRelationship ir = new IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new IdComparator(), true, indexedNode, graphDb()); Node n1 = graphDb().createNode(); n1.setProperty(name, n1); Node n2 = graphDb().createNode(); n2.setProperty(name, n2); Node n3 = graphDb().createNode(); n3.setProperty(name, n3); Node n4 = graphDb().createNode(); n4.setProperty(name, n4); indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); ir.createRelationshipTo(n2); ir.createRelationshipTo(n4); IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.DIRECT_RELATIONSHIP); IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.INDEXED_RELATIONSHIP); for(Relationship rel: re1.expand(indexedNode)){ System.out.println(rel.getEndNode().getProperty(name)); } for(Relationship rel: re2.expand(indexedNode)){ System.out.println(re2.getEndNode().getProperty(name)); } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 16:55:36 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the constructor if there is no relationship to the treeNode, you create a new one, but that new treeNode is not connected to the actual node? I'm not sure if it should support the original relationship-traversal API / methods (getRelationships(Dir,type), etc). Perhaps that IndexedRelationship should rather be just a wrapper around a SuperNode ? So probably
Re: [Neo4j] Performance issue on nodes with lots of relationships
Niels could you perhaps write up a blog post detailing the usage (also for your own scenario and how that solution would compare to the naive supernodes with just millions of relationships. Also I'd like to see a performance comparision of both approaches. Thanks so much for your work Michael Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: I am glad to see a solution will be provided at the core level. Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, see: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship This provides a solution to the issue, but is certainly not as fast as a solution in core would be. However, it does solve my issues and as a bonus, indexed relationships can be traversed in sorted order,this is especially pleasant, since I usually want to know only the recent additions of dense relationships. Niels Date: Thu, 7 Jul 2011 21:37:26 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships 2011/7/7 Agelos Pikoulas agelos.pikou...@gmail.com I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net wrote: I have a graph with roughly 10M nodes. Some of these
Re: [Neo4j] REST batch support - transaction support for java rest client?
Right those are some of the issues. So one way would be to specify a tx timeout upfront which automatically rolls back the tx (you can just add a kind of timer/TTL to the tx-session object) and clears it as well. Keeping state on the server is always a problem but I don't see a different solution for that. But it might worth a try especially if it helps you with your concrete scenario. Michael Am 07.07.2011 um 15:30 schrieb Patrik Sundberg: good idea. i'll ponder it for a bit. but yes, we clearly need to keep state around, so for REST it'd be carried around in session. but on server side I guess you have issues with never ending transactions, how to cull them, etc. since it's a stateless req/response comm channel. on a permanent channel it's easy to detect disconnect and clean up, over http not as easy. thanks On Thu, Jul 7, 2011 at 12:42 PM, Michael Hunger michael.hun...@neotechnology.com wrote: But then it would be possible to write a RequestFilter for the Neo4j-Server that does start and commit/rollbacks transactions. I.e. you create a tx object and put it in the session-context if there is none and return a tx-token that the filter uses (e.g. as header-field). then later you can pull it out again and attach it to the current thread (that's the tricky part). On commit or rollback you just do that with the tx (after attaching it to the thread). As the RestfulGraphDb and the Filter share the same execution thread this could/should work. I wouldn't want to support that in the neo4j server by default as this creates a lot of server-side state that has to be managed. But if it works out one could publish that as server-extension. HTH Michael Am 07.07.2011 um 13:30 schrieb Patrik Sundberg: Following up on the topic of transactions for client API. What is the current plan for some sort of client side API supporting transactions? I'm playing around with some ideas here and the lack of transaction support in the client API is problematic. I know there's BATCH support in the REST API which effectively is a transaction, but it doesn't always suit. For example I have the following steps that I'd like to accomplish: - create a reference node - check if a node with a given domain id exist in an index, if it does, fail - create an entity node for the given domain id - add entity node to the index - attach entity node to ref node - create a node representing a specific version of the entity node - attach the version node to the entity node, with some properties on the relationships signifying valid time That should all be considered an atomic operation, all or nothing. Doing it step by step is very easy and natural with REST API, but trying to roll back on error is flaky. I think could batch it, but from a programming style it becomes pretty unnatural. Same thing with a plugin for doing the steps. The natural flow of code client side gets distorted by having to collect a lot of data upfront and then provide all that data to a method call. It's doable, just doesn't seem ideal. Using an embedded db, exposing as some sort of service etc is also doable, it's just that my domain is graph related and I'm pretty happy with just the primitives and using a remote server (if I could have transactions). Number of clients are quite a few and need to share their data + don't all run all the time so can't make the client API the embedded api. I'd think it's not an uncommon situation and many people wishing for a support for natural client side transaction API (similar to embedded api). Patrik On Tue, Jul 5, 2011 at 12:27 PM, Patrik Sundberg patrik.sundb...@gmail.comwrote: yeah, harder problem than my first hunch. sounds like plugins is the way to go for now, hopefully introduction of non-rest protocol with same interface as embedded API in 1.5 will simplify things in the future. thanks On Mon, Jul 4, 2011 at 11:07 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Patrick, I've already thought long and hard about that. The problem is you can't implement that transparently as you can never allow code in a second call rely on data derived from a previous one. The simplest form that I came up with is a BatchCommand that gets an API interface injected that allows requests but doesn't return data. The execution of this Batch command would then return a BatchResult with all the data acquired during the batch operation. Another way would be to inject the normal GraphDatabaseService interface, record the invocations in a first phase and then execute the batch command again (this time ignoring the inputs but then returning the results) but this is bad from a usability perspective. One critical issue is the creation of relationships as they depend on the correct node-ids of previously created nodes. Jacob already thought about some means of referring to
Re: [Neo4j] Add relationships dynamically
Thanks a lot. :) -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3149791.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Add relationships dynamically
But I can have an enum class and implements RelationshipType but doesn't that mean that I have to define each Relationship before hand. For example, If I have a text, I know john, and know relationship doesn't exist in MyRelationshipType (which implements RelationshipType already). How could I create that in runtime? I think when I want to createRelationship I have to specific RelationshipType? Thank you very much, -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3149985.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
I created a wiki page for indexed relationships in the Git repo, see: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 22:53:05 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Could you put these code examples into the Readme for the project or on a wiki page? Am 07.07.2011 um 22:11 schrieb Niels Hoogeveen: IndexedRelationship and IndexedRelationshipExpander are now in Git. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship An example: class IdComparator implements java.util.ComparatorNode{ public int compare(Node n1, Node n2){ long l1 = Long.reverse(n1.getId()); long l2 = Long.reverse(n2.getId()); if(l1 == l2) return 0; else if(l1 l2) return -1; else return 1; } }static enum RelTypes implements RelationshipType{ DIRECT_RELATIONSHIP, INDEXED_RELATIONSHIP, }; Node indexedNode = graphDb().createNode(); IndexedRelationship ir = new IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new IdComparator(), true, indexedNode, graphDb()); Node n1 = graphDb().createNode(); n1.setProperty(name, n1); Node n2 = graphDb().createNode(); n2.setProperty(name, n2); Node n3 = graphDb().createNode(); n3.setProperty(name, n3); Node n4 = graphDb().createNode(); n4.setProperty(name, n4); indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); ir.createRelationshipTo(n2); ir.createRelationshipTo(n4); IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.DIRECT_RELATIONSHIP); IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.INDEXED_RELATIONSHIP); for(Relationship rel: re1.expand(indexedNode)){ System.out.println(rel.getEndNode().getProperty(name)); } for(Relationship rel: re2.expand(indexedNode)){ System.out.println(re2.getEndNode().getProperty(name)); } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 16:55:36 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a
Re: [Neo4j] Indexed relationships
Thanks Michael Am 08.07.2011 um 01:19 schrieb Niels Hoogeveen: I created a wiki page for indexed relationships in the Git repo, see: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 22:53:05 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Could you put these code examples into the Readme for the project or on a wiki page? Am 07.07.2011 um 22:11 schrieb Niels Hoogeveen: IndexedRelationship and IndexedRelationshipExpander are now in Git. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship An example: class IdComparator implements java.util.ComparatorNode{ public int compare(Node n1, Node n2){ long l1 = Long.reverse(n1.getId()); long l2 = Long.reverse(n2.getId()); if(l1 == l2) return 0; else if(l1 l2) return -1; else return 1; } }static enum RelTypes implements RelationshipType{ DIRECT_RELATIONSHIP, INDEXED_RELATIONSHIP, }; Node indexedNode = graphDb().createNode(); IndexedRelationship ir = new IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new IdComparator(), true, indexedNode, graphDb()); Node n1 = graphDb().createNode(); n1.setProperty(name, n1); Node n2 = graphDb().createNode(); n2.setProperty(name, n2); Node n3 = graphDb().createNode(); n3.setProperty(name, n3); Node n4 = graphDb().createNode(); n4.setProperty(name, n4); indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); ir.createRelationshipTo(n2); ir.createRelationshipTo(n4); IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.DIRECT_RELATIONSHIP); IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.INDEXED_RELATIONSHIP); for(Relationship rel: re1.expand(indexedNode)){ System.out.println(rel.getEndNode().getProperty(name)); } for(Relationship rel: re2.expand(indexedNode)){ System.out.println(re2.getEndNode().getProperty(name)); } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 16:55:36 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that show how it works/is used) ? In creation, manual traversal and automatic traversal (i.e. is there a RelationshipExpander that uses it). And in the
Re: [Neo4j] Performance issue on nodes with lots of relationships
I did a write up on indexed relationships in the Git repo: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships A performance comparison would indeed be great. Anecdotally, I have witnessed the difference when trying to load all entries of Dbpedia. With 2.5 G heap space, loading becomes problematic after some 70,000 relationships have been added to the supernode. With the indexed relationship no such problems arise and 1.6 million relationships are easily created without performance degradation. Having real performance figures would be nice though. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 22:56:17 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships Niels could you perhaps write up a blog post detailing the usage (also for your own scenario and how that solution would compare to the naive supernodes with just millions of relationships. Also I'd like to see a performance comparision of both approaches. Thanks so much for your work Michael Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: I am glad to see a solution will be provided at the core level. Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, see: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship This provides a solution to the issue, but is certainly not as fast as a solution in core would be. However, it does solve my issues and as a bonus, indexed relationships can be traversed in sorted order,this is especially pleasant, since I usually want to know only the recent additions of dense relationships. Niels Date: Thu, 7 Jul 2011 21:37:26 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships 2011/7/7 Agelos Pikoulas agelos.pikou...@gmail.com I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in
Re: [Neo4j] Add relationships dynamically
That's such a fast reply, I'm sorry I was going to delete my previous post. I didn't read that well. I get it now. Thanks :) -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Add-relationships-dynamically-tp3149437p3150037.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Indexed relationships
I don't know if this is the right place to ask this; but does it support a batch insert mode? When I am bulk loading data I don't have Node objects to pass around, only node ids. Thanks, Andrew On 07/07/2011 06:19 PM, Niels Hoogeveen wrote: I created a wiki page for indexed relationships in the Git repo, see: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 22:53:05 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Could you put these code examples into the Readme for the project or on a wiki page? Am 07.07.2011 um 22:11 schrieb Niels Hoogeveen: IndexedRelationship and IndexedRelationshipExpander are now in Git. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship An example: class IdComparator implements java.util.ComparatorNode{ public int compare(Node n1, Node n2){ long l1 = Long.reverse(n1.getId()); long l2 = Long.reverse(n2.getId()); if(l1 == l2) return 0; else if(l1 l2) return -1; else return 1; } }static enum RelTypes implements RelationshipType{ DIRECT_RELATIONSHIP, INDEXED_RELATIONSHIP, }; Node indexedNode = graphDb().createNode(); IndexedRelationship ir = new IndexedRelationship(RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, new IdComparator(), true, indexedNode, graphDb()); Node n1 = graphDb().createNode(); n1.setProperty(name, n1); Node n2 = graphDb().createNode(); n2.setProperty(name, n2); Node n3 = graphDb().createNode(); n3.setProperty(name, n3); Node n4 = graphDb().createNode(); n4.setProperty(name, n4); indexedNode.createRelationshipTo(n1, RelTypes.DIRECT_RELATIONSHIP); indexedNode.createRelationshipTo(n3, RelTypes.DIRECT_RELATIONSHIP); ir.createRelationshipTo(n2); ir.createRelationshipTo(n4); IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.DIRECT_RELATIONSHIP); IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, RelTypes.INDEXED_RELATIONSHIP); for(Relationship rel: re1.expand(indexedNode)){ System.out.println(rel.getEndNode().getProperty(name)); } for(Relationship rel: re2.expand(indexedNode)){ System.out.println(re2.getEndNode().getProperty(name)); } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 16:55:36 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael,I realize that the implementation of IndexedRelationship can in fact support returning relationships, and I have a preliminary version running locally now.The returned relationships can support all methods of the Relationship interface, returning the node pointing to the treeRoot as the startNode, and returning the node set as the key_value as the endNode.All relationship properties will be stored on the KEY_VALUE relationship pointing to the endNode.There is one caveat to this solution, the returned relationships cannot support the getId() method,and will throw an UnsupportedOperationException when being called.IndexedRelationship will implement IterableRelationship.With these changes, it is possible to create an Expander and I am working right now to implement that.Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 7 Jul 2011 14:46:35 +0200 Subject: Re: [Neo4j] Indexed relationships Hi Michael, I haven't yet worked on an example. There are tests for the SortedTree implementation, but didn't add those to the IndexedRelationship class, which is simply a wrapper around SortedTree. Having a test would have caught the error that no relationship to the treeNode was created (fixed that bug and pushed it to Git) (note to self: always create a unit test, especially when code seems trivial). There is no relationship expander that uses this. The RelationshipExpander has a method IterableRelationship expand(Node node) which cannot be supported, since there is no direct relationship from startnode to endnode. Instead there is a path through the index tree. It's not possible to support the original relationship-traversal API since the IndexedRelationship class is not a wrapper around a node, but a wrapper around the relationships of a certain RelationshipType in the OUTGOING direction. As to the name of the class. It is essentially an indexed relationship, and not just a solution to the densely-connected-node problem. An indexed relationship can also be used to maintain a sorted set of relationships of any size, and can be used to guarantee unicity constraints. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 13:27:00 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Indexed relationships Good work, do you have an example ready (and/or some tests that
Re: [Neo4j] Performance issue on nodes with lots of relationships
Niels, that sounds fantastic, great work everyone so far! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Jul 8, 2011 at 1:27 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I did a write up on indexed relationships in the Git repo: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships A performance comparison would indeed be great. Anecdotally, I have witnessed the difference when trying to load all entries of Dbpedia. With 2.5 G heap space, loading becomes problematic after some 70,000 relationships have been added to the supernode. With the indexed relationship no such problems arise and 1.6 million relationships are easily created without performance degradation. Having real performance figures would be nice though. Niels From: michael.hun...@neotechnology.com Date: Thu, 7 Jul 2011 22:56:17 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships Niels could you perhaps write up a blog post detailing the usage (also for your own scenario and how that solution would compare to the naive supernodes with just millions of relationships. Also I'd like to see a performance comparision of both approaches. Thanks so much for your work Michael Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: I am glad to see a solution will be provided at the core level. Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, see: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship This provides a solution to the issue, but is certainly not as fast as a solution in core would be. However, it does solve my issues and as a bonus, indexed relationships can be traversed in sorted order,this is especially pleasant, since I usually want to know only the recent additions of dense relationships. Niels Date: Thu, 7 Jul 2011 21:37:26 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships 2011/7/7 Agelos Pikoulas agelos.pikou...@gmail.com I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted implemented in the core distribution so that all API's (including Cypher tinkerpop Pipes/Gremlin) can use it efficiently... Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White li...@andrewewhite.net wrote: I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew Whiteli...@andrewewhite.net wrote: Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) |