Re: [Neo4j] Issues with IndexedRelationship
Excellent... I did a code review and think this is a huge improvement over what we had. Peter, can you pull these changes, I no longer have the privs to do so. Niels Date: Thu, 8 Sep 2011 17:24:44 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship I have made the changes in regards to SortedTree in regards to relationships vs nodes, and have got all the tests passing. The changes are pushed up to my github account (and pull request has been raised). The changes can be seen here: https://github.com/brycenz/graph-collections On Thu, Sep 8, 2011 at 3:41 PM, Bryce bryc...@gmail.com wrote: Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more
Re: [Neo4j] Issues with IndexedRelationship
I like this idea Date: Thu, 8 Sep 2011 15:41:52 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two
Re: [Neo4j] Issues with IndexedRelationship
Niels, Bryce, great! Gave you access to the repo, please merge :) /peter On Thu, Sep 8, 2011 at 2:32 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Excellent... I did a code review and think this is a huge improvement over what we had. Peter, can you pull these changes, I no longer have the privs to do so. Niels Date: Thu, 8 Sep 2011 17:24:44 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship I have made the changes in regards to SortedTree in regards to relationships vs nodes, and have got all the tests passing. The changes are pushed up to my github account (and pull request has been raised). The changes can be seen here: https://github.com/brycenz/graph-collections On Thu, Sep 8, 2011 at 3:41 PM, Bryce bryc...@gmail.com wrote: Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship
[Neo4j] Issues with IndexedRelationship
Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility
Re: [Neo4j] Issues with IndexedRelationship
Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C
Re: [Neo4j] Issues with IndexedRelationship
Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I
Re: [Neo4j] Issues with IndexedRelationship
I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels
Re: [Neo4j] Issues with IndexedRelationship
Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate
Re: [Neo4j] Issues with IndexedRelationship
I have made the changes in regards to SortedTree in regards to relationships vs nodes, and have got all the tests passing. The changes are pushed up to my github account (and pull request has been raised). The changes can be seen here: https://github.com/brycenz/graph-collections On Thu, Sep 8, 2011 at 3:41 PM, Bryce bryc...@gmail.com wrote: Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same