Re: [Neo4j] Modeling subrelationships in Neo4j
I think my explanation was not clear as it should be. I wasn't suggesting to replace the relationships with a node, but to shadow the relationshiptypes with a node. Let's say we have two relationshiptypes, KNOWS and FRIEND, where we want to state that friends form a subset of the people a person knows. Additionally we have a relationshiptype SUBRELATIONSHIP indicating that a relationshiptype is a subtype of another relationshiptype. For the two relationshiptypes KNOWS and FRIEND, create nodes and store the name of the relationshiptype in a property on that node. These two nodes must somehow be indexed, which you can either do with Lucene, though in my own application I have chosen to create a namespace node attached to the reference node, and create a relationship from that namespace node to the relationshiptype node. This allows for a quick lookup of the relationshiptype nodes. Additionally a relationhip of type SUBRELATIONSHIP should be created from the FRIEND node to the KNOWS node. Now methods for the retrieval of relationships should be written, so you don't fetch just the relationships with a given relationshiptype, but traverse all subrelationshiptype too and fetch all relationships on a node with those subrelationships. Example: pete -- FRIEND -- jakepete -- FRIEND -- ellenpete -- KNOWS -- patty Suppose we want to fetch all the people pete knows. We traverse the hierarchy of relationshiptypes under KNOWS, and get an Iterable with the two relationshiptype nodes associated with KNOWS and FRIEND. Then we iterate over these relationhiptype nodes fetching the relationship on the pete-node with the corresponding relationshiptype, thereby returning an Iterable with the nodes associated with jake, ellen and patty. For faster lookups, I have decided to use the id of the relationshiptype node as the name of the relationships used, but this is not a requirement for this solution. Niels Date: Wed, 7 Dec 2011 18:16:15 +0530 From: sourajit.ba...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Modeling subrelationships in Neo4j To Niels' approach, Wouldn't it be a very dense graph ? For e.g. there will be several people inter-connected by KNOWS; if we model KNOWS as a node, there would be lots of edges originating from it. On Wed, Dec 7, 2011 at 5:16 PM, Alistair Jones alistair.jo...@neotechnology.com wrote: Qualifying the relationships with an additional property (or properties) sounds like a sensible approach. The simplest thing to do would be to have a boolean property to distinguish the two types, so they would both have relationship type KNOWS, and also a boolean property well. You could use this in a cypher query like this: start Alistair = node(1) match Alistair -[r:KNOWS]- friend where r.well = true return friend.name Alternatively, as Rick suggests, if you wanted a sliding scale of knowing, you could have a numerical property, and then do more sophisticated traversals. This is analogous to a weighted graph that you might use for route planning, where each of the relationships is weighted with a property distance or time. In cypher: start Alistair = node(1) match Alistair -[r:KNOWS]- friend where r.how_well 50 return friend.name This property-based approach is less sophisticated than Niels' true relationship-type-hierarchy approach, but I guess it depends on your domain what will be most appropriate. I think using properties is probably simpler to implement if it meets your needs. -Alistair On 6 December 2011 14:14, Rick Otten rot...@manta.com wrote: Can you do this with properties on the relationship? In your example a KNOWS relationship could have a how well property, with values 1 to 100. You could define KNOWS_BETTER as [ 50 how well 80 ]. KNOWS_BEST as [ 80 = how well = 100 ]. I'm not sure what the difference between a sub relationship and a relationship qualified with properties really is. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Sourajit Basak Sent: Tuesday, December 06, 2011 6:09 AM To: user@lists.neo4j.org Subject: [Neo4j] Modeling subrelationships in Neo4j Is it possible to create subrelationships in neo4j ? For e.g. a relationship called KNOWS_BETTER as a subrelationship of KNOWS. Do I need to explicitly connect the nodes using both relationships for the traversal to work ? Lets say, I create this neo4j -- KNOWS_BETTER -- graphDB, does this entails the following ? neo4j -- KNOWS -- graphDB. Such a scenario can be modeled in OWL Ontology, wondering if neo4j has any capabilities. Note: Under the hood, most OWL Ontology implementations do create these *extra* inferred links internally. ___ Neo4j mailing list User@lists.neo4j.org
Re: [Neo4j] Modeling subrelationships in Neo4j
It cannot directly be done through the standard API, but of course it can be implemented. I do this myself in an application I am building. For every RelationshipType, i create a Node and between those Nodes there can have subtyping relationships. To make lookup fast, I use the node-id of the RelationshipTypeNodes as the RelationshipType name, and give it a more meaningful name by means of a property on the RelationshipTypeNode. This way the Node belonging to a RelationshipType can be fetched without overhead and it allows me to change the name of the relationhip type. Downside to the approach is that relationhips have no meaningful name when displayed in neoclipse. Of course you need to write your own methods to fetch relationships from nodes, because you may want to fetch not only the ones with the RelationhipType you supply, but also those with a RelationshipType that is a subtype thereof. Niels Date: Tue, 6 Dec 2011 16:39:19 +0530 From: sourajit.ba...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Modeling subrelationships in Neo4j Is it possible to create subrelationships in neo4j ? For e.g. a relationship called KNOWS_BETTER as a subrelationship of KNOWS. Do I need to explicitly connect the nodes using both relationships for the traversal to work ? Lets say, I create this neo4j -- KNOWS_BETTER -- graphDB, does this entails the following ? neo4j -- KNOWS -- graphDB. Such a scenario can be modeled in OWL Ontology, wondering if neo4j has any capabilities. Note: Under the hood, most OWL Ontology implementations do create these *extra* inferred links internally. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Moving to u...@neo4j.org
Good decision. Immediately signed up. From: peter.neuba...@neotechnology.com Date: Wed, 30 Nov 2011 13:55:44 +0100 To: user@lists.neo4j.org Subject: [Neo4j] Moving to u...@neo4j.org Hi all, we are going to move from mailman to google groups, http://groups.google.com/a/neo4j.org/groups/dir soon, I just wanted to give you a heads-up that I will invite add of the current members to http://groups.google.com/a/neo4j.org/group/user/topics?lnk when we are ready. Just wanted to warn you that there might be a surprising welcome message from that group soonish, hope you don't mind! Let me know if you have any objections. Happy hacking! /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer brew install neo4j neo4j start heroku addons:add neo4j ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] collation and wild card queries
In order to have proper sort order for Strings with diacritical characters, I started using Lucene's ICUCollationKeyAnalyzer. This indeed gives the proper sort order for queries, but for some reason wild card queries no longer seem to work. This applies for both the normal CollationKeyAnalyzer and for the ICU variant. Exact queries work, but as soon as a wild card is added the query no longer returns any results. Does anyone have an idea how to solve this? I'd like to be able to have an index that allows both diacritics-aware sort order and support for wild cards. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j upcoming features importance poll
I noticed work on supernodes being committed to GitHub. Looking forward seeing this and in 1.6-SNAPSHOT. I would like to test this sooner rather than later. The node#getDegree methods are a great addition. Niels From: peter.neuba...@neotechnology.com Date: Tue, 22 Nov 2011 15:51:15 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] Neo4j upcoming features importance poll Uservoice seems great. If rapportive uses it, http://feedback.rapportive.com/forums/42557-general then it is good in my book. I think we should try it if we can integrate this with GIThub issues. Pablo, impressive feedback on http://www.doodle.com/wg8k77vwq6b654bv ! I think Mattias will be delighted that the supernode support is on top ;) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. On Sat, Nov 19, 2011 at 9:29 PM, Peter Bell li...@pbell.com wrote: Uservoice might be a good fit. I used it for feature voting on as OSS project and it worked out pretty well... Sent from my iPhone On Nov 19, 2011, at 2:11 PM, Nigel Small ni...@nigelsmall.name wrote: Actually sounds like we may have finally found a use for Google Wave! :-P On 19 Nov 2011 13:09, Pablo Pareja ppar...@era7.com wrote: Yeah it'd be great having something more wiki-like that everyone could edit. I have no idea though about how could this be done Any ideas? Pablo On Sat, Nov 19, 2011 at 7:54 PM, Nigel Small ni...@nigelsmall.name wrote: How about something like Wufoo? http://www.wufoo.com/ http://www.wufoo.com/ *Nigel Small* Phone: +44 7814 638 246 Blog: http://nigelsmall.name/ GTalk: ni...@nigelsmall.name MSN: nasm...@live.co.uk Skype: technige Twitter: @technige https://twitter.com/#!/technige LinkedIn: http://uk.linkedin.com/in/nigelsmall On 19 November 2011 18:45, Peter Neubauer peter.neuba...@neotechnology.comwrote: I really like this. Is there any other transparent public method you poll, like a Google form that everyone can edit? On Nov 19, 2011 7:19 PM, Pablo Pareja ppar...@era7.com wrote: I just added a link for every possible upcoming feature and created an issue for those which didn't have one so far. Sorry for those who voted already but since the options changed, their vote was lost, could you please vote again? From now on every time we add a new feature to the poll we should create its respective issue before adding it. At least, whenever a new option is added, the votes for the rest of options are conserved, so we should be able update our votes just adding our vote (or not) to the new ones. Sorry for the inconvenience! Pablo On Sat, Nov 19, 2011 at 5:21 PM, Pablo Pareja ppar...@era7.com wrote: Ok, I just did that for the first one; the bad thing about this is that every time I edit one of the options, all the votes cast for it get lost and you have to edit your vote again... So maybe from now on I'd be better adding new features to the poll only once their respective issues has been risen in github. What do you think? Pablo On Sat, Nov 19, 2011 at 5:16 PM, Pablo Pareja ppar...@era7.com wrote: Yeah that'd be cool, if you give me the links I can put them as part of the options themselves (with bit.ly or something like that). Cheers, Pablo On Sat, Nov 19, 2011 at 1:30 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Guys, This is great! Could you raise issues for these and we mitigate missing voting on Github with this, linking back to github for discussion? On Nov 19, 2011 1:09 PM, Pablo Pareja ppar...@era7.com wrote: @Linan get_or_create feature added ;) @Mattias I mean being required to specify a node type at creation time, (as how things are right now with relationships) On Sat, Nov 19, 2011 at 1:01 PM, Mattias Persson matt...@neotechnology.comwrote: hat exactly does mandatory node types mean? 2011/11/19 Pablo Pareja ppar...@era7.com Hi all, I was thinking it'd be cool to create a sort of a poll in order to know which features (that are missing right now...) are the most important ones for the community. I just did a quick google search for free online poll creation platforms and found doodle site, (btw do you know a better site to do this?). The address for the poll is: http://www.doodle.com/wg8k77vwq6b654bv So far I just added three features that came to my mind while I was creating it, so please say which features you're missing and I'll add them so that we can all vote for them or not. What do you think about all this? Cheers,
Re: [Neo4j] Lucene sort with diacritic characters
anyone? From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 10 Nov 2011 20:20:46 +0100 Subject: [Neo4j] Lucene sort with diacritic characters When retrieving items from a Lucene index, using the sort method, it seems the order doesn't abide proper rules for sorting diacritic characters. For example, Århus comes later in the list than Zürich and Ḩalab comes later than Žužemberk. Can someone help me solve this? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene sort with diacritic characters
Thanks Rick, I will try out your suggestions. Niels From: rick.bullo...@thingworx.com To: user@lists.neo4j.org Date: Fri, 11 Nov 2011 07:33:44 -0700 Subject: Re: [Neo4j] Lucene sort with diacritic characters You probably need to create a custom analyzer using one of Lucene's collation filters (which you will provide as a parameter to the Neo4J index creation method). Unfortunately, you can't apply a new analyzer after the fact. I think you'll need to delete and regenerate the index. Lucene has some built-in language specific collation filters, but there is also a contributed package, ICUCollationKeyFilter, which may have some advantages in terms of performance. Unfortunately, I do not direct experience in using either, but hopefully this will help get you pointed in the right direction. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Friday, November 11, 2011 9:27 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene sort with diacritic characters anyone? From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 10 Nov 2011 20:20:46 +0100 Subject: [Neo4j] Lucene sort with diacritic characters When retrieving items from a Lucene index, using the sort method, it seems the order doesn't abide proper rules for sorting diacritic characters. For example, Århus comes later in the list than Zürich and Ḩalab comes later than Žužemberk. Can someone help me solve this? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene sort with diacritic characters
It works like a dream. One note for others needing this functionality. The ICUCollationKeyAnalyzer has a constructor which takes a Collator (from icu4j) as argument. Neo4j's index requires a constructor without arguments, so it's necessary to wrap the ICUCollationKeyAnalyzer and provide it the appropriate Collator in the constructor. For me Collator.SECONDARY was the best choice. Niels From: rick.bullo...@thingworx.com To: user@lists.neo4j.org Date: Fri, 11 Nov 2011 07:33:44 -0700 Subject: Re: [Neo4j] Lucene sort with diacritic characters You probably need to create a custom analyzer using one of Lucene's collation filters (which you will provide as a parameter to the Neo4J index creation method). Unfortunately, you can't apply a new analyzer after the fact. I think you'll need to delete and regenerate the index. Lucene has some built-in language specific collation filters, but there is also a contributed package, ICUCollationKeyFilter, which may have some advantages in terms of performance. Unfortunately, I do not direct experience in using either, but hopefully this will help get you pointed in the right direction. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Friday, November 11, 2011 9:27 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene sort with diacritic characters anyone? From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 10 Nov 2011 20:20:46 +0100 Subject: [Neo4j] Lucene sort with diacritic characters When retrieving items from a Lucene index, using the sort method, it seems the order doesn't abide proper rules for sorting diacritic characters. For example, Århus comes later in the list than Zürich and Ḩalab comes later than Žužemberk. Can someone help me solve this? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Function to check whether two nodes are connected?
There is one caveat to this method, you'd have to know which node is most densely connected. Suppose one of the nodes has 100,000 relationships (incoming and outgoing) and the other node has only a few relationships, then you'd want to iterate over the relationships of the second node. A solution could be to iterate over both sets of relationships at the same time: public boolean areConnected(Node n1,Node n2, RelationshipType relType,Direction dir) { IteratorRelatiionship rels1 = n1.getRelationships(relType, dir).iterator(); IteratorRelatiionship rels2 = n2.getRelationships(relType, dir).iterator(); while(rels1.hasNext rels2.hasNext){ Relationship rel1 = rels1.next(); Relationship rel2 = rels2.next(); if (rel1.getEndNode().equals(n2) return true; else if (rel2.getEndNode().equals(n1)) return true; } return false; } Date: Thu, 27 Oct 2011 18:39:01 +0200 From: bplsi...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Function to check whether two nodes are connected? Easy: just one. For now, I've written this, but I'm still not sure it is the simplest way to write it public boolean areConnected(Node n1,Node n2,Relationship rel,Direction dir) throws Exception { IterableRelationship relationships = n1.getRelationships(dir); for (Relationship r : relationships) { //I am only working with Dynamic Relationships if (r.getType().equals(rel.getType())) { if (dir == Direction.OUTGOING) { if (r.getEndNode().equals(n2)) { return true; } } else { if (r.getStartNode().equals(n2)) { return true; } } } } return false; } Bruno Le 27/10/2011 18:31, Peter Neubauer a écrit : Bruno, There is no such function low level, but toy can use a Shortest path algo to check this. What is the maximum length for a path between the nodes? On Oct 27, 2011 6:14 PM, Bruno Paiva Lima da Silvabplsi...@gmail.com wrote: Hello there! First of all, thanks for the help in all my previous questions, all the answers have been helping me to use Neo4j with success. I have a very simple question, but I haven't found the answer yet... I'd like to have a function, which signature would be more or less like this: public areTheyConnected(Node *n1*,Node *n2*,Relationship *rel*,Direction *dir*) which returns true iff there is an edge of type *rel*, between *n1* and *n2*, in the *dir* direction (the direction has n1 as reference). Example: In my graph, I have: Bob knows Tom, Tom knows Peter, Jack knows Tom areTheyConnected(nodeBob,nodeTom,relKnows,Direction.OUTGOING) returns true; (Bob knows Tom) areTheyConnected(nodeTom,nodeJack,relKnows,Direction.INCOMING) also returns true; (Jack knows Tom) areTheyConnected(nodeBob,nodeTom,relKnows,Direction.INCOMING) returns false; (Tom doesn't know Bob) Is there an easy method (constant time, or close) for that? Thank you very much, Bruno ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Function to check whether two nodes are connected?
I see I made a bit of a mistake with this one. The gist of the solution remains, but I made a mistake dealing with the directions of relationship. It should be something like this. public boolean areConnected(Node n1,Node n2, RelationshipType relType,Direction dir) { Direction dir2 = null; if(dir.equals(Direction.INCOMING)) dir2 = Direction.OUTGOING; else if(dir.equals(Direction.OUTGOING)) dir2 = Direction.INCOMING; else dir2 = Direction.BOTH; IteratorRelationship rels1 = n1.getRelationships(relType, dir).iterator(); IteratorRelationship rels2 = n2.getRelationships(relType, dir2).iterator(); while(rels1.hasNext rels2.hasNext){ Relationship rel1 = rels1.next(); Relationship rel2 = rels2.next(); if (rel1.getEndNode().equals(n2) return true; else if (rel2.getEndNode().equals(n1)) return true; } return false; } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 27 Oct 2011 19:05:16 +0200 Subject: Re: [Neo4j] Function to check whether two nodes are connected? There is one caveat to this method, you'd have to know which node is most densely connected. Suppose one of the nodes has 100,000 relationships (incoming and outgoing) and the other node has only a few relationships, then you'd want to iterate over the relationships of the second node. A solution could be to iterate over both sets of relationships at the same time: public boolean areConnected(Node n1,Node n2, RelationshipType relType,Direction dir) { IteratorRelatiionship rels1 = n1.getRelationships(relType, dir).iterator(); IteratorRelatiionship rels2 = n2.getRelationships(relType, dir).iterator(); while(rels1.hasNext rels2.hasNext){ Relationship rel1 = rels1.next(); Relationship rel2 = rels2.next(); if (rel1.getEndNode().equals(n2) return true; else if (rel2.getEndNode().equals(n1)) return true; } return false; } Date: Thu, 27 Oct 2011 18:39:01 +0200 From: bplsi...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Function to check whether two nodes are connected? Easy: just one. For now, I've written this, but I'm still not sure it is the simplest way to write it public boolean areConnected(Node n1,Node n2,Relationship rel,Direction dir) throws Exception { IterableRelationship relationships = n1.getRelationships(dir); for (Relationship r : relationships) { //I am only working with Dynamic Relationships if (r.getType().equals(rel.getType())) { if (dir == Direction.OUTGOING) { if (r.getEndNode().equals(n2)) { return true; } } else { if (r.getStartNode().equals(n2)) { return true; } } } } return false; } Bruno Le 27/10/2011 18:31, Peter Neubauer a écrit : Bruno, There is no such function low level, but toy can use a Shortest path algo to check this. What is the maximum length for a path between the nodes? On Oct 27, 2011 6:14 PM, Bruno Paiva Lima da Silvabplsi...@gmail.com wrote: Hello there! First of all, thanks for the help in all my previous questions, all the answers have been helping me to use Neo4j with success. I have a very simple question, but I haven't found the answer yet... I'd like to have a function, which signature would be more or less like this: public areTheyConnected(Node *n1*,Node *n2*,Relationship *rel*,Direction *dir*) which returns true iff there is an edge of type *rel*, between *n1* and *n2*, in the *dir* direction (the direction has n1 as reference). Example: In my graph, I have: Bob knows Tom, Tom knows Peter, Jack knows Tom areTheyConnected(nodeBob,nodeTom,relKnows,Direction.OUTGOING) returns true; (Bob knows Tom) areTheyConnected(nodeTom,nodeJack,relKnows,Direction.INCOMING) also returns true; (Jack knows Tom) areTheyConnected(nodeBob,nodeTom,relKnows,Direction.INCOMING) returns false; (Tom doesn't know Bob) Is there an easy method (constant time, or close) for that? Thank you very much, Bruno ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org
Re: [Neo4j] Function to check whether two nodes are connected?
You know me and my obsession for densely connected nodes :-) Date: Thu, 27 Oct 2011 17:37:07 + From: peter.neuba...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Function to check whether two nodes are connected? Good catch Niels, thanks - my brain is in jet lag mode :-\ On Oct 27, 2011 7:26 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I see I made a bit of a mistake with this one. The gist of the solution remains, but I made a mistake dealing with the directions of relationship. It should be something like this. public boolean areConnected(Node n1,Node n2, RelationshipType relType,Direction dir) { Direction dir2 = null; if(dir.equals(Direction.INCOMING)) dir2 = Direction.OUTGOING; else if(dir.equals(Direction.OUTGOING)) dir2 = Direction.INCOMING; else dir2 = Direction.BOTH; IteratorRelationship rels1 = n1.getRelationships(relType, dir).iterator(); IteratorRelationship rels2 = n2.getRelationships(relType, dir2).iterator(); while(rels1.hasNext rels2.hasNext){ Relationship rel1 = rels1.next(); Relationship rel2 = rels2.next(); if (rel1.getEndNode().equals(n2) return true; else if (rel2.getEndNode().equals(n1)) return true; } return false; } From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 27 Oct 2011 19:05:16 +0200 Subject: Re: [Neo4j] Function to check whether two nodes are connected? There is one caveat to this method, you'd have to know which node is most densely connected. Suppose one of the nodes has 100,000 relationships (incoming and outgoing) and the other node has only a few relationships, then you'd want to iterate over the relationships of the second node. A solution could be to iterate over both sets of relationships at the same time: public boolean areConnected(Node n1,Node n2, RelationshipType relType,Direction dir) { IteratorRelatiionship rels1 = n1.getRelationships(relType, dir).iterator(); IteratorRelatiionship rels2 = n2.getRelationships(relType, dir).iterator(); while(rels1.hasNext rels2.hasNext){ Relationship rel1 = rels1.next(); Relationship rel2 = rels2.next(); if (rel1.getEndNode().equals(n2) return true; else if (rel2.getEndNode().equals(n1)) return true; } return false; } Date: Thu, 27 Oct 2011 18:39:01 +0200 From: bplsi...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Function to check whether two nodes are connected? Easy: just one. For now, I've written this, but I'm still not sure it is the simplest way to write it public boolean areConnected(Node n1,Node n2,Relationship rel,Direction dir) throws Exception { IterableRelationship relationships = n1.getRelationships(dir); for (Relationship r : relationships) { //I am only working with Dynamic Relationships if (r.getType().equals(rel.getType())) { if (dir == Direction.OUTGOING) { if (r.getEndNode().equals(n2)) { return true; } } else { if (r.getStartNode().equals(n2)) { return true; } } } } return false; } Bruno Le 27/10/2011 18:31, Peter Neubauer a écrit : Bruno, There is no such function low level, but toy can use a Shortest path algo to check this. What is the maximum length for a path between the nodes? On Oct 27, 2011 6:14 PM, Bruno Paiva Lima da Silva bplsi...@gmail.com wrote: Hello there! First of all, thanks for the help in all my previous questions, all the answers have been helping me to use Neo4j with success. I have a very simple question, but I haven't found the answer yet... I'd like to have a function, which signature would be more or less like this: public areTheyConnected(Node *n1*,Node *n2*,Relationship *rel*,Direction *dir*) which returns true iff there is an edge of type *rel*, between *n1* and *n2*, in the *dir* direction (the direction has n1 as reference). Example: In my graph, I have: Bob knows Tom, Tom knows Peter, Jack knows Tom areTheyConnected(nodeBob,nodeTom,relKnows,Direction.OUTGOING) returns true; (Bob knows Tom) areTheyConnected(nodeTom,nodeJack,relKnows,Direction.INCOMING) also returns true; (Jack knows Tom) areTheyConnected(nodeBob,nodeTom,relKnows,Direction.INCOMING) returns false; (Tom doesn't know Bob) Is there an easy method (constant time, or close) for that? Thank you very much, Bruno ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Article: The Coming SQL Collapse
Hijack alert (going completely off topic) I noticed the following statement: all reasoning is best with a linked list data structure. When looking at the underlying store we see that the RelationshipRecord indeed forms two linked lists, one for the incoming side of the relationship and one for the outgoing side of the relationship, yet the API provides more or less the methods of a Set. The insert mechanism for Relationships guarantees that the two linked lists cannot contain duplicates (hence they form a set). The lists are always prepended, with an entry point to the head of the list stored in the NodeRecord. In the past I have (ad nauseum) proposed a partitioning of the Relationship linked lists per Direction per RelationshipType. I am not going to repeat my arguments, they can be found here: http://lists.neo4j.org/pipermail/user/2011-August/011191.html and in other posts to the mailing list around that time. Partitioning the two linked lists per Direction per RelationshipType, I now realize, also makes it possible to treat the two linked lists as implementations of the LinkedList interface in a meaningful way. For many practical purposes an ordering of Relationships makes little sense when the Relationships of a Node are not grouped by some critieria, but once we apply such a grouping, ordering starts to make sense. The simplest example I can think of is a timeline, where all relationships are either appended or prepended to the linked lists (depending on the preferred timeline arrow), so each iteration over the Relationships of a certain node for a given Direction and RelationshipType will be returned in the insert order (or inverse insert order) of the Relationships. Supporting all methods of a linked list would also allow for constructs like createRelationshipTo(node, SOME_REL, 2, 4), where 2 and 4 represent the positions in the linked lists (throwing IndexOutOfBoundsExceptions when appropriate). Since linked list data structures are foundational to the Neo4j engine it would make sense to make these structures more explicit in the API, so application programmers can take advantage of the inherent ordering of the underlying storage. Many applications eventually present information in some default sort order, so it would be nice if it were possible to insert relationships according to some sort criterion. Niels From: okramma...@gmail.com Date: Fri, 14 Oct 2011 11:28:16 -0600 To: user@lists.neo4j.org Subject: Re: [Neo4j] Article: The Coming SQL Collapse Hi, This is not conducive to Baysian-based reasoning, evidential reasoning, other forms of logics (classical and non-classical) How would you model those to a suitable domain model? Can you give a good example? Michael Here is an article that argues for support of other data semantics in the Web of Data (RDF world) beyond description logics. In here, you will find examples of other forms of reasoning. http://arxiv.org/abs/0905.3378 Unlike the triple/quad-store world, graph databases provide a very generic data model with limited constraints on meaning. Unfortunately (in my opinion), graph databases like OrientDB and DEX employ typing at the graph database level. Neo4j provides it at the Spring Data Graph level -- a level above. This is good in that Neo4j is not pushing a world view to low into the stack. The world of RDF, on the other hand, and its strong bent towards OWL (description logic) makes it such that the entire technology stack is mixed up with this logic. And, while this logic is very cool, its not the only way to do things -- the only way to view the world. however, at some point, there is always an assumption, and the foundational assumption of graph databases is all reasoning is best with a linked list data structure. See ya, Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Article: The Coming SQL Collapse
I concur. In my opinion Neo4j is more a storage engine with certain storage features than a database management system. This is already exemplified by the absence of a query language as primary interface. The author is therefore wrong in his assessment that there is no separation of logical model and physical model. There is no logical model, so the separation is complete, any logical model can be bolted onto the physical model, or can be stored in a separate repository. In general, I think, NOSQL databases are more storage engine than database management system. It's exactly the control over storage that forms the niche NOSQL database operate in. Distributed key value lookup and tree/graph traversals are typical application domains where SQL engines don't provide the hooks to efficiently or scalably process certain questions or actions. Niels Date: Sat, 15 Oct 2011 01:12:58 +0200 From: a...@morgner.de To: user@lists.neo4j.org Subject: Re: [Neo4j] Article: The Coming SQL Collapse My 2 cents: The Neo4j API is clean, open, and sort-of low level by intention. It is neither ugly, smelly, nor it does it violate anything. Neo4j in general is very stable. But, of course, if you try the latest snapshot, it may have bugs (as any software has). Since May 2010, we're developing a CMS based on Neo4j (structr) and do some graph-related projects. Coming from the Oracle world, I can only say that working with Neo4j is a revelation. Axel Am 14.10.2011 14:48, schrieb Tobias Ivarsson: We had an interesting discussion about this internally at Neo Technology today. We thought it might be of interest to the broader community. I don't think the discussion is over, so it would be interesting to continue it on the public mailing list. It regards the initial paragraphs of an article posted to dzone recently: http://www.dzone.com/links/rss/the_coming_sql_collapse.html It mentions Neo4j and how the author dislikes a common way of using Neo4j for building applications. It would be interesting to hear suggestions on how to improve this. Forwarded conversation follows: On Fri, Oct 14, 2011 at 10:13 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: I found this while reading feeds in bed last night: *The Coming SQL Collapse* http://www.dzone.com/links/rss/the_coming_sql_collapse.html (Sent from Flipboardhttp://flipboard.com) The things he say about SQL vs NOSQL is not very interesting, but I'd like to raise what he says about Neo4j: I looked at neo4j briefly the other day, and quite predictably thought ‘wow, this looks like a serious tinkertoy: it‘s basically a bunch of nodes where you just blob your attributes.‘ Worse than that, to wrap objects around it, you have to have them explicitly incorporate their node class, which is ugly, smelly, violates every law of separation of concerns and logical vs. physical models. On the plus side, as I started to look at it more, I realized that it was the perfect way to implement a backend for a bayesian inference engine (more on that later). Why? Because inference doesn‘t care particularly about all the droll requirements that are settled for you by SQL, and there are no real set operations to speak of. He attacks our pattern of building domain models with Neo4j, calling it ugly, smelly and in violation of every law of separation of concerns and logical vs. physical models. Is he right? My feeling is that he is brain washed with too many so called best practices, but Neo4j has been my main model for a long time now, my perspective is likely skewed. I'd like to hear your thoughts. -- Tobias Ivarssontobias.ivars...@neotechnology.com On Fri, Oct 14, 2011 at 10:32 AM, Rickard Öberg rickard.ob...@neotechnology.com wrote: Well, I'd tend to agree with the author. Mixing persistence details with the domain model itself is really a bad idea. Infrastructure details should not pollute the domain logic as it does with the currently suggested usage of Neo4j. But I think both Spring Data Graph and the Qi4j usage model fixes this, as it allows you to keep many of those things outside of the domain code. /Rickard On Fri, Oct 14, 2011 at 11:45 AM, Tobias Ivarssontobias.ivarsson@ neotechnology.com wrote: On Fri, Oct 14, 2011 at 11:21 AM, Rickard Öberg rickard.ob...@neotechnology.com wrote: On 10/14/11 17:16 , Tobias Ivarsson wrote: I was hoping for a bit more elaboration, of why it is a bad idea. Spring Data Neo4j operates mainly in the same way (at least it did when I was part of the design process), it just hides the details of it. The model we suggest is not to mix infrastructure details (nodes, relationships, traversals) with the domain logic. We suggest the domain logic be a separate layer, acting on domain data objects (defined as a set of interfaces). What we do suggest though
Re: [Neo4j] HyperRelationship example
When I wrote the wiki page for Enhanced-API, I ended up using all the words I had spent on the hyperrelationship example, so I decided to keep the original page alive, but link it to the enhanced API page. Date: Sat, 24 Sep 2011 19:45:47 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] HyperRelationship example Here you go: https://github.com/neo4j/graph-collections/wiki/HyperRelationship-example Though that page just has a link to: https://github.com/neo4j/graph-collections/wiki/Enhanced-API Bryce On Sat, Sep 24, 2011 at 5:00 PM, loldrup lold...@gmail.com wrote: Niels Hoogeveen wrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example This link now gives 404. Does it have a new address? If so, what is it? Jon -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-HyperRelationship-example-tp3204449p3363779.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property last_name. This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name last_name that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. The same applies to relationships, where the name will in general be used to denote the same type of relationship. It is unlikely an application with use the FRIEND relationship to sometimes denote a friendship between two people while at other times use that relationship name to denote the address of a building. This is as far as typing goes in Neo4j, but it is there and means we have to incorporate it into the API somehow. This is the reason why I decided to add subtyping of relationship-types and property-types in the API, a feature that may be of interest to the model you describe in your email. Joe is a janitor at the school. Here we see three elements: Joe, is janitor at, and the school, which can indeed be modeled with two nodes and a relationship. There is however a more general statement here of the form: person works with organization. Suppose we want to store the fact: Jane is principal of the school. Again we can model this with two
Re: [Neo4j] Modelling with neo4j
kind like object blobs), so in our metamodel, they are not stored as nodes, relationships, and properties, but rather, as a JSON blob, serialized as a string to a node property. That has worked out really well. When we do need to filter/manipulate those, we do them at the domain level Just wanted to share some more examples. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Saturday, September 24, 2011 9:14 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property last_name. This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name last_name that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. The same applies to relationships, where the name will in general be used to denote the same type of relationship. It is unlikely an application with use the FRIEND relationship to sometimes
Re: [Neo4j] Modelling with neo4j
to parse and traverse. - We often found that there were data structures in our application domain for which it was OK to be opaque - e.g. although the structures were deep and complex, they did not require searchability or traversability (e.g. they were kind like object blobs), so in our metamodel, they are not stored as nodes, relationships, and properties, but rather, as a JSON blob, serialized as a string to a node property. That has worked out really well. When we do need to filter/manipulate those, we do them at the domain level Just wanted to share some more examples. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Saturday, September 24, 2011 9:14 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property last_name. This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name last_name that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. The same applies
Re: [Neo4j] Modelling with neo4j
Subtyping works as follows in Enhanced API. When calling getRelationships(RelationshipType, Direction) or any of its alternatives, the API looks up all subtypes of that relationship type and then call getRelationshipTypes(Direction, RelationshipType and its subtypes). All you need to do is create a RelationshipType IS_JANITOR_OF and a RelationshipType WORKS_FOR and state that the former is a subtype of the latter. Haskell type classes are a great mechanism for ad-hoc polymorphism and in some ways are preferable to subtyping, though not necessarily in the context of a database. It allows you indeed to say there is a commonality between WORKS_AT and IS_JANITOR_OF, but it doesn't allow you to state that the relationships of type IS_JANITOR_OF are a subset of the relationships of type WORKS_AT. In a database context the subsumption rule is actually quite important and Haskell type classes don't offer that. The combination of type classes and subtyping is as far as I know still an open research topic. It is not without reason that Scala (which has subtyping) doesn't have type classes, though it allows similar constructs through implicit conversions. Working in both disciplines at the same time (poor-man type classes through implicit conversions in combination with subtyping) seems to be non-trivial. Niels Date: Sat, 24 Sep 2011 08:09:48 -0700 From: lold...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j Subtyping of relationship types sounds like the cure to my problems. When creating a relationship IS_A_JANITOR_OF, will a corresponding relationship type IS_A_JANITOR_OF-relationship-type automatically be created? If I have a simple relationship can I then ask which relationship types it's type is a subtype of? Regarding interfaces: I took the idea of interfaces from Haskells type classes, which makes great sense as interfaces. In Neo4j we could imagine that relationships with types WORKS_AT and REFERS_TO might have something in common (e.g. they both have to specify a boss who gives them orders). For now I don't think my problem requires interfaces before it can be be solved, but I only just started so who knows :) Jon On Sep 24, 2011 3:15 PM, Niels Hoogeveen [via Neo4j Community Discussions] ml-node+s438527n3364304...@n3.nabble.com wrote: You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage
Re: [Neo4j] Unrolled Linked List
A quick skim of the code shows me you have a baseNode which is an entrypoint for the ULL. This is a logical candidate node to use for the purpose of locking. What are the pros and cons to locking the baseNode on every read and write operation? Niels Date: Fri, 23 Sep 2011 09:39:38 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List Good stuff. I am presently looking into concurrent use of a given UnrolledLinkedList at least within the same graph database instance, might be a little bit harder in HA environment. Its hard enough writing test cases for this, maybe even harder than making it work properly! Hoping that some utility code I am going to produce will help with testing concurrency of other data structures. By concurrent use I mean concurrent use of the data within the graph, not of the given instantiation of the class, e.g. what happens when one thread gets an instance of ULL based off a given node and is iterating over it, then another thread gets an instance of a ULL and writes into it. Cheers Bryce On Fri, Sep 23, 2011 at 4:57 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: It looks really cool. I always find it fun to create something and later find out it is an already known construction (something worth inventing). Anyway, I pulled your code and will removed the dependencies to the Enhanced API stuff this week. After that we can start adding some documentation. Niels Date: Thu, 22 Sep 2011 15:57:13 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Unrolled Linked List Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags with a large number of items against them, I might read the first 100-200 items out of a collection of 30,000 and therefore by not reading in the other 29800 relationships / nodes (per tag) I should be saving 90% or more. here's hoping. From the java doc: The structure is broken into pages of links to nodes where the size of the page can be controlled at initial construction time. Page size is not fixed but instead can float between a lower bound, and an upper bound. The bounds are at a fixed margin from the page size of M. When a page drops below the lower bound it will be joined onto the an adjacent page, and when the page goes above the upper bound it will be split in half. I am about to do some tests with this based on my use case and will report back on the performance impacts. Cheers Bryce P.S. still thinking about how to make this thread safe, any suggestions would be appreciated (presently only one thread will be able to write at a time, I am worried about a thread reading while another is writing, especially when it joins / splits pages or changes the head). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Unrolled Linked List
Read integrity is really a dog. We haven't even begun to address that in the other collections. With regards to write locks ( and this is something we should check in sortedtree too ) is code like: page.setProperty( ITEM_COUNT, ((Integer) page.getProperty( ITEM_COUNT )) + 1 ); This is only threadsafe if the value returned by page.getProperty( ITEM_COUNT ) is read from a locked node. Niels Date: Sat, 24 Sep 2011 09:14:07 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List For writing that works well, and in fact for add node I am doing that (just realised I am not for remove node but I should be). The problems for reading are: - it should allow multiple threads to read at the same time - it shouldn't dictate that client code has a transaction in order to read As a simple solution thats probably workable (and probably the safest), and means that HA will just work, but restricting one thread at a time into a given node collection isn't the best. Maybe the client code should set whether it locks the data structure when reading, or fails with a ConcurrentModificationException when reading and data is changed. On Sat, Sep 24, 2011 at 6:00 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: A quick skim of the code shows me you have a baseNode which is an entrypoint for the ULL. This is a logical candidate node to use for the purpose of locking. What are the pros and cons to locking the baseNode on every read and write operation? Niels Date: Fri, 23 Sep 2011 09:39:38 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Unrolled Linked List Good stuff. I am presently looking into concurrent use of a given UnrolledLinkedList at least within the same graph database instance, might be a little bit harder in HA environment. Its hard enough writing test cases for this, maybe even harder than making it work properly! Hoping that some utility code I am going to produce will help with testing concurrency of other data structures. By concurrent use I mean concurrent use of the data within the graph, not of the given instantiation of the class, e.g. what happens when one thread gets an instance of ULL based off a given node and is iterating over it, then another thread gets an instance of a ULL and writes into it. Cheers Bryce On Fri, Sep 23, 2011 at 4:57 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: It looks really cool. I always find it fun to create something and later find out it is an already known construction (something worth inventing). Anyway, I pulled your code and will removed the dependencies to the Enhanced API stuff this week. After that we can start adding some documentation. Niels Date: Thu, 22 Sep 2011 15:57:13 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Unrolled Linked List Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags
Re: [Neo4j] Unrolled Linked List
It looks really cool. I always find it fun to create something and later find out it is an already known construction (something worth inventing). Anyway, I pulled your code and will removed the dependencies to the Enhanced API stuff this week. After that we can start adding some documentation. Niels Date: Thu, 22 Sep 2011 15:57:13 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Unrolled Linked List Hi all, I have added an in graph representation of an unrolled linked list to the graph collections code, currently just in my githug repo: https://github.com/brycenz/graph-collections See this in particular: https://github.com/brycenz/graph-collections/blob/master/src/main/java/org/neo4j/collections/list/UnrolledLinkedList.java The name comes from: http://en.wikipedia.org/wiki/Unrolled_linked_list And it works roughly in the same manner, though I had the idea prior to reading the wiki article. As the UnrolledLinkedList class implements the NodeCollection interface it can be used as the backing of an IndexedRelationship, which is done in tests here: https://github.com/brycenz/graph-collections/blob/master/src/test/java/org/neo4j/collections/indexedrelationship/TestUnrolledLinkedListIndexedRelationship.java The main reason for me being interested in this, and an example of where this is (probably) really useful is in the following case: - you have a number of tag (or category, folder etc.) nodes - they each link to a large number of document (or article, comments, post etc.) nodes - using a single relationship type - you generally only are interested in showing the newest documents in descending date order (showing the head, in a paged ui) - documents are generally added in ascending date order (added to the head) The benefits come from being able to iterate over a small percentage of a collection of nodes in a fixed order without having to first load all the nodes and sort them. This reduces the amount of data read in from disk, reduces the turnover of data in memory, and therefore aids with reduction in garbage collection. In my case I have a large number of tags with a large number of items against them, I might read the first 100-200 items out of a collection of 30,000 and therefore by not reading in the other 29800 relationships / nodes (per tag) I should be saving 90% or more. here's hoping. From the java doc: The structure is broken into pages of links to nodes where the size of the page can be controlled at initial construction time. Page size is not fixed but instead can float between a lower bound, and an upper bound. The bounds are at a fixed margin from the page size of M. When a page drops below the lower bound it will be joined onto the an adjacent page, and when the page goes above the upper bound it will be split in half. I am about to do some tests with this based on my use case and will report back on the performance impacts. Cheers Bryce P.S. still thinking about how to make this thread safe, any suggestions would be appreciated (presently only one thread will be able to write at a time, I am worried about a thread reading while another is writing, especially when it joins / splits pages or changes the head). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface
Hi Bryce, Sorry for the late response. I understand it's difficult to come up with a really good use-case for making NodeCollection more general in the context of IndexedRelationships, but I like to think of that interface as something we can eventually use for all sorts of collections, not just the ones derived from SortedTree. There is of course the issue that relationships can not attach to relationships, so collections of relationships will need to be addressed by Id. This is not necessarily a bad thing, because it decouples the container and the elements. In other words the container knows what elements it contains, but the elements don't know in what containers they are placed. Another option would be to create shadow nodes for contained relationships. Instead of adding a relationships to the collection, its shadow node is added and both the shadow node and the relationship contain pointers (properties with Id values) towards each other. I think it would be best if we do indeed create a GraphCollection interface parameterized by T extends PropertyContainer even if that type parameter for now is always a Node. It doesn't add much complexity now to do it, and later on we may regret it and by then it becomes harder to do because there is an installed base. Niels Date: Sat, 17 Sep 2011 14:19:04 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface Hi Niels, I had wondered about having a collection interface that covered both nodes and relationships. There were a couple of reasons I didn't go with that right now, though well worthwhile discussing it and going with a GraphCollection super interface if it fits properly. Firstly I wanted to get something out there so people could have a look, and having something that matched what IndexedRelationship currently required was easiest first step. Biggest thing specific in there to that functionality is the addNode method returning a relationship. The other issue was more wondering how a relationship collection would work. Say I have a relationship collection, and I have a relationship R1 between node A and B, how am I going to represent that relationship withing some graph based data structure that makes sense. There could be a node X that is part of the relationship collection data structure (e.g. tree) and that node could have an attribute that has the relationship id on it, but that doesn't seem like it would be very performant. There could be a relationship between X and A that also gave the relationship type of R1, so you could find the relationship based on that, but there isn't any guarantee of the relationship type being unique. What it would need to properly model it is the ability to have a relationship between X and R1, i.e. a relationship from a node to a relationship. If instead of being able to add any given relationship to the relationship collection you instead restrict it to being relationships matching a certain criteria from a given node then it is practically the same thing as a relationship expander. Or if you instead have a way through the relationship collection to create relationships from a given node to a set of other arbitrary nodes, with the relationship collection having a fixed relationship type and direction, then that is practically the current IndexedRelationship. I guess a way it could work is similar to IndexedRelationship, basically more general case of that class, where you have a method on the relationship collection createRelationship(startNode, endNode, relationshipType, direction) that was then stored in an internal data structure to create a pseudo relationship between the start and end, and then being able to iterate over this set of relationships. Not sure exactly what the use case of that would be. Maybe of more interest could be the same situation where the relationship type and direction are fixed, then you may have a friend of set of relationships that you create between arbitrary nodes and then iterate over all of those. I can't personally think of a good way of adding a set of arbitrary relationships into a collection stored in a graph data structure. Thoughts? Cheers Bryce P.S. Peter, I had thought to remove the passing in of the graph database and instead just getting it from the node, or only passing in the graph database and creating the node internally. On Sat, Sep 17, 2011 at 2:19 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi Bryce, I really like what you are trying to achieve here. One question: Instead of having NodeCollection, why not have GraphCollectionT extends PropertyContainer. That way we can have collections of both Relationships and Nodes. Niels Date: Fri, 16 Sep 2011 17:37:29 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Neo4j graph
Re: [Neo4j] Radix tree
Peter, I agree, we need to work on documentation once the dust has settled around the changes Bryce has been working on. Niels From: peter.neuba...@neotechnology.com Date: Fri, 16 Sep 2011 13:59:07 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Radix tree Yes, great work Davide! I think we need to start enabling documentation for the graph collections as it is evolving pretty rapidly! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Sep 15, 2011 at 1:48 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Thanks to the good work of Davide, graph-collections now contains an implementation of Radix-tree. See: http://en.wikipedia.org/wiki/Radix_tree This particular datastructure can be used to store nodes sorted by a String value, very handy when you want to create associative arrays in Neo4j. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface
Bryce's point makes perfect sense. The argument graphDb().createNode() gives the constructor an instance of Node, which contains a reference to the database, so there is no real need to additionally supply the database instance. Of course his example would have been less confusing if he'd written: Node indexedNode = graphDb().createNode(); SortedTree st = new SortedTree( graphDb(), indexedNode, new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); From: peter.neuba...@neotechnology.com Date: Fri, 16 Sep 2011 15:22:27 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface Also, since you can do node.getGraphDatabase(), I think we don't need to pass in the graphdb instance in new SortedTree( graphDb(), graphDb().createNode(), new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); ? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Sep 16, 2011 at 7:37 AM, Bryce bryc...@gmail.com wrote: Hi, I had mentioned in a previous thread that I was working on introducing a NodeCollection interface to remove the dependency from IndexedRelationship to SortedTree. I have an initial cut of this up now in my github repo: https://github.com/brycenz/graph-collections It would be great to get community feedback on this as I think that having a well designed and common NodeCollection interface would help for multiple use cases, e.g. sortedTreeNodeCollection.addAll(linkedListNodeCollection) doing exactly what you think it would. IndexedRelationship now takes a node to index relationships from, a relationship type, and a direction, as well as a NodeCollection at creation time. As in the unit tests this then leads to: Node indexedNode = graphDb().createNode(); SortedTree st = new SortedTree( graphDb(), graphDb().createNode(), new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, st ); To create the IndexedRelationship. To later add nodes to the relationship you need to create an instance of IndexedRelationship without the NodeCollection: IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING ); What this means from a NodeCollection implementation point of view is that firstly it needs to use the NodeCollection.RelationshipType.VALUE relationship to connect from its internal data structure to the nodes being added to the collection, and it needs to be able to recreate itself from a base node that is passed into a constructor (that only takes the base node). A node collection also needs to store its class name on the base node for later construction purposes, as well as any other data required to recreate the NodeCollection instance (in the case of SortedTree this is the comparator class, the tree name, and whether it is a unique index. Niels, you may want to have a good look over SortedTree, I have made a few changes to it, mainly around introduction of a base node, and changing of the end value relationships. This could be cleaned up better, but I wanted to start with minimal changes. Both IndexedRelationship and IndexedRelationshipExpander have no dependencies on SortedTree now, and should work with any properly implemented NodeCollection. I will be putting together a paged linked list NodeCollection next to try this. Some future thoughts for NodeCollection, addition of as many of the java.util.Collection methods (e.g. addAll, removeAll, retainAll, contains, containsAll) as well as an abstract base NodeCollection to help provide non-optimised support for these methods. Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j graph collections introduction of NodeCollection interface
Hi Bryce, I really like what you are trying to achieve here. One question: Instead of having NodeCollection, why not have GraphCollectionT extends PropertyContainer. That way we can have collections of both Relationships and Nodes. Niels Date: Fri, 16 Sep 2011 17:37:29 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Neo4j graph collections introduction of NodeCollection interface Hi, I had mentioned in a previous thread that I was working on introducing a NodeCollection interface to remove the dependency from IndexedRelationship to SortedTree. I have an initial cut of this up now in my github repo: https://github.com/brycenz/graph-collections It would be great to get community feedback on this as I think that having a well designed and common NodeCollection interface would help for multiple use cases, e.g. sortedTreeNodeCollection.addAll(linkedListNodeCollection) doing exactly what you think it would. IndexedRelationship now takes a node to index relationships from, a relationship type, and a direction, as well as a NodeCollection at creation time. As in the unit tests this then leads to: Node indexedNode = graphDb().createNode(); SortedTree st = new SortedTree( graphDb(), graphDb().createNode(), new IdComparator(), true, RelTypes.INDEXED_RELATIONSHIP.name() ); IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING, st ); To create the IndexedRelationship. To later add nodes to the relationship you need to create an instance of IndexedRelationship without the NodeCollection: IndexedRelationship ir = new IndexedRelationship( indexedNode, RelTypes.INDEXED_RELATIONSHIP, Direction.OUTGOING ); What this means from a NodeCollection implementation point of view is that firstly it needs to use the NodeCollection.RelationshipType.VALUE relationship to connect from its internal data structure to the nodes being added to the collection, and it needs to be able to recreate itself from a base node that is passed into a constructor (that only takes the base node). A node collection also needs to store its class name on the base node for later construction purposes, as well as any other data required to recreate the NodeCollection instance (in the case of SortedTree this is the comparator class, the tree name, and whether it is a unique index. Niels, you may want to have a good look over SortedTree, I have made a few changes to it, mainly around introduction of a base node, and changing of the end value relationships. This could be cleaned up better, but I wanted to start with minimal changes. Both IndexedRelationship and IndexedRelationshipExpander have no dependencies on SortedTree now, and should work with any properly implemented NodeCollection. I will be putting together a paged linked list NodeCollection next to try this. Some future thoughts for NodeCollection, addition of as many of the java.util.Collection methods (e.g. addAll, removeAll, retainAll, contains, containsAll) as well as an abstract base NodeCollection to help provide non-optimised support for these methods. Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Radix tree
Thanks to the good work of Davide, graph-collections now contains an implementation of Radix-tree. See: http://en.wikipedia.org/wiki/Radix_tree This particular datastructure can be used to store nodes sorted by a String value, very handy when you want to create associative arrays in Neo4j. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] regarding supernode
Peter, I'd gladly put out a piece of code demonstrating the use of IndexRelationships, using this LIVES_IN example. Though I get the impression the question here relates to the normal relationship index. However when supernodes (still don't like that term for densely connected nodes) come into play, the normal relationship index doesn't offer much help. Niels From: pe...@neubauer.se Date: Fri, 9 Sep 2011 09:58:28 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] regarding supernode Linan, I think your example would be great for doing a JUnit test showing this. Niels, could you do that, plz? In that case, I can add a graph and exaplanations to it. /peter On Thu, Sep 8, 2011 at 11:25 PM, Linan Wang tali.w...@gmail.com wrote: On Wed, Sep 7, 2011 at 5:21 PM, Linan Wang tali.w...@gmail.com wrote: hi, I don't quite understand RelationshipIndex and RelationshipExpander. say I have a supernode city (beijing), it has 10 m users links to through relationship LIVES_IN. so how should I index? should be something like: RelationshipIndex idx = db.index().forRelationships(CITY_LIVES_IN); idx.add(rel, LIVES_IN, Beijing); if so, what's the advantage over this? IndexNode idx = db.index().forNodes(CITY_LIVES_IN); idx.add(user, LIVES_IN, beijing); (I read source code of LuceneIndex.java, found out that the implementation of the add method is shared between Indexnode and RelationshipIndex.) ok, i answer my own question: RelationshipIndex has the function query which takes startNode and endNode as extra parameters. so if traverse only depth 1, it could be faster than using Traverser. am i right here? (please say yes!) then the question is how to take advantage of it for more than 1? about RelationshipExpander. i don't see how RelationshipIndex could help combining with RelationshipExpander, when use GraphAlgoFactory.shortestPath(RelationshipExpander expander, int maxDepth)? thanks for help! -- Best regards Linan Wang -- Best regards Linan Wang ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
Excellent... I did a code review and think this is a huge improvement over what we had. Peter, can you pull these changes, I no longer have the privs to do so. Niels Date: Thu, 8 Sep 2011 17:24:44 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship I have made the changes in regards to SortedTree in regards to relationships vs nodes, and have got all the tests passing. The changes are pushed up to my github account (and pull request has been raised). The changes can be seen here: https://github.com/brycenz/graph-collections On Thu, Sep 8, 2011 at 3:41 PM, Bryce bryc...@gmail.com wrote: Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more
Re: [Neo4j] Issues with IndexedRelationship
I like this idea Date: Thu, 8 Sep 2011 15:41:52 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Another thought if there is going to be a larger refactor of the code is whether the indexing mechanism should be broken out as a strategy for the IndexedRelationship. At present it is tied to SortedTree, but if an interface was extracted out that had addNode, removeNode, iterator, and isUniqueIndex then other indexing implementations could be used in certain cases. The particular other implementation I am currently thinking of that could be of use to me would be a paged linked list. So that would have a linked list of pages, each with min x max KEY_VALUE (or equivalent) relationships. I think that could work quite well for the situation where the index is descending date ordered, and generally just appended at the most recent end, and results are retrieved in a paged manner generally from near the most recent. But more to the point there could be any number of implementations that would be good for given different situations. That does bring up a question though, there was some discussion a while ago about some functionality along the lines of IndexedRelationship being pulled into the core, so is that overkill for now if there is going to be another core offering later? On Thu, Sep 8, 2011 at 2:38 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two
Re: [Neo4j] Issues with IndexedRelationship
Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C} Will return 3 relationships N -- A, N -- B, N -- C. I have written tests for each of these, as well as a couple of other tests. Still completing (1) and have a little question about this. In order to fix this I may need to introduce a unique ID stored against the IR both at the root and at the leaves. Currently the relationship type is used to name the IR at both root and leaves, but in the case above that means you can't tell from node A which KEY_VALUE relationship belongs to which IR tree without traversing the tree. So the question is adding this ID would mean that anyone who is already using this wont have the ID, and therefore without care will be data incompatible with the updated code. This could be managed via a check for the ID when accessing the tree and if it isn't there doing a walk over the tree to populate all the places where it is required. In general in developing against this code where do we sit on data compatibility and API compatibility? Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Issues with IndexedRelationship
Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels Date: Thu, 8 Sep 2011 10:22:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Issues with IndexedRelationship Hi, As I mentioned a while ago I am looking at using IndexedRelationship's within my application. The major thing that was missing for me to be able to do this was IndexedRelationshipExpander being able to provide all the relationships from the leaf end of indexed relationships through the the root end. So I have been working on getting that support in there. However in writing this I have discovered a number of other issues that I have also fixed, and at least one I am still working on. Since I was right into the extra support for expanding the relationships it is hard to break out these fixes as a separate commit (which I think would be ideal), so it will most likely all come in together hopefully later today (NZ time). Just letting everyone know in case someone else is doing development against indexed relationships. Quick run down of the issues, note: N -- IR(X) -- {A,B} below means there is a indexed relationship from N to A B, of type X. 1) Exception thrown when more than one IR terminates at a given node, e.g.: N1 -- IR(X) -- {A,B,C,D} N2 -- IR(X) -- {A,X,Y,Z} Will throw an exception when using the IndexedRelationshipExpander on either N1, or N2. 2) Start / End nodes are transposed when the IR has an direction of incoming, i.e. the IR is created against N but across a set of incoming relationships: N -- IR(Y) -- {A,B,C
Re: [Neo4j] Issues with IndexedRelationship
I think we don't have to worry about backwards compatibility much yet. There has not been a formal release of the component, so if there are people using the software, they will accept that they are bleeding edgers. Indeed addNode should return the KEY_VALUE relationship and I think we should change the signature of SortedTree to turn it into IterableRelationship. No need to maintain a Node iterator, the node is always one getEndNode away. Niels Date: Thu, 8 Sep 2011 14:17:59 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Will have to experiment with changing my id's to be stored as longs, it does make perfect sense really that it would be better. Thanks for the hint. In regards to SortedTree returning the KEY_VALUE relationship instead of the end Node, I had thought of that too, and it would definitely help. Could end up being a significant change to SortedTree though, e.g.: sortedTree.addNode( node ); Would need to return the KEY_VALUE relationship instead of a boolean. Which not knowing where else SortedTree is used could be a large change? Maybe SortedTree would have two iterator's available a key_value relationship iterator, and a node iterator. Having a quick look at it now it seems that it could work ok that way without introducing much code duplication. On Thu, Sep 8, 2011 at 12:46 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Two longs is certainly cheaper than a string. Two longs take 128 bit and are stored in the main record of the PropertyContainer, while a String would require a 64 bit pointer in the main record of the PropertyContainer, and an additional read in the String store where the string representation will take up 256 bits. So both memory-wise, as perfomance wise, it is better to store a UUID as two long values. The main issue is something that needs a deeper fix than adding ID's. SortedTree now returns Nodes when traversing the tree. We should however return the KEY_VALUE Relationship to the indexed Node. Then IndexedRelationship.DirectRelationship can be created with that relationship as an argument. We get the Direction and the RelationshipType for free. Niels Date: Thu, 8 Sep 2011 11:36:11 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Issues with IndexedRelationship Hi Niels, Sorry I didn't quite write the bit about (1) clearly enough. The problem is that it presently throws an Exception where it shouldn't. This stems from IndexedRelationship.DirectRelationship: this.endRelationship = endNode.getSingleRelationship( SortedTree.RelTypes.KEY_VALUE, Direction.INCOMING ); So if the end node has more than one incoming KEY_VALUE relationship a more than one relationship exception is thrown. Instead of the getSingleRelationship I was planning on iterating over the relationships and matching the UUID stored at the root end of the IR with one of the KEY_VALUE relationships (which is why using a unique id is necessary rather than the relationship type). Note: there will actually still be an issue if the same IR has multiple relationships to the same leaf node - still thinking about that might need . Is storing the UUID as two longs much quicker than storing it as a string? Curious about this since in my current model I have all the domain objects with UUID's, and these are all stored as strings. If it was going to help with either memory or performance then I would be keen to migrate this to two longs. Cheers Bryce On Thu, Sep 8, 2011 at 11:07 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Great work Bryce, I do have a question though. What is the rationale for the restriction mentioned under 1). Do you need this for the general case (to make IndexedRelationshipExpander work correctly), or do you need it for your own application to throw that exception? If the latter is the case, I think it would be important to tease out the general case and offer this new behaviour as an option. A unique key for the index is a good idea anyway and can be added to SortedTree. Generate a UUID and store it in two long properties. That way the two values will always be read in the first fetch of the underlying PropertyContainer. A getId method on the TreeNodes can then return a String representation of of the two long values. IndexRelationships are a relatively new development, so I think you are one of the first to actually try it out. Personally I have chosen to directly work with SortedTree, because I am working within the framework of a wrapper API, so I can integrate the functionality behind the regular createRelationshipTo and getRelationships methods. I don't think API changes will be an issue at the moment. Niels
Re: [Neo4j] IndexedRelationship some observations and questions
Hi Bryce, Sorry for my belated response. I have been away for a couple of days and wasn't able to check my emails. I am glad you took the time to look into the IndexRelationship module. It certainly could use some scrutiny. Remarks: 1) Good catch... Something the unit test didn't catch because it runs in the same namespace as IndexedRelationship itself. Didn't catch it in user code either, because personally I prefer to directly call SortedTree. 2) Agreed. It should be possible to define more than one IndexRelationship per node. 3) I haven't tried out an anonymous inner class as Comparator. As far as I can tell any object implementing ComparatorNode should be able to work as a comparator. Questions: 1) That is certainly an option. IndexRelationships however offer you the possibility to sort your Relationships based on some value associated with a node (for example creation/edit date of the document). This may be a reason to use IndexRelationships even in the situation where you have less than 500 entries per tag (though it would be possible to do that sorting in memory too). 2) The end node of an IndexRelationship is always referred to by a Relationship with RelationshipType KEY_VALUE, and has a property tree_name (both are defined in SortedTree). The tree_name property has the same value as the RelationshipType.name used in IndexRelationship. To traverse from a leaf node to the tree root, keep following the incoming relationships: KEY_VALUE (there is only one), KEY_ENTRY (there can be many), SUB_TREE (there can be many), TREE_ROOT (there is only one) Niels Date: Fri, 2 Sep 2011 11:44:40 +1200 From: bryc...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] IndexedRelationship some observations and questions Hi, I have been looking at performance options for Neo4j as presently I have been observing a number of performance issues. I am still investigating the way to get the best performance out of what I am doing, and one thing it might be are longer running transactions stopping other work going on (but thats an aside to what this message is about). One of the things that I investigated using was the IndexedRelationship work by Niels. Thought I would give a bit of feedback, although I haven't quite got this implemented at present. 1) I had to change the IndexedRelationshipExpander to be a public class in order to use it outside the package its in. 2) IndexedRelationship assumes only one tree root per node, whereas the expander allows for multiple (IndexedRelationship uses getSingleRelationship vs expander using getRelationships then matching on tree name). Having multiple would obviously be good as it means you could have two types of relationships covered by IndexedRelationship's. 3) Might pay to make it clear in the Javadocs for IndexedRelationship that the comparator can't be an anonymous inner class. Then I have some questions about usage of this. First a little background of the model I have, from reading a few things it seems quite standard. There are a lot of document nodes each of which have a relationship with multiple tag nodes. Documents generally have in the order of 10-20 tags, and tags can have as few as 1 document and sometimes tens of thousands. When tags are viewed through the UI they are almost always displayed with a descending date ordered list of documents. Seemed to be to fit quite well with IndexedRelationship. 1) I was thinking of having a switch over point at say around 500 documents for a given node where I will switch from using normal relationships to an IndexedRelationship as I was thinking at small numbers of relationships normal relationships would be quicker. Would that be correct, or not worth it? 2) On the tag end (which is the incoming end of the document-tag relationship) I was going to use a IndexedRelationshipExpander which would cover the case of whether the relationship was done through normal relationships, or through an IndexedRelationship. I also need to get a set of tags from the document end where their may be both normal relationships, and relationships coming from multiple IndexedRelationship's. From looking at it IndexedRelationshipExpander doesn't cover the reverse direction, but I would imagine using a relationship expander here would be correct. What would the best way of doing this be? As an aside it may be a good idea to note in the configuration settings page: http://wiki.neo4j.org/content/Configuration_Settings#Optimizing_for_traversals_example that -XX:+UseNUMA only works when using the Parallel Scavenger garbage collector (default or -XX:+UseParallelGC) not the concurrent mark and sweep one. Based on Cheers Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___
Re: [Neo4j] Hyperedges in Neo4j
Correct, turing completeness is not the lower bound for non-guaranteed termination. It is however possible to have some forms of recursion without sacrificing guaranteed termination. Neo4j traversals, memorizing visited paths, relationships or nodes are an example (Note, it would be nice to have an option to memorize visited (Node, RelationshipType, Direction)). This limited form of recursion is useful as a query language. Doing so of course eliminates correct statements. When memorizing nodes, the statement john (FRIEND_OF, OUTGOING) pete (FRIEND_OF, OUTGOING) john can no longer be true, but we could memorize relationships instead of nodes. This makes the former statement possible, but makes it impossible to return the statement john (FRIEND_OF, OUTGOING) pete (FRIEND_OF, OUTGOING) john (FRIEND_OF, OUTGOING) pete, unless john has more than one outgoing FRIEND relationship with pete (Memorizing (Node, RelationshipType, Direction) would make that statement impossible even in the presence of more than one FRIEND relationship from john to pete). While repeated paths in the graph are in principle true statements of the graph grammar, in many practical programming tasks we are not interested in such statements and in fact like to see those eliminated. Niels From: okramma...@gmail.com Date: Thu, 1 Sep 2011 08:17:27 -0600 To: user@lists.neo4j.org Subject: Re: [Neo4j] Hyperedges in Neo4j Hey, I think a traversal should in principal be performed with a query language that is not turing complete so we can guarantee termination. Turning completeness is not the lower bound for non-guaranteed termination. You can't guarantee completion in a regular language when your String (data structure) is a graph. E.g. a* The only languages guaranteed to complete are Star-free languages. That is, those that don't allow for recursion. See ya, Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] API adventures in Scalaland
Peter, I haven't put this code out yet. It has been too much in flux to share the code yet. I use neoclipse for visualization, which helps to check the layout of the test graphs i am using. I would need something more programmable for the visual output, since i use node id's of the types as property names and relationshiptypes. This allows for renaming of types and makes it possible to put types in namespaces and be moved from one namespace to another. Using real names for properties and relationshiptypes is not flexible enough. As a result the visualization in neoclipse looks pretty cryptic, having only numbers as labels. I will look into neo4j/neoviz to see if I can export the graph with proper names for the relationships and properties, otherwise i can always roll my own output program. Dot is not the most complex file format to generate. Niels Date: Mon, 29 Aug 2011 07:19:35 +0200 From: peter.neuba...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] API adventures in Scalaland Niels, Is that Scala code in the graph collections? If you want, ,you could use the neo4j/neoviz project to output .dot graphs at any point and thus visualize what's happening in the graph to illustrate :) /Peter On Monday, August 29, 2011, Niels Hoogeveen pd_aficion...@hotmail.com wrote: In the last week I have been working on a Neo4j API in Scala, taking navigation in the graph as primary. Just like the Enhanced API written in Java, the Scala API generalizes each element (Node, Relationship, RelationshipType, property name and property value) of the Neo4j database as being a Vertex. Before digging into the details of the Scala API, let's start with some example code. val name = Db(String(name)) val friend = Db(VertexOut(FRIEND)) val john = Db(NewVertex).put(name, John) val pete = Db(NewVertex).put(name, Pete).put(friend, john) This piece of code defines the PropertyType name and the EdgeTypes FRIEND, creates two vertices for the persons John and Pete, and states that John is a friend of Pete. In standard Neo4j API this could have been written as: Node john = db.createNode(); Node pete = db.createNode(); john.setProperty(name, John); pete.setProperty(name, Pete); pete.createRelationshipTo(john, DynamicRelationshipType.withName(FRIEND)); Apart from an obvious style difference, there is one immediate difference noticeable between the two API's. In the Neo4j API it is possible to write: john.setProperty(name, John); pete.setProperty(name, 99); While the following Scala program won't typecheck: pete.put(name, 99) //ERROR The name property is defined as a String and the API enforces that type. This also applies when fetching a property value. In the Neo4j API we write: john.getProperty(name) // returns java.lang.Object In the Scala API we write: john(name) // returns java.lang.String It is also possible to ask the names of pete's friend as follows: pete(friend andThen name) This is equal to the Neo4j call: (String)pete.getSingleRelationship(DynamicRelationshipType.withName(FRIEND), Direction.OUTGOING).endNode.getProperty(name) If pete has more than one friend, we have to define a different key to fetch them ( the VertexOut key refers to one single relationship, either the one to be created, or a singular exisiting relationship ): val friends = Db(Vertices(FRIEND)) We now have a key to all FRIEND relationships so we can ask: pete(friends andThen name) This returns an Iterable[String] with the names of all Pete's friends. We can even do: pete(Rec(friends) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the n-th degree). The Rec object recursively applies friend to all vertices it traverses, remembering already taken paths, traversed Relationships or traversed Nodes (settings are optional with sensible defaults). We can also write: pete(Rec(friend, 2) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the 2nd degree) It is even possible to write: pete(Rec(friend andThen friend) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the n-th degree where n is even) Instead of having get methods for properties and relationships and traversal methods on nodes, the Scala API uses one calling pattern for all database related objects: object(traverser) So a call like Db(String(name)) is not just a call on the database to return a PropertyType with name name and datatype String, it is a traversal from the database to that PropertyType. What is being returned with that call is a traverser itself. Traversers can be composed with andThen, so the output of one traverser is used as input for the next traverser. All traversers are typed
[Neo4j] API adventures in Scalaland
In the last week I have been working on a Neo4j API in Scala, taking navigation in the graph as primary. Just like the Enhanced API written in Java, the Scala API generalizes each element (Node, Relationship, RelationshipType, property name and property value) of the Neo4j database as being a Vertex. Before digging into the details of the Scala API, let's start with some example code. val name = Db(String(name)) val friend = Db(VertexOut(FRIEND)) val john = Db(NewVertex).put(name, John) val pete = Db(NewVertex).put(name, Pete).put(friend, john) This piece of code defines the PropertyType name and the EdgeTypes FRIEND, creates two vertices for the persons John and Pete, and states that John is a friend of Pete. In standard Neo4j API this could have been written as: Node john = db.createNode(); Node pete = db.createNode(); john.setProperty(name, John); pete.setProperty(name, Pete); pete.createRelationshipTo(john, DynamicRelationshipType.withName(FRIEND)); Apart from an obvious style difference, there is one immediate difference noticeable between the two API's. In the Neo4j API it is possible to write: john.setProperty(name, John); pete.setProperty(name, 99); While the following Scala program won't typecheck: pete.put(name, 99) //ERROR The name property is defined as a String and the API enforces that type. This also applies when fetching a property value. In the Neo4j API we write: john.getProperty(name) // returns java.lang.Object In the Scala API we write: john(name) // returns java.lang.String It is also possible to ask the names of pete's friend as follows: pete(friend andThen name) This is equal to the Neo4j call: (String)pete.getSingleRelationship(DynamicRelationshipType.withName(FRIEND), Direction.OUTGOING).endNode.getProperty(name) If pete has more than one friend, we have to define a different key to fetch them ( the VertexOut key refers to one single relationship, either the one to be created, or a singular exisiting relationship ): val friends = Db(Vertices(FRIEND)) We now have a key to all FRIEND relationships so we can ask: pete(friends andThen name) This returns an Iterable[String] with the names of all Pete's friends. We can even do: pete(Rec(friends) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the n-th degree). The Rec object recursively applies friend to all vertices it traverses, remembering already taken paths, traversed Relationships or traversed Nodes (settings are optional with sensible defaults). We can also write: pete(Rec(friend, 2) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the 2nd degree) It is even possible to write: pete(Rec(friend andThen friend) andThen name) This returns an Iterable[String] with the names of all Pete's friends of friends (to the n-th degree where n is even) Instead of having get methods for properties and relationships and traversal methods on nodes, the Scala API uses one calling pattern for all database related objects: object(traverser) So a call like Db(String(name)) is not just a call on the database to return a PropertyType with name name and datatype String, it is a traversal from the database to that PropertyType. What is being returned with that call is a traverser itself. Traversers can be composed with andThen, so the output of one traverser is used as input for the next traverser. All traversers are typed, so the andThen connective can only be applied when the type of the output of the left-hand-side traverser is equal to the type as the input of the right-hand-side traverser. This is checked at compile time. Traversals not only work on Vertex objects and it's subtypes (Property, PropertyType, Edge, EdgeType...), it also works on Iterable[Vertex]. Instead of fetching just pete's friends, as in: pete(friends) we can also fetch the friends of pete and john: val frnds = List(pete, john) frnds(friends) or if we don't need the frnds object later on, we simply state: List(pete, john)(friends) and if we want the names of those friends: List(pete, john)(friends andThen name) it is even possible to set properties or create relationships on Iterable[Vertex] val age = Db(Int(age)) val nationality = Db(String(nationality)) List(pete, john).put(age, 40).put(nationality, Irish) This sets the age property to 40 on both pete and john. It is also possible to write this as a traversal: List(pete, john)(Put(age, 40) andThen Put(nationality, Irish)) All traversers are function objects, so they can both be called as a function and can be treated as an object. This makes it possible to create traverers programmatically, allowing for the storage of traversers in the database, and many more nifty tricks. Using the Put object, we could for example create a list of such actions/traversals and perform a validation on the
Re: [Neo4j] partitioning the relationship store
Jim, Can you tell me how to add my suggestions for a solution to this problem to your issue tracker? Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 16 Aug 2011 16:33:04 +0200 Subject: Re: [Neo4j] partitioning the relationship store The partitioning is a solution to the densely-connected node problem, but would also allow for the iteration over RelationshipTypes/Directions, another feature I would very much like to see. I have posted suggestions on how to approach this problem and would like to add those suggestions to the issue tracker so they will be taken into consideration when addressing this issue. Yet I can't find an issue in Lighthouse. I am glad to hear it is #5 in priority order. I would be extra pleased if devteam when picking up this issue would stay in touch, because Enhanced API could greatly benefit depending on the approach taken. Niels From: j...@neotechnology.com Date: Tue, 16 Aug 2011 14:40:21 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] partitioning the relationship store Hi Niels, Is this partitioning an aspect of the supernode problem? If so, there is a feature request* in the devteam backlog for that. Jim * It is currently 5th in priority order. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Subtyping
Yesterday, I added subtyping to Enhanced API. Suppose an application has UserGroups, Users and Roles, where both UserGroups and Users are Vertices and Roles are BinaryEdges. There can be different predefined Roles, such as ADMINISTRATOR, EDITOR, MEMBER, GUEST. With subtyping it is possible to say that each of the types ADMINISTRATOR, EDITOR, MEMBER, GUEST is a subtype of ROLE. We can now call the method user.getAllBinaryEdges(ROLE, Direction.OUTGOING), and all roles of that user will be returned. It is also possible to ask if a user has any role by calling user.hasAnyBinaryEdge(ROLE, Direction.OUTGOING). The same applies for Properties. Suppose a user has the properties: UserName, FullName, NickName. With subtyping it is possible to say that each of the types UserName, FullName, NickName is a subtype of Name. We can now call the method user.getAllProperties(Name) and all names of that user will be returned. It is also possible to ask if a user has any name by calling user.hasAnyProperty(Name). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] partitioning the relationship store
At the risk of coming off as an utter bore, I would like once more to raise awareness for the fact that the relationships of a node are currently stored as one linked list. The downside of this has been discussed in many posts, so I shan't rehash the points. It's just that whatever I try to implement, this one issue keeps me from making the progress I would want to make. I know that the issue will be addressed some day, I would just want to ask the Neo team to give it priority. I am almost inclined to fork the kernel and do it myself, but I don't want to do that for obvious reasons. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Subtyping
Later today I will push the changes to Git, including tests. Date: Tue, 16 Aug 2011 14:51:42 +0200 From: peter.neuba...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Subtyping Very cool. Is there a test demonstrating it? /peter Sent from my phone. On Aug 16, 2011 1:52 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Yesterday, I added subtyping to Enhanced API. Suppose an application has UserGroups, Users and Roles, where both UserGroups and Users are Vertices and Roles are BinaryEdges. There can be different predefined Roles, such as ADMINISTRATOR, EDITOR, MEMBER, GUEST. With subtyping it is possible to say that each of the types ADMINISTRATOR, EDITOR, MEMBER, GUEST is a subtype of ROLE. We can now call the method user.getAllBinaryEdges(ROLE, Direction.OUTGOING), and all roles of that user will be returned. It is also possible to ask if a user has any role by calling user.hasAnyBinaryEdge(ROLE, Direction.OUTGOING). The same applies for Properties. Suppose a user has the properties: UserName, FullName, NickName. With subtyping it is possible to say that each of the types UserName, FullName, NickName is a subtype of Name. We can now call the method user.getAllProperties(Name) and all names of that user will be returned. It is also possible to ask if a user has any name by calling user.hasAnyProperty(Name). Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] partitioning the relationship store
The partitioning is a solution to the densely-connected node problem, but would also allow for the iteration over RelationshipTypes/Directions, another feature I would very much like to see. I have posted suggestions on how to approach this problem and would like to add those suggestions to the issue tracker so they will be taken into consideration when addressing this issue. Yet I can't find an issue in Lighthouse. I am glad to hear it is #5 in priority order. I would be extra pleased if devteam when picking up this issue would stay in touch, because Enhanced API could greatly benefit depending on the approach taken. Niels From: j...@neotechnology.com Date: Tue, 16 Aug 2011 14:40:21 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] partitioning the relationship store Hi Niels, Is this partitioning an aspect of the supernode problem? If so, there is a feature request* in the devteam backlog for that. Jim * It is currently 5th in priority order. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] n-ary relationships
Hi Emerson, Over the last couple of weeks, I have been working on an implementation of n-ary relationships on top of Neo4j. I also detailed how n-ary relationships could in principle be implemented in the database kernel (see: http://lists.neo4j.org/pipermail/user/2011-August/011191.html). Right now I am working on traversals for n-ary relationships, in an attempt to remove the unnaturalness you describe. If we look at your example and using the nomenclature of Enhanced-API, you'd have an EdgeType REFERS with three ConnectorsTypes: Referrer, Referree, Course, such that we can create the following Egde: REFERS Referrer --paul Referree -- john Course -- history A traversal takes as input a Vertex (strictly speaking a Traversal, of which a Vertex is a subclass), and takes two ConnectorTypes to traverse from a Vertex to an Edge to a Vertex. So if you want to know the Referrers for the course history, the traversal would be defined like: //create a traversal description TraversalDescription descr = TraversalDescription.add(Course, Referrer) //traverse the graph based on the description starting from the vertex history descr.traverse(history) Ther traverse method returns a IterablePath, which in this case contains only one Path. The path consists of two Connections, (history, Course) and (paul, Referrer). I hope this somehow answers your question. Niels Date: Sun, 14 Aug 2011 18:57:22 +0200 From: emerson.farru...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] n-ary relationships Hi, I started looking into Neo4j this morning, and played with some domain models to see whether it really passes a whiteboard friendliness test. I'm really after a persistence solution that makes it straightforward to persist a domain model designed from a DDD perspective, and Neo4j is looking promising so far. The one aspect that's not too clear is how to model n-ary relationships, and I'm curious as to how those of you with experience with Neo4j and graph databases would approach it. An edge connects two vertices, so in a graph database, a relationship connects two nodes. But when modeling, there are frequently relationships between multiple entities. For example, student John attends a History course at university, and John was referred to the History course by Paul. This relationship relates John, Paul, and the History course. There are a few ways I can think of to model this. 1) A node John has an ATTENDS relationship with node History, and the relationship has a referrer property with Paul's ID. Simple, but keeping IDs as properties seems like an anti-pattern. 2) A node Referral has a CREATED_BY relationship with node Paul, a FOR_COURSE relationship with node History, and a TO_STUDENT relationship with node John. It's effectively a three-way join. 3) Same as 2, but with an additional ATTENDS relationship between John and History. This is particularly useful if a course attendant may attend a course without being referred. This might not be the best example in the world, but it should drive my point home: when relationships have a degree higher than 2, relationships need to be modelled as vertices to overcome the binary nature of edges. Is this expected behavior that's part and parcel of graph databases, or am I approaching the modeling incorrectly somehow? My concern is that traversals may become unnatural when this happens. Say I want to iterate over the attendants of a course, and show the name of who referred them when I do so. Will I have the graph database equivalent of n+1 selects because the data I want to extract (referrer name) is in a different node (Paul) to my node of interest (John), instead of in the relationship to it (attends)? Any tips and opinions would be appreciated. Cheers, Emerson ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Is data lost if the object graph and relationships are changed?
All your existing relationships will remain the same, unless you remove them yourself. If you make your hypothetical changes, all Persons will keep a relationship to Address through the RESIDES_AT relationship, even though you now create a new ContactInfo entity that connects to Address too. So unless you remove RESIDES_AT relationships, there will be two paths from a Person to an Address: from Person via RESIDES_AT to Address and from Person via CONTACT_BY, ContactInfo, BY_ADDRESS. Niels From: e...@nextideapartners.com To: user@lists.neo4j.org Date: Mon, 15 Aug 2011 18:41:53 -0400 Subject: [Neo4j] Is data lost if the object graph and relationships are changed? Hypothetical example, let's say I'm building a system and I want to capture Person and Address entities, I might model it like this Person ---(RESIDES_AT)--- Address Assume that the relationship is bi-directionally, so whether I have a person or address entity, I can always find the other. After 6 months of running in production, we now need to capture phone numbers and email addresses, so we decide to create a new entity, ContactInfo ---(BY_ADDRESS)--- Address Person ---(CONTACT_BY)--- ContactInfo ---(BY_PHONE) ---Phone ---(BY_EMAIL) --- Email So we introduced a new entity, ContactInfo, which has relationships to Address, Phone, and Email entities. My question is, since Address was originally related to Person but is now related to ContactInfo via Person, does neo4j automatically pick up the address details from the ContactInfo relationship for all Persons who used the prior relationship? This is important because change is inevitable, so I want to make sure existing data is not lost simply because a relationship was re-mapped in the java object hierarchy. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Is data lost if the object graph and relationships are changed?
Relationships can't be changed. They are created from one Node to another Node with a certain RelationshipType, and can only be removed.. All Relationships you create can be navigated. If your original code did something like: person.getSingleRelationship(RESIDES_AT, Direction.OUTGOING).getEndNode(), you will now have to do something like: Node addressNode = null; for(Relationship rel: person.getRelationships(CONTACT_BY, Direction.OUTGOING)){ Node contactInfo = rel.getEndNode(); if(contactInfo).hasRelationship(BY_ADDRESS, Direction.OUTGOING){ addresNode = contactInfo.getSingleRelationship(BY_ADDRESS, Direction.OUTGOING).getEndNode(); } } Niels From: e...@nextideapartners.com To: user@lists.neo4j.org Date: Mon, 15 Aug 2011 22:14:05 -0400 Subject: Re: [Neo4j] Is data lost if the object graph and relationships are changed? OK, but after making changes to the relationships, does the graph service automatically allow me to navigate from Person to ContactInfo to Address? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen Sent: Monday, August 15, 2011 10:08 PM To: user@lists.neo4j.org Subject: Re: [Neo4j] Is data lost if the object graph and relationships are changed? All your existing relationships will remain the same, unless you remove them yourself. If you make your hypothetical changes, all Persons will keep a relationship to Address through the RESIDES_AT relationship, even though you now create a new ContactInfo entity that connects to Address too. So unless you remove RESIDES_AT relationships, there will be two paths from a Person to an Address: from Person via RESIDES_AT to Address and from Person via CONTACT_BY, ContactInfo, BY_ADDRESS. Niels From: e...@nextideapartners.com To: user@lists.neo4j.org Date: Mon, 15 Aug 2011 18:41:53 -0400 Subject: [Neo4j] Is data lost if the object graph and relationships are changed? Hypothetical example, let's say I'm building a system and I want to capture Person and Address entities, I might model it like this Person ---(RESIDES_AT)--- Address Assume that the relationship is bi-directionally, so whether I have a person or address entity, I can always find the other. After 6 months of running in production, we now need to capture phone numbers and email addresses, so we decide to create a new entity, ContactInfo ---(BY_ADDRESS)--- Address Person ---(CONTACT_BY)--- ContactInfo ---(BY_PHONE) ---Phone ---(BY_EMAIL) --- Email So we introduced a new entity, ContactInfo, which has relationships to Address, Phone, and Email entities. My question is, since Address was originally related to Person but is now related to ContactInfo via Person, does neo4j automatically pick up the address details from the ContactInfo relationship for all Persons who used the prior relationship? This is important because change is inevitable, so I want to make sure existing data is not lost simply because a relationship was re-mapped in the java object hierarchy. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API wiki page
http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Aug 10, 2011 at 1:19 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Today I updated the wiki page for Enhanced API. Since the last edit many changes have taken place, so it was to to reflect those changes on the wiki page. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API I also changed what was previously called an EdgeRole into a Connector. Every Edge has a number of Connectors to which Vertices connect. The EdgeType of an Edge defines the ConnectorTypes of the Connectors of an Edge. Each ConnectorType and with that a Connector, has a ConnectionMode, which can be one of these four: Unrestricted: An Edge can connect to an unlimited number of Vertices through a Connector with an unrestricted mode, and a Vertex can have an unlimited number of connected Edges with a ConnectorType with an unrestricted ConnectionMode. Injective: An Edge can connect to only one Vertex through a Connector with injective mode, but a Vertex can have an unlimited number of connected Edges with a ConnectorType with an injective ConnectionMode. Surjective: An Edge can connect to an unlimited number of Vertices through a Connector with a surjective mode, but a Vertex can only have one Edge connected to it with a ConnectorType with a surjective ConnectionMode. Bijective: An Edge can connect to only one Vertex through a Connector with bijective mode, and a Vertex can only have one Edge connected to it with a ConnectorType with a bijective ConnectionMode. All ConnectionModes have been implemented. The switch from EdgeRole to Connector with ConnectionModes has eliminated some of the more annoying type parameters found in the previous incarnation of Enhanced API. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API wiki page
Hi Peter, The API is indeed a bit heavy to grasp, if you want to use N-ary edges. I don't know how to make that simpler without sacrificing functionality. For binary edges and properties, the API is very similar to the standard Neo4j API, give or take a some details. I have given it considerable thought how to traverse these hyperedges, and the answer is stunningly simple: the same as we would a binary edge. Right now we traverse from a Node to another Node by means of a RelatonshipType (given a Direction). We could also say in Enhanced API parlance that we traverse from a Vertex to a BinaryEdge following the StartConnector, then use the EndConnector to reach another Vertex. So traversing the graph requires that we provide a pair of Connectors. This works the same for N-ary edges, we still provide a pair of connector, helping to build the path we want to return. Example: Suppose we have stored the fact Tom, Dick and Harry give Flo and Eddie a Book and a Bicycle, as explained on the Wiki page. Suppose all people in the database can also be FRIENDs to other people. Now suppose we want to know the people who are friends of the people that Tom has given a gift to. We provide the traverser with (Giver, Recipient, GIFT) and with (StartConnector, EndConnector, FRIEND). Now we can of course further simplify this by making each step in the traversal to only follow one connector: (Giver, GIFT) (Recipient, GIFT) (StartConnector, FRIEND) (EndConnector, FRIEND) This way we can traverse not only from Vertex to Vertex (via an Edge), but to traverse from a Vertex to an Edge, to a property on that Edge. Since we want to return a path through the graph, we need to provide a list of Connectors describing how to get there. Interestingly enough the arity of an Edge has no impact on how the graph is traversed. It takes one connector to get to an Edge, and it takes one connector to get away from an Edge. Niels From: peter.neuba...@neotechnology.com Date: Thu, 11 Aug 2011 20:46:59 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API wiki page Nils, interesting approaches! However, IMHO the API is still too heavy to grasp with ConnectorType, EdgeElement, EdgeType and Edge being involved in creating connections between facts. Is anyone seeing a more fluent/concise approach to this? Also, did you have some ideas about how to traverse or query these hyperedges? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Aug 10, 2011 at 1:19 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Today I updated the wiki page for Enhanced API. Since the last edit many changes have taken place, so it was to to reflect those changes on the wiki page. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API I also changed what was previously called an EdgeRole into a Connector. Every Edge has a number of Connectors to which Vertices connect. The EdgeType of an Edge defines the ConnectorTypes of the Connectors of an Edge. Each ConnectorType and with that a Connector, has a ConnectionMode, which can be one of these four: Unrestricted: An Edge can connect to an unlimited number of Vertices through a Connector with an unrestricted mode, and a Vertex can have an unlimited number of connected Edges with a ConnectorType with an unrestricted ConnectionMode. Injective: An Edge can connect to only one Vertex through a Connector with injective mode, but a Vertex can have an unlimited number of connected Edges with a ConnectorType with an injective ConnectionMode. Surjective: An Edge can connect to an unlimited number of Vertices through a Connector with a surjective mode, but a Vertex can only have one Edge connected to it with a ConnectorType with a surjective ConnectionMode. Bijective: An Edge can connect to only one Vertex through a Connector with bijective mode, and a Vertex can only have one Edge connected to it with a ConnectorType with a bijective ConnectionMode. All ConnectionModes have been implemented. The switch from EdgeRole to Connector with ConnectionModes has eliminated some of the more annoying type parameters found in the previous incarnation of Enhanced API. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] length of property names
I find myself using some pretty long property names, like org.neo4j.collections.graphdb.node_id and wonder if this has an impact on performance. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Subject: length of property names Date: Mon, 8 Aug 2011 15:44:20 +0200 Quick question: what is the performance impact of the length of a property name? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] length of property names
Thanks Mattias Date: Wed, 10 Aug 2011 15:25:24 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] length of property names No, none whatsoever (if you don't count the potentially slightly longer for-loop in String#equals which maps from String to internal ID (integer) used in Neo4j). 2011/8/10 Niels Hoogeveen pd_aficion...@hotmail.com I find myself using some pretty long property names, like org.neo4j.collections.graphdb.node_id and wonder if this has an impact on performance. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Subject: length of property names Date: Mon, 8 Aug 2011 15:44:20 +0200 Quick question: what is the performance impact of the length of a property name? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Enhanced API wiki page
Today I updated the wiki page for Enhanced API. Since the last edit many changes have taken place, so it was to to reflect those changes on the wiki page. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API I also changed what was previously called an EdgeRole into a Connector. Every Edge has a number of Connectors to which Vertices connect. The EdgeType of an Edge defines the ConnectorTypes of the Connectors of an Edge. Each ConnectorType and with that a Connector, has a ConnectionMode, which can be one of these four: Unrestricted: An Edge can connect to an unlimited number of Vertices through a Connector with an unrestricted mode, and a Vertex can have an unlimited number of connected Edges with a ConnectorType with an unrestricted ConnectionMode. Injective: An Edge can connect to only one Vertex through a Connector with injective mode, but a Vertex can have an unlimited number of connected Edges with a ConnectorType with an injective ConnectionMode. Surjective: An Edge can connect to an unlimited number of Vertices through a Connector with a surjective mode, but a Vertex can only have one Edge connected to it with a ConnectorType with a surjective ConnectionMode. Bijective: An Edge can connect to only one Vertex through a Connector with bijective mode, and a Vertex can only have one Edge connected to it with a ConnectorType with a bijective ConnectionMode. All ConnectionModes have been implemented. The switch from EdgeRole to Connector with ConnectionModes has eliminated some of the more annoying type parameters found in the previous incarnation of Enhanced API. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API wiki page
I should of course market this work better. So hereby the statement: NOW with nice and handy images, free of charge!!! Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 10 Aug 2011 01:19:42 +0200 Subject: [Neo4j] Enhanced API wiki page Today I updated the wiki page for Enhanced API. Since the last edit many changes have taken place, so it was to to reflect those changes on the wiki page. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API I also changed what was previously called an EdgeRole into a Connector. Every Edge has a number of Connectors to which Vertices connect. The EdgeType of an Edge defines the ConnectorTypes of the Connectors of an Edge. Each ConnectorType and with that a Connector, has a ConnectionMode, which can be one of these four: Unrestricted: An Edge can connect to an unlimited number of Vertices through a Connector with an unrestricted mode, and a Vertex can have an unlimited number of connected Edges with a ConnectorType with an unrestricted ConnectionMode. Injective: An Edge can connect to only one Vertex through a Connector with injective mode, but a Vertex can have an unlimited number of connected Edges with a ConnectorType with an injective ConnectionMode. Surjective: An Edge can connect to an unlimited number of Vertices through a Connector with a surjective mode, but a Vertex can only have one Edge connected to it with a ConnectorType with a surjective ConnectionMode. Bijective: An Edge can connect to only one Vertex through a Connector with bijective mode, and a Vertex can only have one Edge connected to it with a ConnectorType with a bijective ConnectionMode. All ConnectionModes have been implemented. The switch from EdgeRole to Connector with ConnectionModes has eliminated some of the more annoying type parameters found in the previous incarnation of Enhanced API. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API rewrite
I can probably find the time for that. It would be fun working on these ideas in collaboration. I don't mind producing my usual brain-dumps and write some of the code, but quality will certainly improve when it is more than just me paying attention to this. Niels From: peter.neuba...@neotechnology.com Date: Mon, 8 Aug 2011 11:50:35 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API rewrite Very interesting thoughts! I would love to have a bootcamp and explore a spike on how this would work out in practice. Got anything to do this autumn? ;) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Aug 7, 2011 at 4:30 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Hi Peter, Thanks for showing an interest. A Property is indeed a unary edge in the Enhanced API and therefore (potentially) backed by a Node, but that Node doesn't contain the value. All property values are still stored the way they are stored in the standard API. If someone however decides to add a Property to a Property or create an Edge containing that Property, a Node will be created to store those properties and connect those Edges to. When the associated Node of a Property is created, the ID of that Node will be stored in the PropertyContainer of that property. Example: Suppose we have a property on a Person Vertex that denotes a personal identity number, and the user of the application want to annually check that identity number against some other database and state when it was last verified and who verified it. A Vertex (backed by a Node) for a particular Person is created and the property is set (in that Node's PropertyContainer), just like it would be the case in the standard API. When the verification is done, an additional property is created on the PropertyContainer of that Person with the name org.neo4j.collections.graphdb.[propertyname].node_id This property contains the node ID of the associated property. On that node the verification date will be set and the BinaryEdge (in principle nothing but a classic Relationship) will be created to the Person Vertex of the one who verified the personal identity code. It is certainly true that everything being a Vertex makes the Node implementation more important than ever before, but it goes even further, apart from a standard Vertex and the various VertexTypes, almost everything is an Edge. So I would say the Relationship implementation is becoming eminently important. There are certainly several tweaks to the storage layer I would love to see incorporated, mostly to hide the implementation for the user and to make sure that the maintenance of IDs takes place in core and not in a layer on top of core. In fact all of Enhanced API could much better be maintained in core, something that can actually quite easily be implemented. One of my ulterior motives with the development of Enhanced API is to tease out the technical requirements to push this functionality into core (whether Neo Tech decides to do so, is another question of course). Since the Neo4j database consists mostly of records and linked lists, the technical requirements to push things into core, are mostly a question of adding entry-points to linked lists in some records and partitioning some existing linked lists. I will write down those requirements in a separate post. This will include support for N-ary edges, since that is actually not all that difficult to implement and adds very little complexity to the database. Yes, traversals will become much more generalized in the Enhanced API, especially when we make them composable. In fact composable traversal descriptions can easily be seen as a query language giving access to all parts of the database. Niels From: peter.neuba...@neotechnology.com Date: Sun, 7 Aug 2011 09:10:02 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API rewrite Niels, this sounds very interesting. Given the role of properties being unary edges, that would mean that any classic Neo4j property would now be a Node with one Property in the new Vertex sense? Having Vertices for EVERYTHING will of course make the node-implementation much more important than anything else, since every element is backed by a node, possibly with some property. I wonder how this would reflect in the storage layer that might need to be tweaked. Also, as you point out, traversals will become quite
Re: [Neo4j] Enhanced API rewrite
Hi Dmitri, I would very much appreciate it if you tried out Enhanced API and gave me feed back about your findings. Apart from traversals it is more or less feature complete, but it could use some thorough trying out. Niels Date: Mon, 8 Aug 2011 20:20:14 +0500 From: shaban...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API rewrite I ready to jump in too ;-) On Mon, Aug 8, 2011 at 3:37 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I can probably find the time for that. It would be fun working on these ideas in collaboration. I don't mind producing my usual brain-dumps and write some of the code, but quality will certainly improve when it is more than just me paying attention to this. Niels -- Dmitriy Shabanov ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API rewrite
Hi Peter, Thanks for showing an interest. A Property is indeed a unary edge in the Enhanced API and therefore (potentially) backed by a Node, but that Node doesn't contain the value. All property values are still stored the way they are stored in the standard API. If someone however decides to add a Property to a Property or create an Edge containing that Property, a Node will be created to store those properties and connect those Edges to. When the associated Node of a Property is created, the ID of that Node will be stored in the PropertyContainer of that property. Example: Suppose we have a property on a Person Vertex that denotes a personal identity number, and the user of the application want to annually check that identity number against some other database and state when it was last verified and who verified it. A Vertex (backed by a Node) for a particular Person is created and the property is set (in that Node's PropertyContainer), just like it would be the case in the standard API. When the verification is done, an additional property is created on the PropertyContainer of that Person with the name org.neo4j.collections.graphdb.[propertyname].node_id This property contains the node ID of the associated property. On that node the verification date will be set and the BinaryEdge (in principle nothing but a classic Relationship) will be created to the Person Vertex of the one who verified the personal identity code. It is certainly true that everything being a Vertex makes the Node implementation more important than ever before, but it goes even further, apart from a standard Vertex and the various VertexTypes, almost everything is an Edge. So I would say the Relationship implementation is becoming eminently important. There are certainly several tweaks to the storage layer I would love to see incorporated, mostly to hide the implementation for the user and to make sure that the maintenance of IDs takes place in core and not in a layer on top of core. In fact all of Enhanced API could much better be maintained in core, something that can actually quite easily be implemented. One of my ulterior motives with the development of Enhanced API is to tease out the technical requirements to push this functionality into core (whether Neo Tech decides to do so, is another question of course). Since the Neo4j database consists mostly of records and linked lists, the technical requirements to push things into core, are mostly a question of adding entry-points to linked lists in some records and partitioning some existing linked lists. I will write down those requirements in a separate post. This will include support for N-ary edges, since that is actually not all that difficult to implement and adds very little complexity to the database. Yes, traversals will become much more generalized in the Enhanced API, especially when we make them composable. In fact composable traversal descriptions can easily be seen as a query language giving access to all parts of the database. Niels From: peter.neuba...@neotechnology.com Date: Sun, 7 Aug 2011 09:10:02 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API rewrite Niels, this sounds very interesting. Given the role of properties being unary edges, that would mean that any classic Neo4j property would now be a Node with one Property in the new Vertex sense? Having Vertices for EVERYTHING will of course make the node-implementation much more important than anything else, since every element is backed by a node, possibly with some property. I wonder how this would reflect in the storage layer that might need to be tweaked. Also, as you point out, traversals will become quite different with this API, but let's see an what the weekend brings ;) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Aug 6, 2011 at 2:51 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Today I pushed a major rewrite of the Enhanced API. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdb Originally the Enhanced API was a drop-in replacement of the standard Neo4j API. This resulted in lots of wrapper classes that needed to be maintained. The rewrite of Enhanced API is no longer a drop-in replacement and contains no interface/class names that can be found in the standard API. Enhanced API no longer speaks of Nodes but of Vertices and doesn't speak of Relationships but of Edges. This helps to prevent name clashes at the expense
Re: [Neo4j] Node#getRelationshipTypes
Yes, let's not argue about something as elusive as the definition of low hanging fruit. In the mean time I wrote down my suggestions for store refactoring more succinctly and added some more suggestions. Niels Date: Sun, 7 Aug 2011 22:09:48 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes 2011/8/6 Niels Hoogeveen pd_aficion...@hotmail.com This is the thread about store layer changes for type/direction, and in my opinion this is still quite low hanging fruit. Sure, the impact needs to be tested rigorously, which may take considerable time, but the implementation is quite straight-forward and the potential gains are large. Agreeing to disagree. Implementing it shouldn't be very hard, but that's only a small part of it. It would require quite hefty amounts of testing to be considered production quality... not even mentioning writing and testing migration of existing databases. Or we just have different views of what kind of fruit to consider low hanging. Niels Date: Sat, 6 Aug 2011 22:16:15 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Oh, confused this thread with store layer changes for type/direction of relationships. This fruit in this thread is pretty low hanging. Den lördagen den 6:e augusti 2011 skrev Mattias Perssonmatt...@neotechnology.com: I would not consider this low hanging fruit btw Den onsdagen den 3:e augusti 2011 skrev Niels Hoogeveenpd_aficion...@hotmail.com: Hmmm... Does that require the inclusion of golden parachutes as well? Anyway, addressing the readers of this message that have time allocation authority. I hope my suggestion, or another technical solution that solves the same issues will be picked up for 1.5. This is as far as I can tell pretty much low hanging fruit. There are probably all sorts of tweaks that can improve the performance of Neo4j, but this one can improve the performance of Neo4j big time (under certain conditions). As a user who is confronted with several very densely connected nodes, I have tried all sorts of means to solve my issues, but none as rewarding as a solution in core would be. Niels Date: Wed, 3 Aug 2011 16:31:04 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes A golden helicopter might do the trick :) 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com How does one persuade the time allocation authorities? Niels Date: Wed, 3 Aug 2011 09:28:45 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Yup, it's a pretty sane approach and somewhat along the lines of how I feel it would be done. It's been said a long time that this functionality will be implemented some day and it's just that a significant amount of time have to be invested... maybe not for implementing it, but for discovering all bugs and inconveniences to have it on par with production quality. And that kind of time haven't been allocated yet. I appreciate your thoughts and time on all this! Best, Mattias 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com I would like to make a suggestion that would both address my feature request and increase performance of the database. Right now the NodeRecord (org.neo4j.kernel.impl.nioneo.store.NodeRecord) contains the ID of the first Relationship, while the RelationshipRecord contain the ID's of the previous and next relationship for both sides of the relationship. My suggestion is as follows: Create a new store: noderelationshiptypestore.db The layout of this store is given by the NodeRelationshipTypeRecord: id previousrelationshiptype nextrelationshiptype firstrelationship The NodeRecord would now need to point to the first outgoing NodeRelationshipType and to the first incoming NodeRelationshipType instead of to the first Relationship. On insert of a Relationship, one side of the relationship will update the store from the outgoing side, the other side will update the store for the incoming side. I will list the steps to take here for the outgoing side (the incoming side is almost identical). From the NodeReco-- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com
[Neo4j] sub-graphs
While I am at it, let's post another brain dump. A couple of weeks ago, I worked on SortedTree/IndexRelationships in an attempt to solve the densely-connected-node-problem. SortedTree is a Btree layed-out in the graph, sorted by some function on a node (eg. the nodeId, or a property value). This approach worked, to a degree, but at some point, load times decrease because of reorganizations of the tree. Too much memory is needed to keep the entire tree in memory and standard nodes and relationships are simply too fine grained for the job. Instead of loading each individual node and each individual relationships in an index block, it would be nice to be able to load the entire block with one read operation, and swap out an entire memory block when memory is needed. This brought me to the idea of sub-graphs. Let's say every node (and possibly relationship) is a graph, containing nodes and relationships. Each graph has its own store (if the contained graph is not empty). Relationships are lightweight (offset based) when associated with Nodes (and possibly Relationships) within the same graph, but require an extra store_id when associating with nodes (and possibly Relationship) outside that graph. This gives control over where things are stored, and what is stored together. Using RelationshipRoles, as I described in another post we can state which association of a Relationship is certainly stored local, and what is certainly stored in another another Graph and what is either stored local or in another Graph. This way we can have full control over the locality of the associations of a Relationship. If we make each index block of SortedTree its own graph we can make sure that all Relationship associations are local, except the ones eventually pointing to the Nodes we want indexed, those are certainly stored in another Graph. This way the store will only contain Nodes and Relationships belonging to that index block, so we can load the entire store in a set of buffers and flush those buffers when no longer needed. This approach could also be used for sharding the database. Since each node in the graph can be a store of its own, we have a natural means to distribute graphs over different shards. Lets define a shard as a set of graphs, which membership is decided by some rules defined on the RelationshipTypes used in the shard. We could add the following options to the RelationshipRoles: must be shard, may be in shard and must not be in shard. This way the RelationshipRoles used in a store determine the dependencies of that store. RelationshipRoles can form 9 possible combinations of settings over the locality of each Relationship association, one of which is mutually exclusive and some are tautological or inconsequential: Must be in store and Must be in shard (is tautological). Must be in store and May be in shard (is inconsequential). Must be in store and Must not be in shard (is impossible) May be in store and Must be in shard May be in store and May be in shard (is inconsequential) May be in store and Must not be in shard (is inconsequential) Must not be in store and Must be in shard Must not be in store and May be in shard (inconsequential) Must not be in store and Must not be in shard (is tautological) So that leaves the following RelationshipRole options: Must be in store May be in store and Must be in shard May be in shard Must not be in store and Must be in shard Must not be in store Must not be in shard The default RelationshipRoles of a standard binary relationship are StartNode and EndNode, which both will have as default setting Must be in store. This way an implementation of such an approach remains backwards compatible. When combining RelationshipRoles into a RelationshipType, at least one RelationshipRole in the set must not have the setting Must not be in store, which is implied by Must not be in shard. Any such combination cannot be stored, since no store can contain any of the associated Nodes. When creating a Relationship, a store adds that Relationship when at least one associated Node is actually present in the database. When adding NodeTypes to the mix, the distribution of Nodes and Relationships over the various stores can even be further controlled. If we would know for each created Node if it must have an associated RelationshipRole, may have an associated RelationshipRole, or must not have an associated RelationshipRole, it becomes possible to decide if a Node Must be added to a store, may be added to a store, or must not be added to a store. For the may be added to a store cases, a Coordinator can decide where to store those particular Nodes. Finally, this approach allows for distributed traversals. Traversals are always local, when a traversal branch hits upon a relationship association that is external to the store, that traversal will asynchronously be continued on that other store. When the traversal ends its
Re: [Neo4j] Keeping context information in the Graph
What you describe here is a ternary edge, something I try to cover in the Enhanced API. Your film example can be modeled as follows: There is an Edge STARS with the EdgeRoles: Actor, Film, Role. We can now state: STARS -- Actor -- Brad Pitt -- Film -- Fight club -- Role -- Tyler Durden STARS -- Actor -- Edward Norton -- Film -- Fight club -- Role -- Fight club narrator, Tyler Durden or we can state STARS -- Actor -- Brad Pitt, Edward Norton -- Film -- Fight club -- Role -- Tyler Durden STARS -- Actor -- Edward Norton -- Film -- Fight club -- Role -- Fight club narrator or we can state STARS -- Actor -- Brad Pitt -- Film -- Fight club -- Role -- Tyler Durden STARS -- Actor -- Edward Norton -- Film -- Fight club -- Role -- Fight club narrator STARS -- Actor -- Edward Norton -- Film -- Fight club -- Role -- Tyler Durden Niels Date: Sat, 6 Aug 2011 15:51:51 +0800 From: asianf...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Keeping context information in the Graph This may be the same solution suggested by Dmitriy, but I had to visualise it to understand the problem. The problematic solution on top, if I understand it correctly; the proposed solution beneath it: http://s3.amazonaws.com/neo4j/node_example.png It's a more verbose graph, but it does model the semantics. This is all very abstract, so let's make your example more concrete by naming the nodes something other than letters that match to a real world example. 1. (A) Brad Pitt stars in (B) Fight Club in the role of (C) Tyler Durden. 2. (D) Edward Norton stars in (B) Fight Club in the roles of both (E) The Narrator and [spoiler alert] (C) Tyler Durden The creation of casting nodes F and G in the diagram may serve a practical purpose later, for example if one was also modelling Pitt and Norton's contract for accounting purposes, tracking media coverage of the casting news, etc. Stephen On 6 August 2011 06:11, pankaj pankaj@gmail.com wrote: Hi, I have following data modeling problem. Node A related to Node B with complex property C. I modeled it like A-B-C. Now I have another node D related to B with complex property C and E. Now my graph looks like D-B-c, A-B-C, and D-B-E. Now storing like this, I lost the information that A never related to B in the context of complex property E. How do I model it? Thanks Pankaj -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Keeping-context-information-in-the-Graph-tp3229955p3229955.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node#getRelationshipTypes
This is the thread about store layer changes for type/direction, and in my opinion this is still quite low hanging fruit. Sure, the impact needs to be tested rigorously, which may take considerable time, but the implementation is quite straight-forward and the potential gains are large. Niels Date: Sat, 6 Aug 2011 22:16:15 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Oh, confused this thread with store layer changes for type/direction of relationships. This fruit in this thread is pretty low hanging. Den lördagen den 6:e augusti 2011 skrev Mattias Perssonmatt...@neotechnology.com: I would not consider this low hanging fruit btw Den onsdagen den 3:e augusti 2011 skrev Niels Hoogeveenpd_aficion...@hotmail.com: Hmmm... Does that require the inclusion of golden parachutes as well? Anyway, addressing the readers of this message that have time allocation authority. I hope my suggestion, or another technical solution that solves the same issues will be picked up for 1.5. This is as far as I can tell pretty much low hanging fruit. There are probably all sorts of tweaks that can improve the performance of Neo4j, but this one can improve the performance of Neo4j big time (under certain conditions). As a user who is confronted with several very densely connected nodes, I have tried all sorts of means to solve my issues, but none as rewarding as a solution in core would be. Niels Date: Wed, 3 Aug 2011 16:31:04 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes A golden helicopter might do the trick :) 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com How does one persuade the time allocation authorities? Niels Date: Wed, 3 Aug 2011 09:28:45 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Yup, it's a pretty sane approach and somewhat along the lines of how I feel it would be done. It's been said a long time that this functionality will be implemented some day and it's just that a significant amount of time have to be invested... maybe not for implementing it, but for discovering all bugs and inconveniences to have it on par with production quality. And that kind of time haven't been allocated yet. I appreciate your thoughts and time on all this! Best, Mattias 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com I would like to make a suggestion that would both address my feature request and increase performance of the database. Right now the NodeRecord (org.neo4j.kernel.impl.nioneo.store.NodeRecord) contains the ID of the first Relationship, while the RelationshipRecord contain the ID's of the previous and next relationship for both sides of the relationship. My suggestion is as follows: Create a new store: noderelationshiptypestore.db The layout of this store is given by the NodeRelationshipTypeRecord: id previousrelationshiptype nextrelationshiptype firstrelationship The NodeRecord would now need to point to the first outgoing NodeRelationshipType and to the first incoming NodeRelationshipType instead of to the first Relationship. On insert of a Relationship, one side of the relationship will update the store from the outgoing side, the other side will update the store for the incoming side. I will list the steps to take here for the outgoing side (the incoming side is almost identical). From the NodeReco-- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API rewrite
Today I added fluency to the API design. It is now possible to write: Db().createVertex() .setProperty(Name, John) .setProperty(Age, 29) .addEdgeTo(june, WIFE) I also added support for VertexTypes, which is nothing more and nothing less than a Vertex with a unique name and a class name to initialize the VertexType. Application programmers can decide for themselves how to implement VertexTypes. VertexTypes can be retrieved from a Vertex with the method Vertex#getTypes(). There are no facilities to retrieve the Vertices defined with a certain VertexType. The connection between Vertex and VertexType is not stored as a Relationship, but is stored as a Long[] property on the Vertex, containing the id's of the VertexTypes, this to prevent the densely-connected-node-problem. Each Vertex will likely have few types, but each VertexType will likely have lots of associated Vertices. If users want to know know the Vertices of a VertexType they can create an index for that (something that is outside the scope of Enhanced API). Edges all have at least one associated VertexType which is used for traversals. An Edge can have more than one VertexType, but only the one added as EdgeType (which extends VertexType) will be used for traversals. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Sat, 6 Aug 2011 02:51:23 +0200 Subject: [Neo4j] Enhanced API rewrite Today I pushed a major rewrite of the Enhanced API. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdb Originally the Enhanced API was a drop-in replacement of the standard Neo4j API. This resulted in lots of wrapper classes that needed to be maintained. The rewrite of Enhanced API is no longer a drop-in replacement and contains no interface/class names that can be found in the standard API. Enhanced API no longer speaks of Nodes but of Vertices and doesn't speak of Relationships but of Edges. This helps to prevent name clashes at the expense of somewhat less recognizable names (Relationship is after all a more common word than Edge). This rewrite is not merely a renaming of classes and interfaces, but is in most part a complete rewrite and also a rethinking of the API on my part. Enhanced API consists of two basic elements: Vertex and EdgeRole. Most elements are a subclass of Vertex, though there are some specialized versions of EdgeRole. Let me start with an example: Suppose we have two vertices denoting the persons Tom and Paula, and we want to state that Tom is the father of Paula. For standard Neo4j we tend to write such a fact as: Tom --Father-- Paula For Enhanced API we can conceptually write this fact as follows: --StartRole--Tom Father --EndRole--Paula This should be read as follows: We have two Vertices: Tom and Paula and we have a BinaryEdge (similar to a Relationship in the standard API) of type Father, where Tom has the StartRole for that edge and Paula has the EndRole for that edge. So instead of a directed graph, we conceptually have an undirected bipartite graph. For binary edges (edges between two vertices), this is mostly conceptually the case, because the API will simply allow you to write: tom.createEdgeTo(paula, FATHER) (similar to tom.createRelationshipTo(paula, FATHER) as we would have in the standard API). It is also possible to fetch the start vertex of the binary relationship with the method: edge.getStartVertex() (similar to relationship.getStartNode()), although it is also possible to treat the binary edge as a generic edge and fetch that Vertex as: edge.getElement(db.getStartRole()). BinaryEdges, are a special case and have special methods which cover the same functionality as can be found in the standard Neo4j API. In general, we can say that Vertices are connected to Edges by means of EdgeRoles. In the binary case there are two predefined EdgeRoles: StartRole and EndRole. Before we get deeper into the general case of n-ary edges, let's first look at another special case: Properties. Properties can be thought of as unary edges, an edge that connects to only one Vertex (as opposed to two in the binary case). Suppose we want to state that Tom is 49 years old, we can write that as: age(49)--PropertyRole--Tom We have an edge of type age that is connected to the vertex Tom in the role of a property. Again this is mostly conceptually true, because there are lots of methods in Enhanced API that are very similar to the ones found in the standard API; getProperty, hasProperty, setProperty. Instead, we can also call methods on the property itself, after all the age property connected to the Vertex Tom, is an object all of itself. More precisely it is a Property and with that it is a UnaryEdge, which is an Edge, which is a Vertex. From the age property we can fetch the ProperyType, but we
[Neo4j] Enhanced API rewrite
Today I pushed a major rewrite of the Enhanced API. See: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdb Originally the Enhanced API was a drop-in replacement of the standard Neo4j API. This resulted in lots of wrapper classes that needed to be maintained. The rewrite of Enhanced API is no longer a drop-in replacement and contains no interface/class names that can be found in the standard API. Enhanced API no longer speaks of Nodes but of Vertices and doesn't speak of Relationships but of Edges. This helps to prevent name clashes at the expense of somewhat less recognizable names (Relationship is after all a more common word than Edge). This rewrite is not merely a renaming of classes and interfaces, but is in most part a complete rewrite and also a rethinking of the API on my part. Enhanced API consists of two basic elements: Vertex and EdgeRole. Most elements are a subclass of Vertex, though there are some specialized versions of EdgeRole. Let me start with an example: Suppose we have two vertices denoting the persons Tom and Paula, and we want to state that Tom is the father of Paula. For standard Neo4j we tend to write such a fact as: Tom --Father-- Paula For Enhanced API we can conceptually write this fact as follows: --StartRole--Tom Father --EndRole--Paula This should be read as follows: We have two Vertices: Tom and Paula and we have a BinaryEdge (similar to a Relationship in the standard API) of type Father, where Tom has the StartRole for that edge and Paula has the EndRole for that edge. So instead of a directed graph, we conceptually have an undirected bipartite graph. For binary edges (edges between two vertices), this is mostly conceptually the case, because the API will simply allow you to write: tom.createEdgeTo(paula, FATHER) (similar to tom.createRelationshipTo(paula, FATHER) as we would have in the standard API). It is also possible to fetch the start vertex of the binary relationship with the method: edge.getStartVertex() (similar to relationship.getStartNode()), although it is also possible to treat the binary edge as a generic edge and fetch that Vertex as: edge.getElement(db.getStartRole()). BinaryEdges, are a special case and have special methods which cover the same functionality as can be found in the standard Neo4j API. In general, we can say that Vertices are connected to Edges by means of EdgeRoles. In the binary case there are two predefined EdgeRoles: StartRole and EndRole. Before we get deeper into the general case of n-ary edges, let's first look at another special case: Properties. Properties can be thought of as unary edges, an edge that connects to only one Vertex (as opposed to two in the binary case). Suppose we want to state that Tom is 49 years old, we can write that as: age(49)--PropertyRole--Tom We have an edge of type age that is connected to the vertex Tom in the role of a property. Again this is mostly conceptually true, because there are lots of methods in Enhanced API that are very similar to the ones found in the standard API; getProperty, hasProperty, setProperty. Instead, we can also call methods on the property itself, after all the age property connected to the Vertex Tom, is an object all of itself. More precisely it is a Property and with that it is a UnaryEdge, which is an Edge, which is a Vertex. From the age property we can fetch the ProperyType, but we can also ask for the Vertex it is connected to: getVertex(). Since a Property is an Edge we can also fetch the connected vertex (Tom) as follows: age.getElement(db.getPropertyRole). So we have seen the two special cases: unary edges and binary edges, which work very much the same as properties and Relationships in the standard Neo4j API, though we have given it a conceptually different perspective that unifies the two and fits it neatly into the general case of N-ary edges. As said before, an Edge is a Vertex that connects other Vertices by means of EdgeRoles. Since Edges are Vertices, they can have other Edges connected to them. Or in standard API talk: relationships can be connected to other relationships and they can have properties. The concept of EdgeRoles separates Edges from Vertices, so we will effectively have a bipartite graph where Vertices can only connect to Edges and Edges can only connect to Vertices. Given the fact that Edges are also Vertices, Edges can be connected to Edges, but in such a case it is unambiguous which plays the role of Edge and which plays the role of Vertex in that connection. Let's look at an example of an N-ary edge: Suppose we want to state the fact that Tom gives Paula a Bicycle (no golden helicopters in stock today). We can write that as follows: --Giver--Tom GIVES --Recipient -- Paula --Gift -- Bicycle There is an EdgeType GIVES which defines three EdgeRoles: Giver, Recipient and Gift, which
Re: [Neo4j] Batch find
The batch insert is intended to push data into the database with having to do any look ups. You could preprocess your input data, such that it can be loaded in one go. You could for example read you input file against an existing database, fetch the ID's of nodes and relationships that contain the information you need to update, and create two new input files. One containing data that can be inserted using the batch inserter, and one containing the information that needs to updated (including the ID's of the PropertyContainers that need to be updated). Niels Date: Wed, 3 Aug 2011 04:14:44 -0700 From: ahmed.elshark...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Batch find I am trying to insert a document containing list of words , and i wont to check whether some of this words are already in my graph and in this case i will update their properties otherwise i will create new nodes with the new words -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Batch-find-tp3221634p3221964.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch find
That should be without having to do any lookups From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 3 Aug 2011 13:37:44 +0200 Subject: Re: [Neo4j] Batch find The batch insert is intended to push data into the database with having to do any look ups. You could preprocess your input data, such that it can be loaded in one go. You could for example read you input file against an existing database, fetch the ID's of nodes and relationships that contain the information you need to update, and create two new input files. One containing data that can be inserted using the batch inserter, and one containing the information that needs to updated (including the ID's of the PropertyContainers that need to be updated). Niels Date: Wed, 3 Aug 2011 04:14:44 -0700 From: ahmed.elshark...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Batch find I am trying to insert a document containing list of words , and i wont to check whether some of this words are already in my graph and in this case i will update their properties otherwise i will create new nodes with the new words -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Batch-find-tp3221634p3221964.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node#getRelationshipTypes
Hmmm... Does that require the inclusion of golden parachutes as well? Anyway, addressing the readers of this message that have time allocation authority. I hope my suggestion, or another technical solution that solves the same issues will be picked up for 1.5. This is as far as I can tell pretty much low hanging fruit. There are probably all sorts of tweaks that can improve the performance of Neo4j, but this one can improve the performance of Neo4j big time (under certain conditions). As a user who is confronted with several very densely connected nodes, I have tried all sorts of means to solve my issues, but none as rewarding as a solution in core would be. Niels Date: Wed, 3 Aug 2011 16:31:04 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes A golden helicopter might do the trick :) 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com How does one persuade the time allocation authorities? Niels Date: Wed, 3 Aug 2011 09:28:45 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Yup, it's a pretty sane approach and somewhat along the lines of how I feel it would be done. It's been said a long time that this functionality will be implemented some day and it's just that a significant amount of time have to be invested... maybe not for implementing it, but for discovering all bugs and inconveniences to have it on par with production quality. And that kind of time haven't been allocated yet. I appreciate your thoughts and time on all this! Best, Mattias 2011/8/3 Niels Hoogeveen pd_aficion...@hotmail.com I would like to make a suggestion that would both address my feature request and increase performance of the database. Right now the NodeRecord (org.neo4j.kernel.impl.nioneo.store.NodeRecord) contains the ID of the first Relationship, while the RelationshipRecord contain the ID's of the previous and next relationship for both sides of the relationship. My suggestion is as follows: Create a new store: noderelationshiptypestore.db The layout of this store is given by the NodeRelationshipTypeRecord: id previousrelationshiptype nextrelationshiptype firstrelationship The NodeRecord would now need to point to the first outgoing NodeRelationshipType and to the first incoming NodeRelationshipType instead of to the first Relationship. On insert of a Relationship, one side of the relationship will update the store from the outgoing side, the other side will update the store for the incoming side. I will list the steps to take here for the outgoing side (the incoming side is almost identical). From the NodeRecord getFirstNodeRelationType (outgoing). Keep following NextRelationshipType until the desired record is found. If no record exists, create one, make the current FirstNodeRelationshipType in the NodeRecord (if it exists) the NextRelationshipType of the created NodeRelationshipType (and make the created one the previous of the current one) and make the created NodeRelationshipType the FirstNodeRelationshipType in the NodeRecord. In other words: find the NodeRelationshipTypeRecord in the linked list. If none exists, create a NodeRelationshipTypeRecord, prepend it to the existing list and change the entry point in the NodeRecord. We now have found the requested NodeRelationshipTypeRecord. From NodeRelationshipTypeRecord getFirstRelationship. Create a new RelationshipRecord and make it the FirstRelationship in the NodeRelationshipTypeRecord. Make the old first RelationshipRecord (if it exists) the nextRelationship of the new first RelationshipRecord and make the new first RelationshipRecord the previous of the old first RelationshipRecord. In other words: prepend a new RelationshipRecord to the existing list of Relationships and change the entry point in the NodeRelationshipTypeRecord. Do the same for the incoming side (except for the creation of the RelationshipRecord, we only need one of those). Instead of a linked list of Relationships per Node we now have two linked lists of RelationshipTypes per Node (one incoming, one outgoing), with a linked list of Relationships per NodeRelationshipType. With this approach only those Relationships need to be read that match the RelationshipType and Direction given. Worst case this approach leads to an extra read operation per RelationshipType: Worst case example 1: Retrieve all Relationships, regardless of Relationship or Direction. Here we have extra reads for all NodeRelationshipType records. If the number of Relationships per
Re: [Neo4j] Memory overflow while creating big graph
Is it possible for you to use the batch inserter, or does the data you are loading require a lot of lookups? Niels From: jvcole...@gmail.com Date: Wed, 3 Aug 2011 17:57:20 -0300 To: user@lists.neo4j.org Subject: [Neo4j] Memory overflow while creating big graph Hi, I'm trying to create a graph with 15M nodes and 12M relationships, but after insert 400K relationships the following exception is thrown: Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded. I'm using -Xmx3g and the following configuration file for the graph: neostore.nodestore.db.mapped_memory = 256M neostore.relationshipstore.db.mapped_memory = 1G neostore.propertystore.db.mapped_memory = 90M neostore.propertystore.db.index.mapped_memory = 1M neostore.propertystore.db.index.keys.mapped_memory = 1M neostore.propertystore.db.strings.mapped_memory = 768M neostore.propertystore.db.arrays.mapped_memory = 130M cache_type = weak Can anyone help me? -- Jose Vinicius Pimenta Coletto ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] graph weight scheme design advice
Hi Boris, What will be your decision procedure to determine what edges will be marked as heavy and which will be marked as light? Even if you establish a fixed ratio, you will still need to decide what relationships belong in one category and which belong in the other? Could you elaborate a little more on your problem domain? Niels From: bo...@popcha.com Date: Mon, 1 Aug 2011 23:31:08 -0400 To: user@lists.neo4j.org Subject: [Neo4j] graph weight scheme design advice Howdy Graphistas! I hope someone with graph modeling experience can help me with a pattern I'm working on. I have two kinds of edges that may connect nodes, one is very heavy meaning that it has a high weight and if two nodes are connected by it this relationship it is very important, but there are few of these. The other is the opposite, it is very light, but plentiful. Since there will always be many more of the light relationships then the heavy ones, what is the best way to represent these in the graph? I can set up a fixed ratio, like 5:1, so each of the light ones is .2 and each of the heavy ones is 1, but at this time I have no idea what that ratio should be because I don't know how large the data set is and how it is configured, so I was wondering of this is a known pattern and had some elegant representation. If this isn't a message board answer, maybe someone can point me at a paper? Many thanks! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Composable traversals
It looks like this does the same I suggested. It's a bit clunkier, but I understand you don't want to changed the Node interface. OTOH is there any reason not to extend the Node interface, after all it is only one extends more? Since Nodes are all created in the neo4j-kernel component, there is no real reason to maintain strict binary backwards compatibility between versions, or do you expect people having projects with two separate neo4j-kernel jars having different versions? Niels Date: Tue, 2 Aug 2011 23:05:17 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Composable traversals Cool. To not mess around with interfaces too much I'm thinking of having: TraversalDescription#traverse( Node startNode, Node... additionalStartNodes ); TraversalDescription#traverse( Path startPath, Path... additionalStartPaths ); TraversalDescription#traverse( IterablePath startPaths ); that would be rather similar, wouldn't it? 2011/7/30 Niels Hoogeveen pd_aficion...@hotmail.com I would be all for it if this could become part of 1.5. I am willing to put time into this. Date: Sat, 30 Jul 2011 11:33:01 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Composable traversals Yes, FYI that's the exact thing we've been discussing :) 2011/7/29 Niels Hoogeveen pd_aficion...@hotmail.com Great, I would much rather see this become part of the core API than have this as part of the Enhanced API. To make things work correctly, one important change to core is needed: The Node interface needs to extends Traverser (the interface in org.neo4j.graphdb.traversal, not the one in org.neo4j.graphdb). This is actually not a big deal. There Traverser interface supports three methods: Iteratorpath iterator() [return 1 path with 1 element in the path, being the node itself]IterableNode nodes() [return an iterable over the node itself]IterableRelationship relationships() [return an empty iterable] With that addition, it's not all too difficult to enhance the current implementation of Traverser. It only adds one more iteration level over the current implementation. Instead of having one start node, we now have multiple start paths. When returning values from the Traverser, the start paths and the result paths need to be concatenated. In the new scenario, all old traverse() methods can remain the same, since Node becomes a Traverser, so those methods are just special cases where IterablePath consists of 1 path, with just 1 element. Niels Date: Fri, 29 Jul 2011 18:36:28 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Composable traversals There have been thoughts a long while to make something like this with the traversal framework, but time has never been allocated to evolve it. I'm adding stuff to the framework in a side track and will surely add some aspect of composable traversers also. 2011/7/29 Niels Hoogeveen pd_aficion...@hotmail.com I'd like to take a stab at implementing traversals in the Enhanced API. One of the things I'd like to do, is to make traversals composable. Right now a Traverser is created by either calling the traverse method on Node, or to call the traverse(Node) method on TraversalDescription. This makes traversals inherently non-composable, so we can't define a single traversal that returns the parents of all our friends. To make Traversers composable we need a function: Traverser traverse(Traverser, TraversalDescription) My take on it is to make Element (which is a superinterface of Node) into a Traverser. Traverser is basically another name for IterablePath. Every Node (or more generally every Element) can be seen as an IterabePath, returning a single Path, which contains a single path-element, the Node/Element itself. Composing traversals would entail the concatenation of the paths returned with the paths supplied, so when we ask for the parents of all our friends, the returned paths would take the form: Node --FRIEND-- Node -- PARENT -- Node Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User
Re: [Neo4j] Node#getRelationshipTypes
the performance may decrease by at most a factor 2, while the performance may increase by orders of magnitude in some quite common use cases. On top of that, we can also present the meta information I requested, because we can simply iterate over the NodeRelationshipType list and return the entries to the user. Finally, this proposal makes it possible to guarantee functional, surjective and one-to-one Relationships. Due to the partitioning we will know if there already is a relationship of a certain type. If a relationship is stated to be functional, surjective, or one-to-one, we can raise an exception when a second relationship is about to be created for that particular NodeRelationshipType. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 2 Aug 2011 23:03:41 +0200 Subject: Re: [Neo4j] Node#getRelationshipTypes Building an API on top of Neo4j of course pushes the standard API to its limits. So for that matter it is already a good exercise. Any chance this feature request will find its way into 1.5? Niels Date: Tue, 2 Aug 2011 22:33:03 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Those methods will of course be more efficient if implemented in the kernel compared to iterating through all relationships if the whole relationship chain have already been loaded for that node, otherwise it will require a full iteration (or at least making sure the whole chain have been loaded). I've never found a use case for it myself and this is the first I've heard. 2011/8/1 Niels Hoogeveen pd_aficion...@hotmail.com I have two specific use cases for these methods: I'd like to present a node with the property types (names) it has content for and with the relationship types it has relationships for, while loading those properties/relationships on demand (ie. click here to see details). This can be done for properties: there is a getPropertyKeys() method, but there is no getRelationshipTypes() method. The other use case has to do with the Enhanced API. There I want to have pluggable relationships and properties. With respect to relationships there are already three implementations: the regular Relationship, SortedRelations (which use an in-graph Btree for storage) and HyperRelationships which allow n-ary relationships. Every Element in Enhanced API has a getRelationships() method, much like the getRelationships() method in Node, which should return every relationship attached to an Element, irrespective of its implementation. Right now the Element implementation has to perform the logic to distinguish which relationship is used for what implementation (under the hood it all works using normal Relationships). It would be much more elegant to iterate over the RelationshipTypes and dispatch the getRelationships() method to the appropriate RelationshipType implementations. That way the logic for SortedRelationships, HyperRelationships remains in their associated classes and is not spread around the implementation. Niels From: michael.hun...@neotechnology.com Date: Sun, 31 Jul 2011 23:20:50 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Imho it would have to iterate as well. As the type is stored with the relationship record and so can only be accessed after having read it. It might be to have some minimal performance improvements that relationships would not have to be fully loaded, nor put into the cache for that. But this is always a question of the use-case. What will be done next with those rel-types. What was the use-case for this operation again? Cheers Michael Am 31.07.2011 um 18:59 schrieb Niels Hoogeveen: Good point. It could for all practical purposes even be IterableRelationshipType so they can be lazily fetched, as long as the underlying implementation makes certain that any iteration of the RelationshipTypes forms a set (no duplicates). There is no need to have RelationshipTypes in any particular order, and if that is needed in the application, they can usually be sorted locally since Nodes will generally have associated Relationships of only a handful of RelationshipTypes. That said, the more important question is, if the Neo4j store can produce this meta-information. For sparsely connected nodes, it is possible to iterate over the relationships and return the set of RelationshipTypes, but this is not a proper solution when nodes are densely connected. So there is no general solution for this question yet. Niels From: j...@neotechnology.com Date: Sun, 31 Jul 2011 17:29:29 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes
[Neo4j] HyperEdges unify Relationships and Properties
Last couple of days I have worked improving upon the Enhanced API and made some progress unifying Properties and Relationships. For some time, I have wanted to have a traverser which I can set up so that it returns a collection of properties. After all what we want to present in an application is not a node, but a set of properties on a node (or relationship). Both the node and the relationship are ultimately containers and only interesting for computational reasons. Of course it is possible to unify Relationships and Properties, after all they are both addressed by name (albeit in the Relationship case dressed up as a RelationshipType). Introducing HyperEdges (formerly named HyperRelationships) creates the right framework to unify Relationships as Properties. The standard API provides support for binary edges, relating 2 Nodes by means of a RelationshipType (label). The constructor of such a binary relationship can be thought of as: Egde(EdgeType, Node, Node) The Enhanced API generalizes this to n-ary Edges, where the binary (2-ary) edge is just a special case. So for the n-ary case we get the constructor: Edge(EdgeType, Node...) This is implemented and works well for all n 2. It also works for n = 2, because then we simply wrap the standard API and make direct calls to normal Relationships. This leaves us with two special cases n = 0 and n = 1. The n = 0 case could be thought of as the EdgeType itself, after all for case n = 0, the constructor of the edge reduces to: Egde(EdgeType) The n = 1 case brings us to the reason of this post, because for that case, the constructor of the edge is: Egde(EdgeType, Node) This looks strikingly similar to the constructor of a property, which would take the form: Property(PropertyType, Node) All we need to do to unify Properties and Relationships within the HyperEdge framework is to state that PropertyType is a subtype of EdgeType and that Property is a subtype of Edge. How all this relates to the transformation of a directed graph into an undirected bipartite graph will likely be subject of another post. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Brainstorming on my project: neo4john
Hi John, I think when approaching a project there are two distinct issues at play, one is the tooling level, another is the actual solution you are trying to create for an actual problem. When looking at the tooling level it is great to have as much covered as possible. Neo4j offers a graph database and pretty good integration with Lucene. This overall is a good choice of tools, because there is hardly any overlapping functionality. Neo4j offers storage and navigation, while Lucene provides indexing. So the tools are pretty much orthogonal to each other. When adding BDB to the mix, things become a bit messier. BDB offers indexing and storage, so now you have to decide what to use BDB for. If you choose to only use it for indexing, like an alternative to Lucene, things remain pretty much orthogonal. When you decide to use BDB for storage, the question becomes: what to store in Neo4j and what to store in BDB. When it comes to storing and retrieving properties to entities both seem to be pretty fast, and unless you have serious performance issues with the storage of properties, either Neo4j or BDB is suitable for the task. When it comes to storing relationships between entities, Neo4j is by far the better solution. Fetching a relationship is a really cheap action, since it only involves moving a file pointer to a certain position (id * record length) and read the record (ie. if that data is not available in the cache already). When having a relationships it is also cheap to fetch the associated nodes (again moving a file pointer to a position, or read it from the cache). And while we are at it, when having a node or a relationship, it is again cheap to fetch the properties associated to that node. The motto of Neo4j seems to be, keep it local stupid. This works great, unless things are not local and this is where indexing comes into play. Suppose we know a name or a certain value and want to know what nodes or relationships it is associated with, doing a local search becomes ineffective. We could iterated over all nodes (and or all relationships) and check for that particular value, but that doesn't scale beyond a couple of thousand nodes or relationships. One option could be to do the indexing in the graph. We could create a node that can easily be addressed through the reference node, that functions as a tree root and traverse over he index to find a particular node or relationship. It works, but is not as fast as dedicated indexing. A dedicated index will fetch index blocks in one read operation and manipulate those index blocks in memory, where an index build in Neo4j would model an index block as a set of nodes that need to be read one after another (and likely from very different places in the store). So a dedicated index is more local than Neo4j can be when manipulating the index trees. A dedicated index will win hands down from Neo4j when it comes to raw speed of an index lookup/manipulation and likely consume less memory doing so. Neo4j already supports Lucene, which is great for certain jobs (full text indexing, composite queries), but is probably (I would have to run tests to verify this assumption) slower than BDB when it comes to simple key-value mappings. Lucene is also not very good at handling unicity constraints, an area where a more regular key-value store like BDB has advantage too. All this is just about the tooling level of an application (fun in its own right, but it doesn't solve any real problems). Things become more interesting when we start looking at an actual application. So my question is, what use cases do you want to solve with your neo4john project. Your example with buttons on a screen is a bit too high level, because it contains a lot more tooling than just neo4j and or BDB. You would need presentation (GUI or HTML) and reactiveness (how to respond to input) and you would need to somehow model your domain. So my suggestion would be to first list a couple of real world scenarios you want to solve with your neo4john project and then look at your tooling to see what trade-offs you need to make to implement it. You may need a mix of Neo4j, Lucene and BDB, but maybe you don't need all three to solve your particular problem. In any case, it's important to rise above the tooling level, because that is only a means to a goal. Even if your project provides additional tooling, there is still an application level to it. Focusing on the application level is good practice, because only there do you actually provide solutions. Niels Date: Sun, 31 Jul 2011 15:09:20 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] Brainstorming on my project: neo4john Hey guys, I've been thinking that I would like to have a topic (like this current one) where I would be allowed to post anything related to brainstorming on my project which is currently a mix of neo4j and berkeleydb java
Re: [Neo4j] Node#getRelationshipTypes
Good point. It could for all practical purposes even be IterableRelationshipType so they can be lazily fetched, as long as the underlying implementation makes certain that any iteration of the RelationshipTypes forms a set (no duplicates). There is no need to have RelationshipTypes in any particular order, and if that is needed in the application, they can usually be sorted locally since Nodes will generally have associated Relationships of only a handful of RelationshipTypes. That said, the more important question is, if the Neo4j store can produce this meta-information. For sparsely connected nodes, it is possible to iterate over the relationships and return the set of RelationshipTypes, but this is not a proper solution when nodes are densely connected. So there is no general solution for this question yet. Niels From: j...@neotechnology.com Date: Sun, 31 Jul 2011 17:29:29 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Hi Niels, Ignoring the operational use for getting relationship types, I do think these should be generalised from: RelationshipType[] getRelationshipTypes(); RelationshipType[] getRelationshipTypes(Direction); to: SetRelationshipType getRelationshipTypes(); SetRelationshipType getgetRelationshipTypes(Direction); Unless you need the ordering and you think the overhead of creating a some kind of Set is too onerous from a performance point of view. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Brainstorming on my project: neo4john
Interesting thought, and it is certainly true that indexing is much less of a concern in a graph database than in a normal RDBMS where generally every table needs to have a primary key and where you need to have an index on the primary key to be able to do joins (at least to do them somewhat quickly). In a graph database relationships are explicit and static, while in an RDBMS inter-table relationships are implicit and dynamic. This distinction makes that an RDBMS can answer some ad-hoc relationship questions where this would be unpractical in a graph database. For example, in an RDBMS I can ask for a join over the Persons and over the Country table and return the Person_ID and the Country_ID if the country code is contained in the last name of the person. In a graph database asking that same question is not that easy, unless of course we have explicitly created relationships from Person nodes to Country nodes if the country code is contained in the last name of the person (unlikely). Being able to find relationships in an implicit and dynamic way has of course a performance penalty. After all it's much cheaper to follow a file pointer than having to lookup a value in an index (or worse do a full table scan). That said, there are situations where we need to jump to another position in the graph. One way is through the use of id's, which is a very cheap non-local jump. The other is through indexes, which can come in two variations, in-graph (using a traversal to mimic a non-local jump), or through an external index service. In-graph indexes can work really well, but are not as optimized to the task as dedicated index services are. The main reason is that dedicated index services can map index blocks to memory, while neo4j is much more fine grained, having to load the content of an index block node for node and relationship for relationship. This makes that in-graph indexes don't really scale all that well, especially when getting bigger than memory allocated. When having a cache miss, a dedicated index service can swap out a couple of index blocks where neo4j needs to swap out individual nodes and relationships. If index blocks are needed again, a dedicated index service can simply load those block in one read operation, while an in-graph index would have to reload those individual nodes and relationships one at a time. Niels From: j...@neotechnology.com Date: Sun, 31 Jul 2011 17:27:33 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] Brainstorming on my project: neo4john Hi John, Niels, I think of indexes in Neo4j as long-lived names. Not quite the keep it local that Niels mentioned, but not entirely dissimilar either. Those long lived-names tend to give you starting points in the graph from where you perform graph operations. Indexing therefore constitutes less of your database design than it would in a RDBMS. Marko had a good line about this: graphs are adjacency free indexes (or words to that affect). Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Brainstorming on my project: neo4john
Aiming to be as generic as possible can be good, but as some point you need to be specific too. You mention Java and Eclipse as being generic, but they are only to a point. When Java was introduced some of its main feats were platform independence, static type checking, garbage collection/managed memory and checked exceptions. Those were deliberate design decisions making Java a very specific sort of language, making it suitable for certain types of applications and less suitable for other types of applications. Java is very suitable for large applications that need modularisation, but it's not that great for ad-hoc scripting. The same is true for Eclipse. It is a great platform to build an IDE in, but would be overkill for a simple game of tic-tac-toe. Creating something that is completely generic has the downside that it actually becomes bad at doing something specific. Another downside to being completely generic is that it doesn't provide people with clues what it can do. This is most noticeable in the programming language LISP, which is so generic that every construct looks like every other construct, giving no visual clues to what the program is actually doing. It's a wonderful language where you can do the most amazing dynamic magic in only a few lines of code (with lots of parentheses), but has always been a niche language, because it doesn't offer programmers concrete clues about what you can do with it. Another language from that same era, COBOL, took the opposite approach and very explicitly made every feature available at the language level. This made COBOL a very special purpose language. At the time, for application programmers COBOL was an easy choice, because it offered many of the features needed for the applications. Much of that could be achieved in LISP too, but it never made those features explicit, so no one ever considered writing a business app in LISP. That said, the demise of COBOL came because of a changing environment and the specifics strengths eventually became weaknesses. Still, the changing environment didn't make LISP a winner, instead a language like PHP became hugely successful, because it focused on doing one thing well: the creation of HTML pages. So my point is, when you want to create something, try to have a concrete vision of what you want it to do. Now as to the discussion of what is tooling level and what is application level. I think the as a rule of thumb you can say that tools can be replaced by something else without functionally changing the application, while you cannot replace part of the application with something else without functionally changing the applications. Let's look at Neo4j. For me as an application programmer, it is a tool. I could in principle swap Neo4j out and replace it with another storage engine. I would probably take a performance hit in some areas doing so, but functionally my application could very much remain the same. For Neo Tech on the other hand, Neo4j is an application. There the tooling level consists of things like Maven and the java NIO API. In principle the tooling could be replaced. Instead of Maven, ANT scripts could be used to do the build and instead of the NIO API, the old fashioned IO API could be used. There would be a huge performance penalty swapping out NIO for IO, but functionally Neo4j could remain the same, only much slower. Yet it is not possible to remove the Node API and replace it with something else without changing the functionality of the application. So the question remains what functionality you want to provide with your neo4john project. You could think of a storage API that is independent of the storage engine used. So you could swap out Neo4j and replace it with BDB, and vice versa. If you do that, ask yourself who would be interested in that and what purpose does it serve? What are the benefits of replacing one storage engine with the other? When I started working on the Enhanced API, I had some concrete goals in mind which I wanted to solve: 1.) Make every element of the database reifiable, so they can all be used as first-class citizens.2.) Provide a pluggable architecture for properties and relationships. Both these goals make the Enhanced API more general than the standard API, but this is a result of the goals and not a goal in and of itself. Date: Sun, 31 Jul 2011 19:45:50 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Brainstorming on my project: neo4john Hey Niels, thanks for the concise reply. On Sun, Jul 31, 2011 at 5:10 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi John, I think when approaching a project there are two distinct issues at play, one is the tooling level, another is the actual solution you are trying to create for an actual problem. I seem to want a generic solution for multiple problems. Something generic enough that it can be applied
Re: [Neo4j] Node#getRelationshipTypes
I have two specific use cases for these methods: I'd like to present a node with the property types (names) it has content for and with the relationship types it has relationships for, while loading those properties/relationships on demand (ie. click here to see details). This can be done for properties: there is a getPropertyKeys() method, but there is no getRelationshipTypes() method. The other use case has to do with the Enhanced API. There I want to have pluggable relationships and properties. With respect to relationships there are already three implementations: the regular Relationship, SortedRelations (which use an in-graph Btree for storage) and HyperRelationships which allow n-ary relationships. Every Element in Enhanced API has a getRelationships() method, much like the getRelationships() method in Node, which should return every relationship attached to an Element, irrespective of its implementation. Right now the Element implementation has to perform the logic to distinguish which relationship is used for what implementation (under the hood it all works using normal Relationships). It would be much more elegant to iterate over the RelationshipTypes and dispatch the getRelationships() method to the appropriate RelationshipType implementations. That way the logic for SortedRelationships, HyperRelationships remains in their associated classes and is not spread around the implementation. Niels From: michael.hun...@neotechnology.com Date: Sun, 31 Jul 2011 23:20:50 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Imho it would have to iterate as well. As the type is stored with the relationship record and so can only be accessed after having read it. It might be to have some minimal performance improvements that relationships would not have to be fully loaded, nor put into the cache for that. But this is always a question of the use-case. What will be done next with those rel-types. What was the use-case for this operation again? Cheers Michael Am 31.07.2011 um 18:59 schrieb Niels Hoogeveen: Good point. It could for all practical purposes even be IterableRelationshipType so they can be lazily fetched, as long as the underlying implementation makes certain that any iteration of the RelationshipTypes forms a set (no duplicates). There is no need to have RelationshipTypes in any particular order, and if that is needed in the application, they can usually be sorted locally since Nodes will generally have associated Relationships of only a handful of RelationshipTypes. That said, the more important question is, if the Neo4j store can produce this meta-information. For sparsely connected nodes, it is possible to iterate over the relationships and return the set of RelationshipTypes, but this is not a proper solution when nodes are densely connected. So there is no general solution for this question yet. Niels From: j...@neotechnology.com Date: Sun, 31 Jul 2011 17:29:29 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] Node#getRelationshipTypes Hi Niels, Ignoring the operational use for getting relationship types, I do think these should be generalised from: RelationshipType[] getRelationshipTypes(); RelationshipType[] getRelationshipTypes(Direction); to: SetRelationshipType getRelationshipTypes(); SetRelationshipType getgetRelationshipTypes(Direction); Unless you need the ordering and you think the overhead of creating a some kind of Set is too onerous from a performance point of view. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
I use the download option on Github expand the zip in a directory and run mvn install in that directory without any problems. Niels Date: Sat, 30 Jul 2011 13:39:15 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index When running the mvn install, both tests are ran after another. Since I didn't use mvn (xD) I ran the tests manually one by one, but what you say makes sense, it's likely the tests fail when ran one after the other, I'll see what happens with an @Suite since there are only 2 junit tests, with @Suite they work Let's see if I could run mvn install (btw, avoided mvn so far because I cannot install the git plugin for some reason and that other error I get) Looks like I still need to find out how to fix this error: [ERROR] The project org.neo4j:neo4j-berkeleydb-je-index:0.1-SNAPSHOT (E:\wrkspc\bdb-index-fork\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:18 is not available in the local repository. and 'parent.relativePath' points at wrong local POM @ line 3, column 11 - [Help 2] before I could do anything with maven... I'll skip trying to make maven to work for me for now, don't feel like it :) *I'm not qualified to fix this with maven, sorry* John On Fri, Jul 29, 2011 at 5:16 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi John, Thanks for looking into this. I am still seeing the same error I had before. When running the mvn install, both tests are ran after another. For some reason the transaction log sees an unclean shutdown and tries to commit pending transactions. During that process the index names of the bdb indexes are being retrieved from binary storage. Here something goes wrong, because the index name returned is garbage, so the recovery process fails because it can't find the right index files. Niels Date: Fri, 29 Jul 2011 07:48:43 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I forked and fixed, the tests are all working now: https://github.com/13th-floor/bdb-index Let me know if you want me to do a pull request, ... sadly I applied formatting on RawBDBSpeed and the diff doesn't look pretty if you're trying to see what changed John. On Thu, Jul 28, 2011 at 7:36 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
licenses... [INFO] [INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ neo4j-berkeleydb-je-index --- [WARNING] The POM for org.apache.maven:maven-plugin-api:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-project:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-core:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-artifact:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-settings:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-model:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-monitor:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven.shared:maven-filtering:jar:1.0-beta-4 is missing, no dependency information available [WARNING] The POM for org.codehaus.plexus:plexus-interpolation:jar:1.13 is missing, no dependency information available [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.780s [INFO] Finished at: Sat Jul 30 19:32:06 CEST 2011 [INFO] Final Memory: 16M/154M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:2.4.3:resources (default-resources) on project neo4j-berkeleydb-je-index: Executi on default-resources of goal org.apache.maven.plugins:maven-resources-plugin:2.4.3:resources failed: Plugin org.apache.maven.plugins:maven-resources-plugin:2.4. 3 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven.shared:maven-filtering:jar:1.0-beta-4, org.c odehaus.plexus:plexus-interpolation:jar:1.13: The repository system is offline but the artifact org.apache.maven.shared:maven-filtering:jar:1.0-beta-4 is not av ailable in the local repository. - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException *BOLD *part :) Running from command line, but on a just now downloaded zip file(as you said), it works (thus my eclipse maven still needs some work, ie. maybe allow it internet access even though it's on ask in firewall) I mean I do see those errors that you said you're seeing... can't really paste them here from terminal they will be broken with 80 chars per line In eclipse without maven, running AllTests, although the tests do pass, I failed to see that (possibly) the same exception(s) thrown by maven install, are happening on console. But not when tests are run each individually. So, your errors happen both with mvn install and AllTests (which runs them both one after the other, too). So that was a failure to notice on my part :) that counting from 0 to 100 on console must've moved up the exceptions and since tests were all success, I didn't scroll up. Trying to fix, On Sat, Jul 30, 2011 at 3:28 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I use the download option on Github expand the zip in a directory and run mvn install in that directory without any problems. Niels Date: Sat, 30 Jul 2011 13:39:15 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index When running the mvn install, both tests are ran after another. Since I didn't use mvn (xD) I ran the tests manually one by one, but what you say makes sense, it's likely the tests fail when ran one after the other, I'll see what happens with an @Suite since there are only 2 junit tests, with @Suite they work Let's see if I could run mvn install (btw, avoided mvn so far because I cannot install the git plugin for some reason and that other error I get) Looks like I still need to find out how to fix this error: [ERROR] The project org.neo4j:neo4j-berkeleydb-je-index:0.1-SNAPSHOT (E:\wrkspc\bdb-index-fork\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:18 is not available in the local repository. and 'parent.relativePath' points at wrong local POM @ line 3, column 11 - [Help 2] before I could do anything with maven... I'll skip trying to make maven to work for me
[Neo4j] Node#getRelationshipTypes
While working on Enhanced API, I realize two crucial method are missing on the Node interface of the standard API: RelationshipType[] getRelationshipTypes(); RelationshipType[] getRelationshipTypes(Direction); For Enhanced API, I'd like to be able to plug in different Relationship implementations (eg. SortedRelations and HyperRelations). Doing so is sort of possible, but only by cluttering a class called ElementImpl with all sorts of logic related to those different Relationship implementations (not the place where it belongs). The neat way would be to dispatch on RelationshipType and have the different Relationship implementations handle that logic. The API for PropertyContainer on the other hand does provide a method similar to what I am asking for: PropertyContainer#getPropertyKeys(). I realize this request cannot be honored without changing the record layout of the neo4j store, which has a major impact. However, there is already good reason to reconsider the record layout of the relationship store to solve the issue of densely connected nodes. To properly solve the densely connected node issue, relationships should be partitioned by relationship type and by direction. That way only those Relationships belonging to a RelationshipType that contributes to the densely connectedness will take time to load, while other Relationships can be fetched fast. Such partitioning immediately provides the meta information I am asking for. Such meta information (as exists for properties), has value beyond its use in the Enhanced API. I would eg. like to be able to present a form with the property keys and RelationshipTypes associated with a particular Node, and on request load the content belonging to a property key or a RelationshipType. For property keys this is possible, but for RelationshipTypes all relationships need to be fetched to know which RelationshipTypes are associated with a particular node. Especially for RelationshipTypes with many instances connected to one Node this is not a suitable solution and runs counter to the need to load those relationships on request. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Yes, you are right. I had looked at the code too superficially. Still, something goes wrong reading the indexName, when I print that name it looks like garbage (upon recovery), while it should produce a readable index name. I didn't check if the value written to the record is actually a readable String. Niels Date: Sat, 30 Jul 2011 23:23:49 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.com wrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc
Re: [Neo4j] bdb-index
It looks as if you have modified the file header of the source files. Maven checks the license (the file header) and returns an error message when the license required is different from the license provided. When looking at the diff of one of your edits I noticed there are extra spaces in the license. See: https://github.com/13th-floor/bdb-index/commit/7c6b59fbdc445a122aa247b391c15a23dd64cac9#src/main/java/org/neo4j/index/bdbje/BerkeleyDbBatchInserterIndexProvider.java These extra spaces make that maven does not install. Niels Date: Sun, 31 Jul 2011 00:00:42 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\Neo4jTestCase.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeley.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbDataSource.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeleyBatchInsert.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\All Tests.java [ERROR] BUILD ERROR [INFO] - [INFO] Some files do not have the expected license header [INFO] - But it should work, I say; maybe let me know if it doesn't On Sat, Jul 30, 2011 at 11:41 PM, John cyuczieekc cyuczie...@gmail.comwrote: org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction
Re: [Neo4j] bdb-index
Could you check if the neo4j kernel jar file maven adds to class path is correct and complete. You can find it in your user directory in the .m2 subdirectory. Date: Sun, 31 Jul 2011 00:40:51 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I fixed those licenses, but to my amazement I'm getting new errors which didn't happen before, I am puzzled as to why would this happen e:\down\13th-floor-bdb-index-f9a3155mvn install [INFO] Scanning for projects... [INFO] [INFO] Building Unnamed - org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT [INFO]task-segment: [install] [INFO] [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] [resources:resources {execution: default-resources}] [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 0 resource to META-INF [INFO] [compiler:compile {execution: default-compile}] [INFO] Compiling 14 source files to e:\down\13th-floor-bdb-index-f9a3155\target\ classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] 4 errors [INFO] - [INFO] [ERROR] BUILD FAILURE [INFO] [INFO] Compilation failure \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 2 seconds [INFO] Finished at: Sun Jul 31 00:37:19 CEST 2011 [INFO] Final Memory: 38M/359M [INFO] e:\down\13th-floor-bdb-index-f9a3155 On Sun, Jul 31, 2011 at 12:26 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: It looks as if you have modified the file header of the source files. Maven checks the license (the file header) and returns an error message when the license required is different from the license provided. When looking at the diff of one of your edits I noticed there are extra spaces in the license. See: https://github.com/13th-floor/bdb-index/commit/7c6b59fbdc445a122aa247b391c15a23dd64cac9#src/main/java/org/neo4j/index/bdbje/BerkeleyDbBatchInserterIndexProvider.java These extra spaces make that maven does not install. Niels Date: Sun, 31 Jul 2011 00:00:42 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor
Re: [Neo4j] bdb-index
I see in your edit of is the following import: import org.neo4j.index.lucene.LuceneIndexProvider; This is an interface defined in the legacy-index component, which is not in the POM ( and shouldn't be). The import is nowhere used in the file, except as links in header of the class where it doesn't belong. I guess an organize imports in Eclipse has added that import based on an incorrect header. It's best to remove the legacy-index component from your build path in eclips. In fact, it's best to let maven manage the project for you, so only jars listed as dependencies in maven are put in the build path. To work on bdb-index you need nothing more than the neo4j-kernel on your build path. Niels Date: Sun, 31 Jul 2011 01:19:17 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I'm not sure how complete it is (ie. there's no org\neo4j\index folder inside it), but its sha1 matches, but also worth mentioning that I noticed it got updated a few minutes before I tried to mvn install, so it could be that it worked before because it was a different .jar (ie. prev version) Also, unpacking the jar and searching for any file named lucene* yields no results Searching for lucene* in all archives under that .m2 folder, still nothing. trying with 1.3 still doesn't work, not found. neo4j-kernel-1.4-SNAPSHOT.jar 817,935 bytes sha1: a20720ece824b372520b7afde080cdc83abb5501 Thanks for the hints! All this maven knowledge will prove useful. John. On Sun, Jul 31, 2011 at 12:57 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Could you check if the neo4j kernel jar file maven adds to class path is correct and complete. You can find it in your user directory in the .m2 subdirectory. Date: Sun, 31 Jul 2011 00:40:51 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I fixed those licenses, but to my amazement I'm getting new errors which didn't happen before, I am puzzled as to why would this happen e:\down\13th-floor-bdb-index-f9a3155mvn install [INFO] Scanning for projects... [INFO] [INFO] Building Unnamed - org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT [INFO]task-segment: [install] [INFO] [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] [resources:resources {execution: default-resources}] [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 0 resource to META-INF [INFO] [compiler:compile {execution: default-compile}] [INFO] Compiling 14 source files to e:\down\13th-floor-bdb-index-f9a3155\target\ classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] 4 errors [INFO] - [INFO] [ERROR] BUILD FAILURE [INFO] [INFO] Compilation failure \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] [INFO] For more information, run Maven with the -e switch
Re: [Neo4j] bdb-index
Forgot the filename in the first sentence: BerkeleyDbBatchInserterIndexProvider.java From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Sun, 31 Jul 2011 01:47:20 +0200 Subject: Re: [Neo4j] bdb-index I see in your edit of is the following import: import org.neo4j.index.lucene.LuceneIndexProvider; This is an interface defined in the legacy-index component, which is not in the POM ( and shouldn't be). The import is nowhere used in the file, except as links in header of the class where it doesn't belong. I guess an organize imports in Eclipse has added that import based on an incorrect header. It's best to remove the legacy-index component from your build path in eclips. In fact, it's best to let maven manage the project for you, so only jars listed as dependencies in maven are put in the build path. To work on bdb-index you need nothing more than the neo4j-kernel on your build path. Niels Date: Sun, 31 Jul 2011 01:19:17 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I'm not sure how complete it is (ie. there's no org\neo4j\index folder inside it), but its sha1 matches, but also worth mentioning that I noticed it got updated a few minutes before I tried to mvn install, so it could be that it worked before because it was a different .jar (ie. prev version) Also, unpacking the jar and searching for any file named lucene* yields no results Searching for lucene* in all archives under that .m2 folder, still nothing. trying with 1.3 still doesn't work, not found. neo4j-kernel-1.4-SNAPSHOT.jar 817,935 bytes sha1: a20720ece824b372520b7afde080cdc83abb5501 Thanks for the hints! All this maven knowledge will prove useful. John. On Sun, Jul 31, 2011 at 12:57 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Could you check if the neo4j kernel jar file maven adds to class path is correct and complete. You can find it in your user directory in the .m2 subdirectory. Date: Sun, 31 Jul 2011 00:40:51 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I fixed those licenses, but to my amazement I'm getting new errors which didn't happen before, I am puzzled as to why would this happen e:\down\13th-floor-bdb-index-f9a3155mvn install [INFO] Scanning for projects... [INFO] [INFO] Building Unnamed - org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT [INFO]task-segment: [install] [INFO] [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] [resources:resources {execution: default-resources}] [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 0 resource to META-INF [INFO] [compiler:compile {execution: default-compile}] [INFO] Compiling 14 source files to e:\down\13th-floor-bdb-index-f9a3155\target\ classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbDataSource.java:[31,29] package org.neo4j.index.lucene does not exist [ERROR] \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\B erkeleyDbBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] 4 errors [INFO] - [INFO] [ERROR] BUILD FAILURE [INFO] [INFO] Compilation failure \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist
Re: [Neo4j] Composable traversals
I am going to stick as closely to the current implementation of traversers and where possible use code of the current implementation. As far as I can see, the current UniquenessFilter works well, so I am going to keep that setup for the new implementation. Indeed it should be: Node --FRIEND-- Node --PARENT-- Node Both FRIEND and PARENT are just RelationshipTypes, nothing fancy with intermediate nodes going on. The iterator that returns the paths of the traversal should check if none of the nodes/relationships in the path have been deleted when returning. Locking things in the traverser is probably not a good idea, since it can easily lock large parts of the graph for an unknown amount of time. The traverser works lazily, so we cannot know in advance when and even if the iterator will be forwarded. Keeping nodes (potentially indefinitely) locked is not such a good idea. A traversal can never return more than temporary snapshots of the database. It can easily be that already returned paths have been deleted by the time the traversal ends, and new paths can be created which the traverser will not see, because that part of the graph has already been examinated. I don't see how the isolation levels found in an RDBMS can be implemented in graph dabase. There is no notion of range locks without having a schema, so phantom reads may always occur in traversals. Niels Date: Fri, 29 Jul 2011 07:04:41 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Composable traversals Hey Niels, As they are composable, is java going to keep track of things, like if recursive, in stack ? or in array/variables ? or the graph could keep track of what's beep parsed so far, in-graph ? (I mean, this question applies for non-composable too; personally i like the idea of in-graph keeping track of those but maybe that would be implemented later at a higher level, so I guess for now it will be in array/variables) Just making sure, in here: Node --FRIEND-- Node -- PARENT -- Node FRIEND and PARENT are both relationship types? they are thus not intermediary nodes acting like they are relationships? (which is actually what I do with bdb where the only elemental thing is the Node, rels cannot be addressed ie. by ID) What happens while the traversers are executing and some other thread/process is deleting something which the traverser added to to itself as a valid node/path ? For example the first Node in Node --FRIEND-- Node assuming that's where the traverser's currently at, is deleted... Is there some notification/event or were they locked by traverser? or this kind of issue will be dealt with later after traverser is implemented? Are thee locks kept in-graph so they can be seen by other threads/processes (mainly thinking processes that cannot access the same java resource ie. in another jvm or computer tho accessing the same database - I guess this rules out embedded?) ? if any locks... On Fri, Jul 29, 2011 at 1:30 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I'd like to take a stab at implementing traversals in the Enhanced API. One of the things I'd like to do, is to make traversals composable. Right now a Traverser is created by either calling the traverse method on Node, or to call the traverse(Node) method on TraversalDescription. This makes traversals inherently non-composable, so we can't define a single traversal that returns the parents of all our friends. To make Traversers composable we need a function: Traverser traverse(Traverser, TraversalDescription) My take on it is to make Element (which is a superinterface of Node) into a Traverser. Traverser is basically another name for IterablePath. Every Node (or more generally every Element) can be seen as an IterabePath, returning a single Path, which contains a single path-element, the Node/Element itself. Composing traversals would entail the concatenation of the paths returned with the paths supplied, so when we ask for the parents of all our friends, the returned paths would take the form: Node --FRIEND-- Node -- PARENT -- Node Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Hi John, Thanks for looking into this. I am still seeing the same error I had before. When running the mvn install, both tests are ran after another. For some reason the transaction log sees an unclean shutdown and tries to commit pending transactions. During that process the index names of the bdb indexes are being retrieved from binary storage. Here something goes wrong, because the index name returned is garbage, so the recovery process fails because it can't find the right index files. Niels Date: Fri, 29 Jul 2011 07:48:43 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I forked and fixed, the tests are all working now: https://github.com/13th-floor/bdb-index Let me know if you want me to do a pull request, ... sadly I applied formatting on RawBDBSpeed and the diff doesn't look pretty if you're trying to see what changed John. On Thu, Jul 28, 2011 at 7:36 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
What I need to store in an index depends on the type of element that needs to be reified. Relatationship: To associated Node: RelId - NodeIdFrom associated Node: NodeId - RelId RelationshipType: To associated Node: RelationhipType.name - NodeIdFrom associated Node: NodeId - RelationshipType.name; RelationshipRole:To associated Node: RelationhipRole.name - NodeIdFrom associated Node: NodeId - RelationshipRole.name; PropertyType:To associated Node: PropertyType.name - NodeIdFrom associated Node: NodeId - PropertyType.name; Property:To associated Node: Node, PropertyType.name - NodeIdFrom associated Node: NodeId - Node, PropertyType.name Niels Date: Fri, 29 Jul 2011 06:49:31 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index Hi xD I'm not clear what you need to store here, if I understand correctly you could store in 2 primary bdb databases the nodeID (ie. long) of each node in a relationship ie. key-value dbForward: A-B A-C X-D X-B dbBackward: B-A B-X C-A D-X A,B,C,D,X are all nodeIDs ie. longs this way you could check if A-B exists, or all of A's endNodes , or what startNodes are pointing to the endNode B the storing of these would be sorted and in BTree, lookup would be fast, so you can consider ie. A as being a set of B and C, and X being a set of B and D, (that is you cannot set the order as in a list, they are sorted by bdb for fast retrievals). (But upon this, sets, can build lists np - that is using only bdb; tho you won't need that using neo4j) So, if this is the kind of index you wanted... (I am not aware of specific indexes with bdb, though that doesn't mean they don't exist) Insertions would require transaction protection so both A-B in dbForward and B-A in dbBackward are inserted atomically. Parsing A then X of B- in dbBackward for example can only be done with a cursor... Either way, I'm taking a look on that bdb-index thingy; will report back if I have any ideas heh John. On Thu, Jul 28, 2011 at 9:42 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance a user application would want to use those property names, but it is a restriction not found in the standard API, so ultimately something to consider.Niels From: peter.neuba...@neotechnology.com Date: Thu, 28 Jul 2011 10:39:47 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index niels, in this spike, I just concentrated on getting _something_ working in order to test insertion speed. This is not up to real indexing standards, so some love is needed here. I think Mattias is the best person to ask about pointers, let's wait until he is back next week if that is ok? Maybe some other (like the standard Lucene) index can suffice for the time being to test out things? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 28, 2011 at 10:36 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels
Re: [Neo4j] Composable traversals
Great, I would much rather see this become part of the core API than have this as part of the Enhanced API. To make things work correctly, one important change to core is needed: The Node interface needs to extends Traverser (the interface in org.neo4j.graphdb.traversal, not the one in org.neo4j.graphdb). This is actually not a big deal. There Traverser interface supports three methods: Iteratorpath iterator() [return 1 path with 1 element in the path, being the node itself]IterableNode nodes() [return an iterable over the node itself]IterableRelationship relationships() [return an empty iterable] With that addition, it's not all too difficult to enhance the current implementation of Traverser. It only adds one more iteration level over the current implementation. Instead of having one start node, we now have multiple start paths. When returning values from the Traverser, the start paths and the result paths need to be concatenated. In the new scenario, all old traverse() methods can remain the same, since Node becomes a Traverser, so those methods are just special cases where IterablePath consists of 1 path, with just 1 element. Niels Date: Fri, 29 Jul 2011 18:36:28 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Composable traversals There have been thoughts a long while to make something like this with the traversal framework, but time has never been allocated to evolve it. I'm adding stuff to the framework in a side track and will surely add some aspect of composable traversers also. 2011/7/29 Niels Hoogeveen pd_aficion...@hotmail.com I'd like to take a stab at implementing traversals in the Enhanced API. One of the things I'd like to do, is to make traversals composable. Right now a Traverser is created by either calling the traverse method on Node, or to call the traverse(Node) method on TraversalDescription. This makes traversals inherently non-composable, so we can't define a single traversal that returns the parents of all our friends. To make Traversers composable we need a function: Traverser traverse(Traverser, TraversalDescription) My take on it is to make Element (which is a superinterface of Node) into a Traverser. Traverser is basically another name for IterablePath. Every Node (or more generally every Element) can be seen as an IterabePath, returning a single Path, which contains a single path-element, the Node/Element itself. Composing traversals would entail the concatenation of the paths returned with the paths supplied, so when we ask for the parents of all our friends, the returned paths would take the form: Node --FRIEND-- Node -- PARENT -- Node Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
Hi John, Thanks for showing an interest. The compile error you got was due to the fact that a removed class was still hanging around in the Git repo. I renamed BinaryRelationshipRoles into BinaryRelationshipRole, but the original file was still active in the Git repo. I fixed that. I have been thinking about BDB too for this situation, because the graph database now stores some information about the associated nodes and their reverse lookup. This of course polutes the name/node space. It would be neat to offload this book keeping information to some persistent hashmap, so the implementation is completely transparent to the user. I don't know how nicely BDB plays with Neo4J transactions. Does anyone have experience with this? Another aspect is licencing. I am no legal buff, so maybe someone else can jump in and answer this. Personally, I don't mind adding BDB as a dependency, but it has to work well at the transaction level and licence wise, otherwise it's a no go for me. I would recommend you to start using maven. There is an Eclipse plugin m2eclipse, which allows you to use/maintain Maven projects from within Eclipse. Niels Date: Thu, 28 Jul 2011 05:09:54 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] HyperRelationship example Hey Niels, I like xD this seems like a lot of work and professionally done; ie. something I could not have done (I don't have that kind of experience and focus). Gratz on that, I really appreciate seeing this. I cloned the repo from git, manually, with eclipse (not using maven - don't know how with eclipse) I am getting only about 3 compile errors, like: 1) The type BinaryRelationshipRolesT must implement the inherited abstract method PropertyContainer.getId() 2) The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 3) The return type is incompatible with RelationshipContainer.getRelationships() for org.neo4j.collections.graphdb.impl.RelationshipIterable.RelationshipIterable(IterableRelationship rels) Also, I am thinking to try and implement this on top of berkeleydb just for fun/benchmarking (so to speak) to compare between that and neo4j - since I am currently unsure which one to use for my hobby project (I like that berkeleydb's searches are 0-1ms instead of few seconds) Btw, would it be any interest to you if I were to fork your repo and add ie. AllTests.java for junit and the .project and related files for eclipse project in a pull or two ? as long as it doesn't seem useless or cluttering... (note however I never actually, yet, used forkpull but only read about it on github xD) Thanks to all, for wasting some time reading this, Greeting and salutations, John On Wed, Jul 27, 2011 at 8:48 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] strange problem while getting a node property
When iterating over all nodes, you also pull the reference node (with id = 0), which probably doesn't have the requested property. If you want to list all properties of a node, it's better to use a construct like: for(String key: node.getPropertyKeys()){ System.out.println(node.getProperty(key));} Date: Thu, 28 Jul 2011 13:18:50 +0200 From: c-...@jsnet.be To: user@lists.neo4j.org Subject: [Neo4j] strange problem while getting a node property Hi, I've this strange problem when I try to collect data from the graph with the Java API in Groovy : db.allNodes.each {node - cpt=0 node.getRelationships().each {rel - cpt++ } println (${node} ${cpt}) println node.getPropertyKeys() } The iteration on each node is right working. The iteration to count the relationships on each node is working too. The call node.getPropertyKeys() gives me the list of the properties like this : [nbrel, version, maintainer, section, architecture, package, priority, dataset, installedSize] But, If a call node.getProperty(package) I receive this error : Caught: org.neo4j.graphdb.NotFoundException: package property not found for NodeImpl#0 And, If I set the value just before, for test like this : node.setProperty(package, test) println node.getProperty(package) I get the value. So I can't get property which was not set by the node.setProperty method. The initial data are copied into the graph with a perl script using the Neo4j REST interface. Maybe I do something wrong, I'm a newbie in both Neo4j and Groovy Regards, Jean-Sébastien Stoffen ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships() The return type is incompatible with RelationshipContainer.getRelationships() 3) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships(RelationshipType...) The return type is incompatible with RelationshipContainer.getRelationships(RelationshipType[]) John. On Thu, Jul 28, 2011 at 12:52 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Hi John, Thanks for showing an interest. The compile error you got was due to the fact that a removed class was still hanging around in the Git repo. I renamed BinaryRelationshipRoles into BinaryRelationshipRole, but the original file was still active in the Git repo. I fixed that. I have been thinking about BDB too for this situation, because the graph database now stores some information about the associated nodes and their reverse lookup. This of course polutes the name/node space. It would be neat to offload this book keeping information to some persistent hashmap, so the implementation is completely transparent to the user. I don't know how nicely BDB plays with Neo4J transactions. Does anyone have experience with this? Another aspect is licencing. I am no legal buff, so maybe someone else can jump in and answer this. Personally, I don't mind adding BDB as a dependency, but it has to work well at the transaction level and licence wise, otherwise it's a no go for me. I would recommend you to start using maven. There is an Eclipse plugin m2eclipse, which allows you to use/maintain Maven projects from within Eclipse. Niels Date: Thu, 28 Jul 2011 05:09:54 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] HyperRelationship example Hey Niels, I like xD this seems like a lot of work and professionally done; ie. something I could not have done (I don't have that kind of experience and focus). Gratz on that, I really appreciate seeing this. I cloned the repo from git, manually, with eclipse (not using maven - don't know how with eclipse) I am getting only about 3 compile errors, like: 1) The type BinaryRelationshipRolesT must implement the inherited abstract method PropertyContainer.getId() 2) The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 3) The return type is incompatible with RelationshipContainer.getRelationships() for org.neo4j.collections.graphdb.impl.RelationshipIterable.RelationshipIterable(IterableRelationship rels) Also, I am thinking to try and implement this on top of berkeleydb just for fun/benchmarking (so to speak) to compare between that and neo4j - since I am currently unsure which one to use for my hobby project (I like that berkeleydb's searches are 0-1ms instead of few seconds) Btw, would it be any interest to you if I were to fork your repo and add ie. AllTests.java for junit and the .project and related files for eclipse project in a pull or two ? as long as it doesn't seem useless or cluttering... (note however I never actually, yet, used forkpull but only read about it on github xD) Thanks to all, for wasting some time reading this, Greeting and salutations, John On Wed, Jul 27, 2011 at 8:48 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
It's a trick to lock a node. When removing a property that does not exist the node gets locked. Date: Thu, 28 Jul 2011 15:51:15 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] HyperRelationship example Hey Niels, what is acquireLock() doing in SortedTree ? is removeProperty causing neo4j to acquire a lock on the Node? or its properties? also does that property need to exist? seems like not interesting :) On Wed, Jul 27, 2011 at 8:48 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki/Berkeley_DB#Licensing I would totally use neo4j, if it would be as fast at searches :/ ie. BTree storage of nodes/rels? (guessing) But having 10mil rels, and seeing BDB checking if A--B in 0ms, and neo4j in like 0 to 66 to 310 seconds (depending on its position) is a show stopper for me, especially because I want to base everything on just nodes (without properties) and their relationships. ie. make a set or list of things, without having A ---[ENTRY]-- e ---[NEXT] --- e2 but instead A-b-e-c-e2 where b and c are just nodes, and also AllEntries-b and AllNexts-c (silly example with such less info tho) Point is, I would do lots of searches a lot (imagine a real time program running on top of nodes/rels, that is it's defined in and can access only nodes), this would likely cause those ms to add up to seconds... I installed maven (m2e) again, I guess I could use it, but it seems it creates .jar , not sure if that's useful to me while I am coding... seems better to use project/sources no? and maven only when ready to publish/get the jar ; anyway I need to learn how to use it otherwise I'm getting errors like this , when trying to build: [ERROR] The project org.neo4j:neo4j-graph-collections:1.5-SNAPSHOT (E:\wrkspc\graph-collections\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:21 is not available in the local repositor y. and 'parent.relativePath' points at wrong local POM @ line 4, column 11 - [Help 2] Anyway, with normal eclipse, I'm still showing 2 different errors: 1) in org.neo4j.collections.graphdb.ComparablePropertyTypeT line 29: super(name, graphDb); The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 2) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships() The return type is incompatible with RelationshipContainer.getRelationships() 3) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships(RelationshipType...) The return type is incompatible with RelationshipContainer.getRelationships(RelationshipType[]) John. On Thu, Jul 28, 2011 at 12:52 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Hi John, Thanks for showing an interest. The compile error you got was due to the fact that a removed class was still hanging around in the Git repo. I renamed BinaryRelationshipRoles into BinaryRelationshipRole, but the original file was still active in the Git repo. I fixed that. I have been thinking about BDB too for this situation, because the graph database now stores some information about the associated nodes and their reverse lookup. This of course polutes the name/node space. It would be neat to offload this book keeping information to some persistent hashmap, so the implementation is completely
[Neo4j] bdb-index
Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here:https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Should read: The retrieved indexName is actually garbage. From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 28 Jul 2011 19:36:21 +0200 Subject: [Neo4j] bdb-index Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here:https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance a user application would want to use those property names, but it is a restriction not found in the standard API, so ultimately something to consider.Niels From: peter.neuba...@neotechnology.com Date: Thu, 28 Jul 2011 10:39:47 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index niels, in this spike, I just concentrated on getting _something_ working in order to test insertion speed. This is not up to real indexing standards, so some love is needed here. I think Mattias is the best person to ask about pointers, let's wait until he is back next week if that is ok? Maybe some other (like the standard Lucene) index can suffice for the time being to test out things? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 28, 2011 at 10:36 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here:https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Composable traversals
I'd like to take a stab at implementing traversals in the Enhanced API. One of the things I'd like to do, is to make traversals composable. Right now a Traverser is created by either calling the traverse method on Node, or to call the traverse(Node) method on TraversalDescription. This makes traversals inherently non-composable, so we can't define a single traversal that returns the parents of all our friends. To make Traversers composable we need a function: Traverser traverse(Traverser, TraversalDescription) My take on it is to make Element (which is a superinterface of Node) into a Traverser. Traverser is basically another name for IterablePath. Every Node (or more generally every Element) can be seen as an IterabePath, returning a single Path, which contains a single path-element, the Node/Element itself. Composing traversals would entail the concatenation of the paths returned with the paths supplied, so when we ask for the parents of all our friends, the returned paths would take the form: Node --FRIEND-- Node -- PARENT -- Node Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API and HyperRelationships
Added a test for Enhanced API and HyperRelationships. Reification works correctly, HyperRelationships works correctly for binary relationships. Still need to add tests for HyperRelationships with higher arity (will do so later today). Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 27 Jul 2011 00:04:32 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships Hi Peter, I will start writing test-code first. Some nice creation code will definitely be part of that, which I will post on the graph-collections Wiki, together with the resulting data (reminder to self: install neoclipse to make neat images of the graph). The Enhanced stuff still needs thorough testing. My Scala app only uses the non-enhanced features, so it was basically a test proving the wrappers all work properly. In two or three days, I am confident the software is ready for others to try out. Niels From: peter.neuba...@neotechnology.com Date: Tue, 26 Jul 2011 14:48:36 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API and HyperRelationships That is cool Niels, I am looking forward to you testing it out, maybe some else people? Also, I would love to see how to query such a structure at the API level. Could you post some nice creation code and the resulting graph so we can see how it looks? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 26, 2011 at 2:44 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I just ported my own application 12kloc of Scala code to use the Enhance API and got it working. Of course more thorough testing needs to be done, but it proves that at least in the case of my own application the Enhanced API can work as a drop-in replacement. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 22:13:59 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships A first stab at implementing the Enhanced API and HyperRelationships is finished. It still needs thorough testing, so this is PRE-ALPHA quality.It also still lacks proper documentation (java docs).The source code can be found at:https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdbA description can be found at:https://github.com/peterneubauer/graph-collections/wiki/Enhanced-APINiels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 01:02:20 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships The implementation of HyperRelationships needs another day of work, though the hard parts are finished now. Time to explain the inner workings of HyperRelationships. HyperRelationships are a generalization of the binary relationships found in Neo4j. Instead of creating a relationship from a node to another node, we define a HyperRelationship as a set of Nodes each having a RelationshipRole within the HyperRelationship. For the binary case the RelationshipRoles are StartNode and EndNode. For HyperRelationships with an arity higher than 2, the Roles need to be defined for each HyperRelationshipType. A HyperRelationship is layed-out in the database as a regular relationship in the binary case. For HyperRelationship with an arity higher than 2, a Node is created subsuming the role of Relationship. From this Node, binary relationships (regular Neo4J relationships) are created for each Element of the relationship. The RelationshipTypes of these binary relationships are a concatenation of the name of the HyperRelationshipType used and the RelationshipRole of the attached Element. Example: Suppose we want to store the fact that Flo and Eddie give Tom, Dick and Harry a Book. This is a ternary relationship, with the following RelationshipRoles: Giver: Flo and Eddie Recipient: Tom, Dick and Harry Gift: Book The GIVE relationship is first created with a Set of Roles (Giver, Recipient and Gift). When the example relation is created the following binary relationships will be create: HyperRelationshipNode --GIVE/#/Giver-- Flo HyperRelationshipNode --GIVE/#/Giver-- Eddie HyperRelationshipNode --GIVE/#/Recipient-- Tom HyperRelationshipNode --GIVE/#/Recipient-- Dick HyperRelationshipNode --GIVE/#/Recipient-- Harry HyperRelationshipNode --GIVE/#/Gift-- Book
[Neo4j] HyperRelationship example
I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] how to scale and view or generate reports for complex graphs?
Hi Sambodhi, One of the means to organize complexity is by adding meta information to your database. This first of all helps you organize what relationships and properties belong to what sort of node, it may also help answer questions such as: what nodes belong to what type/class. Niels Date: Wed, 27 Jul 2011 23:23:45 +0100 From: sambodhi.s...@gmail.com To: user@lists.neo4j.org Subject: [Neo4j] how to scale and view or generate reports for complex graphs? Hi Guys! I am a bit new to Graph database. I really liked the concept, graph made managing relationship between the entities relatively easy. I therefore chose to use it in my new project. I started the development two weeks back and my graph has already grown so complex with static data. I am wondering when it goes to production with thousands of users, how would we manage it. What really bothers me is : a. how do view such a complex graph? I use neoecplise but am not sure it would be able to accommodate thousands of nodes and at the same time it would be easy to eyes to find a particular node. b. is there any kind of report generation tool ? c. how to scale the graph? i read few article on it but it got me more confused. Would be really helpful if you can provide a link to a relevant document. Many Thanks! Sambodhi ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API and HyperRelationships
Integrated IndexedRelationships functionality into the Enhanced API, so relationships of a certain type are maintained in a Btree, while they can be manipulated through the API just like any other relationship. Still need to test this one. As mentioned earlier today, HyperRelationships and Enhanced API now have a set of tests which they pass. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 22:13:59 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships A first stab at implementing the Enhanced API and HyperRelationships is finished. It still needs thorough testing, so this is PRE-ALPHA quality.It also still lacks proper documentation (java docs).The source code can be found at:https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdbA description can be found at:https://github.com/peterneubauer/graph-collections/wiki/Enhanced-APINiels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 01:02:20 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships The implementation of HyperRelationships needs another day of work, though the hard parts are finished now. Time to explain the inner workings of HyperRelationships. HyperRelationships are a generalization of the binary relationships found in Neo4j. Instead of creating a relationship from a node to another node, we define a HyperRelationship as a set of Nodes each having a RelationshipRole within the HyperRelationship. For the binary case the RelationshipRoles are StartNode and EndNode. For HyperRelationships with an arity higher than 2, the Roles need to be defined for each HyperRelationshipType. A HyperRelationship is layed-out in the database as a regular relationship in the binary case. For HyperRelationship with an arity higher than 2, a Node is created subsuming the role of Relationship. From this Node, binary relationships (regular Neo4J relationships) are created for each Element of the relationship. The RelationshipTypes of these binary relationships are a concatenation of the name of the HyperRelationshipType used and the RelationshipRole of the attached Element. Example: Suppose we want to store the fact that Flo and Eddie give Tom, Dick and Harry a Book. This is a ternary relationship, with the following RelationshipRoles: Giver: Flo and Eddie Recipient: Tom, Dick and Harry Gift: Book The GIVE relationship is first created with a Set of Roles (Giver, Recipient and Gift). When the example relation is created the following binary relationships will be create: HyperRelationshipNode --GIVE/#/Giver-- Flo HyperRelationshipNode --GIVE/#/Giver-- Eddie HyperRelationshipNode --GIVE/#/Recipient-- Tom HyperRelationshipNode --GIVE/#/Recipient-- Dick HyperRelationshipNode --GIVE/#/Recipient-- Harry HyperRelationshipNode --GIVE/#/Gift-- Book We can now retrieve all Relationships where Flo is the Giver in a GIVE relationship, simply by concatenating GiVE and Giver into GIVE/#/Giver, and then ask all incoming Relationships with that RelationshipType. This fetches the HyperRelationship nodes and the other attached Elements of the HyperRelationship can be loaded. I added an extra interface FunctionalRelationshipRole, which restricts the number of Elements attached to a RelationshipRole within a HyperRelationship to one. The use of this amounts to something similar to having a getSingleRelationship method, which cannot throw an Exception, because multiple entries with the same RelationshipType cannot be created by design. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 25 Jul 2011 02:03:54 +0200 Subject: [Neo4j] Enhanced API and HyperRelationships Today I wrote a piece about the Enhanced API and about HyperRelationships, I have been working on over the last couple of days. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API The API as presented in the graph-collections repo on Git is not feature complete yet with respect to HyperRelationships. The interfaces are there, but the implementation only works for binary relationships at present. Need one more day for the implementation. I posted the Wiki page and the source code to open the discussion about these new features. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API and HyperRelationships
A first stab at implementing the Enhanced API and HyperRelationships is finished. It still needs thorough testing, so this is PRE-ALPHA quality.It also still lacks proper documentation (java docs).The source code can be found at:https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdbA description can be found at:https://github.com/peterneubauer/graph-collections/wiki/Enhanced-APINiels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 01:02:20 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships The implementation of HyperRelationships needs another day of work, though the hard parts are finished now. Time to explain the inner workings of HyperRelationships. HyperRelationships are a generalization of the binary relationships found in Neo4j. Instead of creating a relationship from a node to another node, we define a HyperRelationship as a set of Nodes each having a RelationshipRole within the HyperRelationship. For the binary case the RelationshipRoles are StartNode and EndNode. For HyperRelationships with an arity higher than 2, the Roles need to be defined for each HyperRelationshipType. A HyperRelationship is layed-out in the database as a regular relationship in the binary case. For HyperRelationship with an arity higher than 2, a Node is created subsuming the role of Relationship. From this Node, binary relationships (regular Neo4J relationships) are created for each Element of the relationship. The RelationshipTypes of these binary relationships are a concatenation of the name of the HyperRelationshipType used and the RelationshipRole of the attached Element. Example: Suppose we want to store the fact that Flo and Eddie give Tom, Dick and Harry a Book. This is a ternary relationship, with the following RelationshipRoles: Giver: Flo and Eddie Recipient: Tom, Dick and Harry Gift: Book The GIVE relationship is first created with a Set of Roles (Giver, Recipient and Gift). When the example relation is created the following binary relationships will be create: HyperRelationshipNode --GIVE/#/Giver-- Flo HyperRelationshipNode --GIVE/#/Giver-- Eddie HyperRelationshipNode --GIVE/#/Recipient-- Tom HyperRelationshipNode --GIVE/#/Recipient-- Dick HyperRelationshipNode --GIVE/#/Recipient-- Harry HyperRelationshipNode --GIVE/#/Gift-- Book We can now retrieve all Relationships where Flo is the Giver in a GIVE relationship, simply by concatenating GiVE and Giver into GIVE/#/Giver, and then ask all incoming Relationships with that RelationshipType. This fetches the HyperRelationship nodes and the other attached Elements of the HyperRelationship can be loaded. I added an extra interface FunctionalRelationshipRole, which restricts the number of Elements attached to a RelationshipRole within a HyperRelationship to one. The use of this amounts to something similar to having a getSingleRelationship method, which cannot throw an Exception, because multiple entries with the same RelationshipType cannot be created by design. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 25 Jul 2011 02:03:54 +0200 Subject: [Neo4j] Enhanced API and HyperRelationships Today I wrote a piece about the Enhanced API and about HyperRelationships, I have been working on over the last couple of days. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API The API as presented in the graph-collections repo on Git is not feature complete yet with respect to HyperRelationships. The interfaces are there, but the implementation only works for binary relationships at present. Need one more day for the implementation. I posted the Wiki page and the source code to open the discussion about these new features. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Enhanced API and HyperRelationships
I just ported my own application 12kloc of Scala code to use the Enhance API and got it working. Of course more thorough testing needs to be done, but it proves that at least in the case of my own application the Enhanced API can work as a drop-in replacement. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 22:13:59 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships A first stab at implementing the Enhanced API and HyperRelationships is finished. It still needs thorough testing, so this is PRE-ALPHA quality.It also still lacks proper documentation (java docs).The source code can be found at:https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdbA description can be found at:https://github.com/peterneubauer/graph-collections/wiki/Enhanced-APINiels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 01:02:20 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships The implementation of HyperRelationships needs another day of work, though the hard parts are finished now. Time to explain the inner workings of HyperRelationships. HyperRelationships are a generalization of the binary relationships found in Neo4j. Instead of creating a relationship from a node to another node, we define a HyperRelationship as a set of Nodes each having a RelationshipRole within the HyperRelationship. For the binary case the RelationshipRoles are StartNode and EndNode. For HyperRelationships with an arity higher than 2, the Roles need to be defined for each HyperRelationshipType. A HyperRelationship is layed-out in the database as a regular relationship in the binary case. For HyperRelationship with an arity higher than 2, a Node is created subsuming the role of Relationship. From this Node, binary relationships (regular Neo4J relationships) are created for each Element of the relationship. The RelationshipTypes of these binary relationships are a concatenation of the name of the HyperRelationshipType used and the RelationshipRole of the attached Element. Example: Suppose we want to store the fact that Flo and Eddie give Tom, Dick and Harry a Book. This is a ternary relationship, with the following RelationshipRoles: Giver: Flo and Eddie Recipient: Tom, Dick and Harry Gift: Book The GIVE relationship is first created with a Set of Roles (Giver, Recipient and Gift). When the example relation is created the following binary relationships will be create: HyperRelationshipNode --GIVE/#/Giver-- Flo HyperRelationshipNode --GIVE/#/Giver-- Eddie HyperRelationshipNode --GIVE/#/Recipient-- Tom HyperRelationshipNode --GIVE/#/Recipient-- Dick HyperRelationshipNode --GIVE/#/Recipient-- Harry HyperRelationshipNode --GIVE/#/Gift-- Book We can now retrieve all Relationships where Flo is the Giver in a GIVE relationship, simply by concatenating GiVE and Giver into GIVE/#/Giver, and then ask all incoming Relationships with that RelationshipType. This fetches the HyperRelationship nodes and the other attached Elements of the HyperRelationship can be loaded. I added an extra interface FunctionalRelationshipRole, which restricts the number of Elements attached to a RelationshipRole within a HyperRelationship to one. The use of this amounts to something similar to having a getSingleRelationship method, which cannot throw an Exception, because multiple entries with the same RelationshipType cannot be created by design. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 25 Jul 2011 02:03:54 +0200 Subject: [Neo4j] Enhanced API and HyperRelationships Today I wrote a piece about the Enhanced API and about HyperRelationships, I have been working on over the last couple of days. See: https://github.com/peterneubauer/graph-collections/wiki/Enhanced-API The API as presented in the graph-collections repo on Git is not feature complete yet with respect to HyperRelationships. The interfaces are there, but the implementation only works for binary relationships at present. Need one more day for the implementation. I posted the Wiki page and the source code to open the discussion about these new features. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org
Re: [Neo4j] Enhanced API and HyperRelationships
Hi Emil, My Scala app is in-house software that I am not sharing. I do intend to adapt it to use all or at least most of the features of the Enhanced API, and where I can spin-off generic pieces I will move that to the Enhanced API (after porting it to Java first). The impact so far is minimal, since my app only uses non-enhanced methods, so it actually only calls a wrapper. It becomes more interesting once the enhanced features are being used. Niels From: e...@neotechnology.com Date: Tue, 26 Jul 2011 21:49:24 + To: user@lists.neo4j.org Subject: Re: [Neo4j] Enhanced API and HyperRelationships Hi Niels -- Very interesting stuff you're doing. Any chance that Scala app of your is open source? Would love to see the impact of using your enhanced API vs not using it. Cheers, -EE On Tue, Jul 26, 2011 at 21:44, Niels Hoogeveen pd_aficion...@hotmail.com wrote: I just ported my own application 12kloc of Scala code to use the Enhance API and got it working. Of course more thorough testing needs to be done, but it proves that at least in the case of my own application the Enhanced API can work as a drop-in replacement. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 22:13:59 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships A first stab at implementing the Enhanced API and HyperRelationships is finished. It still needs thorough testing, so this is PRE-ALPHA quality.It also still lacks proper documentation (java docs).The source code can be found at:https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/graphdbA description can be found at:https://github.com/peterneubauer/graph-collections/wiki/Enhanced-APINiels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Tue, 26 Jul 2011 01:02:20 +0200 Subject: Re: [Neo4j] Enhanced API and HyperRelationships The implementation of HyperRelationships needs another day of work, though the hard parts are finished now. Time to explain the inner workings of HyperRelationships. HyperRelationships are a generalization of the binary relationships found in Neo4j. Instead of creating a relationship from a node to another node, we define a HyperRelationship as a set of Nodes each having a RelationshipRole within the HyperRelationship. For the binary case the RelationshipRoles are StartNode and EndNode. For HyperRelationships with an arity higher than 2, the Roles need to be defined for each HyperRelationshipType. A HyperRelationship is layed-out in the database as a regular relationship in the binary case. For HyperRelationship with an arity higher than 2, a Node is created subsuming the role of Relationship. From this Node, binary relationships (regular Neo4J relationships) are created for each Element of the relationship. The RelationshipTypes of these binary relationships are a concatenation of the name of the HyperRelationshipType used and the RelationshipRole of the attached Element. Example: Suppose we want to store the fact that Flo and Eddie give Tom, Dick and Harry a Book. This is a ternary relationship, with the following RelationshipRoles: Giver: Flo and Eddie Recipient: Tom, Dick and Harry Gift: Book The GIVE relationship is first created with a Set of Roles (Giver, Recipient and Gift). When the example relation is created the following binary relationships will be create: HyperRelationshipNode --GIVE/#/Giver-- Flo HyperRelationshipNode --GIVE/#/Giver-- Eddie HyperRelationshipNode --GIVE/#/Recipient-- Tom HyperRelationshipNode --GIVE/#/Recipient-- Dick HyperRelationshipNode --GIVE/#/Recipient-- Harry HyperRelationshipNode --GIVE/#/Gift-- Book We can now retrieve all Relationships where Flo is the Giver in a GIVE relationship, simply by concatenating GiVE and Giver into GIVE/#/Giver, and then ask all incoming Relationships with that RelationshipType. This fetches the HyperRelationship nodes and the other attached Elements of the HyperRelationship can be loaded. I added an extra interface FunctionalRelationshipRole, which restricts the number of Elements attached to a RelationshipRole within a HyperRelationship to one. The use of this amounts to something similar to having a getSingleRelationship method, which cannot throw an Exception, because multiple entries with the same RelationshipType cannot be created by design. Niels From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Mon, 25 Jul 2011 02:03:54 +0200 Subject: [Neo4j] Enhanced API and HyperRelationships Today I wrote a piece about the Enhanced API and about HyperRelationships, I have been working on over the last couple of days. See