Re: [Neo4j] neo4j maven config
Something's odd now, last time I exited eclipse no projects had errors, now after starting it I've noticed that neo4j-cypher neo4j-server neo4j-shell have errors, in the latter there are only 2 errors: Query cannot be resolved to a type SyntaxException cannot be resolved to a type these are supposed to be in the first project: import org.neo4j.cypher.SyntaxException; import org.neo4j.cypher.commands.Query; I don't understand :) but something does get updated when I start eclipse, like maven index something The import org.neo4j.cypher.commands cannot be resolved Where is this import and why is it gone? something got updated meanwhile ? wicked O_o On Tue, Aug 2, 2011 at 7:49 AM, John cyuczieekc cyuczie...@gmail.comwrote: ok I finally fixed them all, no errors anymore the most important error was: Project configuration is not up-to-date with pom.xml. Run project configuration update seems it's all good now, no need to reply Thanks, John On Tue, Aug 2, 2011 at 6:22 AM, John cyuczieekc cyuczie...@gmail.comwrote: I should mention that I've already imported neo4j-community from github as a maven project, and I've a lot of neo4j-* projects in my workspace but most of them are red/errors and I am getting this error in pom.xml: GroupId is duplicate of parent groupId I ran a maven test, seems to be successful but ie. neo4j-cypher has errors like: The import org.neo4j.cypher.SyntaxException cannot be resolved and something like this in pom.xml: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1(Exit value: 1) (org.scala-tools:maven-scala-plugin:2.15.2:testCompile:test-compile:test-compile) On Tue, Aug 2, 2011 at 6:12 AM, John cyuczieekc cyuczie...@gmail.comwrote: Hi, in this wiki here: http://wiki.neo4j.org/content/Getting_Started_With_Java there is a separate snapshot repository: http://m2.neo4j.org/snapshots/ since I kind of want to use the latest neo4j, how do I use this repository, when using m2e and eclipse ? I am new to maven, maybe this is trivial to config... is this done somewhere globally or in my own project's pom.xml ? in my own pom.xml I've specified this: dependency groupIdorg.neo4j/groupId artifactIdneo4j/artifactId version1.5-SNAPSHOT/version /dependency but maven cannot find it (on maven central): [ERROR] Failed to execute goal on project neo4john: Could not resolve dependencies for project neo4john:neo4john:jar:0.0.1-SNAPSHOT: Could not find artifact org .neo4j:neo4j:jar:1.5-SNAPSHOT - [Help 1] I need to tell maven to use your own repository: http://m2.neo4j.org/snapshots/ Thanks in advance, John ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j maven config
mvn test [INFO] Scanning for projects... [WARNING] [WARNING] Some problems were encountered while building the effective model for org.neo4j:neo4j-cypher:jar:1.5-SNAPSHOT [WARNING] 'parent.relativePath' points at org.neo4j.build:community-build instea d of org.neo4j:parent-central, please verify your project structure @ line 3, co lumn 11 [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten t he stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support buildin g such malformed projects. [WARNING] [INFO] [INFO] [INFO] Building Neo4j - Cypher 1.5-SNAPSHOT [INFO] but that seems to be right: parent groupIdorg.neo4j/groupId artifactIdparent-central/artifactId version22/version /parent pretty much wherever I look it's parent-central, tho I didn't look everywhere/on each project am I in the twilight-zone again? On Tue, Aug 2, 2011 at 2:07 PM, John cyuczieekc cyuczie...@gmail.comwrote: Something's odd now, last time I exited eclipse no projects had errors, now after starting it I've noticed that neo4j-cypher neo4j-server neo4j-shell have errors, in the latter there are only 2 errors: Query cannot be resolved to a type SyntaxException cannot be resolved to a type these are supposed to be in the first project: import org.neo4j.cypher.SyntaxException; import org.neo4j.cypher.commands.Query; I don't understand :) but something does get updated when I start eclipse, like maven index something The import org.neo4j.cypher.commands cannot be resolved Where is this import and why is it gone? something got updated meanwhile ? wicked O_o On Tue, Aug 2, 2011 at 7:49 AM, John cyuczieekc cyuczie...@gmail.comwrote: ok I finally fixed them all, no errors anymore the most important error was: Project configuration is not up-to-date with pom.xml. Run project configuration update seems it's all good now, no need to reply Thanks, John On Tue, Aug 2, 2011 at 6:22 AM, John cyuczieekc cyuczie...@gmail.comwrote: I should mention that I've already imported neo4j-community from github as a maven project, and I've a lot of neo4j-* projects in my workspace but most of them are red/errors and I am getting this error in pom.xml: GroupId is duplicate of parent groupId I ran a maven test, seems to be successful but ie. neo4j-cypher has errors like: The import org.neo4j.cypher.SyntaxException cannot be resolved and something like this in pom.xml: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1(Exit value: 1) (org.scala-tools:maven-scala-plugin:2.15.2:testCompile:test-compile:test-compile) On Tue, Aug 2, 2011 at 6:12 AM, John cyuczieekc cyuczie...@gmail.comwrote: Hi, in this wiki here: http://wiki.neo4j.org/content/Getting_Started_With_Java there is a separate snapshot repository: http://m2.neo4j.org/snapshots/ since I kind of want to use the latest neo4j, how do I use this repository, when using m2e and eclipse ? I am new to maven, maybe this is trivial to config... is this done somewhere globally or in my own project's pom.xml ? in my own pom.xml I've specified this: dependency groupIdorg.neo4j/groupId artifactIdneo4j/artifactId version1.5-SNAPSHOT/version /dependency but maven cannot find it (on maven central): [ERROR] Failed to execute goal on project neo4john: Could not resolve dependencies for project neo4john:neo4john:jar:0.0.1-SNAPSHOT: Could not find artifact org .neo4j:neo4j:jar:1.5-SNAPSHOT - [Help 1] I need to tell maven to use your own repository: http://m2.neo4j.org/snapshots/ Thanks in advance, John ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j maven config
I even installed scala plugin for eclipse, and switched from embedded mevan 3.0.3 to external.. neo4j-cypher still has errors I failed and I'm giving up... everything good byez fml On Tue, Aug 2, 2011 at 2:31 PM, John cyuczieekc cyuczie...@gmail.comwrote: mvn test [INFO] Scanning for projects... [WARNING] [WARNING] Some problems were encountered while building the effective model for org.neo4j:neo4j-cypher:jar:1.5-SNAPSHOT [WARNING] 'parent.relativePath' points at org.neo4j.build:community-build instea d of org.neo4j:parent-central, please verify your project structure @ line 3, co lumn 11 [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten t he stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support buildin g such malformed projects. [WARNING] [INFO] [INFO] [INFO] Building Neo4j - Cypher 1.5-SNAPSHOT [INFO] but that seems to be right: parent groupIdorg.neo4j/groupId artifactIdparent-central/artifactId version22/version /parent pretty much wherever I look it's parent-central, tho I didn't look everywhere/on each project am I in the twilight-zone again? On Tue, Aug 2, 2011 at 2:07 PM, John cyuczieekc cyuczie...@gmail.comwrote: Something's odd now, last time I exited eclipse no projects had errors, now after starting it I've noticed that neo4j-cypher neo4j-server neo4j-shell have errors, in the latter there are only 2 errors: Query cannot be resolved to a type SyntaxException cannot be resolved to a type these are supposed to be in the first project: import org.neo4j.cypher.SyntaxException; import org.neo4j.cypher.commands.Query; I don't understand :) but something does get updated when I start eclipse, like maven index something The import org.neo4j.cypher.commands cannot be resolved Where is this import and why is it gone? something got updated meanwhile ? wicked O_o On Tue, Aug 2, 2011 at 7:49 AM, John cyuczieekc cyuczie...@gmail.comwrote: ok I finally fixed them all, no errors anymore the most important error was: Project configuration is not up-to-date with pom.xml. Run project configuration update seems it's all good now, no need to reply Thanks, John On Tue, Aug 2, 2011 at 6:22 AM, John cyuczieekc cyuczie...@gmail.comwrote: I should mention that I've already imported neo4j-community from github as a maven project, and I've a lot of neo4j-* projects in my workspace but most of them are red/errors and I am getting this error in pom.xml: GroupId is duplicate of parent groupId I ran a maven test, seems to be successful but ie. neo4j-cypher has errors like: The import org.neo4j.cypher.SyntaxException cannot be resolved and something like this in pom.xml: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1(Exit value: 1) (org.scala-tools:maven-scala-plugin:2.15.2:testCompile:test-compile:test-compile) On Tue, Aug 2, 2011 at 6:12 AM, John cyuczieekc cyuczie...@gmail.comwrote: Hi, in this wiki here: http://wiki.neo4j.org/content/Getting_Started_With_Java there is a separate snapshot repository: http://m2.neo4j.org/snapshots/ since I kind of want to use the latest neo4j, how do I use this repository, when using m2e and eclipse ? I am new to maven, maybe this is trivial to config... is this done somewhere globally or in my own project's pom.xml ? in my own pom.xml I've specified this: dependency groupIdorg.neo4j/groupId artifactIdneo4j/artifactId version1.5-SNAPSHOT/version /dependency but maven cannot find it (on maven central): [ERROR] Failed to execute goal on project neo4john: Could not resolve dependencies for project neo4john:neo4john:jar:0.0.1-SNAPSHOT: Could not find artifact org .neo4j:neo4j:jar:1.5-SNAPSHOT - [Help 1] I need to tell maven to use your own repository: http://m2.neo4j.org/snapshots/ Thanks in advance, John ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Looking over these On Sat, Jul 30, 2011 at 1:42 PM, John cyuczieekc cyuczie...@gmail.comwrote: those got concatenated for some reason, I'll repost them here so I can see them Relatationship: To associated Node: RelId - NodeId From associated Node: NodeId - RelId RelationshipType: To associated Node: RelationhipType.name - NodeId From associated Node: NodeId - RelationshipType.name; here, this may indeed be faster as you said it ('cause it doesn't require double lookup in both Relationship: and RelationshipType: above); but I wondering if it's more neatly if RelationshipType: here would be: Relid - RelationshipType expanding this notation as: To associated Relid: RelationshipType - Relid From associated Relid: Relid - RelationshipType the logic would be, if you have the start node, you can get the Relid (via Relationship: above) then you can get the RelationshipType and if you have RelationshipType you can get all Relids of that type, and by having those, you can lookup the nodes that form them in the Relationship: above but of course the way you said it it's faster I guess, I don't suppose lookup by name would be slower by much compared to lookup by long/id and XA transactions or any transactions, would make sure Relationship: is consistent with RelationshipType: anyway. I should probably be thinking/doing about other things, this seemed useless RelationshipRole: To associated Node: RelationhipRole.name - NodeId From associated Node: NodeId - RelationshipRole.name; PropertyType: To associated Node: PropertyType.name - NodeId From associated Node: NodeId - PropertyType.name; Property: To associated Node: Node, PropertyType.name - NodeId From associated Node: NodeId - Node, PropertyType.name On Fri, Jul 29, 2011 at 5:27 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: What I need to store in an index depends on the type of element that needs to be reified. Relatationship: To associated Node: RelId - NodeIdFrom associated Node: NodeId - RelId RelationshipType: To associated Node: RelationhipType.name - NodeIdFrom associated Node: NodeId - RelationshipType.name; RelationshipRole:To associated Node: RelationhipRole.name - NodeIdFrom associated Node: NodeId - RelationshipRole.name; PropertyType:To associated Node: PropertyType.name - NodeIdFrom associated Node: NodeId - PropertyType.name; Property:To associated Node: Node, PropertyType.name - NodeIdFrom associated Node: NodeId - Node, PropertyType.name Niels Date: Fri, 29 Jul 2011 06:49:31 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index Hi xD I'm not clear what you need to store here, if I understand correctly you could store in 2 primary bdb databases the nodeID (ie. long) of each node in a relationship ie. key-value dbForward: A-B A-C X-D X-B dbBackward: B-A B-X C-A D-X A,B,C,D,X are all nodeIDs ie. longs this way you could check if A-B exists, or all of A's endNodes , or what startNodes are pointing to the endNode B the storing of these would be sorted and in BTree, lookup would be fast, so you can consider ie. A as being a set of B and C, and X being a set of B and D, (that is you cannot set the order as in a list, they are sorted by bdb for fast retrievals). (But upon this, sets, can build lists np - that is using only bdb; tho you won't need that using neo4j) So, if this is the kind of index you wanted... (I am not aware of specific indexes with bdb, though that doesn't mean they don't exist) Insertions would require transaction protection so both A-B in dbForward and B-A in dbBackward are inserted atomically. Parsing A then X of B- in dbBackward for example can only be done with a cursor... Either way, I'm taking a look on that bdb-index thingy; will report back if I have any ideas heh John. On Thu, Jul 28, 2011 at 9:42 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance
[Neo4j] Brainstorming on my project: neo4john
Hey guys, I've been thinking that I would like to have a topic (like this current one) where I would be allowed to post anything related to brainstorming on my project which is currently a mix of neo4j and berkeleydb java edition. That is, I would like to start from scratch and explain and explore ideas, where anyone could step in and say what on their mind, especially with notes on how that would be better with neo4j rather than berkeleydb. But I'd like to know if this is a good idea to do here, and if any neo4j people would allow me to do this here. This would probably mean you'd receive lots of emails with this subject and Re: this subject, which you may not want to receive, in which case I would suggest a filter to ignore such emails (easily done within gmail for example) - but be sure not to ignore the sender which is always user@lists.neo4j.org for any topic/subject not just this one. So, anyone could potentially ignore my emails that I send here, should they be annoyed or they be too many too soon. Still, I would not do this unless most (if not all) of you (mainly neo4j devs I'd say) agree to allow me to post here. I would post replies only to this topic... well you get the idea :) Though you should reserve the right at any time to say stop if you don't want me to post anymore (due to ie. too frequent post, too dumb content, content seems like noise and doesn't help anyone) - that is, in the case you allow me to post :) - so if allowed, please reply and say so, otherwise if no replies with allowed or not, will default to `not allowed`, so I won't try to post anymore :) - be kind lol If I know Peter, and I don't lol, he'd be happy with some brainstorming I think, right? :) then, what about the others? Btw, if you feel like saying that I'm allowed would be too much of a responsibility or taking it from others, then maybe say that you wouldn't mind if I posted or not, or would make no difference to you. Though the neo4j guysgirls (ie. devs) would probably know if `me posting on this topic would be a good idea, for them and the users using this mailing list`. If you're wondering why would I do this, most importantly because it helps me by typing my thoughts rather than just thinking them in my head, if I don't type them I get easily distracted by other things and they end up being postponed/abandoned. Expressing my thoughts by typing them seems to be bridging both the physical and the mental in a way that they're both happy to do this heh. And also this might be helpful to others reading this, unless they get annoyed by my way of writing (which means both me and him are at fault, or rather the cause of his annoyance) or they get annoyed for other reasons but still triggered by them reading what I write. I am not good at writing or at programming for that matter, and I'm aware of this, but I believe that expressing myself in this written form might help (at least) me (and I hope not at the expense of others ie. like spam) and will likely,along the way, trigger some progress in me, which if you ask me, is in everyone's interest: the more people evolve the better is for everyone, no? yes,good :) No one is required (or expected of me) to read what I write, btw; but you should know that my subconscious, for some reason, likes knowing that someone did read and got beneficial results from it, ie. got something positive rather than negative (though any change is progress, except ie. if you make a system on top of that saying that ie. `counter` must increase for it to be considered progressing, so then while any change is progress at the lower level even if counter decreases, at this higher level, counter decreasing is not considered progress anymore; but then again at an even higher level, over time counter could be increasing by 10 then decreasing by 10 such that it would seem to be oscillating, and this would be considered no progress, rather it would be considered constant, unless the oscillation amplitude would change or increase ie. counter would increase by 15 and then decrease by 15 over time, this would be progress as considered at this level). So while I am sort of waiting for a good enough reason to hack my own subconscious and change it (assumed that it's possible, hey neuroplasticity would say so heh) such that i wouldn't require expressing my thoughts in writing or feeling empowered knowing that others are reading that, (while that) I am going along with what seems to be the next feel-empowered step...(kinda forgot what I wanted to say here xD) Also the subconscious(not just mine I'd say) likes to know that it did something, sort of like has a foundation for allowing itself to feel empowered, by having something done in the physical worlds that it is proud of, can be used by it as a permission slip to allow itself to feel happy about it or rather empowered; so in this respect this me writing my thoughts here stuff also helps with that. :) There's also some inherent desire to
Re: [Neo4j] Brainstorming on my project: neo4john
Hey Niels, thanks for the concise reply. On Sun, Jul 31, 2011 at 5:10 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi John, I think when approaching a project there are two distinct issues at play, one is the tooling level, another is the actual solution you are trying to create for an actual problem. I seem to want a generic solution for multiple problems. Something generic enough that it can be applied specifically. The tooling (if I understand this right) should be able to be used by the user to mould to his own needs, sort of like java and/or eclipse can be used to build whatever program the user wants to code. I want this to be a foundation, such that supposedly I could spend 99% of the time inside this doing my work, rather than using the OS and its applications... When looking at the tooling level it is great to have as much covered as possible. Neo4j offers a graph database and pretty good integration with Lucene. This overall is a good choice of tools, because there is hardly any overlapping functionality. Neo4j offers storage and navigation, while Lucene provides indexing. So the tools are pretty much orthogonal to each other. When adding BDB to the mix, things become a bit messier. BDB offers indexing and storage, so now you have to decide what to use BDB for. If you choose to only use it for indexing, like an alternative to Lucene, things remain pretty much orthogonal. as I understand it here that's exactly what bdb-index is supposed to do for you (replace or be similar in interface/usage as Lucene index) for your graph-collections When you decide to use BDB for storage, the question becomes: what to store in Neo4j and what to store in BDB. the way it is right now, I could implement what I got, in either bdb or in neo4j; i don't need both; btw, lucene index seems as fast as bdb possibly 1ms slower, from what I've tested (granted that it was superficially tested) When it comes to storing and retrieving properties to entities both seem to be pretty fast, and unless you have serious performance issues with the storage of properties, either Neo4j or BDB is suitable for the task. When it comes to storing relationships between entities, Neo4j is by far the better solution. Fetching a relationship is a really cheap action, since it only involves moving a file pointer to a certain position (id * record length) and read the record (ie. if that data is not available in the cache already). as I've seen it though, I need to use an index (ie. lucene) such that I could check with neo4j if A-B exists where A has 1 million outgoing relationships to 1 million different nodes of which B is one else, it's over 700ms to a few seconds by using findSinglePath (of course I might've missed something) however when using an index then ~1ms When having a relationships it is also cheap to fetch the associated nodes (again moving a file pointer to a position, or read it from the cache). And while we are at it, when having a node or a relationship, it is again cheap to fetch the properties associated to that node. The motto of Neo4j seems to be, keep it local stupid. This works great, unless things are not local and this is where indexing comes into play. Suppose we know a name or a certain value and want to know what nodes or relationships it is associated with, doing a local search becomes ineffective. We could iterated over all nodes (and or all relationships) and check for that particular value, but that doesn't scale beyond a couple of thousand nodes or relationships. that is one use case that I need, but this search is done by bdb ~0ms instead of me doing any iterations via java code though in my case value is either the string name of a Node or just another node id One option could be to do the indexing in the graph. We could create a node that can easily be addressed through the reference node, that functions as a tree root and traverse over he index to find a particular node or relationship. did you do this btw, with SortedTree? or similar, within graph-collections ? I admit I only superficially skimmed it at some point and notice some acquireLock() method that attracted my attention - unrelated It works, but is not as fast as dedicated indexing. A dedicated index will fetch index blocks in one read operation and manipulate those index blocks in memory, where an index build in Neo4j would model an index block as a set of nodes that need to be read one after another (and likely from very different places in the store). So a dedicated index is more local than Neo4j can be when manipulating the index trees. that lucene index ie. RelationshipIndex works rather well 0 to 1ms results with is, similar to bdb, so using it would be a must for me, assuming I have millions of relationships :) A dedicated index will win hands down from Neo4j when it comes to raw speed of an index lookup/manipulation and likely consume less memory doing so.
Re: [Neo4j] bdb-index
When running the mvn install, both tests are ran after another. Since I didn't use mvn (xD) I ran the tests manually one by one, but what you say makes sense, it's likely the tests fail when ran one after the other, I'll see what happens with an @Suite since there are only 2 junit tests, with @Suite they work Let's see if I could run mvn install (btw, avoided mvn so far because I cannot install the git plugin for some reason and that other error I get) Looks like I still need to find out how to fix this error: [ERROR] The project org.neo4j:neo4j-berkeleydb-je-index:0.1-SNAPSHOT (E:\wrkspc\bdb-index-fork\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:18 is not available in the local repository. and 'parent.relativePath' points at wrong local POM @ line 3, column 11 - [Help 2] before I could do anything with maven... I'll skip trying to make maven to work for me for now, don't feel like it :) *I'm not qualified to fix this with maven, sorry* John On Fri, Jul 29, 2011 at 5:16 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Hi John, Thanks for looking into this. I am still seeing the same error I had before. When running the mvn install, both tests are ran after another. For some reason the transaction log sees an unclean shutdown and tries to commit pending transactions. During that process the index names of the bdb indexes are being retrieved from binary storage. Here something goes wrong, because the index name returned is garbage, so the recovery process fails because it can't find the right index files. Niels Date: Fri, 29 Jul 2011 07:48:43 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I forked and fixed, the tests are all working now: https://github.com/13th-floor/bdb-index Let me know if you want me to do a pull request, ... sadly I applied formatting on RawBDBSpeed and the diff doesn't look pretty if you're trying to see what changed John. On Thu, Jul 28, 2011 at 7:36 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
those got concatenated for some reason, I'll repost them here so I can see them Relatationship: To associated Node: RelId - NodeId From associated Node: NodeId - RelId RelationshipType: To associated Node: RelationhipType.name - NodeId From associated Node: NodeId - RelationshipType.name; RelationshipRole: To associated Node: RelationhipRole.name - NodeId From associated Node: NodeId - RelationshipRole.name; PropertyType: To associated Node: PropertyType.name - NodeId From associated Node: NodeId - PropertyType.name; Property: To associated Node: Node, PropertyType.name - NodeId From associated Node: NodeId - Node, PropertyType.name On Fri, Jul 29, 2011 at 5:27 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: What I need to store in an index depends on the type of element that needs to be reified. Relatationship: To associated Node: RelId - NodeIdFrom associated Node: NodeId - RelId RelationshipType: To associated Node: RelationhipType.name - NodeIdFrom associated Node: NodeId - RelationshipType.name; RelationshipRole:To associated Node: RelationhipRole.name - NodeIdFrom associated Node: NodeId - RelationshipRole.name; PropertyType:To associated Node: PropertyType.name - NodeIdFrom associated Node: NodeId - PropertyType.name; Property:To associated Node: Node, PropertyType.name - NodeIdFrom associated Node: NodeId - Node, PropertyType.name Niels Date: Fri, 29 Jul 2011 06:49:31 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index Hi xD I'm not clear what you need to store here, if I understand correctly you could store in 2 primary bdb databases the nodeID (ie. long) of each node in a relationship ie. key-value dbForward: A-B A-C X-D X-B dbBackward: B-A B-X C-A D-X A,B,C,D,X are all nodeIDs ie. longs this way you could check if A-B exists, or all of A's endNodes , or what startNodes are pointing to the endNode B the storing of these would be sorted and in BTree, lookup would be fast, so you can consider ie. A as being a set of B and C, and X being a set of B and D, (that is you cannot set the order as in a list, they are sorted by bdb for fast retrievals). (But upon this, sets, can build lists np - that is using only bdb; tho you won't need that using neo4j) So, if this is the kind of index you wanted... (I am not aware of specific indexes with bdb, though that doesn't mean they don't exist) Insertions would require transaction protection so both A-B in dbForward and B-A in dbBackward are inserted atomically. Parsing A then X of B- in dbBackward for example can only be done with a cursor... Either way, I'm taking a look on that bdb-index thingy; will report back if I have any ideas heh John. On Thu, Jul 28, 2011 at 9:42 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance a user application would want to use those property names, but it is a restriction not found in the standard API, so ultimately something to consider.Niels From: peter.neuba...@neotechnology.com Date: Thu, 28 Jul 2011 10:39:47 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index niels, in this spike, I just concentrated on getting _something_ working in order to test insertion speed. This is not up to real indexing standards, so some love is needed here. I think Mattias is the best person to ask about pointers, let's wait until he is back next week if that is ok? Maybe some other (like the standard Lucene) index can suffice for the time being to test out things? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund -
Re: [Neo4j] bdb-index
(ignore these, skip to the bold part: ie. search BOLD) Thanks Niels, I just tried what you said, with maven 3.0.3 it seemed to do some downloading work for a while then eventually got this: [ERROR] Failed to execute goal on project neo4j-berkeleydb-je-index: Could not resolve dependencies for project org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT: Could not find artifact org.neo4j:neo4j-kernel:jar:1.3-SNAPSHOT in oracleReleases ( http://download.oracle.com/maven) - [Help 1] I'll try to view/edit that pom maybe I need 1.4 That seems to have worked, but now I get this: [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:10.949s [INFO] Finished at: Sat Jul 30 19:11:16 CEST 2011 [INFO] Final Memory: 10M/156M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plu gin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] How were you able to run mvn install ? did you have a different config ie. auto ignore licenses? what about the 1.3 to 1.4 transformation, did you have to manually do it? So far, using maven is more of a pain than using simply eclipse and adding dependencies manually heh Maybe I should try maven 2, let's see... mvn -version Apache Maven 2.2.1 (r801777; 2009-08-06 21:16:01+0200) Java version: 1.6.0_26 Java home: C:\Program Files\Java\jdk1.6.0_26\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows 7 version: 6.1 arch: amd64 Family: windows With original pom with 1.3 neo4j requirement I still got error, changed to 1.4 then works but I get the licenses issue again: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Some files do not have the expected license header [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 23 seconds [INFO] Finished at: Sat Jul 30 19:19:23 CEST 2011 [INFO] Final Memory: 34M/350M [INFO] ok then, trying to fix the license for that file... also I've seen that it now work from eclipse when on pom.xml Run As-Maven install [INFO] Checking licenses... [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\AllTests.java [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.578s [INFO] Finished at: Sat Jul 30 19:25:02 CEST 2011 [INFO] Final Memory: 16M/154M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plugin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] Ok after fixing the licenses for those 2 files, running from eclipse yields: [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building neo4j-berkeleydb-je-index 0.1-SNAPSHOT [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.0-beta-1:enforce (enforce-maven) @ neo4j-berkeleydb-je-index --- [INFO] [INFO] --- maven-license-plugin:1.9.0:check (check-licenses) @ neo4j-berkeleydb-je-index --- [INFO] Checking licenses... [INFO] [INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ neo4j-berkeleydb-je-index --- [WARNING] The POM for org.apache.maven:maven-plugin-api:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-project:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-core:jar:2.0.6 is missing, no dependency information available [WARNING] The POM for org.apache.maven:maven-artifact:jar:2.0.6 is missing, no dependency information
Re: [Neo4j] bdb-index
from my experience this kind of behaviour would happen mostly due to using some static fields which are expected to be in initialized state for each test, or test class I also needed to mention that I get this error: Jul 30, 2011 8:18:54 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\batch/logical.log.1]. Recovery started ... when running TestBerkeleyBatchInsert.java (all 3 tests in it) but not when running only the test in which it appears namely the method testFindCreatedIndex() some state is carried from the previous tests, even if this is just the database not being deleted I'll check some more, ofc On Sat, Jul 30, 2011 at 7:57 PM, John cyuczieekc cyuczie...@gmail.comwrote: (ignore these, skip to the bold part: ie. search BOLD) Thanks Niels, I just tried what you said, with maven 3.0.3 it seemed to do some downloading work for a while then eventually got this: [ERROR] Failed to execute goal on project neo4j-berkeleydb-je-index: Could not resolve dependencies for project org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT: Could not find artifact org.neo4j:neo4j-kernel:jar:1.3-SNAPSHOT in oracleReleases ( http://download.oracle.com/maven) - [Help 1] I'll try to view/edit that pom maybe I need 1.4 That seems to have worked, but now I get this: [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:10.949s [INFO] Finished at: Sat Jul 30 19:11:16 CEST 2011 [INFO] Final Memory: 10M/156M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plu gin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] How were you able to run mvn install ? did you have a different config ie. auto ignore licenses? what about the 1.3 to 1.4 transformation, did you have to manually do it? So far, using maven is more of a pain than using simply eclipse and adding dependencies manually heh Maybe I should try maven 2, let's see... mvn -version Apache Maven 2.2.1 (r801777; 2009-08-06 21:16:01+0200) Java version: 1.6.0_26 Java home: C:\Program Files\Java\jdk1.6.0_26\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows 7 version: 6.1 arch: amd64 Family: windows With original pom with 1.3 neo4j requirement I still got error, changed to 1.4 then works but I get the licenses issue again: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Some files do not have the expected license header [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 23 seconds [INFO] Finished at: Sat Jul 30 19:19:23 CEST 2011 [INFO] Final Memory: 34M/350M [INFO] ok then, trying to fix the license for that file... also I've seen that it now work from eclipse when on pom.xml Run As-Maven install [INFO] Checking licenses... [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\AllTests.java [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.578s [INFO] Finished at: Sat Jul 30 19:25:02 CEST 2011 [INFO] Final Memory: 16M/154M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plugin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] Ok after fixing the licenses for those 2 files, running from eclipse yields: [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building
Re: [Neo4j] bdb-index
Ok, up until now I've had almost no idea what is happening, what those tests are doing and stuff, so I was blindly trying to fix things, it sort of worked until now; looks like I have to begin to understand what is going on; so I will delve deeper into this and understand what is going on exactly and then see what isn't happening right; Btw, Niels, if you want me to do anything ie. delegate to me; I have all the time in the world ie. all I need is sleep+food, rest of the time I could be coding; no payments or anything needed, doing it for free in the hope this would advance me and the code im part of; for now I will try to fix this by first understanding it... let me know if anything else Peace off :) John On Sat, Jul 30, 2011 at 8:22 PM, John cyuczieekc cyuczie...@gmail.comwrote: from my experience this kind of behaviour would happen mostly due to using some static fields which are expected to be in initialized state for each test, or test class I also needed to mention that I get this error: Jul 30, 2011 8:18:54 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\batch/logical.log.1]. Recovery started ... when running TestBerkeleyBatchInsert.java (all 3 tests in it) but not when running only the test in which it appears namely the method testFindCreatedIndex() some state is carried from the previous tests, even if this is just the database not being deleted I'll check some more, ofc On Sat, Jul 30, 2011 at 7:57 PM, John cyuczieekc cyuczie...@gmail.comwrote: (ignore these, skip to the bold part: ie. search BOLD) Thanks Niels, I just tried what you said, with maven 3.0.3 it seemed to do some downloading work for a while then eventually got this: [ERROR] Failed to execute goal on project neo4j-berkeleydb-je-index: Could not resolve dependencies for project org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT: Could not find artifact org.neo4j:neo4j-kernel:jar:1.3-SNAPSHOT in oracleReleases ( http://download.oracle.com/maven) - [Help 1] I'll try to view/edit that pom maybe I need 1.4 That seems to have worked, but now I get this: [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:10.949s [INFO] Finished at: Sat Jul 30 19:11:16 CEST 2011 [INFO] Final Memory: 10M/156M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plu gin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] How were you able to run mvn install ? did you have a different config ie. auto ignore licenses? what about the 1.3 to 1.4 transformation, did you have to manually do it? So far, using maven is more of a pain than using simply eclipse and adding dependencies manually heh Maybe I should try maven 2, let's see... mvn -version Apache Maven 2.2.1 (r801777; 2009-08-06 21:16:01+0200) Java version: 1.6.0_26 Java home: C:\Program Files\Java\jdk1.6.0_26\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows 7 version: 6.1 arch: amd64 Family: windows With original pom with 1.3 neo4j requirement I still got error, changed to 1.4 then works but I get the licenses issue again: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Some files do not have the expected license header [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 23 seconds [INFO] Finished at: Sat Jul 30 19:19:23 CEST 2011 [INFO] Final Memory: 34M/350M [INFO] ok then, trying to fix the license for that file... also I've seen that it now work from eclipse when on pom.xml Run As-Maven install [INFO] Checking licenses... [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\AllTests.java [INFO] Missing header in: E:\wrkspc\bdb-index-fork\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO
Re: [Neo4j] bdb-index
) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 4.1.10) Problem creating output files in: E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\ On Sat, Jul 30, 2011 at 8:22 PM, John cyuczieekc cyuczie...@gmail.comwrote: from my experience this kind of behaviour would happen mostly due to using some static fields which are expected to be in initialized state for each test, or test class I also needed to mention that I get this error: Jul 30, 2011 8:18:54 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\batch/logical.log.1]. Recovery started ... when running TestBerkeleyBatchInsert.java (all 3 tests in it) but not when running only the test in which it appears namely the method testFindCreatedIndex() some state is carried from the previous tests, even if this is just the database not being deleted I'll check some more, ofc On Sat, Jul 30, 2011 at 7:57 PM, John cyuczieekc cyuczie...@gmail.comwrote: (ignore these, skip to the bold part: ie. search BOLD) Thanks Niels, I just tried what you said, with maven 3.0.3 it seemed to do some downloading work for a while then eventually got this: [ERROR] Failed to execute goal on project neo4j-berkeleydb-je-index: Could not resolve dependencies for project org.neo4j:neo4j-berkeleydb-je-index:jar:0.1-SNAPSHOT: Could not find artifact org.neo4j:neo4j-kernel:jar:1.3-SNAPSHOT in oracleReleases ( http://download.oracle.com/maven) - [Help 1] I'll try to view/edit that pom maybe I need 1.4 That seems to have worked, but now I get this: [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:10.949s [INFO] Finished at: Sat Jul 30 19:11:16 CEST 2011 [INFO] Final Memory: 10M/156M [INFO] [ERROR] Failed to execute goal com.mycila.maven-license-plugin:maven-license-plu gin:1.9.0:check (check-licenses) on project neo4j-berkeleydb-je-index: Some file s do not have the expected license header - [Help 1] How were you able to run mvn install ? did you have a different config ie. auto ignore licenses? what about the 1.3 to 1.4 transformation, did you have to manually do it? So far, using maven is more of a pain than using simply eclipse and adding dependencies manually heh Maybe I should try maven 2, let's see... mvn -version Apache Maven 2.2.1 (r801777; 2009-08-06 21:16:01+0200) Java version: 1.6.0_26 Java home: C:\Program Files\Java\jdk1.6.0_26\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows 7 version: 6.1 arch: amd64 Family: windows With original pom with 1.3 neo4j requirement I still got error, changed to 1.4 then works but I get the licenses issue again: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-fb11e38\13th-floor-bdb-in dex-fb11e38\src\test\java\org\neo4j\index\bdbje\RawBDBSpeed.java [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Some files do not have the expected license header [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 23 seconds [INFO] Finished at: Sat Jul 30 19:19:23 CEST 2011 [INFO] Final Memory: 34M/350M [INFO] ok then, trying to fix the license for that file... also I've seen that it now work from eclipse when on pom.xml Run As-Maven install [INFO] Checking licenses... [INFO] Missing header in: E:\wrkspc\bdb-index
Re: [Neo4j] bdb-index
in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc cyuczie...@gmail.comwrote: looks like before delving too deep, I found that attempting to deleting the dbPath ie. deleteFileOrDirectory( dbPath ); fails, right after graphDB.shutdown(); - I'm excluding the possibility that that method is deferring the shutdown to another thread and thus is non-blocking (due to my timing of it from previous tests looks like it takes at most 3 sec) ie. this file cannot be deleted (likely's already in use): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\fast\name\je.info.0 I need to check if and how bdb gets shutdown also, it kind of looks like it doesn't So since that file doesn't get deleted, but probably others do, maybe that is why we get those weird errors: Jul 30, 2011 9:04:11 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\neo4j-db/logical.log.1]. Recovery started ... java.lang.RuntimeException: com.sleepycat.je.EnvironmentFailureException: (JE 4.1.10) Problem creating output files in: E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\ \name/ je.info UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects. at org.neo4j.index.bdbje.BerkeleyDbDataSource.createDB(BerkeleyDbDataSource.java:377) at org.neo4j.index.bdbje.BerkeleyDbDataSource.getDatabase(BerkeleyDbDataSource.java:278) at org.neo4j.index.bdbje.BerkeleydbTransaction.doCommit(BerkeleyDbTransaction.java:191) at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:319) at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.injectOnePhaseCommit(XaResourceManager.java:366) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyOnePhaseCommitEntry(XaLogicalLog.java:514) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyEntry(XaLogicalLog.java:445) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.doInternalRecovery(XaLogicalLog.java:768) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:253) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:134) at org.neo4j.kernel.impl.transaction.xaframework.XaContainer.openLogicalLog(XaContainer.java:97) at org.neo4j.index.bdbje.BerkeleyDbDataSource.init(BerkeleyDbDataSource.java:96) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.neo4j.kernel.impl.transaction.XaDataSourceManager.create(XaDataSourceManager.java:76) at org.neo4j.kernel.impl.transaction.TxModule.registerDataSource(TxModule.java:175) at org.neo4j.index.bdbje.BerkeleyDbIndexImplementation.init(BerkeleyDbIndexImplementation.java:67) at org.neo4j.index.bdbje.BerkeleyDbIndexImplementation.init(BerkeleyDbIndexImplementation.java:58
Re: [Neo4j] bdb-index
found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.comwrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc cyuczie...@gmail.comwrote: looks like before delving too deep, I found that attempting to deleting the dbPath ie. deleteFileOrDirectory( dbPath ); fails, right after graphDB.shutdown(); - I'm excluding the possibility that that method is deferring the shutdown to another thread and thus is non-blocking (due to my timing of it from previous tests looks like it takes at most 3 sec) ie. this file cannot be deleted (likely's already in use): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\fast\name\je.info.0 I need to check if and how bdb gets shutdown also, it kind of looks like it doesn't So since that file doesn't get deleted, but probably others do, maybe that is why we get those weird errors: Jul 30, 2011 9:04:11 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\neo4j-db/logical.log.1]. Recovery started ... java.lang.RuntimeException: com.sleepycat.je.EnvironmentFailureException: (JE 4.1.10) Problem creating output files in: E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\ \name/ je.info UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects. at org.neo4j.index.bdbje.BerkeleyDbDataSource.createDB(BerkeleyDbDataSource.java:377) at org.neo4j.index.bdbje.BerkeleyDbDataSource.getDatabase(BerkeleyDbDataSource.java:278) at org.neo4j.index.bdbje.BerkeleydbTransaction.doCommit(BerkeleyDbTransaction.java:191) at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:319) at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.injectOnePhaseCommit(XaResourceManager.java:366) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyOnePhaseCommitEntry(XaLogicalLog.java:514) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyEntry(XaLogicalLog.java:445) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.doInternalRecovery(XaLogicalLog.java:768) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:253) at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:134) at org.neo4j.kernel.impl.transaction.xaframework.XaContainer.openLogicalLog(XaContainer.java:97) at org.neo4j.index.bdbje.BerkeleyDbDataSource.init(BerkeleyDbDataSource.java:96) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57
Re: [Neo4j] bdb-index
I didn't reach that part, I'm sure you're right though, meanwhile there was a need to add xaContainer.close(); to org.neo4j.index.bdbje.BerkeleyDbDataSource.close() such that that logical.log.1 file isn't kept open anymore Now there's a messages.log still open, working on that xD On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.com wrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc cyuczie...@gmail.com wrote: looks like before delving too deep, I found that attempting to deleting the dbPath ie. deleteFileOrDirectory( dbPath ); fails, right after graphDB.shutdown(); - I'm excluding the possibility that that method is deferring the shutdown to another thread and thus is non-blocking (due to my timing of it from previous tests looks like it takes at most 3 sec) ie. this file cannot be deleted (likely's already in use): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\fast\name\je.info.0 I need to check if and how bdb gets shutdown also, it kind of looks like it doesn't So since that file doesn't get deleted, but probably others do, maybe that is why we get those weird errors: Jul 30, 2011 9:04:11 PM org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog doInternalRecovery INFO: Non clean shutdown detected on log [E:\wrkspc\bdb-index-fork\target\var\neo4j-db/logical.log.1]. Recovery started ... java.lang.RuntimeException: com.sleepycat.je.EnvironmentFailureException: (JE 4.1.10) Problem creating output files in: E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\ \name/ je.info UNEXPECTED_EXCEPTION: Unexpected internal Exception, may
Re: [Neo4j] bdb-index
I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.com wrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc cyuczie...@gmail.com wrote: looks like before delving too deep, I found that attempting to deleting the dbPath ie. deleteFileOrDirectory( dbPath ); fails, right after graphDB.shutdown(); - I'm excluding the possibility that that method is deferring the shutdown to another thread and thus is non-blocking (due to my timing of it from previous tests looks like it takes at most 3 sec) ie. this file cannot be deleted (likely's already in use): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\index\bdb\Node\fast\name\je.info.0 I need
Re: [Neo4j] bdb-index
On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.com wrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long as I know disorganized John :) On Sat, Jul 30, 2011 at 9:06 PM, John cyuczieekc cyuczie...@gmail.comwrote: looks like before delving too deep, I found that attempting to deleting the dbPath ie. deleteFileOrDirectory( dbPath ); fails, right after graphDB.shutdown(); - I'm excluding the possibility that that method is deferring the shutdown to another thread and thus is non-blocking (due to my timing of it from previous tests looks like it takes at most 3 sec) ie. this file cannot
Re: [Neo4j] bdb-index
testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.comwrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn commit to fail, then this needs to be fixed in bdb-index Also,* is neo4j closing the indexes* somehow when graphDb.shutdown() ? it seems to me the only close would be index.delete() and neo4j isn't closing them, thus leaving the bdb Environment still open, thus tests that require shutdown and reopen of graphdb will fail since bdb wasn't itself shutdown and reopened but was left still open. Maybe closing the indexes is left to the user then? it's fine with me, just so long
Re: [Neo4j] bdb-index
org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.comwrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true
Re: [Neo4j] bdb-index
related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\Neo4jTestCase.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeley.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbDataSource.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeleyBatchInsert.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\All Tests.java [ERROR] BUILD ERROR [INFO] - [INFO] Some files do not have the expected license header [INFO] - But it should work, I say; maybe let me know if it doesn't On Sat, Jul 30, 2011 at 11:41 PM, John cyuczieekc cyuczie...@gmail.comwrote: org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time
Re: [Neo4j] bdb-index
hey np, it's all about depth - I should know, I'm always at the superficial level I'll take a closer look at that indexName, did you yet check to see if the tests work, they should work now (except this part that you say it's still broken with the recovery) John reading your newest msg as I type this, eclipse is autoformatting all lines and I have it do this onSave Thank you for pointing that out, I shall see about it On Sun, Jul 31, 2011 at 12:13 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Yes, you are right. I had looked at the code too superficially. Still, something goes wrong reading the indexName, when I print that name it looks like garbage (upon recovery), while it should produce a readable index name. I didn't check if the value written to the record is actually a readable String. Niels Date: Sat, 30 Jul 2011 23:23:49 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being retrieved from the transaction log look like garbage. I think I have located the problem. In the method BerkeleyDbCommand#writeToFile the sequence of elements written to the buffer is different from the order in which the method BerkeleyDbCommand#readCommand reads those elements. The BerkeleyDbCommand#writeToFile method cannot be correct, because it first writes the indexName and then its length. It should of course first write the length and then the indexName. Niels Date: Sat, 30 Jul 2011 22:51:40 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index found out that I don't need to call index.delete() all the time, instead BerkeleyDbDataSource.close() aka XaDataSource.close() should do what index.delete() does, namely closing all databases (related to this datasource) and their bdb environment; so I do just that. Therefore I answer some parts I asked before. And that logical.log.1 seems to be a part of XA Transactions and I must find a way to see that it's closed or something On Sat, Jul 30, 2011 at 10:15 PM, John cyuczieekc cyuczie...@gmail.com wrote: in TestBerkeley.java So far I've found that, bdb environment(and relevant databases) is(are) only closed when index.delete() is called and that can only be called when the current transaction is finished (else it will complain that some bdb databases are not opened on txn commit) Applying all those changes, the following file is still in use (due to cannot be deleted): E:\wrkspc\bdb-index-fork\target\var\neo4j-db\logical.log.1 This seems to be part of neo4j, though I am not sure why would it still be in use even after graphDb.shutdown() Any ideas why that would be still in use? Is graphDb.shutdown() blocking until everything is closed? or are there still threads left keeping files locked? or shutdown is delegated to other threads which may still be doing their work when .shutdown() returns ? By looking at some testcases in neo4j, I see that *index.delete() can be called before transaction finished, is this correct* ? anyone? ie. beginTx(); index = graphDb.index().forNodes( INDEX_NAME ); index.delete(); restartTx(); where void restartTx() { finishTx( true ); beginTx(); } in this case, if that's true that index.delete() should not cause the txn
Re: [Neo4j] bdb-index
that maven does not install. Niels Date: Sun, 31 Jul 2011 00:00:42 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\Neo4jTestCase.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeley.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbDataSource.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeleyBatchInsert.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\All Tests.java [ERROR] BUILD ERROR [INFO] - [INFO] Some files do not have the expected license header [INFO] - But it should work, I say; maybe let me know if it doesn't On Sat, Jul 30, 2011 at 11:41 PM, John cyuczieekc cyuczie...@gmail.com wrote: org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon which to apply the recovered transactions I only get that messages.log being unable to delete when I allow the test testFindCreatedIndex() to run, I cannot yet figure out who creates that file and to make sure it's being closed correction testInsertionSpeed() John. On Sat, Jul 30, 2011 at 11:09 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: The problem is indeed related to not properly closing the bdb database, and that is triggers another problem. In BerkeleyDbCommand data is being stored into the transaction log and been read from the transaction log later on. Something goes wrong making the indexName being
Re: [Neo4j] bdb-index
btw, those diffs look ugly, I wanted to mention that in eclipse in team-History you can ignore whitespace and see the differences in a better way, rather than one big red chunk of removed data then one big green chunk of added data, just because the indentation was also changed I did disable auto formatting on save now, future edits should be less painful to diff On Sun, Jul 31, 2011 at 12:26 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: It looks as if you have modified the file header of the source files. Maven checks the license (the file header) and returns an error message when the license required is different from the license provided. When looking at the diff of one of your edits I noticed there are extra spaces in the license. See: https://github.com/13th-floor/bdb-index/commit/7c6b59fbdc445a122aa247b391c15a23dd64cac9#src/main/java/org/neo4j/index/bdbje/BerkeleyDbBatchInserterIndexProvider.java These extra spaces make that maven does not install. Niels Date: Sun, 31 Jul 2011 00:00:42 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\Neo4jTestCase.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeley.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbDataSource.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeleyBatchInsert.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\All Tests.java [ERROR] BUILD ERROR [INFO] - [INFO] Some files do not have the expected license header [INFO] - But it should work, I say; maybe let me know if it doesn't On Sat, Jul 30, 2011 at 11:41 PM, John cyuczieekc cyuczie...@gmail.com wrote: org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown on BatchInserter(EmbeddedBatchInserter[target/var/batch]) On Sat, Jul 30, 2011 at 11:26 PM, John cyuczieekc cyuczie...@gmail.comwrote: On Sat, Jul 30, 2011 at 11:23 PM, John cyuczieekc cyuczie...@gmail.comwrote: I did a quick check of what you said org.neo4j.index.bdbje.BerkeleyDbCommand.writeToFile(LogBuffer) char[] indexName = indexId.indexName.toCharArray(); buffer.putInt( indexName.length ); buffer.put( indexName ); I'm probably missing something but on my side it looks like it writes length then indexName (and I didn't update from github, just in case you've already fixed this) Either way, my impression of what was happening is that some files got deleted, except some ie. the log, which were still open/in use, and maybe when recovery was tried, either it couldn't be opened, or due to being opened contained impartial data, or all was well but recovery couldn't happen because the log needed some other files or a previous database snapshot upon
Re: [Neo4j] bdb-index
-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bDataSource.java:[31,29] package org.neo4j.index.lucene does not exist \down\13th-floor-bdb-index-f9a3155\src\main\java\org\neo4j\index\bdbje\BerkeleyD bBatchInserterIndexProvider.java:[32,29] package org.neo4j.index.lucene does not exist [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 2 seconds [INFO] Finished at: Sun Jul 31 00:37:19 CEST 2011 [INFO] Final Memory: 38M/359M [INFO] e:\down\13th-floor-bdb-index-f9a3155 On Sun, Jul 31, 2011 at 12:26 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: It looks as if you have modified the file header of the source files. Maven checks the license (the file header) and returns an error message when the license required is different from the license provided. When looking at the diff of one of your edits I noticed there are extra spaces in the license. See: https://github.com/13th-floor/bdb-index/commit/7c6b59fbdc445a122aa247b391c15a23dd64cac9#src/main/java/org/neo4j/index/bdbje/BerkeleyDbBatchInserterIndexProvider.java These extra spaces make that maven does not install. Niels Date: Sun, 31 Jul 2011 00:00:42 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index related to this I've created: https://trac.neo4j.org/ticket/358 also committed on my fork, now AllTests.java works https://github.com/13th-floor/bdb-index for some reason I cannot mvn install: [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndexProvider.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\Neo4jTestCase.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeley.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbBatchInserterIndex.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\main\java\org \neo4j\index\bdbje\BerkeleyDbDataSource.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\org \neo4j\index\bdbje\TestBerkeleyBatchInsert.java [INFO] Missing header in: e:\down\13th-floor-bdb-index-f9a3155\src\test\java\All Tests.java [ERROR] BUILD ERROR [INFO] - [INFO] Some files do not have the expected license header [INFO] - But it should work, I say; maybe let me know if it doesn't On Sat, Jul 30, 2011 at 11:41 PM, John cyuczieekc cyuczie...@gmail.com wrote: org.neo4j.kernel.impl.batchinsert.BatchInserterImpl keeps StringLogger msgLog still open even after shutdown() public void shutdown() { graphDbService.clearCaches(); neoStore.close(); msgLog.logMessage( Thread.currentThread() + Clean shutdown on BatchInserter( + this + ), true ); } we'd need a msgLog.close(storeDir) and storeDir is the same param given to the constructor of BatchInserterImpl maybe someone from neo4j could do that? meanwhile I will ignore the failure to delete that file On Sat, Jul 30, 2011 at 11:34 PM, John cyuczieekc cyuczie...@gmail.comwrote: testFindCreatedIndex() is the method that fails (due to unable to delete the file, else it works fine) but it only fails when testInsertionSpeed() is allowed to execute (ie. not @Ignore) messages.log contents: Sat Jul 30 23:31:23 CEST 2011: Thread[main,5,main] Starting BatchInserter(EmbeddedBatchInserter[target/var/batch]) Sat Jul 30 23:31:42 CEST 2011: Thread[main,5,main] Clean shutdown
Re: [Neo4j] bdb-index
err, lol, actually I failed here, it was actually two primary databases, I only used primary+secondary databases when storing String-longID 1-to1 mapping that is, when emulating a HashMap by using berkeleydb where each side(key or value) can be connected to only one other thing (value or key) but when allowing 1-to-many, must use two primary databases; ok, I stand corrected ;) sorry John. On Fri, Jul 29, 2011 at 6:53 AM, John cyuczieekc cyuczie...@gmail.comwrote: small obvious correction btw, it's not 2 primary databases, it's 1 primary and 1 secondary ;) my bad On Fri, Jul 29, 2011 at 6:49 AM, John cyuczieekc cyuczie...@gmail.comwrote: Hi xD I'm not clear what you need to store here, if I understand correctly you could store in 2 primary bdb databases the nodeID (ie. long) of each node in a relationship ie. key-value dbForward: A-B A-C X-D X-B dbBackward: B-A B-X C-A D-X A,B,C,D,X are all nodeIDs ie. longs this way you could check if A-B exists, or all of A's endNodes , or what startNodes are pointing to the endNode B the storing of these would be sorted and in BTree, lookup would be fast, so you can consider ie. A as being a set of B and C, and X being a set of B and D, (that is you cannot set the order as in a list, they are sorted by bdb for fast retrievals). (But upon this, sets, can build lists np - that is using only bdb; tho you won't need that using neo4j) So, if this is the kind of index you wanted... (I am not aware of specific indexes with bdb, though that doesn't mean they don't exist) Insertions would require transaction protection so both A-B in dbForward and B-A in dbBackward are inserted atomically. Parsing A then X of B- in dbBackward for example can only be done with a cursor... Either way, I'm taking a look on that bdb-index thingy; will report back if I have any ideas heh John. On Thu, Jul 28, 2011 at 9:42 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance a user application would want to use those property names, but it is a restriction not found in the standard API, so ultimately something to consider.Niels From: peter.neuba...@neotechnology.com Date: Thu, 28 Jul 2011 10:39:47 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index niels, in this spike, I just concentrated on getting _something_ working in order to test insertion speed. This is not up to real indexing standards, so some love is needed here. I think Mattias is the best person to ask about pointers, let's wait until he is back next week if that is ok? Maybe some other (like the standard Lucene) index can suffice for the time being to test out things? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 28, 2011 at 10:36 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User
Re: [Neo4j] HyperRelationship example
I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki/Berkeley_DB#Licensing I would totally use neo4j, if it would be as fast at searches :/ ie. BTree storage of nodes/rels? (guessing) But having 10mil rels, and seeing BDB checking if A--B in 0ms, and neo4j in like 0 to 66 to 310 seconds (depending on its position) is a show stopper for me, especially because I want to base everything on just nodes (without properties) and their relationships. ie. make a set or list of things, without having A ---[ENTRY]-- e ---[NEXT] --- e2 but instead A-b-e-c-e2 where b and c are just nodes, and also AllEntries-b and AllNexts-c (silly example with such less info tho) Point is, I would do lots of searches a lot (imagine a real time program running on top of nodes/rels, that is it's defined in and can access only nodes), this would likely cause those ms to add up to seconds... I installed maven (m2e) again, I guess I could use it, but it seems it creates .jar , not sure if that's useful to me while I am coding... seems better to use project/sources no? and maven only when ready to publish/get the jar ; anyway I need to learn how to use it otherwise I'm getting errors like this , when trying to build: [ERROR] The project org.neo4j:neo4j-graph-collections:1.5-SNAPSHOT (E:\wrkspc\graph-collections\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:21 is not available in the local repositor y. and 'parent.relativePath' points at wrong local POM @ line 4, column 11 - [Help 2] Anyway, with normal eclipse, I'm still showing 2 different errors: 1) in org.neo4j.collections.graphdb.ComparablePropertyTypeT line 29: super(name, graphDb); The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 2) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships() The return type is incompatible with RelationshipContainer.getRelationships() 3) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships(RelationshipType...) The return type is incompatible with RelationshipContainer.getRelationships(RelationshipType[]) John. On Thu, Jul 28, 2011 at 12:52 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Hi John, Thanks for showing an interest. The compile error you got was due to the fact that a removed class was still hanging around in the Git repo. I renamed BinaryRelationshipRoles into BinaryRelationshipRole, but the original file was still active in the Git repo. I fixed that. I have been thinking about BDB too for this situation, because the graph database now stores some information about the associated nodes and their reverse lookup. This of course polutes the name/node space. It would be neat to offload this book keeping information to some persistent hashmap, so the implementation is completely transparent to the user. I don't know how nicely BDB plays with Neo4J transactions. Does anyone have experience with this? Another aspect is licencing. I am no legal buff, so maybe someone else can jump
Re: [Neo4j] HyperRelationship example
Hey Nawroth, I attempted to try that at one time and for some reason I cannot remember I concluded that it doesn't work for what I wanted, I will see what I can do again, thanks! findSinglePath is what I was using before. On Thu, Jul 28, 2011 at 2:05 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! I think the hard part about transactions is recovering after crashes and such. Regarding finding A--B, have you tried using a relationship index? See: http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/ReadableRelationshipIndex.html /anders On 07/28/2011 01:35 PM, John cyuczieekc wrote: I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki/Berkeley_DB#Licensing I would totally use neo4j, if it would be as fast at searches :/ ie. BTree storage of nodes/rels? (guessing) But having 10mil rels, and seeing BDB checking if A--B in 0ms, and neo4j in like 0 to 66 to 310 seconds (depending on its position) is a show stopper for me, especially because I want to base everything on just nodes (without properties) and their relationships. ie. make a set or list of things, without having A ---[ENTRY]-- e ---[NEXT] --- e2 but instead A-b-e-c-e2 where b and c are just nodes, and also AllEntries-b and AllNexts-c (silly example with such less info tho) Point is, I would do lots of searches a lot (imagine a real time program running on top of nodes/rels, that is it's defined in and can access only nodes), this would likely cause those ms to add up to seconds... I installed maven (m2e) again, I guess I could use it, but it seems it creates .jar , not sure if that's useful to me while I am coding... seems better to use project/sources no? and maven only when ready to publish/get the jar ; anyway I need to learn how to use it otherwise I'm getting errors like this , when trying to build: [ERROR] The project org.neo4j:neo4j-graph-collections:1.5-SNAPSHOT (E:\wrkspc\graph-collections\pom.xml) has 1 error [ERROR] Non-resolvable parent POM: The repository system is offline but the artifact org.neo4j:parent-central:pom:21 is not available in the local repositor y. and 'parent.relativePath' points at wrong local POM @ line 4, column 11 - [Help 2] Anyway, with normal eclipse, I'm still showing 2 different errors: 1) in org.neo4j.collections.graphdb.ComparablePropertyTypeT line 29: super(name, graphDb); The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 2) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships() The return type is incompatible with RelationshipContainer.getRelationships() 3) org.neo4j.collections.graphdb.impl.NodeLikeImpl.getRelationships(RelationshipType...) The return type is incompatible with RelationshipContainer.getRelationships(RelationshipType[]) John. On Thu, Jul 28, 2011 at 12:52 PM, Niels Hoogeveen pd_aficion...@hotmail.com
Re: [Neo4j] HyperRelationship example
with relationshipindex seems to be working as fast, though I am not sure if I am using it right ie. doing this first time: RelationshipIndex ri = graphDB.index().forRelationships( relsIndex ); and on each relationship created between sNode--eNode where eNode is any random node, and sNode is the same on each call (in my case) Relationship rel = sNode.createRelationshipTo( eNode, linkedRelType ); ri.add( rel, key, value ); and when checking if sNode--eNode exists: final Relationship rel = ri.query( key, value, sNode, eNode ).getSingle(); if ( null == rel ) { return false; } else { return true; } seems to me that using those `key` and `value` are useless, unless I'm missing something; I'm probably using them wrongly but in my case I only have one type of relationship. In either case, the timings as good ~1ms, and no memory increase, so this would seem like a good workaround; with findSinglePath the memory would increase by 1 gig (for my test) Thanks for suggesting to revisit RelationshipIndex, last time I dropped it I think because I didn't know what to put on key/value. Also, I get what Niels meant now by that play nice with transactions, that if both neo4j and bdb recover the same things after crash/recovery or not... On Thu, Jul 28, 2011 at 2:05 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! I think the hard part about transactions is recovering after crashes and such. Regarding finding A--B, have you tried using a relationship index? See: http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/ReadableRelationshipIndex.html /anders On 07/28/2011 01:35 PM, John cyuczieekc wrote: I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki/Berkeley_DB#Licensing I would totally use neo4j, if it would be as fast at searches :/ ie. BTree storage of nodes/rels? (guessing) But having 10mil rels, and seeing BDB checking if A--B in 0ms, and neo4j in like 0 to 66 to 310 seconds (depending on its position) is a show stopper for me, especially because I want to base everything on just nodes (without properties) and their relationships. ie. make a set or list of things, without having A ---[ENTRY]-- e ---[NEXT] --- e2 but instead A-b-e-c-e2 where b and c are just nodes, and also AllEntries-b and AllNexts-c (silly example with such less info tho) Point is, I would do lots of searches a lot (imagine a real time program running on top of nodes/rels, that is it's defined in and can access only nodes), this would likely cause those ms to add up to seconds... I installed maven (m2e) again, I guess I could use it, but it seems it creates .jar , not sure if that's useful to me while I am coding... seems better to use project/sources no? and maven only when ready to publish/get the jar ; anyway I need to learn how to use it otherwise I'm getting errors like this , when trying to build: [ERROR
Re: [Neo4j] HyperRelationship example
nice, no errors now, thanks! I've been postponing checking stuff like SortedTree or anything until the errors were gone... I guess I could try SortedTree, but it's based on Nodes, and that would add an extra unnecessary layer maybe? still good to know I have this option and the RelationshipIndex option thanks to Anders too. I'll have to see what to do with both... Thanks so far, Good luck! John On Thu, Jul 28, 2011 at 3:04 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I checked the Git repo, and left two more files in that I had removed in my local project. (Note to self: When deleting or renaming files always update repo). The repo is now up-to-date, and the most recent download installed perfectly using Maven. What I meant by playing nice with Neo4j transactions is for the scenario where BDB would act as an IndexProvider for Neo4j. In such a scenario I would want to be able to commit, rollback in both Neo4j and BDB and make sure they both close properly and recover properly from error situations. I found an implementation of a BDB IndexProvider in Peter Neubauer's Git repo, but I get an error when trying to build that project, so I would have to look into that. With respect to you 10mil relations needing to check if A -- B, have you tried SortedTree in the collections component. This class has a containsNode method allowing you to check if A -- B. SortedTree is a Btree so the lookup should be much faster. Niels Date: Thu, 28 Jul 2011 13:35:43 +0200 From: cyuczie...@gmail.com To: user@lists.neo4j.org Subject: Re: [Neo4j] HyperRelationship example I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki/Berkeley_DB#Licensing I would totally use neo4j, if it would be as fast at searches :/ ie. BTree storage of nodes/rels? (guessing) But having 10mil rels, and seeing BDB checking if A--B in 0ms, and neo4j in like 0 to 66 to 310 seconds (depending on its position) is a show stopper for me, especially because I want to base everything on just nodes (without properties) and their relationships. ie. make a set or list of things, without having A ---[ENTRY]-- e ---[NEXT] --- e2 but instead A-b-e-c-e2 where b and c are just nodes, and also AllEntries-b and AllNexts-c (silly example with such less info tho) Point is, I would do lots of searches a lot (imagine a real time program running on top of nodes/rels, that is it's defined in and can access only nodes), this would likely cause those ms to add up to seconds... I installed maven (m2e) again, I guess I could use it, but it seems it creates .jar , not sure if that's useful to me while I am coding... seems better to use project/sources no? and maven only when ready to publish/get the jar ; anyway I need to learn how to use it otherwise I'm getting errors like this , when trying to build: [ERROR] The project org.neo4j:neo4j-graph-collections:1.5-SNAPSHOT
Re: [Neo4j] HyperRelationship example
Roger that. don't read the following it's irrelevant(don't even know why I sent it): Btw, seems to me that (since the underlaying index storage is BTree - just guessing from the speed) I could store the ID of the nodes in two indexes and use only those as a base for creating node to node relationships :) that is, kind of emulating a key-value database, tho it seems kind of useless at this time :) indexForward: A--B indexBackward: B--A this way, I could check if A--B exists, but could also check all modes (ie. A) that point to B, by using Backward index; else just for checking if A--B exists, only one index would be needed; though the add() does take an entity, would not need that; http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/Index.html On Thu, Jul 28, 2011 at 3:32 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! Seems right to me. And yes, key/value would typically be storing the relationship type. We should bring this up again next week, when Mattias who wrote the indexing stuff is back from vacation! /anders On 07/28/2011 03:05 PM, John cyuczieekc wrote: with relationshipindex seems to be working as fast, though I am not sure if I am using it right ie. doing this first time: RelationshipIndex ri = graphDB.index().forRelationships( relsIndex ); and on each relationship created between sNode--eNode where eNode is any random node, and sNode is the same on each call (in my case) Relationship rel = sNode.createRelationshipTo( eNode, linkedRelType ); ri.add( rel, key, value ); and when checking if sNode--eNode exists: final Relationship rel = ri.query( key, value, sNode, eNode ).getSingle(); if ( null == rel ) { return false; } else { return true; } seems to me that using those `key` and `value` are useless, unless I'm missing something; I'm probably using them wrongly but in my case I only have one type of relationship. In either case, the timings as good ~1ms, and no memory increase, so this would seem like a good workaround; with findSinglePath the memory would increase by 1 gig (for my test) Thanks for suggesting to revisit RelationshipIndex, last time I dropped it I think because I didn't know what to put on key/value. Also, I get what Niels meant now by that play nice with transactions, that if both neo4j and bdb recover the same things after crash/recovery or not... On Thu, Jul 28, 2011 at 2:05 PM, Anders Nawrothand...@neotechnology.com wrote: Hi! I think the hard part about transactions is recovering after crashes and such. Regarding finding A--B, have you tried using a relationship index? See: http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/ReadableRelationshipIndex.html /anders On 07/28/2011 01:35 PM, John cyuczieekc wrote: I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks like New(or normal?) BSD license or something ... also Licensing Berkeley DB is available under dual license: - Public license that requires that software that uses the Berkeley DB code be free/open source software; and - Closed source license for non-open source software. If your code is not redistributed, no license is required (free for in-house use). from http://www.orafaq.com/wiki
Re: [Neo4j] HyperRelationship example
Hey Niels, what is acquireLock() doing in SortedTree ? is removeProperty causing neo4j to acquire a lock on the Node? or its properties? also does that property need to exist? seems like not interesting :) On Wed, Jul 27, 2011 at 8:48 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
well if I think about it, maybe Niels could use this index(that neo4j uses) instead of berkeleydb, that is, unless I'm missing something (other than add() requiring an entity which is something unneeded with bdb when using key-value). (but likely he's already making use of it and I didn't really understand why bdb would be better then) On Thu, Jul 28, 2011 at 3:45 PM, John cyuczieekc cyuczie...@gmail.comwrote: Roger that. don't read the following it's irrelevant(don't even know why I sent it): Btw, seems to me that (since the underlaying index storage is BTree - just guessing from the speed) I could store the ID of the nodes in two indexes and use only those as a base for creating node to node relationships :) that is, kind of emulating a key-value database, tho it seems kind of useless at this time :) indexForward: A--B indexBackward: B--A this way, I could check if A--B exists, but could also check all modes (ie. A) that point to B, by using Backward index; else just for checking if A--B exists, only one index would be needed; though the add() does take an entity, would not need that; http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/Index.html On Thu, Jul 28, 2011 at 3:32 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! Seems right to me. And yes, key/value would typically be storing the relationship type. We should bring this up again next week, when Mattias who wrote the indexing stuff is back from vacation! /anders On 07/28/2011 03:05 PM, John cyuczieekc wrote: with relationshipindex seems to be working as fast, though I am not sure if I am using it right ie. doing this first time: RelationshipIndex ri = graphDB.index().forRelationships( relsIndex ); and on each relationship created between sNode--eNode where eNode is any random node, and sNode is the same on each call (in my case) Relationship rel = sNode.createRelationshipTo( eNode, linkedRelType ); ri.add( rel, key, value ); and when checking if sNode--eNode exists: final Relationship rel = ri.query( key, value, sNode, eNode ).getSingle(); if ( null == rel ) { return false; } else { return true; } seems to me that using those `key` and `value` are useless, unless I'm missing something; I'm probably using them wrongly but in my case I only have one type of relationship. In either case, the timings as good ~1ms, and no memory increase, so this would seem like a good workaround; with findSinglePath the memory would increase by 1 gig (for my test) Thanks for suggesting to revisit RelationshipIndex, last time I dropped it I think because I didn't know what to put on key/value. Also, I get what Niels meant now by that play nice with transactions, that if both neo4j and bdb recover the same things after crash/recovery or not... On Thu, Jul 28, 2011 at 2:05 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! I think the hard part about transactions is recovering after crashes and such. Regarding finding A--B, have you tried using a relationship index? See: http://components.neo4j.org/neo4j/1.4/apidocs/org/neo4j/graphdb/index/ReadableRelationshipIndex.html /anders On 07/28/2011 01:35 PM, John cyuczieekc wrote: I don't know what you mean by this: I don't know how nicely BDB plays with Neo4J transactions. I have some small experience with bdb java edition that is, but I'm not sure what would their transaction have to do with neo4j transactions... if you meant if you could make a wrapper such that you could use the same format/interface neo4j uses for their transactions, then you can, I did some attempt to that it works for me, also BDB Java Edition doesn't support nested transactions either (the C++ version does), but emulating them to use the same root/parent transaction is easy, my attempt is here: https://github.com/13th-floor/neo4john/blob/6c0371e82b7fc5b5f45d7c0ea9fb03ee4d241df9/src.obsolete/org/bdb/BETransaction.java probably not much relevant though. But this file here: https://github.com/13th-floor/neo4john/blob/master/src/org/benchtests/neo4j/TestLinkage.java I made to use both neo4j and bdb to do the same thing, that is: create nodes(uppercase named ones) with these rels: ROOT_LIST -- START ROOT_LIST -- half a million unique nodes ROOT_LIST -- MIDDLE ROOT_LIST -- another half a million unique nodes ROOT_LIST -- END then make both bdb and neo4j check if the following rels exist: ROOT_LIST -- START ROOT_LIST -- MIDDLE ROOT_LIST -- END (you probably saw this already in another post) But both bdb and neo4j now use transactions... that is, in my test file. About licensing, I'm not much into that but here's the license for Berkeley DB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html Looks
Re: [Neo4j] Events this Week
btw, just making sure, the Webinar is in 3 hours from now right? (otherwise I miscalculated) On Wed, Jul 27, 2011 at 8:28 PM, Allison Sparrow allison.spar...@neotechnology.com wrote: Hi all, Just a reminder on three events we have to close off the week: *TONIGHT at 18:00 PDT* Vancouver Meetup | Reference Node: Creating a Graph: www.meetup.com/graphdb-vancouver/events/24143031/ * TONIGHT at 19:00 PDT* Seattle Meetup | Discussions with Andreas Kollegger: www.meetup.com/graphdb-seattle/events/21044691/ * TOMORROW at 10:00 PDT * Webinar | Getting Started with Neo4j: https://www1.gotomeeting.com/register/855127096 See you there, *Allison Sparrow* * **Marketing Manager | Neo Technology* +19499036091 | @ayeeson http://twitter.com/#%21/ayeeson ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Events this Week
I chose not to attend the webinar due to the fact that it requires java and runs *unrestricted* So for anyone else: Enjoy! the webinar is supposedly still going at this time (35mins into it) On Thu, Jul 28, 2011 at 4:04 PM, John cyuczieekc cyuczie...@gmail.comwrote: btw, just making sure, the Webinar is in 3 hours from now right? (otherwise I miscalculated) On Wed, Jul 27, 2011 at 8:28 PM, Allison Sparrow allison.spar...@neotechnology.com wrote: Hi all, Just a reminder on three events we have to close off the week: *TONIGHT at 18:00 PDT* Vancouver Meetup | Reference Node: Creating a Graph: www.meetup.com/graphdb-vancouver/events/24143031/ * TONIGHT at 19:00 PDT* Seattle Meetup | Discussions with Andreas Kollegger: www.meetup.com/graphdb-seattle/events/21044691/ * TOMORROW at 10:00 PDT * Webinar | Getting Started with Neo4j: https://www1.gotomeeting.com/register/855127096 See you there, *Allison Sparrow* * **Marketing Manager | Neo Technology* +19499036091 | @ayeeson http://twitter.com/#%21/ayeeson ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
Hi xD I'm not clear what you need to store here, if I understand correctly you could store in 2 primary bdb databases the nodeID (ie. long) of each node in a relationship ie. key-value dbForward: A-B A-C X-D X-B dbBackward: B-A B-X C-A D-X A,B,C,D,X are all nodeIDs ie. longs this way you could check if A-B exists, or all of A's endNodes , or what startNodes are pointing to the endNode B the storing of these would be sorted and in BTree, lookup would be fast, so you can consider ie. A as being a set of B and C, and X being a set of B and D, (that is you cannot set the order as in a list, they are sorted by bdb for fast retrievals). (But upon this, sets, can build lists np - that is using only bdb; tho you won't need that using neo4j) So, if this is the kind of index you wanted... (I am not aware of specific indexes with bdb, though that doesn't mean they don't exist) Insertions would require transaction protection so both A-B in dbForward and B-A in dbBackward are inserted atomically. Parsing A then X of B- in dbBackward for example can only be done with a cursor... Either way, I'm taking a look on that bdb-index thingy; will report back if I have any ideas heh John. On Thu, Jul 28, 2011 at 9:42 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Thank you, Peter,There is no rush here. It would be nice to investigate this option, but it can wait until Mattias has returned and sifted through urgent matters. The question is even, if it would be a good idea to use an index to do the book keeping for Enhanced API.As it is now, the Reification of eg. a Relationship, requires one property to be set on a relationship, containing the node ID of the associated node. On the associated node is a property containing the ID of the relationship, so there is a bidirectional look up. Introducing an index would remove the need to have these additional properties, but would lead to slower look-up times (no matter how fast the index).So it's a trade-off between speed and cleanliness of namespace. Using the Enhanced API disallows certain property names to be used in user applications.The property names used in Enhanced API all start with org.neo4j.collections.graphbd., so there is little chance a user application would want to use those property names, but it is a restriction not found in the standard API, so ultimately something to consider.Niels From: peter.neuba...@neotechnology.com Date: Thu, 28 Jul 2011 10:39:47 -0700 To: user@lists.neo4j.org Subject: Re: [Neo4j] bdb-index niels, in this spike, I just concentrated on getting _something_ working in order to test insertion speed. This is not up to real indexing standards, so some love is needed here. I think Mattias is the best person to ask about pointers, let's wait until he is back next week if that is ok? Maybe some other (like the standard Lucene) index can suffice for the time being to test out things? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 28, 2011 at 10:36 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Composable traversals
Hey Niels, As they are composable, is java going to keep track of things, like if recursive, in stack ? or in array/variables ? or the graph could keep track of what's beep parsed so far, in-graph ? (I mean, this question applies for non-composable too; personally i like the idea of in-graph keeping track of those but maybe that would be implemented later at a higher level, so I guess for now it will be in array/variables) Just making sure, in here: Node --FRIEND-- Node -- PARENT -- Node FRIEND and PARENT are both relationship types? they are thus not intermediary nodes acting like they are relationships? (which is actually what I do with bdb where the only elemental thing is the Node, rels cannot be addressed ie. by ID) What happens while the traversers are executing and some other thread/process is deleting something which the traverser added to to itself as a valid node/path ? For example the first Node in Node --FRIEND-- Node assuming that's where the traverser's currently at, is deleted... Is there some notification/event or were they locked by traverser? or this kind of issue will be dealt with later after traverser is implemented? Are thee locks kept in-graph so they can be seen by other threads/processes (mainly thinking processes that cannot access the same java resource ie. in another jvm or computer tho accessing the same database - I guess this rules out embedded?) ? if any locks... On Fri, Jul 29, 2011 at 1:30 AM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I'd like to take a stab at implementing traversals in the Enhanced API. One of the things I'd like to do, is to make traversals composable. Right now a Traverser is created by either calling the traverse method on Node, or to call the traverse(Node) method on TraversalDescription. This makes traversals inherently non-composable, so we can't define a single traversal that returns the parents of all our friends. To make Traversers composable we need a function: Traverser traverse(Traverser, TraversalDescription) My take on it is to make Element (which is a superinterface of Node) into a Traverser. Traverser is basically another name for IterablePath. Every Node (or more generally every Element) can be seen as an IterabePath, returning a single Path, which contains a single path-element, the Node/Element itself. Composing traversals would entail the concatenation of the paths returned with the paths supplied, so when we ask for the parents of all our friends, the returned paths would take the form: Node --FRIEND-- Node -- PARENT -- Node Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] bdb-index
I forked and fixed, the tests are all working now: https://github.com/13th-floor/bdb-index Let me know if you want me to do a pull request, ... sadly I applied formatting on RawBDBSpeed and the diff doesn't look pretty if you're trying to see what changed John. On Thu, Jul 28, 2011 at 7:36 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Trying to find something useful to hide the implementation book keeping of Enhanced API, I tried out dbd-index as can be found here: https://github.com/peterneubauer/bdb-index It looks interesting, but fails its tests. When recovering it performs BerkeleyDbCommand#readCommand from the log. The retrieved indexName is not actually garbage. I would like to help make this component workable, but area of the database is a bit beyond the scope that I know. I know this is completely unsupported software, but can someone give me some pointers on how to fix this issue? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] nested transactions feature ?
thanks for that, the github help was most helpful Meanwhile I realized why calling failure() on the child transaction flags the root as failed, mainly because the child transaction cannot be reused and thus we cannot know if it was retried or not, this being the difference between child transactions that failed and child transactions that failed but were retried and the retry was successful; this explains `point 2)` and thus I agree with the current implementation... I am trying to emulate nested transactions with this possibility of reusing a failed child transaction such that the parent/root one doesn't need to be flagged as failed , but doing it with bdb not neo4j... However `point 1)` still remains unaddressed (though I didn't update from github yet to see if it was fixed, or I simply don't understand it right) On Sun, Jul 24, 2011 at 4:10 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Thanks John for the details! As for the GIThub workflow: http://www.eqqon.com/index.php/Collaborative_Github_Workflow and http://help.github.com/ are good starting points! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Jul 24, 2011 at 3:54 PM, John cyuczieekc cyuczie...@gmail.com wrote: Hey Peter, Got any good working we could change to? I'm not sure I understand what you mean, do you mean that if I have any good text to add to the javadoc so it's stated more clearly ? or something related to code instead? but if code, to do what? I understand that they are using the topmost transaction now, *1)* but what I would potentially expect from the nested transaction (semantics) is to either default to fail(?) or have it stated in the docs that it defaults to success even if success() isn't invoked (when finish() is reached), which is opposite than the parent root transaction which defaults to fail if success() isn't invoked. *2)* But also another issue with this, is that if the nested transaction called fail() - and this may explain why it doesn't default to fail() - then the parent transaction's state is fail() and can never be changed to success() again. I might want to potentially be using a nested transaction which could fail and even throw exception(which I would handle and possibly redo that nested transaction from the beginning) but I wouldn't want the parent transaction to fail because of that, especially since I managed to handle the failed nested transaction by eventually creating a new one which success-ed. In other words, if any nested fails, the entire tree chain of transactions must be aborted (that's how it is now). Would definitely be a great idea to state these in the (java)docs. Thanks! About the fork and pull request, I'd have to learn what they do and how to use them, ... in general I get the idea of what you're saying, I could code some modifs and you could check them out and if agreed upon you could add them to neo4j - or something, that's what I understand from that, but nothing about the details yet, must read... On Sun, Jul 24, 2011 at 3:25 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Hi there, yes, nested transactions are really using the most topmost transaction for control, so your assessment if the commit semantics is right (I think). Talked to Tobias about this, and maybe we could state this more clearly in the docs. Got any good working we could change to? Feel free to fork and send a pull request! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 21, 2011 at 7:29 PM, cyuczi eekc cyuczie...@gmail.com wrote: Hello. Are nested transaction supported? From what I'm testing here, it looks like unless I specify .failure() on the nested transaction, before it reaches .finish() the transaction is considered to be successful (even if I didn't call .success() on it). Though if I do call .failure() then the root transaction will be rolled back with exception: Exception in thread main org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:98
Re: [Neo4j] Pagination in Embedded
Looks like John H. means, how do you get all results for page N and only for page N ? without the overhead of getting thru all other results; so far, as I understand it (also from what Jim said), you'll have to parse all the results for all pages prior to page N, to get to page N, but not the results after page N. On Wed, Jul 27, 2011 at 9:53 AM, Jim Webber j...@neotechnology.com wrote: Hi John, In an embedded scenario, pagination doesn't make as much sense. Since calls to the embedded APIs typically return a lazily-evaluatable iterableT you just call next() to efficiently advance through the results. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Events this Week
Hey Allison, Just checking, those meet-ups are not accessible via internet (ie. live streamed or something) right? Otherwise, looking forward for the webinar, thanks! On Wed, Jul 27, 2011 at 8:28 PM, Allison Sparrow allison.spar...@neotechnology.com wrote: Hi all, Just a reminder on three events we have to close off the week: *TONIGHT at 18:00 PDT* Vancouver Meetup | Reference Node: Creating a Graph: www.meetup.com/graphdb-vancouver/events/24143031/ * TONIGHT at 19:00 PDT* Seattle Meetup | Discussions with Andreas Kollegger: www.meetup.com/graphdb-seattle/events/21044691/ * TOMORROW at 10:00 PDT * Webinar | Getting Started with Neo4j: https://www1.gotomeeting.com/register/855127096 See you there, *Allison Sparrow* * **Marketing Manager | Neo Technology* +19499036091 | @ayeeson http://twitter.com/#%21/ayeeson ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] HyperRelationship example
Hey Niels, I like xD this seems like a lot of work and professionally done; ie. something I could not have done (I don't have that kind of experience and focus). Gratz on that, I really appreciate seeing this. I cloned the repo from git, manually, with eclipse (not using maven - don't know how with eclipse) I am getting only about 3 compile errors, like: 1) The type BinaryRelationshipRolesT must implement the inherited abstract method PropertyContainer.getId() 2) The constructor PropertyTypeT(String, GraphDatabaseService) is not visible 3) The return type is incompatible with RelationshipContainer.getRelationships() for org.neo4j.collections.graphdb.impl.RelationshipIterable.RelationshipIterable(IterableRelationship rels) Also, I am thinking to try and implement this on top of berkeleydb just for fun/benchmarking (so to speak) to compare between that and neo4j - since I am currently unsure which one to use for my hobby project (I like that berkeleydb's searches are 0-1ms instead of few seconds) Btw, would it be any interest to you if I were to fork your repo and add ie. AllTests.java for junit and the .project and related files for eclipse project in a pull or two ? as long as it doesn't seem useless or cluttering... (note however I never actually, yet, used forkpull but only read about it on github xD) Thanks to all, for wasting some time reading this, Greeting and salutations, John On Wed, Jul 27, 2011 at 8:48 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I just posted an example on how to use HyperRelationships: https://github.com/peterneubauer/graph-collections/wiki/HyperRelationship-example There is now a proper test for HyperRelationships, so I hereby push the software to Beta status. Please try out the Enhanced API and HyperRelationships and let me know what needs improvement. Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] shortestPath slower than it could be
)--(two) 773 ms search started (three)--(one) 0 ms search started (two)--(one) 0 ms Environment is about to shut down the neo4j database Environment shutting down complete 3,801 ms and on cold run: search started (one)--(three) 8,454 ms search started (one)--(two) 885 ms search started (three)--(one) 0 ms search started (two)--(one) 0 ms search started (one)--(three) 752 ms search started (one)--(two) 755 ms search started (three)--(one) 0 ms search started (two)--(one) 0 ms search started (one)--(three) 804 ms search started (one)--(two) 621 ms search started (three)--(one) 0 ms search started (two)--(one) 0 ms Environment is about to shut down the neo4j database Environment shutting down complete 3,141 ms = and same thing in berkeleydb (no transactions tho): first time creating the relationships... just created 1,000,000 rels, took=29,658 ms Path from one to three: true 10 ms Path from one to two: true 0 ms Path from three to one: false 0 ms Path from two to one: false 0 ms Path from one to three: true 0 ms Path from one to two: true 0 ms Path from three to one: false 0 ms Path from two to one: false 0 ms Path from one to three: true 0 ms Path from one to two: true 0 ms Path from three to one: false 0 ms Path from two to one: false 0 ms class org.bdb.BerkEnv$1 shutting down complete 152 ms and on cold: Path from one to three: true 10 ms Path from one to two: true 10 ms Path from three to one: false 0 ms Path from two to one: false 0 ms Path from one to three: true 0 ms Path from one to two: true 0 ms Path from three to one: false 0 ms Path from two to one: false 0 ms Path from one to three: true 0 ms Path from one to two: true 0 ms Path from three to one: false 0 ms Path from two to one: false 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms these progs are on: https://github.com/13th-floor/neo4john/commit/d2fd5b27dcde5560e6ff980fe9320aedc4421ab7 Cheerios, John. Song of the day: Bic Runga - She Left On A Monday PS: asserts were enable ie. vm arg -ea (no quotes) On Sat, Jul 23, 2011 at 4:31 PM, Mattias Persson matt...@neotechnology.comwrote: Hi John, the algorithm is written to dodge these kinds of pitfalls. Maybe there's some issue with the implementation, but in principal it should make no difference. I'll look at it when I get the time (I wrote that implementation). 2011/7/23 John cyuczieekc cyuczie...@gmail.com Hey guys, me bugging you again :) (This whole thing is kind of based on the lack of being able to get the number of relationships a node has) If I have two nodes, and the first one has 1 million outgoing relationships of the type X to 1 million unique/different nodes, and the second node has 10 incoming relationships of type X (same type) of which one is from the first node, then using GraphAlgoFactory.shortestPath (or suggest a better way?) How can I tell neo4j to iterate the search on the second node's incoming rels simply because it has 10 relationships instead of 1 million, in order to check if each relationship is in the form of firstNode--secondNode ? For the case when first node has 100,000 relationships and second node has 10, it takes *1.7 seconds* for shortestPath to find the only one link between them using: final PathFinderPath finder = GraphAlgoFactory.shortestPath( Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); final Path foundPath = finder.findSinglePath( *one, two* ); I can put Direction.*BOTH *and get the same amount of time *Path from one to two: (one)--(two) timedelta=1,862,726,634 ns* *BUT*, get this: if I swap the nodes: finder.findSinglePath(* two, one*); and i use either Direction.INCOMING or Direction.*BOTH *(which makes sense for the second node ,right) then I get *20ms* the time until it finishes... *Path from one to two: (two)--(one) timedelta=20,830,111 ns* (both cases are without data being priorly cached) I was expecting it to act like this: (but only when using Direction.BOTH) see which node has the least number of relationships and iterate on those, but this would work if findSinglePath would be made for depth 1 (aka particular case), but as I read Tries to find a single path between startand end nodes. then it makes sense to me why it works like it does... that is, iterate on relationships from start node, rather than from end node... but I'm not sure if it would *not *make sense to iterate on the end node instead of start node, when knowing that end node has less relationships, for make the search faster (well at least if depth is one) - I didn't look into how neo4j actually does stuff yet :D anyway, it's fairly clear to me that I could make a simple wrapper method to make this kind of search faster, *IF* I had the ability to know how many relationships each node has, so I can call findSinglePath with the first param being the node with the least relationship count :) But as I understood it, it's
Re: [Neo4j] nested transactions feature ?
Hey Peter, Got any good working we could change to? I'm not sure I understand what you mean, do you mean that if I have any good text to add to the javadoc so it's stated more clearly ? or something related to code instead? but if code, to do what? I understand that they are using the topmost transaction now, *1)* but what I would potentially expect from the nested transaction (semantics) is to either default to fail(?) or have it stated in the docs that it defaults to success even if success() isn't invoked (when finish() is reached), which is opposite than the parent root transaction which defaults to fail if success() isn't invoked. *2)* But also another issue with this, is that if the nested transaction called fail() - and this may explain why it doesn't default to fail() - then the parent transaction's state is fail() and can never be changed to success() again. I might want to potentially be using a nested transaction which could fail and even throw exception(which I would handle and possibly redo that nested transaction from the beginning) but I wouldn't want the parent transaction to fail because of that, especially since I managed to handle the failed nested transaction by eventually creating a new one which success-ed. In other words, if any nested fails, the entire tree chain of transactions must be aborted (that's how it is now). Would definitely be a great idea to state these in the (java)docs. Thanks! About the fork and pull request, I'd have to learn what they do and how to use them, ... in general I get the idea of what you're saying, I could code some modifs and you could check them out and if agreed upon you could add them to neo4j - or something, that's what I understand from that, but nothing about the details yet, must read... On Sun, Jul 24, 2011 at 3:25 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Hi there, yes, nested transactions are really using the most topmost transaction for control, so your assessment if the commit semantics is right (I think). Talked to Tobias about this, and maybe we could state this more clearly in the docs. Got any good working we could change to? Feel free to fork and send a pull request! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jul 21, 2011 at 7:29 PM, cyuczi eekc cyuczie...@gmail.com wrote: Hello. Are nested transaction supported? From what I'm testing here, it looks like unless I specify .failure() on the nested transaction, before it reaches .finish() the transaction is considered to be successful (even if I didn't call .success() on it). Though if I do call .failure() then the root transaction will be rolled back with exception: Exception in thread main org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:98) at org.neo4j.examples.CalculateShortestPath.main(CalculateShortestPath.java:116) Caused by: javax.transaction.RollbackException: Failed to commit, transaction rolledback at org.neo4j.kernel.impl.transaction.TxManager.rollbackCommit(TxManager.java:811) at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:645) at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:109) at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:85) ... 1 more In the root transaction however, if I don't explicitly call .success() it is considered failure and rolled back on finish. This seem to be the main difference between root transaction and the nested transaction; was this intended? here's a sample program to test: (points 1. and 2. are important to me) 1. run it as it is, to see that nested transaction doesn't default to failure when none of .failure() or .success() are specified before reaching .finish(); is this a feature? (I just remembered that possibly someone told me about this yesterday? I cannot find the message) 2. uncomment // nestyTx.failure(); to see what happens when specifically stating that the nested transaction failed == the root one will rollback with exception, which might be fine, I guess... though maybe I would expect that the root transaction not be affected by a rolled back child transaction, I mean, I might retry the child transaction and succeed the second time, but the root will fail because some child transaction failed before... to get the idea of this, use this code block (you'll know where to put it, replacing the old part):
Re: [Neo4j] shortestPath slower than it could be
but, even when `one` has 1 million outgoing rels and `two` has 10 or 11 incoming rels cool then ;) I was trying to say the same thing ... Have a nice day, John. On Sun, Jul 24, 2011 at 4:10 PM, Mattias Persson matt...@neotechnology.comwrote: What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,590 ms (most was 20ms, least was 20ms on first) TestLinkage.java (both progs) were used from here: https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 -- Here's a lame benchmark for neo4j when I added another node `three` which is the first rel `one`-`three`, then added 1mil `one`-(random nodes
Re: [Neo4j] shortestPath slower than it could be
updated to latest from github, Stops the algo as soon as possible in findSinglePath graphdb contains: one--three one--{ a million other random nodes } one--two { 10 random nodes } -- two (added in that order, except that `one--two` is somewhere between those 10 random nodes) output: with Direction.BOTH: (one)--(three) 7,660 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 2,195 ms (one)--(two) 0 ms *(three)--(one) 0 ms* (two)--(one) 0 ms *(one)--(three) 875 ms* (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 630 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 723 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 802 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 631 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 732 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 873 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 3,171 ms For some reason, one--three takes more than 0ms, even though three has only 1 incoming rel, and it's the first rel of `one` that outgoes to `three` (for whoever cares)the test program used is in this commit here: https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee I tried to revert that commit, and for some reason I'm getting the same results [one--three is the only one over 0ms] (maybe I failed in reverting it, though the 2 sources seem reverted) Peace :) On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson matt...@neotechnology.comwrote: It doesn't matter since the algorithm is bi-directional... so: one -- two will start from one OUTGOING and two INCOMING, whereas two -- one will start from two INCOMING and one OUTGOING see, no difference. It alternates side for each relationship. It will, however depend on where the INCOMING/OUTGOING relationships reside in the relationship chain, but sticking to this discussion: those two calls will yield the exact same speed. Though I just discovered a little thingie where findSinglePath didn't stop right away when after finding the first one, but now it does! 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because
Re: [Neo4j] shortestPath slower than it could be
ok fully reverting the entire project worked (that is without that fix): with Direction.BOTH: (one)--(three) 8,438 ms (one)--(two) 873 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 734 ms (one)--(two) 743 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 784 ms (one)--(two) 621 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 682 ms (one)--(two) 733 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 620 ms (one)--(two) 683 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 762 ms (one)--(two) 621 ms (one)--(three) 712 ms (one)--(two) 631 ms (one)--(three) 672 ms (one)--(two) 773 ms (one)--(three) 620 ms (one)--(two) 743 ms (one)--(three) 620 ms (one)--(two) 683 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,721 ms and with that *fix*: with Direction.BOTH: (one)--(three) 8,710 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 1,143 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 763 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 814 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 897 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms (one)--(two) 0 ms (one)--(three) 763 ms (one)--(two) 0 ms (one)--(three) 842 ms (one)--(two) 0 ms (one)--(three) 621 ms (one)--(two) 0 ms (one)--(three) 742 ms Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 2,711 ms So yeah, one--two got fixed, what say you about one--three though ? :D *whistle* :- On Sun, Jul 24, 2011 at 5:28 PM, John cyuczieekc cyuczie...@gmail.comwrote: updated to latest from github, Stops the algo as soon as possible in findSinglePath graphdb contains: one--three one--{ a million other random nodes } one--two { 10 random nodes } -- two (added in that order, except that `one--two` is somewhere between those 10 random nodes) output: with Direction.BOTH: (one)--(three) 7,660 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 2,195 ms (one)--(two) 0 ms *(three)--(one) 0 ms* (two)--(one) 0 ms *(one)--(three) 875 ms* (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 630 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 723 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 802 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 631 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 732 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 873 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 3,171 ms For some reason, one--three takes more than 0ms, even though three has only 1 incoming rel, and it's the first rel of `one` that outgoes to `three` (for whoever cares)the test program used is in this commit here: https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee I tried to revert that commit, and for some reason I'm getting the same results [one--three is the only one over 0ms] (maybe I failed in reverting it, though the 2 sources seem reverted) Peace :) On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson matt...@neotechnology.com wrote: It doesn't matter since the algorithm is bi-directional... so: one -- two will start from one OUTGOING and two INCOMING, whereas two -- one will start from two INCOMING and one OUTGOING see, no difference. It alternates side for each relationship. It will, however depend on where the INCOMING/OUTGOING relationships reside in the relationship chain, but sticking to this discussion: those two calls will yield the exact same speed. Though I just discovered a little thingie where findSinglePath didn't stop right away when after finding the first one, but now it does! 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes
Re: [Neo4j] shortestPath slower than it could be
I got more creative with the naming and formatting, and I post another output (more intuitively read), the graphdb contains (rels added in this order): ROOT_LIST--START ROOT_LIST--{ 500,000 new unique nodes } ROOT_LIST--MIDDLE ROOT_LIST--{ another set of 500,000 new unique nodes } ROOT_LIST--END then I use findSinglePath with the startNode being on the left, and and node being on the right side of the output below with Direction.BOTH: ROOT_LIST -- START 7,188 ms ROOT_LIST -- MIDDLE 391 ms ROOT_LIST --END 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms ROOT_LIST -- START 1,542 ms ROOT_LIST -- MIDDLE 483 ms ROOT_LIST --END 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms ROOT_LIST -- START 610 ms ROOT_LIST -- MIDDLE 423 ms ROOT_LIST --END 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms ROOT_LIST -- START 874 ms ROOT_LIST -- MIDDLE 311 ms ROOT_LIST --END 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms ROOT_LIST -- START 1,982 ms ROOT_LIST -- MIDDLE 311 ms ROOT_LIST --END 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms with Direction.INCOMING: START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms START -- ROOT_LIST 0 ms MIDDLE -- ROOT_LIST 0 ms END -- ROOT_LIST 0 ms with Direction.OUTGOING: ROOT_LIST -- START 722 ms ROOT_LIST -- MIDDLE 311 ms ROOT_LIST --END 0 ms ROOT_LIST -- START 722 ms ROOT_LIST -- MIDDLE 311 ms ROOT_LIST --END 0 ms ROOT_LIST -- START 742 ms ROOT_LIST -- MIDDLE 311 ms ROOT_LIST --END 0 ms ROOT_LIST -- START 767 ms ROOT_LIST -- MIDDLE 304 ms ROOT_LIST --END 0 ms ROOT_LIST -- START 600 ms ROOT_LIST -- MIDDLE 372 ms ROOT_LIST --END 0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,891 ms hopefully this is clearer than before :) code is in this commit: https://github.com/13th-floor/neo4john/commit/ec2079a9c2edf8e03922e4576c3271b3ac6119fd G'day On Sun, Jul 24, 2011 at 5:36 PM, John cyuczieekc cyuczie...@gmail.comwrote: ok fully reverting the entire project worked (that is without that fix): with Direction.BOTH: (one)--(three) 8,438 ms (one)--(two) 873 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 734 ms (one)--(two) 743 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 784 ms (one)--(two) 621 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 682 ms (one)--(two) 733 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 620 ms (one)--(two) 683 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 762 ms (one)--(two) 621 ms (one)--(three) 712 ms (one)--(two) 631 ms (one)--(three) 672 ms (one)--(two) 773 ms (one)--(three) 620 ms (one)--(two) 743 ms (one)--(three) 620 ms (one)--(two) 683 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,721 ms and with that *fix*: with Direction.BOTH: (one)--(three) 8,710 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 1,143 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 763 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 814 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 897 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms (one)--(two) 0 ms (one)--(three) 763 ms (one)--(two) 0 ms (one)--(three) 842 ms (one)--(two) 0 ms (one)--(three) 621 ms (one)--(two) 0 ms (one)--(three) 742 ms
Re: [Neo4j] shortestPath slower than it could be
ns Path from one to two: (two)--(one) timedelta=255,431 ns and another run: Path from one to two: (two)--(one) timedelta=20,513,338 ns Path from one to two: (two)--(one) timedelta=280,314 ns Path from one to two: (two)--(one) timedelta=255,731 ns (I almost got scared there on the second and third xD) and here's if I count them first (twice): total relations count=100,011 timedelta=3,077,046,964 ns total relations count=100,011 timedelta=151,727,293 ns Path from one to two: (one)--(two) timedelta=173,697,069 ns Path from one to two: (one)--(two) timedelta=70,160,217 ns Path from one to two: (one)--(two) timedelta=91,760,936 ns swapped: total relations count=100,011 timedelta=3,219,749,623 ns total relations count=100,011 timedelta=151,675,727 ns Path from one to two: (two)--(one) timedelta=10,717,914 ns Path from one to two: (two)--(one) timedelta=204,465 ns Path from one to two: (two)--(one) timedelta=143,005 ns Relationships are stored as a chained/linked list in the disk format so it is not possible to know how many there are per node. Will this(storage way) possibly change in 1.5 or in the future? I don't really know how ie. berkeleydb je stores key-value pairs (BTree?) but it might potential help get a perspective on that(actually I believe y'all know already); In bdb I am able to count the number of values for the same key (using a cursor via *cursor.count()* ) By the way, I was storing relationships in bdb je, like this: key-value where each *key *would be the *start node*, and each *value *would be *end node*, thus a key-value would form a relationship, though I would never consider a relationship as a single object(I would always need both start and end nodes to identify/access a relationship). And a key could have more than one value associated with it. (one to many) ie. sameKey-value1 sameKey-value2 sameKey-value3 and using a cursor in bdb, I would be able to parse all values (and count() them tho not sure how it would do that internally, hopefully not by parsing them all). Of course i would have a second database storing them backwards(endnode as key, startnode as value) like: value1-sameKey value2-sameKey value3-sameKey this way I would be able to search by endnode too, since I could only find stuff (BTree/hash fast) only by looking up by key. (each key/value would be a nodeID btw; I would also have another database for nodeID-String and String-nodeID, so I can search for both, where the equivalent of that in neo4j would be the node having a name property with that String value - this so I could access the same nodeID access java application restarts, that is by using that fixed String) My problem with bdb je, was that I couldn't find a way (for my use case) to use transactions without eventually deadlocking...(but if I was as good as you guys I probably would've found it)... so last time I totally disabled the use of transactions so that it would seem simpler while I would focus on some other project aspect. (ie. parsing from value1 to value 3 in one thread, and from value3 to value1 in another; but not just this case; not to mention *it doesn't support nested transactions for java edition*) Cheers Cheerios =)) Michael John PS: btw, this is a lifelong project(so to speak), which I failed to complete or even begin (in 6 years so far) due to postponing it each time I ran into problems (I know this isn't the way to do it like that, instead keep at it and maybe allow compromises - so fail on my part). And finding your neo4j database is giving me a new opportunity to redo it using neo4j instead of bdbje :) I know(!) graph databases(and stuff based on them, that is: THAT kind of connectivity/accessibility) are the future, and I aim for immediate accessibility and customizability ;) Am 23.07.2011 um 04:42 schrieb John cyuczieekc: Hey guys, me bugging you again :) (This whole thing is kind of based on the lack of being able to get the number of relationships a node has) If I have two nodes, and the first one has 1 million outgoing relationships of the type X to 1 million unique/different nodes, and the second node has 10 incoming relationships of type X (same type) of which one is from the first node, then using GraphAlgoFactory.shortestPath (or suggest a better way?) How can I tell neo4j to iterate the search on the second node's incoming rels simply because it has 10 relationships instead of 1 million, in order to check if each relationship is in the form of firstNode--secondNode ? For the case when first node has 100,000 relationships and second node has 10, it takes *1.7 seconds* for shortestPath to find the only one link between them using: final PathFinderPath finder = GraphAlgoFactory.shortestPath( Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); final Path foundPath = finder.findSinglePath( *one, two* ); I can put Direction.*BOTH *and get the same amount of time *Path from one to two: (one)--(two
Re: [Neo4j] how many relationships?
23, 2011 at 10:51 AM, Michael Hunger michael.hun...@neotechnology.com wrote: An internal implementation would be probably faster. If timing is that critical for you, you can have a look in EmbeddedGraphDbImpl.getAllNodes() and implement a similar solution for relationships. Cheers Michael Am 23.07.2011 um 04:20 schrieb John cyuczieekc: Hey Jim, I am sort of glad to hear that, maybe in the future I could see a method like getAllRelationships(), or not, np :) Yes, using Michael's code works, but ... total relations count=100,011 timedelta=3,075,897,991 ns it kind of takes 3 seconds (when not cached) to count 100k relationships (considering there are 100k+2 unique nodes too) when cached: total relations count=100,011 timedelta=154,673,763 ns Still, it's pretty fast, but I have to wonder if it would be faster if using relationships directly :) Either way, wish y'all a great day! On Sat, Jul 23, 2011 at 3:57 AM, Jim Webber j...@neotechnology.com wrote: Hi John, Relationships are stored in a different store than nodes. This enables Neo4j to manage lifecycle events (like caching) for nodes and relationships separately. Neo4j really is a graph DB, not a triple store masquerading as a graph DB. Nonetheless, that code Michael sent still works :-) Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] shortestPath slower than it could be
at: it might be that BR Trees would be only a subset of what I'm trying to do, and so I wouldn't use them to implement what I want on top of them, rather side by side to them, but then again, I fail to properly focus so I might be misunderstanding the whole thing. https://github.com/peterneubauer/graph-collections Cheers Michael Perhaps you should describe the intent / idea behind your project. Dropping too early into implementation details might remove some possibilities that are able with graph databases by loosing the big picture. I definitely lost myself into details and implementation limits since long ago... such that whatever the idea was is merely now based-on/adapted-to the implementation limits. And I wouldn't care about performance right now (btw. System.currentTimeMillis() is mostly good enough for measurement). But rather on modeling your domain to the graph and specify which concepts and operations/use-cases you'd like to support. You can optimize afterwards anyway. Sometime i would find that I had to redo/redesign the entire project in order to be able to gain the required performance or in order to be able to have a certain feature. Examples(which may not pertain to what I just said): find a centralized way of reporting all exceptions thrown, using aspectj to hook on each call as to be able to catch each `throw`; making sure throws in finally don't overwrite throws in try; transactional memory attempt... - this kind of silly non-sensical details Am 23.07.2011 um 14:29 schrieb John cyuczieekc: Hail :) Beware, lot of noise follows (ie. webnoise kinda noise), while I thank anyone in advance for reading, I must also apologize for failing to be concise :- On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger michael.hun...@neotechnology.com wrote: John, no problem :) You pointed out both problems: - cold caches - lots of rels on the one side There are some performance implications of loading millions of rels from disk. We'll be working on improving those in 1.5. The easiest way to solve that is to switch start and end-node which you already did. It is much easier for you _with domain knowledge_ about the graph than for the algorithm. Yeah I would probably apply that manually, that is: call a particular method when I know which one of the nodes has least rels. (as a ' `temporary` or `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now use findSinglePath as it is*, simply because I don't want the overhead of using a particular method for some use case scenario and the generic one for others - besides it's fast enough, for now) My problem here is that I might not know (in the future) which node (first or second) will have the least nodes. So I was hoping to create a generic method for such find. For now, I could say, I can know. For example: AllLists-A AllLists-B AllLists-C where A,B,C are considered nodes representing lists, in this case, I could maybe say that while AllLists can have lots of outgoing rels (aka lots of lists), list A may not have that many nodes pointing to it (aka list is not used in as many places as there are lists), so I could consider A as being node with least nodes. Though I would want a generic method that would always know which node has least nodes and parse that one, especially in my 1 hop case. I am, as seen here, basically tagging the nodes A,B,C with AllLists, by having AllLists point to them, and this way, I know that node A is of AllLists type (so to speak). This was my old way of thinking while using bdb je, I might need to revise it since then I was completely ignoring relationships and not considering them as a single and accessible object. (I am currently trying to port that old way of thinking to neo4j) In this old way, I was using the node AllLists pointing/outgoing to A as being (in neo4j) a relationship of the type AllLists incoming to A. Especially if you have 1 hop paths. In my particular use case, I will probably not (yet!) have cases with 1 hop. (I can't really visualize that far in the future xD ) but for sure I will have lots of ==1 hops cases where I just want to see if thisnode is tagged with thatnode by checking thatnode--thisnode There might be ways in improving the algorithm, e.g. by iterating both sides at the same time, which would lead to the end with the fewer relationships being exhausted and resolved first. for now, I totally avoided getting into the findSinglePath 's source and considering doing changes there or even seeing how it works:) But that algo sounds like something I'd do, in like 2 (reusable) threads even xD (I remember having considered this in the past, when using bdb je but I opted to parse the node with least rels instead) How large is your graph at all? And how is it structured, e.g. how many rels do the other 9 nodes have
Re: [Neo4j] how many relationships?
How would I go about getting all relationships in the entire database ? (with neo4j embedded) I see there is an db.getAllNodes() for nodes is there something similar for relationships? On Wed, Jul 20, 2011 at 7:13 PM, cyuczi eekc cyuczie...@gmail.com wrote: Is there a way to get the number of relationships without having to iterate through (and count++) them? ie. rels=firstNode.getRelationships(); rels.size(); //doesn't actually exist ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] how many relationships?
Though that is kind of telling me that relationships only exist in java as wrappers for an ordered tuple of nodes. I guess I was thinking that they were stored/accessed differently as unique objects(since they each have an id)... maybe they are but neo4j isn't exposing a method for parsing relationships directly, that is without going through a node first. The method you suggested seems like the long way to parse all relationships. Still, I will use it nonetheless ;) Thank you! On Fri, Jul 22, 2011 at 11:57 PM, Michael Hunger michael.hun...@neotechnology.com wrote: for (Node node : db.getAllNodes()) for (Relationship rel : node.getRelationships(Direction.OUTGOING)) { // your code here } Michael Am 22.07.2011 um 23:40 schrieb John cyuczieekc: How would I go about getting all relationships in the entire database ? (with neo4j embedded) I see there is an db.getAllNodes() for nodes is there something similar for relationships? On Wed, Jul 20, 2011 at 7:13 PM, cyuczi eekc cyuczie...@gmail.com wrote: Is there a way to get the number of relationships without having to iterate through (and count++) them? ie. rels=firstNode.getRelationships(); rels.size(); //doesn't actually exist ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] how many relationships?
Hey Jim, I am sort of glad to hear that, maybe in the future I could see a method like getAllRelationships(), or not, np :) Yes, using Michael's code works, but ... total relations count=100,011 timedelta=3,075,897,991 ns it kind of takes 3 seconds (when not cached) to count 100k relationships (considering there are 100k+2 unique nodes too) when cached: total relations count=100,011 timedelta=154,673,763 ns Still, it's pretty fast, but I have to wonder if it would be faster if using relationships directly :) Either way, wish y'all a great day! On Sat, Jul 23, 2011 at 3:57 AM, Jim Webber j...@neotechnology.com wrote: Hi John, Relationships are stored in a different store than nodes. This enables Neo4j to manage lifecycle events (like caching) for nodes and relationships separately. Neo4j really is a graph DB, not a triple store masquerading as a graph DB. Nonetheless, that code Michael sent still works :-) Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] shortestPath slower than it could be
Hey guys, me bugging you again :) (This whole thing is kind of based on the lack of being able to get the number of relationships a node has) If I have two nodes, and the first one has 1 million outgoing relationships of the type X to 1 million unique/different nodes, and the second node has 10 incoming relationships of type X (same type) of which one is from the first node, then using GraphAlgoFactory.shortestPath (or suggest a better way?) How can I tell neo4j to iterate the search on the second node's incoming rels simply because it has 10 relationships instead of 1 million, in order to check if each relationship is in the form of firstNode--secondNode ? For the case when first node has 100,000 relationships and second node has 10, it takes *1.7 seconds* for shortestPath to find the only one link between them using: final PathFinderPath finder = GraphAlgoFactory.shortestPath( Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); final Path foundPath = finder.findSinglePath( *one, two* ); I can put Direction.*BOTH *and get the same amount of time *Path from one to two: (one)--(two) timedelta=1,862,726,634 ns* *BUT*, get this: if I swap the nodes: finder.findSinglePath(* two, one*); and i use either Direction.INCOMING or Direction.*BOTH *(which makes sense for the second node ,right) then I get *20ms* the time until it finishes... *Path from one to two: (two)--(one) timedelta=20,830,111 ns* (both cases are without data being priorly cached) I was expecting it to act like this: (but only when using Direction.BOTH) see which node has the least number of relationships and iterate on those, but this would work if findSinglePath would be made for depth 1 (aka particular case), but as I read Tries to find a single path between startand end nodes. then it makes sense to me why it works like it does... that is, iterate on relationships from start node, rather than from end node... but I'm not sure if it would *not *make sense to iterate on the end node instead of start node, when knowing that end node has less relationships, for make the search faster (well at least if depth is one) - I didn't look into how neo4j actually does stuff yet :D anyway, it's fairly clear to me that I could make a simple wrapper method to make this kind of search faster, *IF* I had the ability to know how many relationships each node has, so I can call findSinglePath with the first param being the node with the least relationship count :) But as I understood it, it's not possible to find how many rels a node has... gimme feat! :)) [by not possible I mean, without having to iterate thru all and count them, which would make the use case here obsolete] PS: clearly all the text I wrote here would benefit from being represented by a graph, just think about all those grouping with autohiding the ie. [] and all kinds of stuff... heh ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user