Re: [Neo4j] Arnoldi iteration
Lorenzo, On Mon, May 31, 2010 at 2:33 PM, Lorenzo Livi lorenz.l...@gmail.com wrote: I'll make a try with the 0.6-snapshot version and I'll let you know. yes, please get back with feedback! Tobias and Mattias have been looking closer at some of the algos in the graph-algo package but not all are totally tested through, so your feedback is greatly appreciated. The last thing: the power method is not guaranted to converge always ... especially if the graph is directed (neo4j is always directed). Then a method like Arnoldi Iteration is necessary IMHO. I've developed the simplest centrality measure, the degree centrality and I should develop PageRank and/or HITS (not now ..). Maybe you can be interested in these algos (for now the degree centrality). Let me know. Yes, we are and could include them into the graph-algo package. You can just sign the CLA and we can get going on this, http://wiki.neo4j.org/content/About_Contributor_License_Agreement /peter ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
Exactly, the problem is most likely that you try to insert all your stuff in one transaction. All data for a transaction is kept in memory until committed so for really big transactions it can fill your entire heap. Try to group 10k operations or so for big insertions or use the batch inserter. Links: http://wiki.neo4j.org/content/Transactions#Big_transactions http://wiki.neo4j.org/content/Batch_Insert 2010/6/2, Laurent Laborde kerdez...@gmail.com: On Wed, Jun 2, 2010 at 3:50 AM, Biren Gandhi biren.gan...@gmail.com wrote: Is there any limit on number of nodes that can be created in a neo4j instance? Any other tips? I created hundreds of millions of nodes without problems, but it was splitted into many transaction. -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
Exactly, the problem is most likely that you try to insert all your stuff in one transaction. All data for a transaction is kept in memory until committed so for really big transactions it can fill your entire heap. Try to group 10k operations or so for big insertions or use the batch inserter. Links: http://wiki.neo4j.org/content/Transactions#Big_transactions http://wiki.neo4j.org/content/Batch_Insert 2010/6/2, Laurent Laborde kerdez...@gmail.com: On Wed, Jun 2, 2010 at 3:50 AM, Biren Gandhi biren.gan...@gmail.com wrote: Is there any limit on number of nodes that can be created in a neo4j instance? Any other tips? I created hundreds of millions of nodes without problems, but it was splitted into many transaction. -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Compacting files?
Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] PHP REST API client
Hi! This is awesome! I tried it out and have a suggestion: to make the semantics for storing NULLs consistent you could change the PropertyContainer::__set method to remove the property if it exists when trying to set it to NULL. This will make sure NULL is returned when you try to read the property. Something along the lines of: public function __set($k, $v) { // because neo doesn't store NULLs if ($v===NULL) { if (array_key_exists($k, $this-_data)) { unset($this-_data[$k]); } } else { $this-_data[$k] = $v; } } For some reason calling Node::save twice gives me an exception, so I can't update a node after the first save and save it again with new property values. Maybe a bug? /anders On 06/02/2010 01:00 AM, Alastair James wrote: Hi there! Sorry, been a bit quiet on the PHP REST API front for a few weeks. I will be added some features this week (traversals etc...), but in the mean time, I have (finally) written up a little blog post detailing how the current version works! http://onewheeledbicycle.com/2010/06/01/getting-started-with-neo4j-rest-api-and-php/ Stay tuned for more! Alastair ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo] TransactionEventHandler and Spring transaction handling
Antonis, Just committed some bug fixes in the event framework and hopefully this also solves the problem you experienced when using Spring. Could you please try the latest neo4j-kernel 1.1-SNAPSHOT to see if it works? To answer your other question the handler is called in the same thread and you can access node properties in the afterCommit() call (we changed so reads without a running transaction are possible). Regards, Johan On Thu, May 20, 2010 at 2:56 PM, Antonis Lempesis ant...@di.uoa.gr wrote: To further clarify, I run 2 tests. In the first test, my objects were configured using spring + I had the @Transactional annotation in the test method. In the second test, I configured the same objects manually and also started and commited the transaction before and after calling the test method. In both cases, my handler got a TransactionData object (not null), but in the second case tData.assignedNodeProperties().hasNext() returned true while in the first it returned false. thanks for your support, Antonis PS 2 questions: is the handler called in a different thread? And, in afterCommit() method, can I access the node properties in the TransactionData object? Since the transaction is commited (I guess finished), shouldn't I get an NotInTransaction exception? On 5/20/10 3:38 PM, Johan Svensson wrote: Hi, I have not tried to reproduce this but just looking at the code I think it is a bug so thanks for reporting it! The synchronization hook that gathers the transaction data gets registered in the call to GraphDatabaseService.beginTx() but when using Spring (with that configuration) UserTransaction (old JTA) will be called directly so no events will be collected. Will fix this very soon. -Johan On Wed, May 19, 2010 at 5:49 PM, Antonis Lempesisant...@di.uoa.gr wrote: Hello all, I have set up spring to handle transactions for neo4j (according to the imdb example) and it works fine. When I read about the new events framework, I checked out the latest revision (4421) and tried to register my TransactionEventHandler that simply prints the number of created nodes. The weird thing is that when I test this in a simple junit test case, the TransactionData I get contains the correct data. When I do the same thing using the spring configuration, the TransactionData is empty. Any ideas? Thanks, Antonis ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
I'm developing the support to traversals for Python REST Client. The underlying idea for me is to mantain the compatibility with neo4j.py (a really hard issue), but the traversals made me to think about some questions: 1. How can I implement support to isStopNode or isReturnable in REST Service? I guess that for isStopNode I may to use prune evaluator, but what about with isReturnable, must I use returnable filer? Why this parameter has no body attribute in order to define a function? 2. If max depth parameter is not set, it's equivalent to STOP_AT_END_OF_GRAPH? If that's not true, how can I get the a behaviour like STOP_AT_END_OF_GRAPH? Sorry, perhaps they are dumb questions, but I need some of light, please. Best regards. -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
2010/6/2 Javier de la Rosa ver...@gmail.com: I'm developing the support to traversals for Python REST Client. The underlying idea for me is to mantain the compatibility with neo4j.py (a really hard issue), but the traversals made me to think about some questions: 1. How can I implement support to isStopNode or isReturnable in REST Service? I guess that for isStopNode I may to use prune evaluator, but what about with isReturnable, must I use returnable filer? Why this parameter has no body attribute in order to define a function? You can specify return filter just as you can do prune evaluator, like: ... return filter: { language: javascript, body: position.node().getProperty( 'name' ).equals( 'Javier' ) } ... 2. If max depth parameter is not set, it's equivalent to STOP_AT_END_OF_GRAPH? If that's not true, how can I get the a behaviour like STOP_AT_END_OF_GRAPH? If max depth isn't supplied max depth 1 is assumed. To get the STOP_AT_END_OF_GRAPH behaviour you should do: ... prune evaluator: { language: builtin, value: none } which converts into PruneEvaluator.NONE. Sorry, perhaps they are dumb questions, but I need some of light, please. Not at all, I hope this helps you! Best regards. -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Hi Johan, Do you mean a utility that creates a new Neo4j instance and copies all entities into it from an old Neo4j instance? That's definitely no problem. I've written a bit of import/export code in my graph_gen_utils branch. I have a GraphReader interface which is generic and only contains getNodes() getRels() methods definitions, which return iterators. The iterators are of type NodeData, basically a HashMap of HashMap for simplicity. 1 NodeData can contain 1 Node with Properties and all it's Relationships with Properties. Then I implemented various readers that I needed during the thesis. For example, ChacoParser, GMLParser, TwitterParser (proprietry format), etc which all implement GraphReader. Similarly for GraphWriter... That made it easy for me to add any parser and use my existing methods for buffering multiple entities into Transactions, etc. It's far from perfect, but might give an idea or two. Maybe some of that could be reused, although someone would definitely need to evaluate the quality of my code first. Blueprints has some import functionality too (.graphml format for example). Cheers, Alex On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.comwrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
Thank you for your clarification. On 2 June 2010 13:31, Mattias Persson matt...@neotechnology.com wrote: return filter: { language: javascript, body: position.node().getProperty( 'name' ).equals( 'Javier' ) } Will we see language: python in the near future? -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
And one more question, what's the meaning of uniqueness: node path parameter? What values does it support? Which is the equivalent en neo4j.py? -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] PHP REST API client
Hi! I tried it out and have a suggestion: to make the semantics for storing NULLs consistent you could change the PropertyContainer::__set method to remove the property if it exists when trying to set it to NULL. Excellent idea! I will add ASAP. For some reason calling Node::save twice gives me an exception, so I can't update a node after the first save and save it again with new property values. Maybe a bug? Looks like it. I will fix ASAP! Alastair /anders On 06/02/2010 01:00 AM, Alastair James wrote: Hi there! Sorry, been a bit quiet on the PHP REST API front for a few weeks. I will be added some features this week (traversals etc...), but in the mean time, I have (finally) written up a little blog post detailing how the current version works! http://onewheeledbicycle.com/2010/06/01/getting-started-with-neo4j-rest-api-and-php/ Stay tuned for more! Alastair ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Dr Alastair James CTO James Media Group www.jamesmedia.net www.adnet-media.net www.worldreviewer.com www.thehotelguru.com 'Inspiring Travel' IATA 96012851 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
I don't think the python bindings (or any other binding) has caught up to the new traversal framework. Uniqueness is all about when to visit a node and when not to. If the uniqueness would be NODE_GLOBAL a node wouldn't be visited more than once in a traversal. NODE_PATH means that a node won't be visited again for the current path (the path from the start node to where ever the traverser is at the moment) if that node is in the current path. It might as well be visited again in another path. Also see the javadoc of Uniqueness at http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/Uniqueness.html 2010/6/2 Javier de la Rosa ver...@gmail.com: And one more question, what's the meaning of uniqueness: node path parameter? What values does it support? Which is the equivalent en neo4j.py? -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
2010/6/2 Javier de la Rosa ver...@gmail.com: Thank you for your clarification. On 2 June 2010 13:31, Mattias Persson matt...@neotechnology.com wrote: return filter: { language: javascript, body: position.node().getProperty( 'name' ).equals( 'Javier' ) } Will we see language: python in the near future? Yep, I very much hope so! -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] neo4j-utils
Is there someone out there using neo4j-utils component, http://components.neo4j.org/neo4j-utils/ ? I'm the one responsible for creating the (somewhat messy) utilities in there. Something just hit me when looking at it: most of the public methods in the code (although not all) which does some write operation to the graph wraps the code in its own transaction. I find that to be a little off, since it's good to be explicit about the scopes of your transactions. So I was planning to remove all such transaction wrappings and also remove a lot of GraphDatabaseService references from constructors, since you now can reach the graph database via http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/PropertyContainer.html#getGraphDatabase() , making that extra reference unnecessary. Does anyone have an opinion about all this? -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.comwrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals in REST Service vs. Traversals in neo4j.py
On 2 June 2010 16:21, Mattias Persson matt...@neotechnology.com wrote: I don't think the python bindings (or any other binding) has caught up to the new traversal framework. Uniqueness is all about when to visit a node and when not to. If the uniqueness would be NODE_GLOBAL a node wouldn't be visited more than once in a traversal. NODE_PATH means that a node won't be visited again for the current path (the path from the start node to where ever the traverser is at the moment) if that node is in the current path. It might as well be visited again in another path. Also see the javadoc of Uniqueness at http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/Uniqueness.html Great! Thank you so much. -- Javier de la Rosa ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Hi Craig, Just a quick note about needing to keep all IDs in memory during an import/export operation. The way I'm doing it at the moment it's not necessary to do so. When exporting: Write IDs to the exported format (this could be JSON, XML, GML, GraphML, etc) When importing: First import all Nodes, this is easy to do in most formats (all that I've tried). While importing Nodes, store index 1 extra property in every Node, I call this GID for global ID. Next import all Relationships, using the GID and Lucene to locate start Node end Node. The biggest graph I've tried with this approach had 2.5million Nodes 250million Relationships. It took a quite a long time, but much of the slowness was because it was performed on an old laptop with 2GB of RAM, I didn't give the BatchInserter a properties file, and I used default JVM parameters. There is at least one obvious downside to this though, and that is that you pollute the dataset with GID properties. Alex On Wed, Jun 2, 2010 at 5:53 PM, Craig Taverner cr...@amanzi.com wrote: I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Yes. I guess you cannot escape an old-new ID map (or in your case ID-GID). I think it is possible to maintain that outside the database: - In memory, as I suggested, but only valid under some circumstances - On disk, and lucene is a good idea here. Why not index with lucene, but without storing the property to the node? Since the index method takes the node, the property and the value, I assume the property and value might be possible to index without actually being real properties and values? I've not tried, but this way the graph is cleaner, and we can delete the lucene index afterwards! On Wed, Jun 2, 2010 at 6:12 PM, Alex Averbuch alex.averb...@gmail.comwrote: Hi Craig, Just a quick note about needing to keep all IDs in memory during an import/export operation. The way I'm doing it at the moment it's not necessary to do so. When exporting: Write IDs to the exported format (this could be JSON, XML, GML, GraphML, etc) When importing: First import all Nodes, this is easy to do in most formats (all that I've tried). While importing Nodes, store index 1 extra property in every Node, I call this GID for global ID. Next import all Relationships, using the GID and Lucene to locate start Node end Node. The biggest graph I've tried with this approach had 2.5million Nodes 250million Relationships. It took a quite a long time, but much of the slowness was because it was performed on an old laptop with 2GB of RAM, I didn't give the BatchInserter a properties file, and I used default JVM parameters. There is at least one obvious downside to this though, and that is that you pollute the dataset with GID properties. Alex On Wed, Jun 2, 2010 at 5:53 PM, Craig Taverner cr...@amanzi.com wrote: I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
- On disk, and lucene is a good idea here. Why not index with lucene, but without storing the property to the node? I like it! This sounds like a cleaner approach than my current one, and (I'm not sure about how to do this either) may be no more complex than the way I'm doing it. Like you say, we can delete the Lucene index afterwards... or just the Lucene folder associated with that one property. I'm writing exams, thesis reports, and thesis opposition reports for the next month so I won't have time to try it out. If you give it a try I'd be interesting in hearing how the Lunece only approach work out though. On Wed, Jun 2, 2010 at 6:42 PM, Craig Taverner cr...@amanzi.com wrote: Yes. I guess you cannot escape an old-new ID map (or in your case ID-GID). I think it is possible to maintain that outside the database: - In memory, as I suggested, but only valid under some circumstances - On disk, and lucene is a good idea here. Why not index with lucene, but without storing the property to the node? Since the index method takes the node, the property and the value, I assume the property and value might be possible to index without actually being real properties and values? I've not tried, but this way the graph is cleaner, and we can delete the lucene index afterwards! On Wed, Jun 2, 2010 at 6:12 PM, Alex Averbuch alex.averb...@gmail.com wrote: Hi Craig, Just a quick note about needing to keep all IDs in memory during an import/export operation. The way I'm doing it at the moment it's not necessary to do so. When exporting: Write IDs to the exported format (this could be JSON, XML, GML, GraphML, etc) When importing: First import all Nodes, this is easy to do in most formats (all that I've tried). While importing Nodes, store index 1 extra property in every Node, I call this GID for global ID. Next import all Relationships, using the GID and Lucene to locate start Node end Node. The biggest graph I've tried with this approach had 2.5million Nodes 250million Relationships. It took a quite a long time, but much of the slowness was because it was performed on an old laptop with 2GB of RAM, I didn't give the BatchInserter a properties file, and I used default JVM parameters. There is at least one obvious downside to this though, and that is that you pollute the dataset with GID properties. Alex On Wed, Jun 2, 2010 at 5:53 PM, Craig Taverner cr...@amanzi.com wrote: I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any
[Neo4j] Tell neo to not reuse ID's
Hej, Is it somehow possible to tell Neo4j not to reuse id's at all? Im running some experiments on Neo4j and I want to add and delete the nodes and relationships. To make sure that I can repeat the same experiment I create a log containing the ID's of the nodes i want to delete. To make sure that I can rerun the experiment each node I add has to have the same ID in each experiment. If ID's can be reused that is not always the case thats why I need to turn it off or work around it. hope for your help cheers Martin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
Thanks. Big transactions were indeed problematic. Splitting them down into smaller chunks did the trick. I'm still disappointed by the on-disk size of a minimal node without any relationships or attributes. For 500K nodes, it is taking 80MB space (160 byes/node) and for 1M objects it is consuming 160MB (again 160 byes/node). Is this normal? 4.0Kactive_tx_log 12K lucene 12K lucene-fulltext 4.0Kneostore 4.0Kneostore.id 4.4Mneostore.nodestore.db 4.0Kneostore.nodestore.db.id 12M neostore.propertystore.db 4.0Kneostore.propertystore.db.arrays 4.0Kneostore.propertystore.db.arrays.id 4.0Kneostore.propertystore.db.id 4.0Kneostore.propertystore.db.index 4.0Kneostore.propertystore.db.index.id 4.0Kneostore.propertystore.db.index.keys 4.0Kneostore.propertystore.db.index.keys.id 64M neostore.propertystore.db.strings 4.0Kneostore.propertystore.db.strings.id 4.0Kneostore.relationshipstore.db 4.0Kneostore.relationshipstore.db.id 4.0Kneostore.relationshiptypestore.db 4.0Kneostore.relationshiptypestore.db.id 4.0Kneostore.relationshiptypestore.db.names 4.0Kneostore.relationshiptypestore.db.names.id 4.0Knioneo_logical.log.active 4.0Ktm_tx_log.1 80M total On Wed, Jun 2, 2010 at 12:17 AM, Mattias Persson matt...@neotechnology.comwrote: Exactly, the problem is most likely that you try to insert all your stuff in one transaction. All data for a transaction is kept in memory until committed so for really big transactions it can fill your entire heap. Try to group 10k operations or so for big insertions or use the batch inserter. Links: http://wiki.neo4j.org/content/Transactions#Big_transactions http://wiki.neo4j.org/content/Batch_Insert 2010/6/2, Laurent Laborde kerdez...@gmail.com: On Wed, Jun 2, 2010 at 3:50 AM, Biren Gandhi biren.gan...@gmail.com wrote: Is there any limit on number of nodes that can be created in a neo4j instance? Any other tips? I created hundreds of millions of nodes without problems, but it was splitted into many transaction. -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
Only 4,4mb out of those 80 is consumed by nodes so you must be storing some properties somewhere. Would you mind sharing your code so that it would be easier to get a better insight into your problem? 2010/6/2, Biren Gandhi biren.gan...@gmail.com: Thanks. Big transactions were indeed problematic. Splitting them down into smaller chunks did the trick. I'm still disappointed by the on-disk size of a minimal node without any relationships or attributes. For 500K nodes, it is taking 80MB space (160 byes/node) and for 1M objects it is consuming 160MB (again 160 byes/node). Is this normal? 4.0Kactive_tx_log 12K lucene 12K lucene-fulltext 4.0Kneostore 4.0Kneostore.id 4.4Mneostore.nodestore.db 4.0Kneostore.nodestore.db.id 12M neostore.propertystore.db 4.0Kneostore.propertystore.db.arrays 4.0Kneostore.propertystore.db.arrays.id 4.0Kneostore.propertystore.db.id 4.0Kneostore.propertystore.db.index 4.0Kneostore.propertystore.db.index.id 4.0Kneostore.propertystore.db.index.keys 4.0Kneostore.propertystore.db.index.keys.id 64M neostore.propertystore.db.strings 4.0Kneostore.propertystore.db.strings.id 4.0Kneostore.relationshipstore.db 4.0Kneostore.relationshipstore.db.id 4.0Kneostore.relationshiptypestore.db 4.0Kneostore.relationshiptypestore.db.id 4.0Kneostore.relationshiptypestore.db.names 4.0Kneostore.relationshiptypestore.db.names.id 4.0Knioneo_logical.log.active 4.0Ktm_tx_log.1 80M total On Wed, Jun 2, 2010 at 12:17 AM, Mattias Persson matt...@neotechnology.comwrote: Exactly, the problem is most likely that you try to insert all your stuff in one transaction. All data for a transaction is kept in memory until committed so for really big transactions it can fill your entire heap. Try to group 10k operations or so for big insertions or use the batch inserter. Links: http://wiki.neo4j.org/content/Transactions#Big_transactions http://wiki.neo4j.org/content/Batch_Insert 2010/6/2, Laurent Laborde kerdez...@gmail.com: On Wed, Jun 2, 2010 at 3:50 AM, Biren Gandhi biren.gan...@gmail.com wrote: Is there any limit on number of nodes that can be created in a neo4j instance? Any other tips? I created hundreds of millions of nodes without problems, but it was splitted into many transaction. -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
There is only 1 property - n (to store name of the node) - used as follows: Node node = graphDb.createNode(); node.setProperty( NAME_KEY, username ); And the values of username are Node-1, Node-2 etc. On Wed, Jun 2, 2010 at 3:14 PM, Mattias Persson matt...@neotechnology.comwrote: Only 4,4mb out of those 80 is consumed by nodes so you must be storing some properties somewhere. Would you mind sharing your code so that it would be easier to get a better insight into your problem? 2010/6/2, Biren Gandhi biren.gan...@gmail.com: Thanks. Big transactions were indeed problematic. Splitting them down into smaller chunks did the trick. I'm still disappointed by the on-disk size of a minimal node without any relationships or attributes. For 500K nodes, it is taking 80MB space (160 byes/node) and for 1M objects it is consuming 160MB (again 160 byes/node). Is this normal? 4.0Kactive_tx_log 12K lucene 12K lucene-fulltext 4.0Kneostore 4.0Kneostore.id 4.4Mneostore.nodestore.db 4.0Kneostore.nodestore.db.id 12M neostore.propertystore.db 4.0Kneostore.propertystore.db.arrays 4.0Kneostore.propertystore.db.arrays.id 4.0Kneostore.propertystore.db.id 4.0Kneostore.propertystore.db.index 4.0Kneostore.propertystore.db.index.id 4.0Kneostore.propertystore.db.index.keys 4.0Kneostore.propertystore.db.index.keys.id 64M neostore.propertystore.db.strings 4.0Kneostore.propertystore.db.strings.id 4.0Kneostore.relationshipstore.db 4.0Kneostore.relationshipstore.db.id 4.0Kneostore.relationshiptypestore.db 4.0Kneostore.relationshiptypestore.db.id 4.0Kneostore.relationshiptypestore.db.names 4.0Kneostore.relationshiptypestore.db.names.id 4.0Knioneo_logical.log.active 4.0Ktm_tx_log.1 80M total On Wed, Jun 2, 2010 at 12:17 AM, Mattias Persson matt...@neotechnology.comwrote: Exactly, the problem is most likely that you try to insert all your stuff in one transaction. All data for a transaction is kept in memory until committed so for really big transactions it can fill your entire heap. Try to group 10k operations or so for big insertions or use the batch inserter. Links: http://wiki.neo4j.org/content/Transactions#Big_transactions http://wiki.neo4j.org/content/Batch_Insert 2010/6/2, Laurent Laborde kerdez...@gmail.com: On Wed, Jun 2, 2010 at 3:50 AM, Biren Gandhi biren.gan...@gmail.com wrote: Is there any limit on number of nodes that can be created in a neo4j instance? Any other tips? I created hundreds of millions of nodes without problems, but it was splitted into many transaction. -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Node creation limit
Here is some content from neostore.propertystore.db.strings - another huge file. What are the max number of nodes/relationships that people have tried with Neo4j so far? Can someone share disk space usage characteristics? od -N 1000 -x -c neostore.propertystore.db.strings 000 8500 \0 \0 \0 205 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 200 0100 0c00 \0 \0 \0 \0 \0 001 377 377 377 377 \0 \0 \0 \f 377 377 220 4e00 6f00 6400 6500 2d00 3000 377 377 \0 N \0 o \0 d \0 e \0 - \0 0 \0 \0 240 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 400 ff01 00ff \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 377 377 377 377 \0 420 ff0c 00ff 004e 006f 0064 0065 \0 \0 \f 377 377 377 377 \0 N \0 o \0 d \0 e \0 440 002d 0031 - \0 1 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 460 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 600 0100 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 620 0c00 4e00 6f00 377 377 377 377 \0 \0 \0 \f 377 377 377 377 \0 N \0 o 640 6400 6500 2d00 3200 \0 d \0 e \0 - \0 2 \0 \0 \0 \0 \0 \0 \0 \0 660 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001020 ff01 00ff ff0c \0 \0 \0 \0 001 377 377 377 377 \0 \0 \0 \f 377 377 377 0001040 00ff 004e 006f 0064 0065 002d 0033 377 \0 N \0 o \0 d \0 e \0 - \0 3 \0 \0 \0 0001060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001220 0100 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 377 377 377 377 \0 \0 0001240 0c00 4e00 6f00 6400 6500 2d00 \0 \f 377 377 377 377 \0 N \0 o \0 d \0 e \0 - 0001260 3400 \0 4 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0001300 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001420 ff01 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 377 0001440 00ff ff0c 00ff 004e 006f 377 377 377 \0 \0 \0 \f 377 377 377 377 \0 N \0 o \0 0001460 0064 0065 002d 0035 d \0 e \0 - \0 5 \0 \0 \0 \0 \0 \0 \0 \0 \0 0001500 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * On Wed, Jun 2, 2010 at 3:50 PM, Biren Gandhi biren.gan...@gmail.com wrote: There is only 1 property - n (to store name of the node) - used as follows: Node node = graphDb.createNode(); node.setProperty( NAME_KEY, username ); And the values of username are Node-1, Node-2 etc. On Wed, Jun 2, 2010 at 3:14 PM, Mattias Persson matt...@neotechnology.com wrote: Only 4,4mb out of those 80 is consumed by nodes so you must be storing some properties somewhere. Would you mind sharing your code so that it would be easier to get a better insight into your problem? 2010/6/2, Biren Gandhi biren.gan...@gmail.com: Thanks. Big transactions were indeed problematic. Splitting them down into smaller chunks did the trick. I'm still disappointed by the on-disk size of a minimal node without any relationships or attributes. For 500K nodes, it is taking 80MB space (160 byes/node) and for 1M objects it is consuming 160MB (again 160 byes/node). Is this normal? 4.0Kactive_tx_log 12K lucene 12K lucene-fulltext 4.0Kneostore 4.0Kneostore.id 4.4Mneostore.nodestore.db 4.0Kneostore.nodestore.db.id 12M neostore.propertystore.db 4.0Kneostore.propertystore.db.arrays 4.0Kneostore.propertystore.db.arrays.id 4.0Kneostore.propertystore.db.id 4.0Kneostore.propertystore.db.index 4.0Kneostore.propertystore.db.index.id 4.0Kneostore.propertystore.db.index.keys 4.0Kneostore.propertystore.db.index.keys.id 64M neostore.propertystore.db.strings 4.0Kneostore.propertystore.db.strings.id 4.0Kneostore.relationshipstore.db 4.0Kneostore.relationshipstore.db.id 4.0Kneostore.relationshiptypestore.db 4.0Kneostore.relationshiptypestore.db.id 4.0K
Re: [Neo4j] Tell neo to not reuse ID's
Here is a crazy idea that probably only works for nodes. Don't actually delete the nodes, just the relationships and the node properties. The skeleton node will retain the id in the table preventing re-use. If these orphans are not relevant to your tests, this should have the effect you are looking for. On Wed, Jun 2, 2010 at 8:17 PM, Martin Neumann m.neumann.1...@gmail.comwrote: Hej, Is it somehow possible to tell Neo4j not to reuse id's at all? Im running some experiments on Neo4j and I want to add and delete the nodes and relationships. To make sure that I can repeat the same experiment I create a log containing the ID's of the nodes i want to delete. To make sure that I can rerun the experiment each node I add has to have the same ID in each experiment. If ID's can be reused that is not always the case thats why I need to turn it off or work around it. hope for your help cheers Martin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
On Wed, Jun 2, 2010 at 9:30 AM, Johan Svensson jo...@neotechnology.comwrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. I like Postgres Auto Vaccum feature. I think if neo Reuses the holes it's already nice. Some kind of compression and truncate of the files would be great. In my opinion. Just my 2 cents, Thomas ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user