[Neo4j] Traversal performance
Looking for help on how to tune traversals, this is a great product with the best API and I want to make sure Im getting the most from it. I'm trying to understand if 62,500 traversals per second is the best I can do given the following scenario: - 15.6M nodes - 15.6M relationships - Data is structured as shown below so that the root has 250 children, each of its children have 250 children, and each of their children have 250 children - If i get the entire list of children and grandchildren for a top node (max 3 levels deep), I get 62,500 nodes, and this takes about 800-1000ms - The server is a dual core quad 3.2ghz Xeon with 16gb ram - The neo4j.props settings are: neostore.nodestore.db.mapped_memory=1G neostore.relationshipstore.db.mapped_memory=1G neostore.propertystore.db.mapped_memory=1G neostore.propertystore.db.index.mapped_memory=1G neostore.propertystore.db.index.keys.mapped_memory=1G neostore.propertystore.db.strings.mapped_memory=1G neostore.propertystore.db.arrays.mapped_memory=1G - The code that does the traversal is Traverser trav = user.traverse( Order.BREADTH_FIRST, new StopEvaluator() { public boolean isStopNode(TraversalPosition pos) { return pos.depth() = 3; } }, new ReturnableEvaluator() { public boolean isReturnableNode(TraversalPosition pos) { return pos.depth() 3; } }, KNOWS, Direction.BOTH ); for ( Node node : trav ) { // Do something with node... i++; } Data example root node 0-0-0 node 0-0-1 node 0-0-2 ... node 0-1-0 node 0-1-1 node 0-1-2 ... -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371038.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
One initial suggestion would be that your memory mapped settings are probably not very near optimal. If you have a look at the file sizes in your graph data directory then the closer you can get to covering each db files entire size the better. I would assume that some of the files will be bigger than others and in fact you will probably find a few of them are very small, so you are wasting memory on them that you could assign to another memory mapping. So in one of mine I have: 5807428 neostore.nodestore.db 335536170 neostore.relationshipstore.db 398675470 neostore.propertystore.db 1208 neostore.propertystore.db.index 6906 neostore.propertystore.db.index.keys 1112428784 neostore.propertystore.db.strings 158 neostore.propertystore.db.arrays In which case there is no point in me assigning much if any memory to: neostore.propertystore.db.arrays.mapped_memory neostore.propertystore.db.index.keys.mapped_memory neostore.propertystore.db.index.mapped_memory The other thing to take into account is that the neostore.nodestore.db.mapped_memory and neostore.relationshipstore.db.mapped_memory settings have a lot more impact on traversal than the property story settings. The property store settings will help when you are reading properties from nodes or relationships. So if you can assign memory mapping settings for nodes and relationships to fit it all in memory map that would be good, otherwise still best to assign more to those, and definitely don't give the ones like arrays much memory (unless you are using them a lot). On Tue, Sep 27, 2011 at 12:52 PM, Rick rick.devin...@gmail.com wrote: Looking for help on how to tune traversals, this is a great product with the best API and I want to make sure Im getting the most from it. I'm trying to understand if 62,500 traversals per second is the best I can do given the following scenario: - 15.6M nodes - 15.6M relationships - Data is structured as shown below so that the root has 250 children, each of its children have 250 children, and each of their children have 250 children - If i get the entire list of children and grandchildren for a top node (max 3 levels deep), I get 62,500 nodes, and this takes about 800-1000ms - The server is a dual core quad 3.2ghz Xeon with 16gb ram - The neo4j.props settings are: neostore.nodestore.db.mapped_memory=1G neostore.relationshipstore.db.mapped_memory=1G neostore.propertystore.db.mapped_memory=1G neostore.propertystore.db.index.mapped_memory=1G neostore.propertystore.db.index.keys.mapped_memory=1G neostore.propertystore.db.strings.mapped_memory=1G neostore.propertystore.db.arrays.mapped_memory=1G - The code that does the traversal is Traverser trav = user.traverse( Order.BREADTH_FIRST, new StopEvaluator() { public boolean isStopNode(TraversalPosition pos) { return pos.depth() = 3; } }, new ReturnableEvaluator() { public boolean isReturnableNode(TraversalPosition pos) { return pos.depth() 3; } }, KNOWS, Direction.BOTH ); for ( Node node : trav ) { // Do something with node... i++; } Data example root node 0-0-0 node 0-0-1 node 0-0-2 ... node 0-1-0 node 0-1-1 node 0-1-2 ... -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371038.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
I took a look at the files and none were larger than 500MB, however it makes a lot of sense to change the memory as you suggested so I altered the options as shown below. I also started eclipse with different memory options than the defaults (eclipse -vmargs -Xmx2000m -server). The changes didn't make it any faster though. I had read about people getting 2M traversals per second, since I'm only seeing around 65000/sec I'm starting to think that represented the number of nodes searched through not the number returned based on the traversal's criteria. neostore.nodestore.db.mapped_memory=1.5G neostore.relationshipstore.db.mapped_memory=1.5G neostore.propertystore.db.mapped_memory=1.5G neostore.propertystore.db.index.mapped_memory=1.5G neostore.propertystore.db.index.keys.mapped_memory=50M neostore.propertystore.db.strings.mapped_memory=50M neostore.propertystore.db.arrays.mapped_memory=50M my file sizes: neostore.relationshipstore.db 500MB neostore.propertystore.db 383MB neostore.nodestore.db 137MB (others are all less than 1MB) the largest lucene node is 367MB -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371379.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
It wont make any difference if the memory mapping settings are just larger than the file sizes, or a lot larger therefore fiddling with those settings wont make any difference from your original test. Generally when people see very high performance it is because a lot of the data they are traversing over is already in memory, i.e. the caches are warmed. So is this test you are running just from a cold start, and if so can you try the test twice, within the same vm that is. On Tue, Sep 27, 2011 at 3:48 PM, Rick Devinsus rick.devin...@gmail.comwrote: I took a look at the files and none were larger than 500MB, however it makes a lot of sense to change the memory as you suggested so I altered the options as shown below. I also started eclipse with different memory options than the defaults (eclipse -vmargs -Xmx2000m -server). The changes didn't make it any faster though. I had read about people getting 2M traversals per second, since I'm only seeing around 65000/sec I'm starting to think that represented the number of nodes searched through not the number returned based on the traversal's criteria. neostore.nodestore.db.mapped_memory=1.5G neostore.relationshipstore.db.mapped_memory=1.5G neostore.propertystore.db.mapped_memory=1.5G neostore.propertystore.db.index.mapped_memory=1.5G neostore.propertystore.db.index.keys.mapped_memory=50M neostore.propertystore.db.strings.mapped_memory=50M neostore.propertystore.db.arrays.mapped_memory=50M my file sizes: neostore.relationshipstore.db 500MB neostore.propertystore.db 383MB neostore.nodestore.db 137MB (others are all less than 1MB) the largest lucene node is 367MB -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371379.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
That was it- the cache wasn't warmed. I tried running the same test twice, that increased the speed around 7x (450K traversals per second). Thanks for the help. -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371546.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal performance
Also, try running it 100 times. Then you should see some JVM optimizations/JIT kick in. David On Mon, Sep 26, 2011 at 9:24 PM, Rick Devinsus rick.devin...@gmail.comwrote: That was it- the cache wasn't warmed. I tried running the same test twice, that increased the speed around 7x (450K traversals per second). Thanks for the help. -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371546.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- David Montag david.mon...@neotechnology.com Neo Technology, www.neotechnology.com Cell: 650.556.4411 Skype: ddmontag ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user