[Neo4j] Traversal performance

2011-09-26 Thread Rick
Looking for help on how to tune traversals, this is a great product with the
best API and I want to make sure Im getting the most from it.  I'm trying to
understand if 62,500 traversals per second is the best I can do given the
following scenario:

- 15.6M nodes
- 15.6M relationships
- Data is structured as shown below so that the root has 250 children, each
of its children have 250 children, and each of their children have 250
children
- If i get the entire list of children and grandchildren for a top node (max
3 levels deep), I get 62,500 nodes, and this takes about 800-1000ms
- The server is a dual core quad 3.2ghz Xeon with 16gb ram
- The neo4j.props settings are:

neostore.nodestore.db.mapped_memory=1G
neostore.relationshipstore.db.mapped_memory=1G
neostore.propertystore.db.mapped_memory=1G
neostore.propertystore.db.index.mapped_memory=1G
neostore.propertystore.db.index.keys.mapped_memory=1G
neostore.propertystore.db.strings.mapped_memory=1G
neostore.propertystore.db.arrays.mapped_memory=1G


- The code that does the traversal is 

Traverser trav = user.traverse( Order.BREADTH_FIRST, 
new StopEvaluator()
{ public boolean isStopNode(TraversalPosition pos) { return pos.depth() =
3; }  }, 
new ReturnableEvaluator() { public 
boolean
isReturnableNode(TraversalPosition pos) { return pos.depth() 3; }
   }, KNOWS, Direction.BOTH );

for ( Node node : trav )
{   
// Do something with node...
i++;
}   


Data example
  root
node 0-0-0
  node 0-0-1
  node 0-0-2
  ...
   node 0-1-0
  node 0-1-1
  node 0-1-2
  ...



--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371038.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal performance

2011-09-26 Thread Bryce
One initial suggestion would be that your memory mapped settings are
probably not very near optimal.  If you have a look at the file sizes in
your graph data directory then the closer you can get to covering each db
files entire size the better.  I would assume that some of the files will be
bigger than others and in fact you will probably find a few of them are
very small, so you are wasting memory on them that you could assign to
another memory mapping.

So in one of mine I have:
5807428  neostore.nodestore.db
335536170  neostore.relationshipstore.db
398675470  neostore.propertystore.db
1208  neostore.propertystore.db.index
6906  neostore.propertystore.db.index.keys
1112428784  neostore.propertystore.db.strings
158  neostore.propertystore.db.arrays

In which case there is no point in me assigning much if any memory to:
neostore.propertystore.db.arrays.mapped_memory
neostore.propertystore.db.index.keys.mapped_memory
neostore.propertystore.db.index.mapped_memory

The other thing to take into account is that the
neostore.nodestore.db.mapped_memory and
neostore.relationshipstore.db.mapped_memory
settings have a lot more impact on traversal than the property story
settings.  The property store settings will help when you are reading
properties from nodes or relationships.

So if you can assign memory mapping settings for nodes and relationships to
fit it all in memory map that would be good, otherwise still best to assign
more to those, and definitely don't give the ones like arrays much memory
(unless you are using them a lot).

On Tue, Sep 27, 2011 at 12:52 PM, Rick rick.devin...@gmail.com wrote:

 Looking for help on how to tune traversals, this is a great product with
 the
 best API and I want to make sure Im getting the most from it.  I'm trying
 to
 understand if 62,500 traversals per second is the best I can do given the
 following scenario:

 - 15.6M nodes
 - 15.6M relationships
 - Data is structured as shown below so that the root has 250 children, each
 of its children have 250 children, and each of their children have 250
 children
 - If i get the entire list of children and grandchildren for a top node
 (max
 3 levels deep), I get 62,500 nodes, and this takes about 800-1000ms
 - The server is a dual core quad 3.2ghz Xeon with 16gb ram
 - The neo4j.props settings are:

 neostore.nodestore.db.mapped_memory=1G
 neostore.relationshipstore.db.mapped_memory=1G
 neostore.propertystore.db.mapped_memory=1G
 neostore.propertystore.db.index.mapped_memory=1G
 neostore.propertystore.db.index.keys.mapped_memory=1G
 neostore.propertystore.db.strings.mapped_memory=1G
 neostore.propertystore.db.arrays.mapped_memory=1G


 - The code that does the traversal is

Traverser trav = user.traverse( Order.BREADTH_FIRST,
 new StopEvaluator()
 { public boolean isStopNode(TraversalPosition pos) { return pos.depth() =
 3; }  },
new ReturnableEvaluator() { public
 boolean
 isReturnableNode(TraversalPosition pos) { return pos.depth() 3; }
   }, KNOWS, Direction.BOTH );

for ( Node node : trav )
{
// Do something with node...
i++;
}


 Data example
  root
node 0-0-0
  node 0-0-1
  node 0-0-2
  ...
   node 0-1-0
  node 0-1-1
  node 0-1-2
  ...



 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371038.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal performance

2011-09-26 Thread Rick Devinsus
I took a look at the files and none were larger than 500MB, however it makes
a lot of sense to change the memory as you suggested so I altered the
options as shown below.  I also started eclipse with different memory
options than the defaults (eclipse -vmargs -Xmx2000m -server).  The changes
didn't make it any faster though.

I had read about people getting 2M traversals per second, since I'm only
seeing around 65000/sec  I'm starting to think that represented the number
of nodes searched through not the number returned based on the traversal's
criteria.


neostore.nodestore.db.mapped_memory=1.5G
neostore.relationshipstore.db.mapped_memory=1.5G
neostore.propertystore.db.mapped_memory=1.5G
neostore.propertystore.db.index.mapped_memory=1.5G
neostore.propertystore.db.index.keys.mapped_memory=50M
neostore.propertystore.db.strings.mapped_memory=50M
neostore.propertystore.db.arrays.mapped_memory=50M

my file sizes:
neostore.relationshipstore.db 500MB
neostore.propertystore.db 383MB
neostore.nodestore.db 137MB
(others are all less than 1MB)
the largest lucene node is 367MB

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371379.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal performance

2011-09-26 Thread Bryce
It wont make any difference if the memory mapping settings are just larger
than the file sizes, or a lot larger therefore fiddling with those
settings wont make any difference from your original test.

Generally when people see very high performance it is because a lot of the
data they are traversing over is already in memory, i.e. the caches are
warmed.  So is this test you are running just from a cold start, and if so
can you try the test twice, within the same vm that is.

On Tue, Sep 27, 2011 at 3:48 PM, Rick Devinsus rick.devin...@gmail.comwrote:

 I took a look at the files and none were larger than 500MB, however it
 makes
 a lot of sense to change the memory as you suggested so I altered the
 options as shown below.  I also started eclipse with different memory
 options than the defaults (eclipse -vmargs -Xmx2000m -server).  The changes
 didn't make it any faster though.

 I had read about people getting 2M traversals per second, since I'm only
 seeing around 65000/sec  I'm starting to think that represented the number
 of nodes searched through not the number returned based on the traversal's
 criteria.


 neostore.nodestore.db.mapped_memory=1.5G
 neostore.relationshipstore.db.mapped_memory=1.5G
 neostore.propertystore.db.mapped_memory=1.5G
 neostore.propertystore.db.index.mapped_memory=1.5G
 neostore.propertystore.db.index.keys.mapped_memory=50M
 neostore.propertystore.db.strings.mapped_memory=50M
 neostore.propertystore.db.arrays.mapped_memory=50M

 my file sizes:
 neostore.relationshipstore.db 500MB
 neostore.propertystore.db 383MB
 neostore.nodestore.db 137MB
 (others are all less than 1MB)
 the largest lucene node is 367MB

 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371379.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal performance

2011-09-26 Thread Rick Devinsus
That was it- the cache wasn't warmed.  I tried running the same test twice,
that increased the speed around 7x (450K traversals per second).  Thanks for
the help.

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371546.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal performance

2011-09-26 Thread David Montag
Also, try running it 100 times. Then you should see some JVM
optimizations/JIT kick in.

David

On Mon, Sep 26, 2011 at 9:24 PM, Rick Devinsus rick.devin...@gmail.comwrote:

 That was it- the cache wasn't warmed.  I tried running the same test twice,
 that increased the speed around 7x (450K traversals per second).  Thanks
 for
 the help.

 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Traversal-performance-tp3371038p3371546.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
David Montag david.mon...@neotechnology.com
Neo Technology, www.neotechnology.com
Cell: 650.556.4411
Skype: ddmontag
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user