Re: [Neo4j] Batch inserter shutdown taking forever

2010-07-27 Thread Mattias Persson
Since you're doing a depth 1 traversal please use something like this
instead:

for ( Relationship rel : graphDb.getReferenceNode().getRelationships(
Relationships.ROUTE, Direction.OUTGOING ) )
{
Node node = rel.getEndNode();
// Do stuff
}

Since a traverser keeps more memory than a simple call to getRelationships.
Another thing, are you doing any write operation in that for-loop of yours?
Also do you shut down the batch inserter and start a new
EmbeddedGraphDatabase to traverse on, or how do you get a hold of the
graphDb?

2010/7/26 Tim Jones bogol...@ymail.com

 OK, I found out what's taking the time. It's iterating over the result set
 of a
 traverser:

// visit each Route node, and add it to the array
Traverser routes = graphDb.getReferenceNode().traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.DEPTH_ONE,
ReturnableEvaluator.ALL_BUT_START_NODE,
Relationships.ROUTE, Direction.OUTGOING);

for (Node node : routes)
{
 // do stuff
}


 The 'for' loop takes ages. There are probably 2m nodes being returned by
 that
 traverser at the moment, and that's only a very small subset of the data I
 want
 to add to the database.

 is there any way to tinker with the neo4j properties or anything to improve
 performance here?

 Thanks


 - Original Message 
  From: Mattias Persson matt...@neotechnology.com
  To: Neo4j user discussions user@lists.neo4j.org
  Sent: Sat, July 24, 2010 10:23:02 PM
  Subject: Re: [Neo4j] Batch inserter shutdown taking forever
 
  2010/7/21 Tim Jones bogol...@ymail.com
 
Hi,
  
   I'm using a BatchInserter and a LuceneIndexBatchInserter to  insert 5m
   nodes and
   5m relationships into a graph in one  go. The insertion seems to work,
 but
   shutting down takes forever - it's  been 2 hours now.
  
   At first, the JVM gave me garbage collection  exception, so I've set
 the
   heap to
   2gb.
  
   'top'  tells me that the application is still running:
  
PID  USER  PR  NI  VIRT  RES  SHR S %CPU  %MEMTIME+  COMMAND
9994 tim 17   0 2620m 2.3g 238m S 99.5 39.1 115:48.84 java
  
but checking the filesystem by running 'ls -l' a few times doesn't
 indicate
   that
   files are being updated.
  
   Is this  normal? Is there a way to improve performance?
  
 
  No, it sounds  quite weird. Any chance to have a look at your code?
 
 
  
   I'm  loading all my data in one go to ease creating the db - it's
 simpler to
create it from scratch each time instead of updating an existing
 database
 -
   so
   ideally I don't want to break this job down into multiple  smaller jobs
   (actually, this would be OK if performance was good, but I  ran into
   problems
   inserting data and retrieving existing  nodes).
  
 
  What kind of problems? could you supply code and  description of your
  problems?

 Problems doing something similar in relational dbs. Also, the API
 recommends to
 optimise the batch search index before using it for lookups. I just decided
 not
 to take this approach.

 
 
  
   Thanks,
Tim
  
  
  
  
  
___
   Neo4j mailing  list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Hacker,  Neo Technology
  www.neotechnology.com
  ___
  Neo4j  mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 




 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch inserter shutdown taking forever

2010-07-24 Thread Mattias Persson
2010/7/21 Tim Jones bogol...@ymail.com

 Hi,

 I'm using a BatchInserter and a LuceneIndexBatchInserter to insert 5m
 nodes and
 5m relationships into a graph in one go. The insertion seems to work, but
 shutting down takes forever - it's been 2 hours now.

 At first, the JVM gave me garbage collection exception, so I've set the
 heap to
 2gb.

 'top' tells me that the application is still running:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  9994 tim17   0 2620m 2.3g 238m S 99.5 39.1 115:48.84 java

 but checking the filesystem by running 'ls -l' a few times doesn't indicate
 that
 files are being updated.

 Is this normal? Is there a way to improve performance?


No, it sounds quite weird. Any chance to have a look at your code?



 I'm loading all my data in one go to ease creating the db - it's simpler to
 create it from scratch each time instead of updating an existing database -
 so
 ideally I don't want to break this job down into multiple smaller jobs
 (actually, this would be OK if performance was good, but I ran into
 problems
 inserting data and retrieving existing nodes).


What kind of problems? could you supply code and description of your
problems?



 Thanks,
 Tim





 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Batch inserter shutdown taking forever

2010-07-21 Thread Tim Jones
Hi,

I'm using a BatchInserter and a LuceneIndexBatchInserter to insert 5m nodes 
and 
5m relationships into a graph in one go. The insertion seems to work, but 
shutting down takes forever - it's been 2 hours now.

At first, the JVM gave me garbage collection exception, so I've set the heap to 
2gb.

'top' tells me that the application is still running:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 9994 tim17   0 2620m 2.3g 238m S 99.5 39.1 115:48.84 java

but checking the filesystem by running 'ls -l' a few times doesn't indicate 
that 
files are being updated.

Is this normal? Is there a way to improve performance?

I'm loading all my data in one go to ease creating the db - it's simpler to 
create it from scratch each time instead of updating an existing database - so 
ideally I don't want to break this job down into multiple smaller jobs 
(actually, this would be OK if performance was good, but I ran into problems 
inserting data and retrieving existing nodes).

Thanks,
Tim



  

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user