I too am experiencing similar problems - possibly worse than you're seeing as I 
am using a very modestly provisioned windows m/c (1.5Gb ram, setting max heap 
to 1Gb, oldish processor).

I found that using the BatchInserter for loading nodes the heap grew and grew 
until when it was exhausted everything ground to a halt practically.  I 
experimented with various settings of the cache memory but nothing made much 
difference. So now I reset the BatchInserter (i.e. shutdown and re-start it) 
ever 100,000 nodes or so.  I posted questions on the list before but the 
replies seemed to suggest that it was just a config issue - but no config 
changes I made helped much.   I get the impression that most people are using 
Neo4j with hugely larger memory footprints than I can realistically expect to 
use at this stage and so maybe that is why this problem may not receive much 
attention..... 

I have a similar approach to you for relationships - i.e. creating them in a 
second pass.  I'm not sure how memory hungry it is, but again have built a 
class that resets the inserters every 100,000 relationships.  It is slow, but 
experimenting with my 'reset' size, didn't make much difference so I'm 
suspecting that its limited by index access time.  Effectively I suspect it's 
going to disk for every index look up that it sees for the first time, and also 
suspect that the size of the index might make a difference as I have over 3m 
nodes in some indexes and these are the ones that are very slow.

I suspect there might be some tuning that can be done, and I really think the 
problem with running out of heap is probably a bug that should be fixed, but am 
now turning my attention to finding ways of creating relationships when the 
initial nodes are created (at least for those for which this is possible) to 
avoid the index lookup overhead...

I'll let you know if/how this helps, but am also interested to learn of others 
experience.

On 9 Jun 2011, at 10:59, Daniel Hepper wrote:

> Hi all,
> 
> I'm struggling with importing a graph with about 10m nodes and 20m
> relationships, with nodes having 0 to 10 relationships. Creating the
> nodes takes about 10 minutes, but creating the relationships is slower
> by several orders of magnitude. I'm using a 2.4 GHz i7 MacBookPro with
> 4GB RAM and conventional HDD.
> 
> The graph is stored as adjacency list in a text file where each line
> has this form:
> 
> Foo|Bar|Baz
> (Node Foo has relations to Bar and Baz)
> 
> My current approach is to iterate over the whole file twice. In the
> first run, I create a node with the property "name" for the first
> entry in the line (Foo in this case) and add it to an index.
> In the second run, I get the start node and the end nodes from the
> index by name and create the relationships.
> 
> My code can be found here: http://pastie.org/2041801
> 
> With my approach, the best I can achieve is 100 created relationships
> per second.
> I experimented with mapped memory settings, but without much effect.
> Is this the speed I can expect?
> Any advice on how to speed up this process?
> 
> Best regards,
> Daniel Hepper
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to