Re: [Neo4j] Speeding up initial import of graph

2011-06-10 Thread Paul Bandler
On 9 Jun 2011, at 22:12, Michael Hunger wrote: Please keep in mind that the HashMap of 10M strings - longs will take a substantial amount of heap memory. That's not the fault of Neo4j :) On my system it alone takes 1.8 G of memory (distributed across the strings, the hashmap-entries and

Re: [Neo4j] Speeding up initial import of graph

2011-06-10 Thread Michael Hunger
You're right the lucene based import shouldn't fail for memory problems, I will look into that. My suggestion is valid if you want to use an in memory map to speed up the import. And if you're able to perhaps analyze / partition your data that might be a viable solution. Will get back to you

[Neo4j] Speeding up initial import of graph

2011-06-09 Thread Daniel Hepper
Hi all, I'm struggling with importing a graph with about 10m nodes and 20m relationships, with nodes having 0 to 10 relationships. Creating the nodes takes about 10 minutes, but creating the relationships is slower by several orders of magnitude. I'm using a 2.4 GHz i7 MacBookPro with 4GB RAM and

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Paul Bandler
I too am experiencing similar problems - possibly worse than you're seeing as I am using a very modestly provisioned windows m/c (1.5Gb ram, setting max heap to 1Gb, oldish processor). I found that using the BatchInserter for loading nodes the heap grew and grew until when it was exhausted

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Chris Gioran
Hi Daniel, I am working currently on a tool for importing big data sets into Neo4j graphs. The main problem in such operations is that the usual index implementations are just too slow for retrieving the mapping from keys to created node ids, so a custom solution is needed, that is dependent to a

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Michael Hunger
I recreated Daniels code in Java, mainly because some things were missing from his scala example. You're right that the index is the bottleneck. But with your small data set it should be possible to cache the 10m nodes in a heap that fits in your machine. I ran it first with the index and had

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Daniel Hepper
I will try caching the nodes in the heap as Michael suggested and I'll also look into Chris' tool. Thanks everybody for the effort and the suggestions! Daniel On Thu, Jun 9, 2011 at 1:27 PM, Michael Hunger michael.hun...@neotechnology.com wrote: I recreated Daniels code in Java, mainly

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Paul Bandler
I ran Michael’s example test import program with the Map replacing the index on my on more modestly configured machine to see whether the import scaling problems I have reported previously using Batchinserter were reproduced. They were – I gave the program 1G of heap and watched it run using

Re: [Neo4j] Speeding up initial import of graph

2011-06-09 Thread Michael Hunger
Please keep in mind that the HashMap of 10M strings - longs will take a substantial amount of heap memory. That's not the fault of Neo4j :) On my system it alone takes 1.8 G of memory (distributed across the strings, the hashmap-entries and the longs). So 3 GB of heap are sensible to run this,