Hi Daniel,

I am working currently on a tool for importing big data sets into Neo4j graphs.
The main problem in such operations is that the usual index
implementations are just too
slow for retrieving the mapping from keys to created node ids, so a
custom solution is
needed, that is dependent to a varying degree on the distribution of
values of the input set.

While your dataset is smaller than the data sizes i deal with, i would
like to use it as a test case. If you could
provide somehow the actual data or something that emulates them, I
would be grateful.

If you want to see my approach, it is available here

https://github.com/digitalstain/BigDataImport

The core algorithm is an XJoin style two-level-hashing scheme with
adaptable eviction strategies but it is not production ready yet,
mainly from an API perspective.

You can contact me directly for any details regarding this issue.

cheers,
CG

On Thu, Jun 9, 2011 at 12:59 PM, Daniel Hepper <daniel.hep...@gmail.com> wrote:
> Hi all,
>
> I'm struggling with importing a graph with about 10m nodes and 20m
> relationships, with nodes having 0 to 10 relationships. Creating the
> nodes takes about 10 minutes, but creating the relationships is slower
> by several orders of magnitude. I'm using a 2.4 GHz i7 MacBookPro with
> 4GB RAM and conventional HDD.
>
> The graph is stored as adjacency list in a text file where each line
> has this form:
>
> Foo|Bar|Baz
> (Node Foo has relations to Bar and Baz)
>
> My current approach is to iterate over the whole file twice. In the
> first run, I create a node with the property "name" for the first
> entry in the line (Foo in this case) and add it to an index.
> In the second run, I get the start node and the end nodes from the
> index by name and create the relationships.
>
> My code can be found here: http://pastie.org/2041801
>
> With my approach, the best I can achieve is 100 created relationships
> per second.
> I experimented with mapped memory settings, but without much effect.
> Is this the speed I can expect?
> Any advice on how to speed up this process?
>
> Best regards,
> Daniel Hepper
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to