Re: [Neo4j] How to create a graph database out of a huge dataset?

Michael Hunger Sun, 17 Jul 2011 10:13:21 -0700

Stephan,

This is a common thing when inserting data.


You should be able to use lucene in both settings (6M authors is not that much).

Please have a look at your heap memory settings (and in transactional mode also 
your memory-map settings for neo4j).

For batch inserter. You can query the index after you called flush on the 
index. That should return a node-id which you then can use to create 
relationship to that looked up node.

If you could share the code that you've already written we can have a look at 
it.

The simplest approach is to just take a HashMap for the authors and their name 
as key.
(while indexing them for later use anyway). Just make sure your heap is large 
enough to hold the map and the objects created during the insert.


#2 can you describe your distributed system? You can run a cluster of neo4j-HA 
instances that each of your distributed engines connects to. (HA instances can 
run embedded or as Neo4j-servers depending on your needs).  (see here: 
http://docs.neo4j.org/chunked/stable/ha.html)

Cheers

Michael

Am 17.07.2011 um 18:31 schrieb st3ven:

> Hi all,
> 
> I'm new to neo4j and graph databases.
> To create my graph database I got two questions for you:
> 
> 1.
> I want to create a graph database out of a huge CSV file.
> The problem is, that i need to index the nodes I have already created, so
> that I don't create duplicate nodes.
> 
> My CSV file looks like this:
> 
> Article A, Timestamp, Author A
> Article A, Timestamp, Author B
> Article A, Timestamp, Author C
> Article B, Timestamp, Author A
> Article B, Timestamp, Author B
> Article B, Timestamp, Author D
> 
> As you can see I need to access nodes I have already created and connect
> them to the next Article.
> Right now I'm using the LuceneIndex, but with around 6M authors this is
> getting really slow.
> Is there any other possibility to access nodes that I have already created?
> BatchInserter also doesn't work, because there you can't access nodes which
> you have created before.
> 
> 2.
> Is it possible to use neo4j in a distributed system?
> If it is possible, are there any guides or tutorials how to realize that?
> 
> Thanks for your help,
> Stephan
> 
> 
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/How-to-create-a-graph-database-out-of-a-huge-dataset-tp3177076p3177076.html
> Sent from the Neo4J Community Discussions mailing list archive at Nabble.com.
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] How to create a graph database out of a huge dataset?

Reply via email to