Hello Michael,

I got the zipfile from here 
http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-stub-meta-history.xml.gz
http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-stub-meta-history.xml.gz
. The unzipped file is a XML-file and I extracted the important informations
and saved the informations in my CSV-file. I extracted the article name, the
timestamp and the author.

>Ok, what parameters do you give your JVM (heap space?) 

Actually I changed nothing so far. I used just the standard options from
Eclipse and Java.
Should I change the heap space and if so, what are the best settings and how
do I do that?

>The time needed for that adds up as well :) That's what I meant, so
converting the original csv to a file >that costs less time to parse (for
every import) pays off. 

What kind of file would be faster to parse or should I change something in
my created CSV-file?

>You're creating millions of relationships to a single node. That might have
performance implications for >later and might have also for now for the
import. 

Actually I connect every article with the reference node and that are 20M
relationships.
The authors are only connected to the article they have written or edited.
I don't see where I also create millions of relationships to a single node
and I can delete the relationships to the reference node if there is the
problem.
If there are also any problems with the author nodes how do I use this
sharding key or the index for those too. 
To be honest, I don't know what "sharding" key means. :-)
An example for this would be nice.

Cheers 
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/How-to-create-a-graph-database-out-of-a-huge-dataset-tp3177076p3180256.html
Sent from the Neo4J Community Discussions mailing list archive at Nabble.com.
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to