I'm not sure it's such a good idea to call tx.success() on every iteration of the loop. I suggest call it only in the commit, and after the loop (ie. move it two lines down).
Also I think a commit size of 50k it a little large. You're probably not going to see much improvement past 10k. In fact I generally only use 1k myself (but I hear 10k is popular too :-) On Sun, Jul 17, 2011 at 8:53 PM, st3ven <st3...@web.de> wrote: > Hi, > > thanks for your fast answer. > Right now I'm using lucene for 6M authors, but my whole dataset consists of > nearly 25M authors. > Can i use lucene there also, because I think this getting really slow to > check if a user already exists. > How can I change my heap memory settings and my memory-map settings, cause > I'm using the transactional mode? > Cause I think with 25M authors I will get a OutOfMemory Exception. > > Here is my code that I have already written so far: > > import java.io.BufferedReader; > import java.io.FileReader; > import java.io.IOException; > > import org.neo4j.graphdb.GraphDatabaseService; > import org.neo4j.graphdb.Node; > import org.neo4j.graphdb.Relationship; > import org.neo4j.graphdb.Transaction; > import org.neo4j.graphdb.index.Index; > import org.neo4j.graphdb.index.IndexHits; > import org.neo4j.graphdb.index.IndexManager; > import org.neo4j.kernel.EmbeddedGraphDatabase; > > public class WikiGraphRegUser { > > /** > * @param args > */ > public static void main(String[] args) throws IOException { > > BufferedReader bf = new BufferedReader(new FileReader( > "E:/wiki0.csv")); > WikiGraphRegUser wgru = new WikiGraphRegUser(); > wgru.createGraphDatabase(bf); > } > > private String articleName = ""; > private GraphDatabaseService db; > private IndexManager index; > private Index<Node> authorList; > private int transactionCounter = 0; > private Node article; > private boolean isFirstAuthor = false; > private Node author; > private Relationship relationship; > private int node; > > private void createGraphDatabase(BufferedReader bf) { > db = new EmbeddedGraphDatabase("target/db"); > index = db.index(); > authorList = index.forNodes("Author"); > > String zeile; > Transaction tx = db.beginTx(); > > try { > // reads lines of CSV-file > while ((zeile = bf.readLine()) != null) { > if (transactionCounter++ % 50000 == 0) { > > tx.success(); > tx.finish(); > tx = db.beginTx(); > } > // String[] looks like this: Article%;% > Timestamp%;% Author > String[] artikelinfo = zeile.split("%;% "); > if (artikelinfo.length != 3) { > System.out.println("ERROR: check > CSV"); > for (int i = 0; i < > artikelinfo.length; i++) { > > System.out.println(artikelinfo[i]); > } > return; > } > > if (articleName == "") { > // create Article and connect with > ReferenceNode > article = > createArticle(artikelinfo[0], > > db.getReferenceNode(), MyRelationshipTypes.ARTICLE); > articleName = artikelinfo[0]; > > isFirstAuthor = true; > > } else if > (!articleName.equals(artikelinfo[0])) { > // create Article and connect with > ReferenceNode > article = > createArticle(artikelinfo[0], > > db.getReferenceNode(), MyRelationshipTypes.ARTICLE); > articleName = artikelinfo[0]; > isFirstAuthor = true; > } > // checks if author already exists > IndexHits<Node> hits = > authorList.get("Author", artikelinfo[2]); > // if new author > if (hits.size() == 0) { > if (isFirstAuthor) { > // creates author and > connects him with an article > author = > createAndConnectNode(artikelinfo[2], article, > > MyRelationshipTypes.WROTE, artikelinfo[1]); > isFirstAuthor = false; > } else { > > author = > createAndConnectNode(artikelinfo[2], article, > > MyRelationshipTypes.EDIT, artikelinfo[1]); > } > > } else { > // author already exists > if (isFirstAuthor) { > // create relationship to > article > relationship = > hits.getSingle().createRelationshipTo( > article, > MyRelationshipTypes.WROTE); > > relationship.setProperty("Timestamp", artikelinfo[1]); > isFirstAuthor = false; > } else { > relationship = > hits.getSingle().createRelationshipTo( > article, > MyRelationshipTypes.EDIT); > > relationship.setProperty("Timestamp", artikelinfo[1]); > } > > } > > tx.success(); > } > } catch (Exception e) { > tx.failure(); > } finally { > tx.finish(); > } > db.shutdown(); > > } > > /** > * creates an article and connect it with reference node > * > * @param name > * Article > * @param reference > * @param relationship > * Type > * @return Article node > */ > private Node createArticle(String name, Node reference, > MyRelationshipTypes relationship) { > Node node = db.createNode(); > node.setProperty("Article", name); > > reference.createRelationshipTo(node, relationship); > return node; > } > > /** > * creates an author node and connects him with an article > * > * @param name > * Author > * @param otherNode > * Article > * @param relationshipType > * Type > * @param timestamp > * Timestamp > * @return new Author node > */ > private Node createAndConnectNode(String name, Node otherNode, > MyRelationshipTypes relationshipType, String > timestamp) { > > Node node = db.createNode(); > node.setProperty("Name", name); > authorList.add(node, "Author", name); > relationship = node.createRelationshipTo(otherNode, > relationshipType); > relationship.setProperty("Timestamp", timestamp); > > return node; > } > > } > > Maybe you know some optimizations here, cause my database is already 20GB > big with 6M authors and 20M aritcles. > The graph looks like this so far: > > ReferenceNode ---> Article <---- (Wrote/Edit) Author > > > About the distributed system, I first have to check that at the University. > We just thought about that, because we want to do some requests to the > database like PageRank or Nodedegree and within a distributed system these > requests should be faster. > > > Thank you for your help again, > Stephan > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/How-to-create-a-graph-database-out-of-a-huge-dataset-tp3177076p3177349.html > Sent from the Neo4J Community Discussions mailing list archive at > Nabble.com. > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user