Modifications in a transaction are kept in memory so that there's the ability to rollback the transaction completely if something would go wrong. There could of course be a solution where (I'm just spawning supposedly), so that if a tx gets big enough such a transaction gets converted into its own graph database or some other on-disk data structure which would then be merged into the main database on commit.
Would it actually be worth something to be able to begin a transaction which auto-committs stuff every X write operation, like a batch inserter mode which can be used in normal EmbeddedGraphDatabase? Kind of like: graphDb.beginTx( Mode.BATCH_INSERT ) ...so that you can start such a transaction and then just insert data without having to care about restarting it now and then? Another view of this is that such big transactions (I'm assuming here) are only really used for a first-time insertion of a big data set, where the BatchInserter can be used and does exactly that... it flushes to disk whenever it feels like and you can just go on feeding it more and more data. 2010/7/8 Rick Bullotta <rick.bullo...@burningskysoftware.com> > Paul, I also would like to see automatic swapping/paging to disk as part of > Neo4J, minimally when in "bulk insert" mode...and ideally in every usage > scenario. I don't fully understand why the in-memory logs get so large > and/or aren't backed by the on-disk log, or if they are, why they need to > be > kept in memory as well. Perhaps it isn't the transaction "stuff" that is > taking up memory, but the graph itself? > > Can any of the Neo team help provide some insight? > > Thanks! > > > -----Original Message----- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Paul A. Jackson > Sent: Thursday, July 08, 2010 1:35 PM > To: (User@lists.neo4j.org) > Subject: [Neo4j] OutOfMemory while populating large graph > > I have seen people discuss committing transactions after some microbatch of > a few hundred records, but I thought this was optional. I thought Neo4J > would automatically write out to disk as memory became full. Well, I > encountered an OOM and want to make sure that I understand the reason. Was > my understanding incorrect, or is there a parameter that I need to set to > some limit, or is the problem them I am indexing as I go. The stack trace, > FWIW, is: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.<init>(HashMap.java:209) > at java.util.HashSet.<init>(HashSet.java:86) > at > > org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java: > 334) > at > org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93) > at > org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59) > at > org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94) > at > > org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.jav > a:220) > at > org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54) > at > > org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209) > at > JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321) > at > JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240) > > Thanks, > Paul Jackson > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user