Re: [Neo4j] OutOfMemory while populating large graph

Mattias Persson Fri, 09 Jul 2010 04:15:32 -0700

Modifications in a transaction are kept in memory so that there's the
ability to rollback the transaction completely if something would go wrong.
There could of course be a solution where (I'm just spawning supposedly), so
that if a tx gets big enough such a transaction gets converted into its own
graph database or some other on-disk data structure which would then be
merged into the main database on commit.


Would it actually be worth something to be able to begin a transaction which
auto-committs stuff every X write operation, like a batch inserter mode
which can be used in normal EmbeddedGraphDatabase? Kind of like:

    graphDb.beginTx( Mode.BATCH_INSERT )

...so that you can start such a transaction and then just insert data
without having to care about restarting it now and then?

Another view of this is that such big transactions (I'm assuming here) are
only really used for a first-time insertion of a big data set, where the
BatchInserter can be used and does exactly that... it flushes to disk
whenever it feels like and you can just go on feeding it more and more data.

2010/7/8 Rick Bullotta <rick.bullo...@burningskysoftware.com>

> Paul, I also would like to see automatic swapping/paging to disk as part of
> Neo4J, minimally when in "bulk insert" mode...and ideally in every usage
> scenario.  I don't fully understand why the in-memory logs get so large
> and/or aren't backed by the on-disk log, or if they are, why they need to
> be
> kept in memory as well.  Perhaps it isn't the transaction "stuff" that is
> taking up memory, but the graph itself?
>
> Can any of the Neo team help provide some insight?
>
> Thanks!
>
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Paul A. Jackson
> Sent: Thursday, July 08, 2010 1:35 PM
> To: (User@lists.neo4j.org)
> Subject: [Neo4j] OutOfMemory while populating large graph
>
> I have seen people discuss committing transactions after some microbatch of
> a few hundred records, but I thought this was optional.  I thought Neo4J
> would automatically write out to disk as memory became full.  Well, I
> encountered an OOM and want to make sure that I understand the reason.  Was
> my understanding incorrect, or is there a parameter that I need to set to
> some limit, or is the problem them I am indexing as I go.  The stack trace,
> FWIW, is:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>            at java.util.HashMap.<init>(HashMap.java:209)
>            at java.util.HashSet.<init>(HashSet.java:86)
>            at
>
> org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java:
> 334)
>            at
> org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93)
>            at
> org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59)
>            at
> org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94)
>            at
>
> org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.jav
> a:220)
>            at
> org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54)
>            at
>
> org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209)
>            at
> JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321)
>            at
> JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240)
>
> Thanks,
> Paul Jackson
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] OutOfMemory while populating large graph

Reply via email to