Re: [Neo4j] OutOfMemory while populating large graph

2010-07-10 Thread Mattias Persson
Great, so maybe neo4j-index should be updated to depend on Lucene 2.9.3.

2010/7/9 Bill Janssen jans...@parc.com

 Note that a couple of memory issues are fixed in Lucene 2.9.3.  Leaking
 when indexing big docs, and indolent reclamation of space from the
 FieldCache.

 Bill

 Arijit Mukherjee ariji...@gmail.com wrote:

  I've a similar problem. Although I'm not going out of memory yet, I can
 see
  the heap constantly growing, and JProfiler says most of it is due to the
  Lucene indexing. And even if I do the commit after every X transactions,
  once the population is finished, the final commit is done, and the graph
 db
  closed - the heap stays like that - almost full. An explicit gc will
 clean
  up some part, but not fully.
 
  Arijit
 
  On 9 July 2010 17:00, Mattias Persson matt...@neotechnology.com wrote:
 
   2010/7/9 Marko Rodriguez okramma...@gmail.com
  
Hi,
   
 Would it actually be worth something to be able to begin a
 transaction
which
 auto-committs stuff every X write operation, like a batch inserter
 mode
 which can be used in normal EmbeddedGraphDatabase? Kind of like:

graphDb.beginTx( Mode.BATCH_INSERT )

 ...so that you can start such a transaction and then just insert
 data
 without having to care about restarting it now and then?
   
Thats cool! Does that already exist? In my code (like others on the
 list
   it
seems) I have a counter++ that every 20,000 inserts (some made up
 number
that is not going to throw an OutOfMemory) commits and the reopens a
 new
transaction. Sorta sux.
   
  
   No it doesn't, I just wrote stuff which I though someone could think of
 as
   useful. A cool thing with just telling it to do a batch insert mode
   transaction (not the actual commit interval) is that it could look at
 how
   much memory it had to play around with and commit whenever it would be
 the
   most efficient, even having the ability to change the limit on the fly
 if
   the memory suddenly ran out.
  
  
Thanks,
Marko.
   
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
   
  
  
  
   --
   Mattias Persson, [matt...@neotechnology.com]
   Hacker, Neo Technology
   www.neotechnology.com
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  And when the night is cloudy,
  There is still a light that shines on me,
  Shine on until tomorrow, let it be.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Mattias Persson
Modifications in a transaction are kept in memory so that there's the
ability to rollback the transaction completely if something would go wrong.
There could of course be a solution where (I'm just spawning supposedly), so
that if a tx gets big enough such a transaction gets converted into its own
graph database or some other on-disk data structure which would then be
merged into the main database on commit.

Would it actually be worth something to be able to begin a transaction which
auto-committs stuff every X write operation, like a batch inserter mode
which can be used in normal EmbeddedGraphDatabase? Kind of like:

graphDb.beginTx( Mode.BATCH_INSERT )

...so that you can start such a transaction and then just insert data
without having to care about restarting it now and then?

Another view of this is that such big transactions (I'm assuming here) are
only really used for a first-time insertion of a big data set, where the
BatchInserter can be used and does exactly that... it flushes to disk
whenever it feels like and you can just go on feeding it more and more data.

2010/7/8 Rick Bullotta rick.bullo...@burningskysoftware.com

 Paul, I also would like to see automatic swapping/paging to disk as part of
 Neo4J, minimally when in bulk insert mode...and ideally in every usage
 scenario.  I don't fully understand why the in-memory logs get so large
 and/or aren't backed by the on-disk log, or if they are, why they need to
 be
 kept in memory as well.  Perhaps it isn't the transaction stuff that is
 taking up memory, but the graph itself?

 Can any of the Neo team help provide some insight?

 Thanks!


 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On
 Behalf Of Paul A. Jackson
 Sent: Thursday, July 08, 2010 1:35 PM
 To: (User@lists.neo4j.org)
 Subject: [Neo4j] OutOfMemory while populating large graph

 I have seen people discuss committing transactions after some microbatch of
 a few hundred records, but I thought this was optional.  I thought Neo4J
 would automatically write out to disk as memory became full.  Well, I
 encountered an OOM and want to make sure that I understand the reason.  Was
 my understanding incorrect, or is there a parameter that I need to set to
 some limit, or is the problem them I am indexing as I go.  The stack trace,
 FWIW, is:

 Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.init(HashMap.java:209)
at java.util.HashSet.init(HashSet.java:86)
at

 org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java:
 334)
at
 org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93)
at
 org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59)
at
 org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94)
at

 org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.jav
 a:220)
at
 org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54)
at

 org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209)
at
 JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321)
at
 JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240)

 Thanks,
 Paul Jackson
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Marko Rodriguez
Hi,

 Would it actually be worth something to be able to begin a transaction which
 auto-committs stuff every X write operation, like a batch inserter mode
 which can be used in normal EmbeddedGraphDatabase? Kind of like:
 
graphDb.beginTx( Mode.BATCH_INSERT )
 
 ...so that you can start such a transaction and then just insert data
 without having to care about restarting it now and then?

Thats cool! Does that already exist? In my code (like others on the list it 
seems) I have a counter++ that every 20,000 inserts (some made up number that 
is not going to throw an OutOfMemory) commits and the reopens a new 
transaction. Sorta sux.

Thanks,
Marko.

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Arijit Mukherjee
I've a similar problem. Although I'm not going out of memory yet, I can see
the heap constantly growing, and JProfiler says most of it is due to the
Lucene indexing. And even if I do the commit after every X transactions,
once the population is finished, the final commit is done, and the graph db
closed - the heap stays like that - almost full. An explicit gc will clean
up some part, but not fully.

Arijit

On 9 July 2010 17:00, Mattias Persson matt...@neotechnology.com wrote:

 2010/7/9 Marko Rodriguez okramma...@gmail.com

  Hi,
 
   Would it actually be worth something to be able to begin a transaction
  which
   auto-committs stuff every X write operation, like a batch inserter mode
   which can be used in normal EmbeddedGraphDatabase? Kind of like:
  
  graphDb.beginTx( Mode.BATCH_INSERT )
  
   ...so that you can start such a transaction and then just insert data
   without having to care about restarting it now and then?
 
  Thats cool! Does that already exist? In my code (like others on the list
 it
  seems) I have a counter++ that every 20,000 inserts (some made up number
  that is not going to throw an OutOfMemory) commits and the reopens a new
  transaction. Sorta sux.
 

 No it doesn't, I just wrote stuff which I though someone could think of as
 useful. A cool thing with just telling it to do a batch insert mode
 transaction (not the actual commit interval) is that it could look at how
 much memory it had to play around with and commit whenever it would be the
 most efficient, even having the ability to change the limit on the fly if
 the memory suddenly ran out.


  Thanks,
  Marko.
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Rick Bullotta
Short answer is maybe. ;-)

There are some cases where the transaction is an all or nothing scenario,
others where incremental commits are OK.  Having the ability to do
incremental autocommits would be useful, however.  In a perfect world, it
could be based on a bucket (e.g. XXX transactions), a time (each 30
seconds), or on a memory usage rule.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Mattias Persson
Sent: Friday, July 09, 2010 7:30 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] OutOfMemory while populating large graph

2010/7/9 Marko Rodriguez okramma...@gmail.com

 Hi,

  Would it actually be worth something to be able to begin a transaction
 which
  auto-committs stuff every X write operation, like a batch inserter mode
  which can be used in normal EmbeddedGraphDatabase? Kind of like:
 
 graphDb.beginTx( Mode.BATCH_INSERT )
 
  ...so that you can start such a transaction and then just insert data
  without having to care about restarting it now and then?

 Thats cool! Does that already exist? In my code (like others on the list
it
 seems) I have a counter++ that every 20,000 inserts (some made up number
 that is not going to throw an OutOfMemory) commits and the reopens a new
 transaction. Sorta sux.


No it doesn't, I just wrote stuff which I though someone could think of as
useful. A cool thing with just telling it to do a batch insert mode
transaction (not the actual commit interval) is that it could look at how
much memory it had to play around with and commit whenever it would be the
most efficient, even having the ability to change the limit on the fly if
the memory suddenly ran out.


 Thanks,
 Marko.

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Paul A. Jackson
I confess I had not investigated the batch inserter.  From the description it 
fits my requirements exactly.

With respect to auto-commits, it seems there are two use cases.  The first is 
every day operations that might run out of memory.  In this case it might be 
nice for neo4j to swap out memory to temporary disk as needed.  If this 
performs acceptably, I think that should be default behavior.  The second case 
is the initial population of a graph, where there is no need for roll back and 
so there is no need to commit to a temporary location.  In this case, it seems 
having neo4j decide when to commit would be ideal.

My concern with the first use case is that swapping to temporary storage at 
ideal intervals may be less efficient than having the user commit to permanent 
storage at less-than-ideal intervals.  If that is the case, then the only real 
justification for committing to temporary storage would be if there was a 
requirement to potentially roll back a transaction that was larger than memory 
could support.

-Paul


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Mattias Persson
Sent: Friday, July 09, 2010 7:30 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] OutOfMemory while populating large graph

2010/7/9 Marko Rodriguez okramma...@gmail.com

 Hi,

  Would it actually be worth something to be able to begin a transaction
 which
  auto-committs stuff every X write operation, like a batch inserter mode
  which can be used in normal EmbeddedGraphDatabase? Kind of like:
 
 graphDb.beginTx( Mode.BATCH_INSERT )
 
  ...so that you can start such a transaction and then just insert data
  without having to care about restarting it now and then?

 Thats cool! Does that already exist? In my code (like others on the list it
 seems) I have a counter++ that every 20,000 inserts (some made up number
 that is not going to throw an OutOfMemory) commits and the reopens a new
 transaction. Sorta sux.


No it doesn't, I just wrote stuff which I though someone could think of as
useful. A cool thing with just telling it to do a batch insert mode
transaction (not the actual commit interval) is that it could look at how
much memory it had to play around with and commit whenever it would be the
most efficient, even having the ability to change the limit on the fly if
the memory suddenly ran out.


 Thanks,
 Marko.

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-09 Thread Bill Janssen
Note that a couple of memory issues are fixed in Lucene 2.9.3.  Leaking
when indexing big docs, and indolent reclamation of space from the
FieldCache.

Bill

Arijit Mukherjee ariji...@gmail.com wrote:

 I've a similar problem. Although I'm not going out of memory yet, I can see
 the heap constantly growing, and JProfiler says most of it is due to the
 Lucene indexing. And even if I do the commit after every X transactions,
 once the population is finished, the final commit is done, and the graph db
 closed - the heap stays like that - almost full. An explicit gc will clean
 up some part, but not fully.
 
 Arijit
 
 On 9 July 2010 17:00, Mattias Persson matt...@neotechnology.com wrote:
 
  2010/7/9 Marko Rodriguez okramma...@gmail.com
 
   Hi,
  
Would it actually be worth something to be able to begin a transaction
   which
auto-committs stuff every X write operation, like a batch inserter mode
which can be used in normal EmbeddedGraphDatabase? Kind of like:
   
   graphDb.beginTx( Mode.BATCH_INSERT )
   
...so that you can start such a transaction and then just insert data
without having to care about restarting it now and then?
  
   Thats cool! Does that already exist? In my code (like others on the list
  it
   seems) I have a counter++ that every 20,000 inserts (some made up number
   that is not going to throw an OutOfMemory) commits and the reopens a new
   transaction. Sorta sux.
  
 
  No it doesn't, I just wrote stuff which I though someone could think of as
  useful. A cool thing with just telling it to do a batch insert mode
  transaction (not the actual commit interval) is that it could look at how
  much memory it had to play around with and commit whenever it would be the
  most efficient, even having the ability to change the limit on the fly if
  the memory suddenly ran out.
 
 
   Thanks,
   Marko.
  
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Hacker, Neo Technology
  www.neotechnology.com
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 -- 
 And when the night is cloudy,
 There is still a light that shines on me,
 Shine on until tomorrow, let it be.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] OutOfMemory while populating large graph

2010-07-08 Thread Paul A. Jackson
I have seen people discuss committing transactions after some microbatch of a 
few hundred records, but I thought this was optional.  I thought Neo4J would 
automatically write out to disk as memory became full.  Well, I encountered an 
OOM and want to make sure that I understand the reason.  Was my understanding 
incorrect, or is there a parameter that I need to set to some limit, or is the 
problem them I am indexing as I go.  The stack trace, FWIW, is:

Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.init(HashMap.java:209)
at java.util.HashSet.init(HashSet.java:86)
at 
org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java:334)
at 
org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93)
at 
org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59)
at 
org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94)
at 
org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.java:220)
at 
org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54)
at 
org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209)
at 
JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321)
at JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240)

Thanks,
Paul Jackson
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OutOfMemory while populating large graph

2010-07-08 Thread Rick Bullotta
Paul, I also would like to see automatic swapping/paging to disk as part of
Neo4J, minimally when in bulk insert mode...and ideally in every usage
scenario.  I don't fully understand why the in-memory logs get so large
and/or aren't backed by the on-disk log, or if they are, why they need to be
kept in memory as well.  Perhaps it isn't the transaction stuff that is
taking up memory, but the graph itself?

Can any of the Neo team help provide some insight?

Thanks!


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Paul A. Jackson
Sent: Thursday, July 08, 2010 1:35 PM
To: (User@lists.neo4j.org)
Subject: [Neo4j] OutOfMemory while populating large graph

I have seen people discuss committing transactions after some microbatch of
a few hundred records, but I thought this was optional.  I thought Neo4J
would automatically write out to disk as memory became full.  Well, I
encountered an OOM and want to make sure that I understand the reason.  Was
my understanding incorrect, or is there a parameter that I need to set to
some limit, or is the problem them I am indexing as I go.  The stack trace,
FWIW, is:

Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.init(HashMap.java:209)
at java.util.HashSet.init(HashSet.java:86)
at
org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java:
334)
at
org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93)
at
org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59)
at
org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94)
at
org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.jav
a:220)
at
org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54)
at
org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209)
at
JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321)
at
JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240)

Thanks,
Paul Jackson
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user