Re: Out of memory when putting many rows in an Acc table

2014-10-01 Thread Eric Newton

 I realized this could be due to an inability by the JVM to create
 additional native threads


You may need to increase the nproc limit on your systems.

-Eric


On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts threadedb...@gmail.com
wrote:

 Thanks for the response.

 The only reason I was creating q new BatchWriter periodically was to
 determine if BatchWriter was holding on to memory even after a flush.   I
 had memory on my BatchWriterConfig set to 1M already.  I am reading my RDB
 tables in pages of 10K rows.

 Bumping up the JVM size didn't help,

 I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
 produce any output (hprof file). I realized this could be due to an
 inability by the JVM to create additional native threads.

 What I now think is the problem is not with Acc directly but hiding out on
 the JDBC side.

 Perhaps this is not an Acc issue at all but merely masquerading as one.
 We'll see.


 On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser josh.el...@gmail.com wrote:

 You shouldn't have to create a new BatchWriter -- have you tried reducing
 the amount of memory the BatchWriter will use? It keeps a cache internally
 to try to do an amortization of Mutations to send to a given tabletserver.

 To limit this memory, use the BatchWriterConfig#setMaxMemory(long)
 method. By default, the maxMemory value is set to 50MB. Reducing this may
 be enough to hold less data in your client and give you some more head room.

 Alternatively, you could give your client JVM some more heap :)


 Geoffry Roberts wrote:

 I am try to pump some data into Accumulo but I keep encountering

 Exception in thread Thrift Connection Pool Checker
 java.lang.OutOfMemoryError: Java heap space

 at java.util.HashMap.newValueIterator(HashMap.java:971)

 at java.util.HashMap$Values.iterator(HashMap.java:1038)

 at
 org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
 closeConnections(ThriftTransportPool.java:103)

 at
 org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
 run(ThriftTransportPool.java:147)

 at java.lang.Thread.run(Thread.java:745)


 I tried, as a work around, creating a new BatchWriter and closing the
 old one every ten thousand rows, but to no avail.  Data gets written up
 to the 200kth row, then the error.

 I have a table of 8M rows in a RDB that I am pumping into Acc via a
 groovy script.  The rows are narrow, a short text field and four floats.

 I googled of course but nothing was helpful.  What can be done?

 Thanks so much.

 --
 There are ways and there are ways,

 Geoffry Roberts




 --
 There are ways and there are ways,

 Geoffry Roberts



Re: Out of memory when putting many rows in an Acc table

2014-10-01 Thread Josh Elser

Or nofile (too). ulimit is your friend :)

Eric Newton wrote:

I realized this could be due to an inability by the JVM to create
additional native threads


You may need to increase the nproc limit on your systems.

-Eric


On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts threadedb...@gmail.com
mailto:threadedb...@gmail.com wrote:

Thanks for the response.

The only reason I was creating q new BatchWriter periodically was to
determine if BatchWriter was holding on to memory even after a
flush.   I had memory on my BatchWriterConfig set to 1M already.  I
am reading my RDB tables in pages of 10K rows.

Bumping up the JVM size didn't help,

I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
produce any output (hprof file). I realized this could be due to an
inability by the JVM to create additional native threads.

What I now think is the problem is not with Acc directly but hiding
out on the JDBC side.

Perhaps this is not an Acc issue at all but merely masquerading as
one.  We'll see.


On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser josh.el...@gmail.com
mailto:josh.el...@gmail.com wrote:

You shouldn't have to create a new BatchWriter -- have you tried
reducing the amount of memory the BatchWriter will use? It keeps
a cache internally to try to do an amortization of Mutations to
send to a given tabletserver.

To limit this memory, use the
BatchWriterConfig#__setMaxMemory(long) method. By default, the
maxMemory value is set to 50MB. Reducing this may be enough to
hold less data in your client and give you some more head room.

Alternatively, you could give your client JVM some more heap :)


Geoffry Roberts wrote:

I am try to pump some data into Accumulo but I keep encountering

Exception in thread Thrift Connection Pool Checker
java.lang.OutOfMemoryError: Java heap space

at java.util.HashMap.__newValueIterator(HashMap.java:__971)

at java.util.HashMap$Values.__iterator(HashMap.java:1038)

at

org.apache.accumulo.core.__client.impl.__ThriftTransportPool$Closer.__closeConnections(__ThriftTransportPool.java:103)

at

org.apache.accumulo.core.__client.impl.__ThriftTransportPool$Closer.__run(ThriftTransportPool.java:__147)

at java.lang.Thread.run(Thread.__java:745)


I tried, as a work around, creating a new BatchWriter and
closing the
old one every ten thousand rows, but to no avail.  Data gets
written up
to the 200kth row, then the error.

I have a table of 8M rows in a RDB that I am pumping into
Acc via a
groovy script.  The rows are narrow, a short text field and
four floats.

I googled of course but nothing was helpful.  What can be done?

Thanks so much.

--
There are ways and there are ways,

Geoffry Roberts




--
There are ways and there are ways,

Geoffry Roberts




Re: Out of memory when putting many rows in an Acc table

2014-10-01 Thread Geoffry Roberts
Apparently, my JDBC driver, no matter what settings one sets, always tries
to load the entire table into memory.  I tried using Groovy's page facility
thinking it would fetch rows in an increment of say, 10k.  It creates the
impression that it's doing this, but behind the scenes, it still tries to
read in the entire table.

Whatever! The whole thing was cleaning my clock.  So I rolled my own page
mechanism, which works my resubmitting the query with an offset and limit,
thus I only get as many rows at at time as I want.  And it is working.
BatchWriter is doing fine.  I'm getting my 8M row table and will move on to
the bigger ones.

For the record, the RDB is MySQL using the latest driver.

Thanks for the help.

On Wed, Oct 1, 2014 at 2:56 PM, Josh Elser josh.el...@gmail.com wrote:

 Or nofile (too). ulimit is your friend :)

 Eric Newton wrote:

 I realized this could be due to an inability by the JVM to create
 additional native threads


 You may need to increase the nproc limit on your systems.

 -Eric


 On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts threadedb...@gmail.com
 mailto:threadedb...@gmail.com wrote:

 Thanks for the response.

 The only reason I was creating q new BatchWriter periodically was to
 determine if BatchWriter was holding on to memory even after a
 flush.   I had memory on my BatchWriterConfig set to 1M already.  I
 am reading my RDB tables in pages of 10K rows.

 Bumping up the JVM size didn't help,

 I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
 produce any output (hprof file). I realized this could be due to an
 inability by the JVM to create additional native threads.

 What I now think is the problem is not with Acc directly but hiding
 out on the JDBC side.

 Perhaps this is not an Acc issue at all but merely masquerading as
 one.  We'll see.


 On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser josh.el...@gmail.com
 mailto:josh.el...@gmail.com wrote:

 You shouldn't have to create a new BatchWriter -- have you tried
 reducing the amount of memory the BatchWriter will use? It keeps
 a cache internally to try to do an amortization of Mutations to
 send to a given tabletserver.

 To limit this memory, use the
 BatchWriterConfig#__setMaxMemory(long) method. By default, the
 maxMemory value is set to 50MB. Reducing this may be enough to
 hold less data in your client and give you some more head room.

 Alternatively, you could give your client JVM some more heap :)


 Geoffry Roberts wrote:

 I am try to pump some data into Accumulo but I keep
 encountering

 Exception in thread Thrift Connection Pool Checker
 java.lang.OutOfMemoryError: Java heap space

 at java.util.HashMap.__newValueIterator(HashMap.java:__971)

 at java.util.HashMap$Values.__iterator(HashMap.java:1038)

 at
 org.apache.accumulo.core.__client.impl.__
 ThriftTransportPool$Closer.__closeConnections(__
 ThriftTransportPool.java:103)

 at
 org.apache.accumulo.core.__client.impl.__
 ThriftTransportPool$Closer.__run(ThriftTransportPool.java:__147)

 at java.lang.Thread.run(Thread.__java:745)


 I tried, as a work around, creating a new BatchWriter and
 closing the
 old one every ten thousand rows, but to no avail.  Data gets
 written up
 to the 200kth row, then the error.

 I have a table of 8M rows in a RDB that I am pumping into
 Acc via a
 groovy script.  The rows are narrow, a short text field and
 four floats.

 I googled of course but nothing was helpful.  What can be
 done?

 Thanks so much.

 --
 There are ways and there are ways,

 Geoffry Roberts




 --
 There are ways and there are ways,

 Geoffry Roberts





-- 
There are ways and there are ways,

Geoffry Roberts


Re: Out of memory when putting many rows in an Acc table

2014-09-30 Thread Josh Elser
You shouldn't have to create a new BatchWriter -- have you tried 
reducing the amount of memory the BatchWriter will use? It keeps a cache 
internally to try to do an amortization of Mutations to send to a given 
tabletserver.


To limit this memory, use the BatchWriterConfig#setMaxMemory(long) 
method. By default, the maxMemory value is set to 50MB. Reducing this 
may be enough to hold less data in your client and give you some more 
head room.


Alternatively, you could give your client JVM some more heap :)

Geoffry Roberts wrote:

I am try to pump some data into Accumulo but I keep encountering

Exception in thread Thrift Connection Pool Checker
java.lang.OutOfMemoryError: Java heap space

at java.util.HashMap.newValueIterator(HashMap.java:971)

at java.util.HashMap$Values.iterator(HashMap.java:1038)

at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.closeConnections(ThriftTransportPool.java:103)

at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.run(ThriftTransportPool.java:147)

at java.lang.Thread.run(Thread.java:745)


I tried, as a work around, creating a new BatchWriter and closing the
old one every ten thousand rows, but to no avail.  Data gets written up
to the 200kth row, then the error.

I have a table of 8M rows in a RDB that I am pumping into Acc via a
groovy script.  The rows are narrow, a short text field and four floats.

I googled of course but nothing was helpful.  What can be done?

Thanks so much.

--
There are ways and there are ways,

Geoffry Roberts