Re: Cassandra cluster runs into OOM when bulk loading data

Eric Yu Mon, 26 Apr 2010 19:04:51 -0700

I have the same problem here, and I analysised the hprof file with mat, as
you said, LinkedBlockQueue used 2.6GB.
I think the ThreadPool of cassandra should limit the queue size.


cassandra 0.6.1

java version
$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

iostat
$ iostat -x -l 1
Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              81.00  8175.00 224.00 17.00 23984.00  2728.00   221.68
1.01    1.86   0.76  18.20

tpstats, of coz, this node is still alive
$ ./nodetool -host localhost tpstats
Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0           1281
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0      473617241
ROW-READ-STAGE                    0         0              0
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0      718355184
GMFD                              0         0         132509
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE                0         0      293735704
MESSAGE-STREAMING-POOL            0         0              6
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0           1870
FLUSH-WRITER-POOL                 0         0           1870
AE-SERVICE-STAGE                  0         0              5
HINTED-HANDOFF-POOL               0         0             21


On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <goffi...@digg.com> wrote:

> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
> LinkedBlockQueue issues that were fixed.
>
> -Chris
>
>
> 2010/4/26 Roland Hänel <rol...@haenel.me>
>
>> Cassandra Version 0.6.1
>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>> Import speed is about 10MB/s for the full cluster; if a compaction is
>> going on the individual node is I/O limited
>> tpstats: caught me, didn't know this. I will set up a test and try to
>> catch a node during the critical time.
>>
>> Thanks,
>> Roland
>>
>>
>> 2010/4/26 Chris Goffinet <goffi...@digg.com>
>>
>>  Which version of Cassandra?
>>> Which version of Java JVM are you using?
>>> What do your I/O stats look like when bulk importing?
>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
>>> during the import?
>>>
>>> -Chris
>>>
>>>
>>> 2010/4/26 Roland Hänel <rol...@haenel.me>
>>>
>>> I have a cluster of 5 machines building a Cassandra datastore, and I load
>>>> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
>>>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not 
>>>> using
>>>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of 
>>>> RAM
>>>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All 
>>>> inserts
>>>> are done with consistency level ALL.
>>>>
>>>> I hope with this I have avoided all the 'usual dummy errors' that lead
>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>>> difficult to catch the JVM in the right moment because it runs well for
>>>> several hours before this thing happens.
>>>>
>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>> reject this idea for me: is it possible that when one machine slows down a
>>>> little bit (for example because a big compaction is going on), the 
>>>> memtables
>>>> don't get flushed to disk as fast as they are building up under the
>>>> continuing bulk import? That would result in a downward spiral, the system
>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>> over Thrift, finally OOM.
>>>>
>>>> I'm using the "periodic" commit log sync, maybe also this could create a
>>>> situation where the commit log writer is too slow to catch up with the data
>>>> intake, resulting in ever growing memory usage?
>>>>
>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Reply via email to