Re: Out of memory issues

2016-05-27 Thread Kai Wang
Paolo,

try a few things in cassandra-env.sh
1. HEAP_NEWSIZE="2G". "The 100mb/core commentary in cassandra-env.sh for
setting HEAP_NEWSIZE is *wrong*" (
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html)
2. MaxTenuringThreshold=8
3. enable GC logging (under "# GC logging options -- uncomment to enable"
section) to compare GC behaviors on good and bad nodes.


On Fri, May 27, 2016 at 5:36 AM, Paolo Crosato <
paolo.cros...@targaubiest.com> wrote:

> Hi,
>
> thanks for the answer. There were no large insertions and the saved_caches
> dir had a resonable size. I tried to delete the cashes and set
> key_cache_size_in_mb to zero, but it didn't help.
> Today our virtual hardware provided raised cpus to 4, memory to 32GB and
> doubled the disk size, and the nodes are stable again. So it was probably
> an issue of severe lack of resources.
> About HEAP_NEWSIZE, your suggestion is quite intriguing. I thought it was
> better to set it 100mb*#cores, so in my case I set it to 200 and now I
> should set it to 400. Do larger values help without being harmful?
>
> Regards,
>
> Paolo
>
>
> Il 27/05/2016 03:05, Mike Yeap ha scritto:
>
> Hi Paolo,
>
> a) was there any large insertion done?
> b) are the a lot of files in the saved_caches directory?
> c) would you consider to increase the HEAP_NEWSIZE to, say, 1200M?
>
>
> Regards,
> Mike Yeap
>
> On Fri, May 27, 2016 at 12:39 AM, Paolo Crosato <
> paolo.cros...@targaubiest.com> wrote:
>
>> Hi,
>>
>> we are running a cluster of 4 nodes, each one has the same sizing: 2
>> cores, 16G ram and 1TB of disk space.
>>
>> On every node we are running cassandra 2.0.17, oracle java version
>> "1.7.0_45", centos 6 with this kernel version 2.6.32-431.17.1.el6.x86_64
>>
>> Two nodes are running just fine, the other two have started to go OOM at
>> every start.
>>
>> This is the error we get:
>>
>> INFO [ScheduledTasks:1] 2016-05-26 18:15:58,460 StatusLogger.java (line
>> 70) ReadRepairStage   0 0116
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,462 StatusLogger.java (line
>> 70) MutationStage31  1369  20526
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,590 StatusLogger.java (line
>> 70) ReplicateOnWriteStage 0 0  0
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,591 StatusLogger.java (line
>> 70) GossipStage   0 0335
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:04,195 StatusLogger.java (line
>> 70) CacheCleanupExecutor  0 0  0
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,526 StatusLogger.java (line
>> 70) MigrationStage0 0  0
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java (line
>> 70) MemoryMeter   1 4 26
>> 0 0
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java (line
>> 70) ValidationExecutor0 0  0
>> 0 0
>> DEBUG [MessagingService-Outgoing-/10.255.235.19] 2016-05-26 18:16:06,518
>> OutboundTcpConnection.java (line 290) attempting to connect to /
>> 10.255.235.19
>>  INFO [GossipTasks:1] 2016-05-26 18:16:22,912 Gossiper.java (line 992)
>> InetAddress /10.255.235.28 is now DOWN
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:22,952 StatusLogger.java (line
>> 70) FlushWriter   1 5 47
>> 025
>>  INFO [ScheduledTasks:1] 2016-05-26 18:16:22,953 StatusLogger.java (line
>> 70) InternalResponseStage 0 0  0
>> 0 0
>> ERROR [ReadStage:27] 2016-05-26 18:16:29,250 CassandraDaemon.java (line
>> 258) Exception in thread Thread[ReadStage:27,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
>> at
>> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
>> at
>> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
>> at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
>> at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
>> at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>> at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>> at
>> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
>> at
>> 

Re: Out of memory issues

2016-05-27 Thread Paolo Crosato

Hi,

thanks for the answer. There were no large insertions and the 
saved_caches dir had a resonable size. I tried to delete the cashes and 
set key_cache_size_in_mb to zero, but it didn't help.
Today our virtual hardware provided raised cpus to 4, memory to 32GB and 
doubled the disk size, and the nodes are stable again. So it was 
probably an issue of severe lack of resources.
About HEAP_NEWSIZE, your suggestion is quite intriguing. I thought it 
was better to set it 100mb*#cores, so in my case I set it to 200 and now 
I should set it to 400. Do larger values help without being harmful?


Regards,

Paolo

Il 27/05/2016 03:05, Mike Yeap ha scritto:

Hi Paolo,

a) was there any large insertion done?
b) are the a lot of files in the saved_caches directory?
c) would you consider to increase the HEAP_NEWSIZE to, say, 1200M?


Regards,
Mike Yeap

On Fri, May 27, 2016 at 12:39 AM, Paolo Crosato 
> 
wrote:


Hi,

we are running a cluster of 4 nodes, each one has the same sizing:
2 cores, 16G ram and 1TB of disk space.

On every node we are running cassandra 2.0.17, oracle java version
"1.7.0_45", centos 6 with this kernel version
2.6.32-431.17.1.el6.x86_64

Two nodes are running just fine, the other two have started to go
OOM at every start.

This is the error we get:

INFO [ScheduledTasks:1] 2016-05-26 18:15:58,460 StatusLogger.java
(line 70) ReadRepairStage   0 0
116 0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,462 StatusLogger.java
(line 70) MutationStage31  1369
20526 0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,590 StatusLogger.java
(line 70) ReplicateOnWriteStage 0 0 0
0 0

 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,591 StatusLogger.java
(line 70) GossipStage   0 0
335 0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:16:04,195 StatusLogger.java
(line 70) CacheCleanupExecutor  0 0 0
0 0

 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,526 StatusLogger.java
(line 70) MigrationStage0 0 0
0 0

 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java
(line 70) MemoryMeter   1 4 26
0 0

 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java
(line 70) ValidationExecutor0 0 0
0 0

DEBUG [MessagingService-Outgoing-/10.255.235.19
] 2016-05-26 18:16:06,518
OutboundTcpConnection.java (line 290) attempting to connect to
/10.255.235.19 
 INFO [GossipTasks:1] 2016-05-26 18:16:22,912 Gossiper.java (line
992) InetAddress /10.255.235.28  is now DOWN
 INFO [ScheduledTasks:1] 2016-05-26 18:16:22,952 StatusLogger.java
(line 70) FlushWriter   1 5 47
025

 INFO [ScheduledTasks:1] 2016-05-26 18:16:22,953 StatusLogger.java
(line 70) InternalResponseStage 0 0 0
0 0

ERROR [ReadStage:27] 2016-05-26 18:16:29,250 CassandraDaemon.java
(line 258) Exception in thread Thread[ReadStage:27,5,main]
java.lang.OutOfMemoryError: Java heap space
at

org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at

org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at

org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at

org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at

com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at

com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at

org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:434)
at

org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
at

org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at


Re: Out of memory issues

2016-05-26 Thread Mike Yeap
Hi Paolo,

a) was there any large insertion done?
b) are the a lot of files in the saved_caches directory?
c) would you consider to increase the HEAP_NEWSIZE to, say, 1200M?


Regards,
Mike Yeap

On Fri, May 27, 2016 at 12:39 AM, Paolo Crosato <
paolo.cros...@targaubiest.com> wrote:

> Hi,
>
> we are running a cluster of 4 nodes, each one has the same sizing: 2
> cores, 16G ram and 1TB of disk space.
>
> On every node we are running cassandra 2.0.17, oracle java version
> "1.7.0_45", centos 6 with this kernel version 2.6.32-431.17.1.el6.x86_64
>
> Two nodes are running just fine, the other two have started to go OOM at
> every start.
>
> This is the error we get:
>
> INFO [ScheduledTasks:1] 2016-05-26 18:15:58,460 StatusLogger.java (line
> 70) ReadRepairStage   0 0116
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,462 StatusLogger.java (line
> 70) MutationStage31  1369  20526
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,590 StatusLogger.java (line
> 70) ReplicateOnWriteStage 0 0  0
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:15:58,591 StatusLogger.java (line
> 70) GossipStage   0 0335
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:04,195 StatusLogger.java (line
> 70) CacheCleanupExecutor  0 0  0
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,526 StatusLogger.java (line
> 70) MigrationStage0 0  0
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java (line
> 70) MemoryMeter   1 4 26
> 0 0
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java (line
> 70) ValidationExecutor0 0  0
> 0 0
> DEBUG [MessagingService-Outgoing-/10.255.235.19] 2016-05-26 18:16:06,518
> OutboundTcpConnection.java (line 290) attempting to connect to /
> 10.255.235.19
>  INFO [GossipTasks:1] 2016-05-26 18:16:22,912 Gossiper.java (line 992)
> InetAddress /10.255.235.28 is now DOWN
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:22,952 StatusLogger.java (line
> 70) FlushWriter   1 5 47
> 025
>  INFO [ScheduledTasks:1] 2016-05-26 18:16:22,953 StatusLogger.java (line
> 70) InternalResponseStage 0 0  0
> 0 0
> ERROR [ReadStage:27] 2016-05-26 18:16:29,250 CassandraDaemon.java (line
> 258) Exception in thread Thread[ReadStage:27,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
> at
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
> at
> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
> at
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
> at
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:434)
> at
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
> at
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
> at
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
> at
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.(MergeIterator.java:87)
> at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
> at
> 

Out of memory issues

2016-05-26 Thread Paolo Crosato

Hi,

we are running a cluster of 4 nodes, each one has the same sizing: 2 
cores, 16G ram and 1TB of disk space.


On every node we are running cassandra 2.0.17, oracle java version 
"1.7.0_45", centos 6 with this kernel version 2.6.32-431.17.1.el6.x86_64


Two nodes are running just fine, the other two have started to go OOM at 
every start.


This is the error we get:

INFO [ScheduledTasks:1] 2016-05-26 18:15:58,460 StatusLogger.java (line 
70) ReadRepairStage   0 0 116 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,462 StatusLogger.java 
(line 70) MutationStage31  1369 20526 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,590 StatusLogger.java 
(line 70) ReplicateOnWriteStage 0 0 0 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:15:58,591 StatusLogger.java 
(line 70) GossipStage   0 0 335 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:16:04,195 StatusLogger.java 
(line 70) CacheCleanupExecutor  0 0 0 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,526 StatusLogger.java 
(line 70) MigrationStage0 0 0 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java 
(line 70) MemoryMeter   1 4 26 
0 0
 INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java 
(line 70) ValidationExecutor0 0 0 
0 0
DEBUG [MessagingService-Outgoing-/10.255.235.19] 2016-05-26 18:16:06,518 
OutboundTcpConnection.java (line 290) attempting to connect to 
/10.255.235.19
 INFO [GossipTasks:1] 2016-05-26 18:16:22,912 Gossiper.java (line 992) 
InetAddress /10.255.235.28 is now DOWN
 INFO [ScheduledTasks:1] 2016-05-26 18:16:22,952 StatusLogger.java 
(line 70) FlushWriter   1 5 47 
025
 INFO [ScheduledTasks:1] 2016-05-26 18:16:22,953 StatusLogger.java 
(line 70) InternalResponseStage 0 0 0 
0 0
ERROR [ReadStage:27] 2016-05-26 18:16:29,250 CassandraDaemon.java (line 
258) Exception in thread Thread[ReadStage:27,5,main]

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)

at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:434)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.(MergeIterator.java:87)

at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1619)
at 

Out of Memory Issues - SERIOUS

2010-10-07 Thread Dan Hendry
There seems to have been a fair amount of discussion on memory related
issues so I apologize if this exact situation has come up before. 

 

I am currently in the process of load testing an metrics platform I have
written which uses Cassandra and I have run into some very troubling issues.
The application is writing quite heavily, about 1000-2000 updates (columns)
per second using batch mutates of 20 columns each. This is divided between
creating new rows and adding columns to a fairly limited number of existing
index rows (30). Nearly all of these updates are read within 10 seconds and
none contain any significant amount of data (generally much less than 100
bytes of data which I specify). Initially, the test hums along nicely but
after some amount of time (1-2 hours) Cassandra crashes with an out of
memory error. Unfortunately I have not had the opportunity to watch the test
as it crashes, but it has happened in 2/2 tests.

 

This is quite annoying but the absolutely TERRIFYING behaviour is that when
I restart Cassandra, it starts replaying the commit logs then crashes with
an out of memory error again. Restart a second time, crash with OOM; it
seems to get through about 3/4 of the commit logs. Just to be absolutely
explicit, I am not trying to insert or read at this point, just recover the
previous updates. Unless somebody can suggest a way to recover the commit
logs, I have effectively lost my data. The only way I have found to recover
is wipe the data directories. It does not matter right now given that it is
only a test but this behaviour is completely unacceptable for a production
system. 

 

Here is information about the system which is probably relevant. Let me know
if any additional details about my application would help sort out this
issue:

-  Cassandra 0.7 Beta2

-  DB Machine: EC2 m1 large with the commit log directory on an ebs
and the data directory on ephemeral storage.

-  OS: Ubuntu server 10.04

-  With the exception of changing JMX settings, no memory or JVM
changes were made to options in cassandra-env.sh

-  In Cassandra.yaml, I reduced binary_memtable_throughput_in_mb to
100 in my second test to try follow the heap memory calculation formula; I
have 8 column families.

-  I am using the Sun JVM, specifically build 1.6.0_20-b02

-  The app is written in java and I am using the latest Pelops
library, I am sending updates at consistency level ONE and reading them at
level ALL.

 

I have been fairly impressed with Cassandra overall and given that I am
using a beta version, I don't expect fully polished behaviour. What is
unacceptable, and quite frankly nearly unbelievable, is the fact Cassandra
cant seem to recover from the error and I am loosing data.

 

Dan Hendry



Re: Out of Memory Issues - SERIOUS

2010-10-07 Thread Jonathan Ellis
if you don't want to lose data, don't wipe your commit logs.  that
part seems pretty obvious to me. :)

cassandra aggressively logs its state when it is running out of memory
so you can troubleshoot.  look for the GCInspector lines in the log.

but in this case it sounds pretty simple; you will be able to finish
replaying the commitlogs if you lower your memtable thresholds or
alternatively increase the amount of memory given to the JVM.  (see
http://wiki.apache.org/cassandra/MemtableSSTable.)

the _binary_ memtable setting has no effect on commitlog replay (it
has no effect on anything but binary writes through the storageproxy
api, which you are not using), you need to adjust
memtable_throughput_in_mb and memtable_operations_in_millions.

If you haven't explicitly set these then Cassandra will guess based on
your heap size; here, it is guessing too high.  start by uncommenting
the settings in the .yaml and reduce by 50% until it works.
alternatively, apply the patch at
https://issues.apache.org/jira/browse/CASSANDRA-1595 to see what
Cassandra is guessing, and start at half of that.

On Thu, Oct 7, 2010 at 10:32 PM, Dan Hendry d...@ec2.dustbunnytycoon.com 
wrote:
 There seems to have been a fair amount of discussion on memory related
 issues so I apologize if this exact situation has come up before.



 I am currently in the process of load testing an metrics platform I have
 written which uses Cassandra and I have run into some very troubling issues.
 The application is writing quite heavily, about 1000-2000 updates (columns)
 per second using batch mutates of 20 columns each. This is divided between
 creating new rows and adding columns to a fairly limited number of existing
 index rows (30). Nearly all of these updates are read within 10 seconds and
 none contain any significant amount of data (generally much less than 100
 bytes of data which I specify). Initially, the test hums along nicely but
 after some amount of time (1-2 hours) Cassandra crashes with an out of
 memory error. Unfortunately I have not had the opportunity to watch the test
 as it crashes, but it has happened in 2/2 tests.



 This is quite annoying but the absolutely TERRIFYING behaviour is that when
 I restart Cassandra, it starts replaying the commit logs then crashes with
 an out of memory error again. Restart a second time, crash with OOM; it
 seems to get through about 3/4 of the commit logs. Just to be absolutely
 explicit, I am not trying to insert or read at this point, just recover the
 previous updates. Unless somebody can suggest a way to recover the commit
 logs, I have effectively lost my data. The only way I have found to recover
 is wipe the data directories. It does not matter right now given that it is
 only a test but this behaviour is completely unacceptable for a production
 system.



 Here is information about the system which is probably relevant. Let me know
 if any additional details about my application would help sort out this
 issue:

 -  Cassandra 0.7 Beta2

 -  DB Machine: EC2 m1 large with the commit log directory on an ebs
 and the data directory on ephemeral storage.

 -  OS: Ubuntu server 10.04

 -  With the exception of changing JMX settings, no memory or JVM
 changes were made to options in cassandra-env.sh

 -  In Cassandra.yaml, I reduced binary_memtable_throughput_in_mb to
 100 in my second test to try follow the heap memory calculation formula; I
 have 8 column families.

 -  I am using the Sun JVM, specifically “build 1.6.0_20-b02”

 -  The app is written in java and I am using the latest Pelops
 library, I am sending updates at consistency level ONE and reading them at
 level ALL.



 I have been fairly impressed with Cassandra overall and given that I am
 using a beta version, I don’t expect fully polished behaviour. What is
 unacceptable, and quite frankly nearly unbelievable, is the fact Cassandra
 cant seem to recover from the error and I am loosing data.



 Dan Hendry



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com