Re: Cassandra OOM

Vitalii Tymchyshyn Wed, 04 Jan 2012 02:39:28 -0800

Hello.

BTW: It would be great for cassandra to shutdown on Errors like OOMbecause now I am not sure if the problem described in previous email isthe root cause or some of OOM error found in log made some "writer" stop.

I am now looking at different OOMs in my cluster. Currently each nodehas up to 300G of data in ~10 column families. Previous Heap Size of 3Gseems to be not enough, I am raising to to 5G. Looking at heap dumps, alot of memory is taken by memtables, much more than 1/3 of heap. At thesame time, logs say that it has nothing to flush since there are notdirty memtables. So, what are cassandra memory requirement? Is it 1% or2% of disk data? Or may be I am doing something wrong?


Best regards, Vitalii Tymchyshyn

03.01.12 20:58, aaron morton написав(ла):

The DynamicSnitch can result in less read operations been sent to anode, but as long as a node is marked as UP mutations are sent to allreplicas. Nodes will shed load when they pull messages off the queuethat have expired past rpc_timeout, but they will not feed back flowcontrol to the other nodes. Other than going down or performing slowenough for the dynamic snitch to route reads around them.
There are also safety valves in there to reduce the size of thememtables and caches in response to low memory. Perhaps that processcould also shed messages from thread pools with a high number ofpending messages.
**But** going OOM with 2M+ mutations in the thread pool sounds likethe server was going down anyway. Did you look into why all themessages were there ?
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:
Hello.
We are using cassandra for some time in our project. Currently we areon 1.1 trunk (it was accidental migration, but since it's hard tomigrate back and it's performing nice enough we are currently on 1.1).During New Year holidays one of the servers've produces a number ofOOM messages in the log.According to heap dump taken, most of the memory is taken byMutationStage queue (over 2millions of items).So, I am curious now if cassandra have any flow control for messages?We are using Quorum for writes and it seems to me that one slowserver may start getting more messages than it can consume. Thewrites will still succeed performed by other servers in thereplication set.If there is no flow control, it should eventually get OOM. Is it thecase? Are there any plans to handle this?BTW: A lot of memory (~half) is taken by Inet4Address objects, somaking a cache of such objects would make this problem less possible.
--
Best regards,
 Vitalii Tymchyshyn

Re: Cassandra OOM

Reply via email to