On 2017-07-14 11:23 (-0700), "Harika Vangapelli -T (hvangape - AKRAYA INC at 
Cisco)"
        <hvang...@cisco.com> wrote: 
> We are using Cassandra 3.x version..
> 

Which 3.x version? 3.11.0? 3.0.14? 3.7? Exact version is important. 

> Recently, our production database is going through some instability issues. 
> One of our node is keep going down from every 2 days up to a few of times a 
> day. The node is down due to JVM out of memory. According to my 
> investigation, I suspect that this might be related to the writing and/or 
> running compaction of the large partitions for some of our large data tables. 
> Here's might be what had happened
> 1. The node went OOM due to unable to de-serialize or compacting some large 
> partitions under some condition due to memory constrains.
> 2. Once we re-started it, which was usually a few hours later, the other 
> nodes in the cluster were trying to perform the hinted handoff to the down 
> node to patch the missing data. From now on, the down node would have to 
> handle handoff plus the normal data load, which made it even busier.
> 3. The node was not able to complete the handoff and went down again.
> 4. This went again and again.
> 

Sounds like it's always the same node? You may want to try running 'nodetool 
scrub' on that node and watching logs for errors that may indicate a corrupt 
file on disk, which would cause the behavior you're seeing.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to