Re: frequent node up/downs

feedly team Fri, 06 Jul 2012 09:34:05 -0700

I reduced the load and the problem hasn't been happening as much. After
enabling gc logging, I see messages mentioning promotion failed when the
pauses happen. It looks like this happens when there is a promotion
failure. From reading on the web it looks like I could try reducing the
CMSInitiatingOccupancyFraction value and/or decreasing the young gen size
to try to avoid this scenario.


Also is it normal to see the "Heap is xx full.  You may need to reduce
memtable and/or cache sizes" message quite often? I haven't turned on row
caches or changed any default memtable size settings so I am wondering why
the old gen fills up.


On Wed, Jul 4, 2012 at 6:28 AM, aaron morton <aa...@thelastpickle.com>wrote:

> What accounts for the much larger virtual number? some kind of off-heap
> memory?
>
> http://wiki.apache.org/cassandra/FAQ#mmap
>
> I'm a little puzzled as to why I would get such long pauses without
> swapping.
>
> The two are not related. On startup the JVM memory is locked so it will
> not swap, from then on memory management is pretty much up the JVM.
>
> Getting a lot of ParNew activity does not mean the JVM is low on memory,
> it means there is a lot of activity in the new heap.
>
> If you have a lot of insert activity (typically in a load test) you can
> generate a lot of GC activity. Try reducing the load to a point where it
> does not ht GC and then increase to find the cause. Also if you can connect
> JConole to the JVM you may get a better view of the heap usage.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/07/2012, at 3:41 PM, feedly team wrote:
>
> Couple more details. I confirmed that swap space is not being used (free
> -m shows 0 swap) and cassandra.log has a message like "JNA mlockall
> successful". top shows the process having 9g in resident memory but 21.6g
> in virtual...What accounts for the much larger virtual number? some kind of
> off-heap memory?
>
> I'm a little puzzled as to why I would get such long pauses without
> swapping. I uncommented all the gc logging options in cassandra-env.sh to
> try to see what is going on when the node freezes.
>
> Thanks
> Kireet
>
> On Mon, Jul 2, 2012 at 9:51 PM, feedly team <feedly...@gmail.com> wrote:
>
>> Yeah I noticed the leap second problem and ran the suggested fix, but I
>> have been facing these problems before Saturday and still see the
>> occasional failures after running the fix.
>>
>> Thanks.
>>
>>
>> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both <mb...@terra.com.br> wrote:
>>
>>> Yeah! Look that.
>>>
>>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/
>>> I had the same problem. The solution was rebooting.
>>>
>>> On Mon, 2 Jul 2012 11:08:57 -0400
>>> feedly team <feedly...@gmail.com> wrote:
>>>
>>> > Hello,
>>> >    I recently set up a 2 node cassandra cluster on dedicated hardware.
>>> In
>>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP
>>> > messages. Comparing the log messages between the 2 nodes, they seem to
>>> > coincide with extremely long ParNew collections. I have seem some of
>>> up to
>>> > 50 seconds. The installation is pretty vanilla, I didn't change any
>>> > settings and the machines don't seem particularly busy - cassandra is
>>> the
>>> > only thing running on the machine with an 8GB heap. The machine has
>>> 64GB of
>>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
>>> > full. You may need to reduce memtable and/or cache sizes' messages.
>>> Would
>>> > this help with the long ParNew collections? That message seems to be
>>> > triggered on a full collection.
>>>
>>> --
>>> Marcus Both
>>>
>>>
>>
>
>

Re: frequent node up/downs

Reply via email to