I reduced the load and the problem hasn't been happening as much. After enabling gc logging, I see messages mentioning promotion failed when the pauses happen. It looks like this happens when there is a promotion failure. From reading on the web it looks like I could try reducing the CMSInitiatingOccupancyFraction value and/or decreasing the young gen size to try to avoid this scenario.
Also is it normal to see the "Heap is xx full. You may need to reduce memtable and/or cache sizes" message quite often? I haven't turned on row caches or changed any default memtable size settings so I am wondering why the old gen fills up. On Wed, Jul 4, 2012 at 6:28 AM, aaron morton <aa...@thelastpickle.com>wrote: > What accounts for the much larger virtual number? some kind of off-heap > memory? > > http://wiki.apache.org/cassandra/FAQ#mmap > > I'm a little puzzled as to why I would get such long pauses without > swapping. > > The two are not related. On startup the JVM memory is locked so it will > not swap, from then on memory management is pretty much up the JVM. > > Getting a lot of ParNew activity does not mean the JVM is low on memory, > it means there is a lot of activity in the new heap. > > If you have a lot of insert activity (typically in a load test) you can > generate a lot of GC activity. Try reducing the load to a point where it > does not ht GC and then increase to find the cause. Also if you can connect > JConole to the JVM you may get a better view of the heap usage. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/07/2012, at 3:41 PM, feedly team wrote: > > Couple more details. I confirmed that swap space is not being used (free > -m shows 0 swap) and cassandra.log has a message like "JNA mlockall > successful". top shows the process having 9g in resident memory but 21.6g > in virtual...What accounts for the much larger virtual number? some kind of > off-heap memory? > > I'm a little puzzled as to why I would get such long pauses without > swapping. I uncommented all the gc logging options in cassandra-env.sh to > try to see what is going on when the node freezes. > > Thanks > Kireet > > On Mon, Jul 2, 2012 at 9:51 PM, feedly team <feedly...@gmail.com> wrote: > >> Yeah I noticed the leap second problem and ran the suggested fix, but I >> have been facing these problems before Saturday and still see the >> occasional failures after running the fix. >> >> Thanks. >> >> >> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both <mb...@terra.com.br> wrote: >> >>> Yeah! Look that. >>> >>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/ >>> I had the same problem. The solution was rebooting. >>> >>> On Mon, 2 Jul 2012 11:08:57 -0400 >>> feedly team <feedly...@gmail.com> wrote: >>> >>> > Hello, >>> > I recently set up a 2 node cassandra cluster on dedicated hardware. >>> In >>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP >>> > messages. Comparing the log messages between the 2 nodes, they seem to >>> > coincide with extremely long ParNew collections. I have seem some of >>> up to >>> > 50 seconds. The installation is pretty vanilla, I didn't change any >>> > settings and the machines don't seem particularly busy - cassandra is >>> the >>> > only thing running on the machine with an 8GB heap. The machine has >>> 64GB of >>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx >>> > full. You may need to reduce memtable and/or cache sizes' messages. >>> Would >>> > this help with the long ParNew collections? That message seems to be >>> > triggered on a full collection. >>> >>> -- >>> Marcus Both >>> >>> >> > >