subject:"\[jira\] \[Comment Edited\] \(CASSANDRA\-11818\) C\* does neither recover nor trigger stability inspector on direct memory OOM"

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-29 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306030#comment-15306030
 ] 

Robert Stupp edited comment on CASSANDRA-11818 at 5/29/16 7:10 PM:
---

I've tried [~norman]'s patch against Netty 4.1 against trunk. With the patch 
enabled (requires {{-Dio.netty.noDirectBufferNoCleaner=false}}) my overloaded 
node recovers nicely. The CMS-GC-storm caused by {{Bits.reserveMemory()}} does 
not occur and the node remains responsive. However, while the node is in an 
overload situation, it spews a lot of errors. Unfortunately these are 
{{java.lang.OutOfMemoryError: No more memory available}}), which is generally 
fine, but in this case it just indicates that there is not enough direct memory 
to fulfill the *current* request. IMO, passing this OOM to 
{{JMVStabilityInspector}} would be wrong, since it is a recoverable error. 
(Background: Netty has a separate, distinct direct memory pool then, which does 
not affect other operations or memory pools.)

I've also applied the same technique to C* internal direct memory allocations. 
(We already use {{FileUtils.clean()}} to cleanup direct buffers.)

To summarize, {{Bits.reserveMemory}} + {{Cleaner}} are the root cause IMO. Not 
having both reduce client latency as a side effect.

EDIT: removed client-latency numbers. Was not from 6 to <1ms - but from .8ms to 
.1ms (99.9percentile).

EDIT2: the number of 6ms can actually be correct. Caused by a longer GC, the 
native-request-pool can grow (e.g. from 140 to 190) - after that, the 
client-latency suddenly increased from .1ms to 6ms.


was (Author: snazy):
I've tried [~norman]'s patch against Netty 4.1 against trunk. With the patch 
enabled (requires {{-Dio.netty.noDirectBufferNoCleaner=false}}) my overloaded 
node recovers nicely. The CMS-GC-storm caused by {{Bits.reserveMemory()}} does 
not occur and the node remains responsive. However, while the node is in an 
overload situation, it spews a lot of errors. Unfortunately these are 
{{java.lang.OutOfMemoryError: No more memory available}}), which is generally 
fine, but in this case it just indicates that there is not enough direct memory 
to fulfill the *current* request. IMO, passing this OOM to 
{{JMVStabilityInspector}} would be wrong, since it is a recoverable error. 
(Background: Netty has a separate, distinct direct memory pool then, which does 
not affect other operations or memory pools.)

I've also applied the same technique to C* internal direct memory allocations. 
(We already use {{FileUtils.clean()}} to cleanup direct buffers.)

To summarize, {{Bits.reserveMemory}} + {{Cleaner}} are the root cause IMO. Not 
having both reduce client latency as a side effect.

EDIT: removed client-latency numbers. Was not from 6 to <1ms - but from .8ms to 
.1ms (99.9percentile).

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, 
> oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
>

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-29 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306030#comment-15306030
 ] 

Robert Stupp edited comment on CASSANDRA-11818 at 5/29/16 6:57 PM:
---

I've tried [~norman]'s patch against Netty 4.1 against trunk. With the patch 
enabled (requires {{-Dio.netty.noDirectBufferNoCleaner=false}}) my overloaded 
node recovers nicely. The CMS-GC-storm caused by {{Bits.reserveMemory()}} does 
not occur and the node remains responsive. However, while the node is in an 
overload situation, it spews a lot of errors. Unfortunately these are 
{{java.lang.OutOfMemoryError: No more memory available}}), which is generally 
fine, but in this case it just indicates that there is not enough direct memory 
to fulfill the *current* request. IMO, passing this OOM to 
{{JMVStabilityInspector}} would be wrong, since it is a recoverable error. 
(Background: Netty has a separate, distinct direct memory pool then, which does 
not affect other operations or memory pools.)

I've also applied the same technique to C* internal direct memory allocations. 
(We already use {{FileUtils.clean()}} to cleanup direct buffers.)

To summarize, {{Bits.reserveMemory}} + {{Cleaner}} are the root cause IMO. Not 
having both reduce client latency as a side effect.

EDIT: removed client-latency numbers. Was not from 6 to <1ms - but from .8ms to 
.1ms (99.9percentile).


was (Author: snazy):
I've tried [~norman]'s patch against Netty 4.1 against trunk. With the patch 
enabled (requires {{-Dio.netty.noDirectBufferNoCleaner=false}}) my overloaded 
node recovers nicely. The CMS-GC-storm caused by {{Bits.reserveMemory()}} does 
not occur and the node remains responsive. However, while the node is in an 
overload situation, it spews a lot of errors. Unfortunately these are 
{{java.lang.OutOfMemoryError: No more memory available}}), which is generally 
fine, but in this case it just indicates that there is not enough direct memory 
to fulfill the *current* request. IMO, passing this OOM to 
{{JMVStabilityInspector}} would be wrong, since it is a recoverable error. 
(Background: Netty has a separate, distinct direct memory pool then, which does 
not affect other operations or memory pools.)

I've also applied the same technique to C* internal direct memory allocations. 
(We already use {{FileUtils.clean()}} to cleanup direct buffers.)

To summarize, {{Bits.reserveMemory}} + {{Cleaner}} are the root cause IMO. Not 
having both reduce client latency as a side effect (from 6ms to <1ms in my 
test).

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, 
> oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.tra

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-27 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304464#comment-15304464
 ] 

Norman Maurer edited comment on CASSANDRA-11818 at 5/27/16 6:14 PM:


[~snazy] Sorry for the late response (busy as always :( )... I wonder if this 
would be something that may be helpful for you in terms of Netty: 
https://github.com/netty/netty/pull/5314


was (Author: norman):
Sorry for the late response (busy as always :( )... I wonder if this would be 
something that may be helpful for you in terms of Netty: 
https://github.com/netty/netty/pull/5314

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, 
> oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445)
>  ~[main/:na]
>   at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> {code}
> The situation does not get better when the load driver is stopped.
> I can reproduce this scenario at will. Managed to get histogram, stack traces 
> and heap dump. Already increased {{-XX:MaxDirectMemoryS

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-22 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295611#comment-15295611
 ] 

Robert Stupp edited comment on CASSANDRA-11818 at 5/22/16 3:59 PM:
---

This thing's getting weird. I tried a couple of different combinations:
# with unpooled direct buffers (change in {{CBUtil}}: {{allocator = new 
UnpooledByteBufAllocator(true)}}) against Java 1.8.0_92
# with unpooled heap buffers (change in {{CBUtil}}: {{allocator = new 
UnpooledByteBufAllocator(false)}}) against Java 1.8.0_92

Variant 1 showed the same behavior. Looks like GC causes long STW, causing 
responses to pile up and then trying to allocate a totally huge amount of 
direct memory
Variant 2 showed a strange behavior with PS GC _and_ ParNew/CMS GC: Works fine 
until the old gen becomes eligible for GC. And then ends with an effectively 
endless-GC-loop. (CMS basically blocked the application according to GC log)

That led me to the idea to try it with Java 1.8.0_66 - and that works fine with 
variant 2. Java 1.8.0_77 also works fine with variant 2. So it looks like Java 
1.8.0_92 is somehow broken. (Didn’t try variant 1)

Unfortunately, the default variant with pooled byte buffers does not work with 
1.8.0_77. It shows the same behavior as with u92.

So, the underlying problem remains: if GC runs wild (i.e. runs into some longer 
STW phases - some hundred milliseconds in my test), requests pile up and cause 
a lot of allocations, which in turn can cause OOMs. I have no idea why it runs 
into these many, consecutive, long GCs. Just see the result: lots of allocated 
direct memory, that’s never freed. It then looks like as in the following 
screenshot. At approx 16:03 a couple of ParNews with 200-500ms kicked in but 
stopped after a while. At 16:06 some ParNews with 200-500ms STW occurred and 
lasted until the load driver was killed. As a result all (allowed) direct 
memory is allocated (configured with 2g max direct memory).

pooled direct mem usage w/ 1.8.0u77:
!11818-direct-mem.png|width=1000!

Further, using unpooled direct buffers work a bit better than pooled direct 
buffers: At least, the unpooled buffers are released after a couple of minutes 
and C* is responsive again.

unpooled direct mem usage w/ 1.8.0u77:
!11818-direct-mem-unpooled.png|width=1000!

The buffer allocations that cause this problem are for messages with around 4MB 
or more. Could be, that it is probably not a good idea to use a pre-thread 
arena for messages that are that big.

/cc [~norman], do you have any idea?


was (Author: snazy):
This thing's getting weird. I tried a couple of different combinations:
# with unpooled direct buffers (change in {{CBUtil}}: {{allocator = new 
UnpooledByteBufAllocator(true)}}) against Java 1.8.0_92
# with unpooled heap buffers (change in {{CBUtil}}: {{allocator = new 
UnpooledByteBufAllocator(false)}}) against Java 1.8.0_92

Variant 1 showed the same behavior. Looks like GC causes long STW, causing 
responses to pile up and then trying to allocate a totally huge amount of 
direct memory
Variant 2 showed a strange behavior with PS GC _and_ ParNew/CMS GC: Works fine 
until the old gen becomes eligible for GC. And then ends with an effectively 
endless-GC-loop. (CMS basically blocked the application according to GC log)

That led me to the idea to try it with Java 1.8.0_66 - and that works fine with 
variant 2. Java 1.8.0_77 also works fine with variant 2. So it looks like Java 
1.8.0_92 is somehow broken. (Didn’t try variant 1)

Unfortunately, the default variant with pooled byte buffers does not work with 
1.8.0_77. It shows the same behavior as with u92.

So, the underlying problem remains: if GC runs wild (i.e. runs into some longer 
STW phases - some hundred milliseconds in my test), requests pile up and cause 
a lot of allocations, which in turn can cause OOMs. I have no idea why it runs 
into these many, consecutive, long GCs. Just see the result: lots of allocated 
direct memory, that’s never freed. It then looks like as in the following 
screenshot. At approx 16:03 a couple of ParNews with 200-500ms kicked in but 
stopped after a while. At 16:06 some ParNews with 200-500ms STW occurred and 
lasted until the load driver was killed. As a result all (allowed) direct 
memory is allocated (configured with 2g max direct memory).

!11818-direct-mem.png|pooled direct mem usage w/ 1.8.0u77!

Further, using unpooled direct buffers work a bit better than pooled direct 
buffers: At least, the unpooled buffers are released after a couple of minutes 
and C* is responsive again.

!11818-direct-mem-unpooled.png|unpooled direct mem usage w/ 1.8.0u77!

The buffer allocations that cause this problem are for messages with around 4MB 
or more. Could be, that it is probably not a good idea to use a pre-thread 
arena for messages that are that big.

/cc [~norman], do you have any idea?

> C* does

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-17 Thread Joshua McKenzie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287103#comment-15287103
 ] 

Joshua McKenzie edited comment on CASSANDRA-11818 at 5/17/16 5:50 PM:
--

A couple observations:

1) We'd need to change our handling of the error 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/Message.java#L396]
 for it to be inspected by the JVMStabilityInspector. If we're OOM, wrapping's 
going to fail.
2) We both need CASSANDRA-8092 to be integrated into CI to catch new errors 
like this in the future, and it needs its logic revised to check if we're 
immediately rethrowing vs. attempting further operation and/or wrapping an 
exception since that will fail in the OOM condition.

Looks like the state of the code-base has regressed w/regards to this issue 
since I submit that ticket:
{noformat}
Total caught and rethrown as something other than Runtime: 100
Total caught and rethrown as Runtime: 68
Total Swallowed: 81
Total delegated to JVMStabilityInspector: 69
Total 'catch (Throwable ...)' analyzed: 120
Total 'catch (Exception ...)' analyzed: 198
Total catch clauses analyzed: 318
{noformat}

[~mshuler]: Any word on where CASSANDRA-8092 falls on the priority list?


was (Author: joshuamckenzie):
A few observations:

1) We'd need to change our handling of the error 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/Message.java#L396]
 for it to be inspected by the JVMStabilityInspector. If we're OOM, wrapping's 
going to fail.
2) We both need CASSANDRA-8092 to be integrated into CI to catch new errors 
like this in the future, and it needs its logic revised to check if we're 
immediately rethrowing vs. attempting further operation and/or wrapping an 
exception since that will fail in the OOM condition.

Looks like the state of the code-base has regressed w/regards to this issue 
since I submit that ticket:
{noformat}
Total caught and rethrown as something other than Runtime: 100
Total caught and rethrown as Runtime: 68
Total Swallowed: 81
Total delegated to JVMStabilityInspector: 69
Total 'catch (Throwable ...)' analyzed: 120
Total 'catch (Exception ...)' analyzed: 198
Total catch clauses analyzed: 318
{noformat}

[~mshuler]: Any word on where CASSANDRA-8092 falls on the priority list?

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channe

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

5 matches

Site Navigation

Mail list logo

Footer information