Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2

2018-03-23 Thread Nitan Kainth
Martin,

Would you pls share settings you had before and what did you change? We have 
similar issue.



> On Mar 23, 2018, at 8:47 AM, Martin Mačura  wrote:
> 
> Nevermind, we resolved the issue  JVM heap settings were misconfigured
> 
> Martin
> 
>> On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura  wrote:
>> Hi all,
>> 
>> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
>> it to 3.11.2.
>> 
>> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>> 
>> After the upgrade, clients started reporting connection issues:
>> 
>> cassandra | [ERROR] Closing established connection pool to host
>>  because of the following error: Read error 'connection
>> reset by peer' (src/pool.cpp:384)
>> cassandra | [ERROR] Unable to establish a control connection to host
>>  because of the following error: Error: 'Request timed out'
>> (0x010E) (src/control_connection.cpp:263)
>> 
>> 
>> Cassandra logs are full of garbage collection warnings:
>> 
>> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
>> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
>> 1774446288; Par Survivor Space: 858980344 -> 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
>> Pool NameActive   Pending  Completed   Blocked
>> All Time Blocked
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> MutationStage10 92526002 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ViewMutationStage 0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ReadStage 2 2 943544 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> RequestResponseStage  0 01666876 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> ReadRepairStage   0 0  10362 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CounterMutationStage  0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MiscStage 0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CompactionExecutor0 0   3076 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MemtableReclaimMemory 0 0 44 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> PendingRangeCalculator0 0  4 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> GossipStage   0 0  14287 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> SecondaryIndexManagement  0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> HintsDispatcher   0 0  1 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_1 0 0 37
>> 0 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_2 0 0 37
>> 0 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> MigrationStage0 0  2 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> MemtablePostFlush 0 0 72 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0 0 0 44
>> 0 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> ValidationExecutor0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> Sampler   0 0  0 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> MemtableFlushWriter   0 0 44 0
>>0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_5 0 0 

Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2

2018-03-23 Thread Martin Mačura
Nevermind, we resolved the issue  JVM heap settings were misconfigured

Martin

On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura  wrote:
> Hi all,
>
> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
> it to 3.11.2.
>
> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>
> After the upgrade, clients started reporting connection issues:
>
> cassandra | [ERROR] Closing established connection pool to host
>  because of the following error: Read error 'connection
> reset by peer' (src/pool.cpp:384)
> cassandra | [ERROR] Unable to establish a control connection to host
>  because of the following error: Error: 'Request timed out'
> (0x010E) (src/control_connection.cpp:263)
>
>
> Cassandra logs are full of garbage collection warnings:
>
> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
> 1774446288; Par Survivor Space: 858980344 -> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
> Pool NameActive   Pending  Completed   Blocked
>  All Time Blocked
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> MutationStage10 92526002 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ViewMutationStage 0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ReadStage 2 2 943544 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> RequestResponseStage  0 01666876 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> ReadRepairStage   0 0  10362 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CounterMutationStage  0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MiscStage 0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CompactionExecutor0 0   3076 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MemtableReclaimMemory 0 0 44 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> PendingRangeCalculator0 0  4 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> GossipStage   0 0  14287 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> SecondaryIndexManagement  0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> HintsDispatcher   0 0  1 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_1 0 0 37
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_2 0 0 37
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> MigrationStage0 0  2 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> MemtablePostFlush 0 0 72 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_0 0 0 44
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> ValidationExecutor0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> Sampler   0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> MemtableFlushWriter   0 0 44 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_5 0 0 37
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> InternalResponseStage 0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
> 

Nodes unresponsive after upgrade 3.9 -> 3.11.2

2018-03-23 Thread Martin Mačura
Hi all,

We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
it to 3.11.2.

Each node has 32 GB RAM, 8 GB Cassandra heap size.

After the upgrade, clients started reporting connection issues:

cassandra | [ERROR] Closing established connection pool to host
 because of the following error: Read error 'connection
reset by peer' (src/pool.cpp:384)
cassandra | [ERROR] Unable to establish a control connection to host
 because of the following error: Error: 'Request timed out'
(0x010E) (src/control_connection.cpp:263)


Cassandra logs are full of garbage collection warnings:

WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
1774446288; Par Survivor Space: 858980344 -> 0
INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
Pool NameActive   Pending  Completed   Blocked
 All Time Blocked
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
MutationStage10 92526002 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ViewMutationStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ReadStage 2 2 943544 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
RequestResponseStage  0 01666876 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
ReadRepairStage   0 0  10362 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CounterMutationStage  0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MiscStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CompactionExecutor0 0   3076 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MemtableReclaimMemory 0 0 44 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
PendingRangeCalculator0 0  4 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
GossipStage   0 0  14287 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
SecondaryIndexManagement  0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
HintsDispatcher   0 0  1 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_1 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_2 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
MigrationStage0 0  2 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
MemtablePostFlush 0 0 72 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_0 0 0 44
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
ValidationExecutor0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
Sampler   0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
MemtableFlushWriter   0 0 44 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_5 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
InternalResponseStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_3 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_4 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
AntiEntropyStage  0