Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-29 Thread Michael Semb Wever


> we had an awful performance/throughput experience with 3.x coming from 2.1. 
> 3.11 is simply a memory hog, if you are using batch statements on the client 
> side. If so, you are likely affected by 
> https://issues.apache.org/jira/browse/CASSANDRA-16201
> 


Confirming what Thomas writes, heavy users of batch statements can likely hit 
memory issues in 3.0 and 3.11.  

It is worth testing upgrades for these memory issues and if evident waiting for 
CASSANDRA-16201 to land in a release before upgrading to 3.11 (skip 3.0).

Further background info on why you want 3.11 over 3.0 is in CASSANDRA-15430, 
CASSANDRA-13929 and CASSANDRA-9766 (but this is all very much dependant on 
16201 landing).

regards,
Mick

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-28 Thread Steinmaurer, Thomas
Leon,

we had an awful performance/throughput experience with 3.x coming from 2.1. 
3.11 is simply a memory hog, if you are using batch statements on the client 
side. If so, you are likely affected by 
https://issues.apache.org/jira/browse/CASSANDRA-16201


Regards,
Thomas

From: Leon Zaruvinsky 
Sent: Wednesday, October 28, 2020 5:21 AM
To: user@cassandra.apache.org 
Subject: Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary 
upgrade



Our JVM options are unchanged between 2.2 and 3.11

For the sake of clarity, do you mean:
(a) you're using the default JVM options in 3.11 and it's different to the 
options you had in 2.2?
(b) you've copied the same JVM options you had in 2.2 to 3.11?

(b), which are the default options from 2.2 (and I believe the default options 
in 3.11 from a brief glance).

Copied here for clarity, though I'm skeptical that GC settings are actually a 
cause here because I would expect them to only impact the upgraded node and not 
the cluster overall.

### CMS Settings
-XX:+UseParNewGC
XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
XX:+CMSClassUnloadingEnabled

The distinction is important because at the moment, you need to go through a 
process of elimination to identify the cause.


Read throughput (rate, bytes read/range scanned, etc.) seems fairly consistent 
before and after the upgrade across all nodes.

What I was trying to get at is whether the upgraded node was getting hit with 
more traffic compared to the other nodes since it will indicate that the longer 
GCs are just the symptom, not the cause.


I don't see any distinct change, nor do I see an increase in traffic to the 
upgraded node that would result in longer GC pauses.  Frankly I don't see any 
changes or aberrations in client-related metrics at all that correlate to the 
GC pauses, except for the corresponding timeouts.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
F?nfundzwanziger Turm 20


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
> Our JVM options are unchanged between 2.2 and 3.11
>>
>
> For the sake of clarity, do you mean:
> (a) you're using the default JVM options in 3.11 and it's different to the
> options you had in 2.2?
> (b) you've copied the same JVM options you had in 2.2 to 3.11?
>

(b), which are the default options from 2.2 (and I believe the default
options in 3.11 from a brief glance).

Copied here for clarity, though I'm skeptical that GC settings are actually
a cause here because I would expect them to only impact the upgraded node
and not the cluster overall.

### CMS Settings
-XX:+UseParNewGC
XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
XX:+CMSClassUnloadingEnabled


> The distinction is important because at the moment, you need to go through
> a process of elimination to identify the cause.
>
>
>> Read throughput (rate, bytes read/range scanned, etc.) seems fairly
>> consistent before and after the upgrade across all nodes.
>>
>
> What I was trying to get at is whether the upgraded node was getting hit
> with more traffic compared to the other nodes since it will indicate that
> the longer GCs are just the symptom, not the cause.
>
>
I don't see any distinct change, nor do I see an increase in traffic to the
upgraded node that would result in longer GC pauses.  Frankly I don't see
any changes or aberrations in client-related metrics at all that correlate
to the GC pauses, except for the corresponding timeouts.


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Erick Ramirez
>
> Our JVM options are unchanged between 2.2 and 3.11
>

For the sake of clarity, do you mean:
(a) you're using the default JVM options in 3.11 and it's different to the
options you had in 2.2?
(b) you've copied the same JVM options you had in 2.2 to 3.11?

The distinction is important because at the moment, you need to go through
a process of elimination to identify the cause.


> Read throughput (rate, bytes read/range scanned, etc.) seems fairly
> consistent before and after the upgrade across all nodes.
>

What I was trying to get at is whether the upgraded node was getting hit
with more traffic compared to the other nodes since it will indicate that
the longer GCs are just the symptom, not the cause.

Again, it's a process of elimination. Cheers!

>


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
Thanks Erick.



Our JVM options are unchanged between 2.2 and 3.11, and we have disk access
mode set to standard.  Generally we’ve maintained all configuration between
the two versions.


Read throughput (rate, bytes read/range scanned, etc.) seems fairly
consistent before and after the upgrade across all nodes.


Leon

On Wed, Oct 28, 2020 at 12:01 AM Erick Ramirez 
wrote:

> I haven't seen this specific behaviour in the past but things that I would
> look at are:
>
>- JVM options which differ between 3.11 defaults and what you have
>configured in 2.2
>- review your monitoring and check read throughput on the upgraded
>node as compared to 2.2 nodes
>- possibly not have disk access mode set to map index files only (not
>directly related to long GC pauses)
>
> If you're interested, I've written a post about disk access mode here --
> https://community.datastax.com/questions/6947/. Cheers!
>
>>


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Erick Ramirez
I haven't seen this specific behaviour in the past but things that I would
look at are:

   - JVM options which differ between 3.11 defaults and what you have
   configured in 2.2
   - review your monitoring and check read throughput on the upgraded node
   as compared to 2.2 nodes
   - possibly not have disk access mode set to map index files only (not
   directly related to long GC pauses)

If you're interested, I've written a post about disk access mode here --
https://community.datastax.com/questions/6947/. Cheers!

>


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Erick Ramirez
On Wed, 28 Oct 2020 at 14:41, Rich Hawley  wrote:

> unsubscribe
>

You need to email user-unsubscr...@cassandra.apache.org to unsubscribe from
the list. Cheers!


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Rich Hawley
unsubscribe

On Tue, Oct 27, 2020 at 11:40 PM Leon Zaruvinsky 
wrote:

> Hi,
>
> I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort
> because of major performance issues associated with GC pauses.
>
> Details:
> 3 node cluster, RF 3, 1 DC
> ~2TB data per node
> Heap Size: 12G / New Size: 5G
>
> I didn't even get very far in the upgrade - I just upgraded a binary of a
> single node to 3.11.6 (did not run upgradesstables) and let it sit.  Within
> 10 minutes, I started seeing elevated GC pressure and lots of timeouts in
> the metrics.
>
> All three nodes, not just the upgraded one, are seeing GC problems.
> GC par new time jumped from .38 up to 3%.  CMS times up to 30 seconds.
>
> Once I turn off node on 3.11.6, the cluster eventually recovers.
>
> Can anyone point me to ways to debug this?  I've taken heap dumps of all
> nodes but nothing in particular stands out, and there are no
> obvious messages in the logs that point to problems.
>


-- 
hawley.r...@gmail.com 757-243-7665


GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
Hi,

I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort
because of major performance issues associated with GC pauses.

Details:
3 node cluster, RF 3, 1 DC
~2TB data per node
Heap Size: 12G / New Size: 5G

I didn't even get very far in the upgrade - I just upgraded a binary of a
single node to 3.11.6 (did not run upgradesstables) and let it sit.  Within
10 minutes, I started seeing elevated GC pressure and lots of timeouts in
the metrics.

All three nodes, not just the upgraded one, are seeing GC problems.
GC par new time jumped from .38 up to 3%.  CMS times up to 30 seconds.

Once I turn off node on 3.11.6, the cluster eventually recovers.

Can anyone point me to ways to debug this?  I've taken heap dumps of all
nodes but nothing in particular stands out, and there are no
obvious messages in the logs that point to problems.