Re: RE: Re: Re: High mutation stage in multi dc deployment

2021-07-20 Thread Jeff Jirsa
This is sufficiently atypical that many people aren't going to have enough
intuition to figure it out without seeing your metrics / logs / debugging
data (e.g. heap dumps).

My only guess, and it's a pretty big guess, is that your write timeout is
low enough  (or network quality bad enough, though unlikely with gcp,
they're usually very good at networking) that your coordinator is timing
out waiting for india to ack the write, which is causing it to write a
hint, and the delivery of that hint also times out due to (network or
latency or just speed of light), so you're creating a death spiral of write
-> write is slow so hint -> hint replay now means youre' doing extra work,
so it's even slower, so new writes also hint, and maybe hints re-deliver.

The other option would be that system_traces is only in india, and someone
enabled tracing (either application level or probabilistically).

If it's neither of those, you're going to have to debug it for real. Do
metrics show hints? Do you have table level metrics for writes per table?
Are they the same as you expect? Are they higher in india? Is one table
only in india and it's getting lots of writes? Take a heap dump of one of
the india machines and look to see what the mutations are. Is it a table
you recognize? Is the replication factor set the way you expect?


On Tue, Jul 20, 2021 at 10:20 AM MyWorld  wrote:

> Kindly help in this regard. What could be the possible reason for load and
> mutation spike in india data center
>
> On 2021/07/20 00:14:56 MyWorld wrote:
> > Hi Arvinder,
> > It's a separate cluster. Here max partition size is 32mb.
> >
> > On 2021/07/19 23:57:27 Arvinder Dhillon wrote:
> > > Is this the same cluster with 1G partition size?
> > >
> > > -Arvinder
> > >
> > > On Mon, Jul 19, 2021, 4:51 PM MyWorld  wrote:
> > >
> > > > Hi daemeon,
> > > > We have already tuned the TCP settings to improve the bandwidth.
> Earlier
> > > > we had lot of hint and mutation msg drop which were gone after tuning
> > TCP.
> > > > Moreover we are writing with CL local quorum at US side, so ack is
> taken
> > > > from local DC.
> > > > I m still concern what could be reason of increase mutation count.
> > > >
> > > > On 2021/07/19 19:55:52 daemeon reiydelle wrote:
> > > > > You may want to think about the latency impacts of a cluster that
> has
> > one
> > > > > node "far away". This is such a basic design flaw that you need to
> do
> > > > some
> > > > > basic learning, and some basic understanding of networking and
> > latency.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Currently we have a cluster with 2 DC of 3 nodes each. One DC is
> in
> > > > GCP-US
> > > > > > while other is in GCP-India. Just to add here, configuration of
> > every
> > > > node
> > > > > > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> > > > > >
> > > > > > We do all our write on US data center. While performing a bulk
> > write on
> > > > > > GCP US, we observe normal load of 1 on US while this load at GCP
> > India
> > > > > > spikes to 10.
> > > > > >
> > > > > > On observing tpstats further in grafana we found mutation stage
> at
> > GCP
> > > > > > India is going to 1million intermittently though our overall
> write
> > is
> > > > > > nearly 300 per sec per node. Don't know the reason but whenever
> we
> > have
> > > > > > this spike, we are having load issue.
> > > > > > Please help what could be the possible reason for this?
> > > > > >
> > > > > > Regards,
> > > > > > Ashish
> > > > > >
> > > > >
> > > >
> > >
> >
>


RE: RE: Re: Re: High mutation stage in multi dc deployment

2021-07-20 Thread MyWorld
Kindly help in this regard. What could be the possible reason for load and
mutation spike in india data center

On 2021/07/20 00:14:56 MyWorld wrote:
> Hi Arvinder,
> It's a separate cluster. Here max partition size is 32mb.
>
> On 2021/07/19 23:57:27 Arvinder Dhillon wrote:
> > Is this the same cluster with 1G partition size?
> >
> > -Arvinder
> >
> > On Mon, Jul 19, 2021, 4:51 PM MyWorld  wrote:
> >
> > > Hi daemeon,
> > > We have already tuned the TCP settings to improve the bandwidth.
Earlier
> > > we had lot of hint and mutation msg drop which were gone after tuning
> TCP.
> > > Moreover we are writing with CL local quorum at US side, so ack is
taken
> > > from local DC.
> > > I m still concern what could be reason of increase mutation count.
> > >
> > > On 2021/07/19 19:55:52 daemeon reiydelle wrote:
> > > > You may want to think about the latency impacts of a cluster that
has
> one
> > > > node "far away". This is such a basic design flaw that you need to
do
> > > some
> > > > basic learning, and some basic understanding of networking and
> latency.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Currently we have a cluster with 2 DC of 3 nodes each. One DC is
in
> > > GCP-US
> > > > > while other is in GCP-India. Just to add here, configuration of
> every
> > > node
> > > > > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> > > > >
> > > > > We do all our write on US data center. While performing a bulk
> write on
> > > > > GCP US, we observe normal load of 1 on US while this load at GCP
> India
> > > > > spikes to 10.
> > > > >
> > > > > On observing tpstats further in grafana we found mutation stage at
> GCP
> > > > > India is going to 1million intermittently though our overall write
> is
> > > > > nearly 300 per sec per node. Don't know the reason but whenever we
> have
> > > > > this spike, we are having load issue.
> > > > > Please help what could be the possible reason for this?
> > > > >
> > > > > Regards,
> > > > > Ashish
> > > > >
> > > >
> > >
> >
>


RE: Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Arvinder,
It's a separate cluster. Here max partition size is 32mb.

On 2021/07/19 23:57:27 Arvinder Dhillon wrote:
> Is this the same cluster with 1G partition size?
>
> -Arvinder
>
> On Mon, Jul 19, 2021, 4:51 PM MyWorld  wrote:
>
> > Hi daemeon,
> > We have already tuned the TCP settings to improve the bandwidth. Earlier
> > we had lot of hint and mutation msg drop which were gone after tuning
TCP.
> > Moreover we are writing with CL local quorum at US side, so ack is taken
> > from local DC.
> > I m still concern what could be reason of increase mutation count.
> >
> > On 2021/07/19 19:55:52 daemeon reiydelle wrote:
> > > You may want to think about the latency impacts of a cluster that has
one
> > > node "far away". This is such a basic design flaw that you need to do
> > some
> > > basic learning, and some basic understanding of networking and
latency.
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Currently we have a cluster with 2 DC of 3 nodes each. One DC is in
> > GCP-US
> > > > while other is in GCP-India. Just to add here, configuration of
every
> > node
> > > > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> > > >
> > > > We do all our write on US data center. While performing a bulk
write on
> > > > GCP US, we observe normal load of 1 on US while this load at GCP
India
> > > > spikes to 10.
> > > >
> > > > On observing tpstats further in grafana we found mutation stage at
GCP
> > > > India is going to 1million intermittently though our overall write
is
> > > > nearly 300 per sec per node. Don't know the reason but whenever we
have
> > > > this spike, we are having load issue.
> > > > Please help what could be the possible reason for this?
> > > >
> > > > Regards,
> > > > Ashish
> > > >
> > >
> >
>


Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread Arvinder Dhillon
Is this the same cluster with 1G partition size?

-Arvinder

On Mon, Jul 19, 2021, 4:51 PM MyWorld  wrote:

> Hi daemeon,
> We have already tuned the TCP settings to improve the bandwidth. Earlier
> we had lot of hint and mutation msg drop which were gone after tuning TCP.
> Moreover we are writing with CL local quorum at US side, so ack is taken
> from local DC.
> I m still concern what could be reason of increase mutation count.
>
> On 2021/07/19 19:55:52 daemeon reiydelle wrote:
> > You may want to think about the latency impacts of a cluster that has one
> > node "far away". This is such a basic design flaw that you need to do
> some
> > basic learning, and some basic understanding of networking and latency.
> >
> >
> >
> >
> >
> > On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
> >
> > > Hi all,
> > >
> > > Currently we have a cluster with 2 DC of 3 nodes each. One DC is in
> GCP-US
> > > while other is in GCP-India. Just to add here, configuration of every
> node
> > > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> > >
> > > We do all our write on US data center. While performing a bulk write on
> > > GCP US, we observe normal load of 1 on US while this load at GCP India
> > > spikes to 10.
> > >
> > > On observing tpstats further in grafana we found mutation stage at GCP
> > > India is going to 1million intermittently though our overall write is
> > > nearly 300 per sec per node. Don't know the reason but whenever we have
> > > this spike, we are having load issue.
> > > Please help what could be the possible reason for this?
> > >
> > > Regards,
> > > Ashish
> > >
> >
>


RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi daemeon,
We have already tuned the TCP settings to improve the bandwidth. Earlier we
had lot of hint and mutation msg drop which were gone after tuning TCP.
Moreover we are writing with CL local quorum at US side, so ack is taken
from local DC.
I m still concern what could be reason of increase mutation count.

On 2021/07/19 19:55:52 daemeon reiydelle wrote:
> You may want to think about the latency impacts of a cluster that has one
> node "far away". This is such a basic design flaw that you need to do some
> basic learning, and some basic understanding of networking and latency.
>
>
>
>
>
> On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
>
> > Hi all,
> >
> > Currently we have a cluster with 2 DC of 3 nodes each. One DC is in
GCP-US
> > while other is in GCP-India. Just to add here, configuration of every
node
> > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> >
> > We do all our write on US data center. While performing a bulk write on
> > GCP US, we observe normal load of 1 on US while this load at GCP India
> > spikes to 10.
> >
> > On observing tpstats further in grafana we found mutation stage at GCP
> > India is going to 1million intermittently though our overall write is
> > nearly 300 per sec per node. Don't know the reason but whenever we have
> > this spike, we are having load issue.
> > Please help what could be the possible reason for this?
> >
> > Regards,
> > Ashish
> >
>


RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Patrick,
Currently we are using 3.11.6 apache cassandra version. We are performing
write with CL local quorum in US side DC. We have 4-5 tables with
supplier_details, supplier_prod_details, supplier_rating. We also have an
mview attached with rating table.
For batching part, I need to check with ops team. However they are
re-syncing data supplier wise in these tables.


On 2021/07/19 20:56:49 Patrick McFadin wrote:
> Hi Ashish,
>
> Can you give us some information about some of the details? Specifically
> some indication on the version of Cassandra, data model, consistency
levels
> used, and how you are bulk loading. Is this a batch by any chance?
>
> Patrick
>
> On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:
>
> > Hi all,
> >
> > Currently we have a cluster with 2 DC of 3 nodes each. One DC is in
GCP-US
> > while other is in GCP-India. Just to add here, configuration of every
node
> > accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
> >
> > We do all our write on US data center. While performing a bulk write on
> > GCP US, we observe normal load of 1 on US while this load at GCP India
> > spikes to 10.
> >
> > On observing tpstats further in grafana we found mutation stage at GCP
> > India is going to 1million intermittently though our overall write is
> > nearly 300 per sec per node. Don't know the reason but whenever we have
> > this spike, we are having load issue.
> > Please help what could be the possible reason for this?
> >
> > Regards,
> > Ashish
> >
>


Re: High mutation stage in multi dc deployment

2021-07-19 Thread Patrick McFadin
Hi Ashish,

Can you give us some information about some of the details? Specifically
some indication on the version of Cassandra, data model, consistency levels
used, and how you are bulk loading. Is this a batch by any chance?

Patrick

On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:

> Hi all,
>
> Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US
> while other is in GCP-India. Just to add here, configuration of every node
> accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
>
> We do all our write on US data center. While performing a bulk write on
> GCP US, we observe normal load of 1 on US while this load at GCP India
> spikes to 10.
>
> On observing tpstats further in grafana we found mutation stage at GCP
> India is going to 1million intermittently though our overall write is
> nearly 300 per sec per node. Don't know the reason but whenever we have
> this spike, we are having load issue.
> Please help what could be the possible reason for this?
>
> Regards,
> Ashish
>


Re: High mutation stage in multi dc deployment

2021-07-19 Thread daemeon reiydelle
You may want to think about the latency impacts of a cluster that has one
node "far away". This is such a basic design flaw that you need to do some
basic learning, and some basic understanding of networking and latency.





On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:

> Hi all,
>
> Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US
> while other is in GCP-India. Just to add here, configuration of every node
> accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
>
> We do all our write on US data center. While performing a bulk write on
> GCP US, we observe normal load of 1 on US while this load at GCP India
> spikes to 10.
>
> On observing tpstats further in grafana we found mutation stage at GCP
> India is going to 1million intermittently though our overall write is
> nearly 300 per sec per node. Don't know the reason but whenever we have
> this spike, we are having load issue.
> Please help what could be the possible reason for this?
>
> Regards,
> Ashish
>


High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi all,

Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US
while other is in GCP-India. Just to add here, configuration of every node
accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb

We do all our write on US data center. While performing a bulk write on GCP
US, we observe normal load of 1 on US while this load at GCP India spikes
to 10.

On observing tpstats further in grafana we found mutation stage at GCP
India is going to 1million intermittently though our overall write is
nearly 300 per sec per node. Don't know the reason but whenever we have
this spike, we are having load issue.
Please help what could be the possible reason for this?

Regards,
Ashish