Re: RE: Re: Re: High mutation stage in multi dc deployment

2021-07-20 Thread Jeff Jirsa
This is sufficiently atypical that many people aren't going to have enough intuition to figure it out without seeing your metrics / logs / debugging data (e.g. heap dumps). My only guess, and it's a pretty big guess, is that your write timeout is low enough (or network quality bad enough, though

RE: RE: Re: Re: High mutation stage in multi dc deployment

2021-07-20 Thread MyWorld
Kindly help in this regard. What could be the possible reason for load and mutation spike in india data center On 2021/07/20 00:14:56 MyWorld wrote: > Hi Arvinder, > It's a separate cluster. Here max partition size is 32mb. > > On 2021/07/19 23:57:27 Arvinder Dhillon wrote: > > Is this the same cl

RE: Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Arvinder, It's a separate cluster. Here max partition size is 32mb. On 2021/07/19 23:57:27 Arvinder Dhillon wrote: > Is this the same cluster with 1G partition size? > > -Arvinder > > On Mon, Jul 19, 2021, 4:51 PM MyWorld wrote: > > > Hi daemeon, > > We have already tuned the TCP settings to i

Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread Arvinder Dhillon
Is this the same cluster with 1G partition size? -Arvinder On Mon, Jul 19, 2021, 4:51 PM MyWorld wrote: > Hi daemeon, > We have already tuned the TCP settings to improve the bandwidth. Earlier > we had lot of hint and mutation msg drop which were gone after tuning TCP. > Moreover we are writing

RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi daemeon, We have already tuned the TCP settings to improve the bandwidth. Earlier we had lot of hint and mutation msg drop which were gone after tuning TCP. Moreover we are writing with CL local quorum at US side, so ack is taken from local DC. I m still concern what could be reason of increase

RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Patrick, Currently we are using 3.11.6 apache cassandra version. We are performing write with CL local quorum in US side DC. We have 4-5 tables with supplier_details, supplier_prod_details, supplier_rating. We also have an mview attached with rating table. For batching part, I need to check with

Re: High mutation stage in multi dc deployment

2021-07-19 Thread Patrick McFadin
Hi Ashish, Can you give us some information about some of the details? Specifically some indication on the version of Cassandra, data model, consistency levels used, and how you are bulk loading. Is this a batch by any chance? Patrick On Mon, Jul 19, 2021 at 10:38 AM MyWorld wrote: > Hi all, >

Re: High mutation stage in multi dc deployment

2021-07-19 Thread daemeon reiydelle
You may want to think about the latency impacts of a cluster that has one node "far away". This is such a basic design flaw that you need to do some basic learning, and some basic understanding of networking and latency. On Mon, Jul 19, 2021 at 10:38 AM MyWorld wrote: > Hi all, > > Currently

High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi all, Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US while other is in GCP-India. Just to add here, configuration of every node accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb We do all our write on US data center. While performing a bulk write on GCP US, we obser

Re: Multi-DC Deployment

2011-04-21 Thread Peter Schuller
> Again, for a lot of services, it is fully acceptable, and a lot better, to > return an almost complete (or maybe even complete, but no verified by > quorum) result than no result at all. +1, except maybe "a lot" depending on how one chooses to define that. There are definitely cases where suffic

Re: Multi-DC Deployment

2011-04-21 Thread Peter Schuller
> Cassandra doesn't "replicate sstable corruptions". It detects corrupt > data and only replicates good data. This is incorrect. Depending on the nature of the corruption it may spread to other nodes. Checksumming (done right) would be a great addition to alleiate this. Yes, there is code that tri

Re: Multi-DC Deployment

2011-04-20 Thread Terje Marthinussen
Sure, the update queue could just as well replicate problems, but the queue would be a lot simpler than cassandra and it would not modify already acknowledged data like like for instance compaction or read-repair/hint deliveries may. There is a fair bit of re-writing/re-assemblying of data even tho

Re: Multi-DC Deployment

2011-04-20 Thread Adrian Cockcroft
Queues replicate bad data just as well as anything else. The biggest source of bad data is broken app code... You will still need to implement a reconciliation/repair checker, as queues have their own failure modes when they get backed up. We have also looked at using queues to bounce data between

Re: Multi-DC Deployment

2011-04-20 Thread Terje Marthinussen
Assuming that you generally put an API on top of this, delivering to two or more systems then boils down to a message queue issue or some similar mechanism which handles secure delivery of messages. Maybe not trivial, but there are many products that can help you with this, and it is a lot easier t

Re: Multi-DC Deployment

2011-04-20 Thread Adrian Cockcroft
Hi Terje, If you feed data to two rings, you will get inconsistency drift as an update to one succeeds and to the other fails from time to time. You would have to build your own read repair. This all starts to look like "I don't trust Cassandra code to work, so I will write my own buggy one off ve

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
If you have RF=3 in both datacenters, it could be discussed if there is a point to use the built in replication in Cassandra at all vs. feeding the data to both datacenters and get 2 100% isolated cassandra instances that cannot replicate sstable corruptions between each others My point is rea

Re: Multi-DC Deployment

2011-04-19 Thread Adrian Cockcroft
If you want to use local quorum for a distributed setup, it doesn't make sense to have less than RF=3 local and remote. Three copies at both ends will give you high availability. Only one copy of the data is sent over the wide area link (with recent versions). There is no need to use mirrored or R

Re: Multi-DC Deployment

2011-04-18 Thread Terje Marthinussen
Hum... Seems like it could be an idea in a case like this with a mode where result is always returned (if possible), but where a flay saying if the consistency level was met, or to what level it was met (number of nodes answering for instance).? Terje On Tue, Apr 19, 2011 at 1:13 AM, Jonathan El

Re: Multi-DC Deployment

2011-04-18 Thread Jonathan Ellis
They will timeout until failure detector realizes the DC1 nodes are down (~10 seconds). After that they will immediately return UnavailableException until DC1 comes back up. On Mon, Apr 18, 2011 at 10:43 AM, Baskar Duraikannu wrote: > We are planning to deploy Cassandra on two data centers.   Let

Multi-DC Deployment

2011-04-18 Thread Baskar Duraikannu
We are planning to deploy Cassandra on two data centers. Let us say that we went with three replicas with 2 being in one data center and last replica in 2nd Data center. What will happen to Quorum Reads and Writes when DC1 goes down (2 of 3 replicas are unreachable)? Will they timeout? R