Philip, I don't see the benefit to have a multi-DC C* cluster in this case. What you need is two separate C* clusters and use Kafka record/replay writes to DR. DR only receives writes from Kafka consumer. You won't need to deal with "Removing everything from Cassandra that -isn't- in Kafka".
On Mon, Dec 14, 2015 at 7:13 PM, Philip Persad <philip.per...@gmail.com> wrote: > I did consider doubling down and replicating both Kafka and Cassandra to > the secondary DC. It seemed a bit complicated (term used relatively), and > I didn't want to think about the unlikely scenario of Cassandra writes > getting across before the Kafka ones. Inserting everything in Kafka into > Cassandra after a failure is easy. Removing everything from Cassandra that > -isn't- in Kafka is not a problem I want to take a swing at if I don't have > to. > > > On Mon, Dec 14, 2015 at 4:02 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > >> Emit a message to a new kafka topic once the first write is persisted >> into cassandra with LOCAL_QUORUM (gives you low latency), then consume off >> of that topic to get higher-latency-but-causally-correct writes to >> subsequent (disconnected) DR DC. >> >> >> >> From: Philip Persad >> Reply-To: "user@cassandra.apache.org" >> Date: Monday, December 14, 2015 at 3:37 PM >> >> To: Cassandra Users >> Subject: Re: Replicating Data Between Separate Data Centres >> >> Hi Jeff, >> >> You're dead on with that article. That is a very good explanation of the >> problem I'm facing. You're also right that, fascinating though that >> research is, letting it anywhere near my production data is not something >> I'd think about. >> >> Basically, I want EACH_QUORUM, but I'm not willing to pay for it. My >> system needs to be reasonably close to a real-time system (let's say a soft >> real-time system). Waiting for each write to make its way across a >> continent is not something I can live with (to say nothing about what >> happens if the WAN temporarily fails). >> >> Basically I guess what I'm hearing is that the best way to create a clone >> of a Cassandra cluster in another DC is to snapshot and restore. >> >> Thanks! >> >> -Phil >> >> On Mon, Dec 14, 2015 at 3:18 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >>> >>> There is research into causal consistency and cassandra ( >>> http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html >>> , >>> for example), though you’ll note that it uses a fork ( >>> https://github.com/wlloyd/eiger ) which is unlikely something you’d >>> ever want to consider in production. Let’s pretend like it doesn’t exist, >>> and won’t in the near future. >>> >>> The typical approach here is to have multiple active datacenters and >>> EACH_QUORUM writes, which gives you the ability to have a full DC failure >>> without impact. This also solves your fail-back problem, because when the >>> primary DC is restored, you simply run a repair. What part of EACH_QUORUM >>> is insufficient for your needs? The failure scenarios when the WAN link >>> breaks and it impacts local writes? >>> >>> Short of that, your ‘occasional snapshots and restore in case of >>> emergency’ is going to be your next-best-thing. >>> >>> >>> From: Philip Persad >>> Reply-To: "user@cassandra.apache.org" >>> Date: Monday, December 14, 2015 at 3:11 PM >>> To: Cassandra Users >>> Subject: Re: Replicating Data Between Separate Data Centres >>> >>> Hi Jim, >>> >>> Thanks for taking the time to answer. By Causal Consistency, what I >>> mean is that I need strict ordering of all related events which might have >>> a causal relationship. For example (albeit slightly contrived), if we are >>> looking at recording an event stream, it is very important that the event >>> creating a user be visible before the event which assigns a permissions to >>> a user. However, I don't care at all about the ordering of the creation of >>> two different users. This is what I mean by Causal Consistency. >>> >>> This reason why LOCAL_QUORUM replication does not work for me, is >>> because, while I can get ordering guarantees about the order in which >>> writes will become visible in the Primary DC, I cannot get those guarantees >>> about the Secondary DC. As a result (to user another slightly contrived >>> example), if a user is created and then takes an action shortly before the >>> failure of the Primary DC, there are four possible situations with respect >>> to what will be visible in the Secondary DC: >>> >>> 1) Both events are visible in the Secondary DC >>> 2) Neither event will be visible in the Secondary DC >>> 3) The creation event is visible in the Secondary DC, but the action >>> event is not >>> 4) The action event is visible Secondary DC, but the creation event is >>> not >>> >>> States 1, 2, and 3 are all acceptable. State 4 is not. However, if I >>> understand Cassandra asynchronous DC replication correctly, I do not >>> believe I get any guarantees that situation 4 will not happen. Eventual >>> Consistency promises to "eventually" settle into State 1. However >>> "eventually" does me very little good if Godzilla steps on my Primary DC. >>> I'm willing to accept loss of data which was created near to a disaster >>> (States 2 and 3), but I cannot accept the inconsistent history of events in >>> State 4. >>> >>> I have a mechanism outside of normal Cassandra replication which can >>> give me the consistency I need. My problem is effectively with setting up >>> a new recovery DC after the failure of the primary. How do I go about >>> getting all of my data into a new, cluster? >>> >>> Thanks, >>> >>> -Phil >>> >>> On Mon, Dec 14, 2015 at 1:06 PM, Jim Ancona <j...@anconafamily.com> >>> wrote: >>> >>>> Could you define what you mean by Casual Consistency and explain why >>>> you think you won't have that when using LOCAL_QUORUM? I ask because >>>> LOCAL_QUORUM >>>> and multiple data centers are the way many of us handle DR, so I'd like to >>>> understand why it doesn't work for you. >>>> >>>> I'm afraid I don't understand your scenario. Are you planning on >>>> building out a new recovery DC *after* the primary has failed, or keeping >>>> two DCs in sync so that you can switch over after a failure? >>>> >>>> Jim >>>> >>>> On Mon, Dec 14, 2015 at 2:59 PM, Philip Persad <philip.per...@gmail.com >>>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm currently looking at Cassandra in the context of Disaster >>>>> Recovery. I have 2 Data Centres, one is the Primary and the other acts as >>>>> a Standby. There is a Cassandra cluster in each Data Centre. For the >>>>> time >>>>> being I'm running Cassandra 2.0.9. Unfortunately, due to the nature of my >>>>> data, the consistency levels that I would get out of LOCAL_QUORUM writes >>>>> followed by asynchronous replication to the secondary data centre are >>>>> insufficient. In the event of a failure, it is acceptable to lose some >>>>> data, but I need Casual Consistency to be maintained. Since I don't have >>>>> the luxury of performing nodetool repairs after Godzilla steps on my >>>>> primary data centre, I use more strictly ordered means of transporting >>>>> events between the Data Centres (Kafka for anyone who cares about that >>>>> detail). >>>>> >>>>> What I'm not sure about, is how to go about copying all the data in >>>>> one Cassandra cluster to a new cluster, either to bring up a new Standby >>>>> Data Centre or as part of failing back to the Primary after I pick up the >>>>> pieces. I'm thinking that I should either: >>>>> >>>>> 1. Do a snapshot ( >>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html), >>>>> and then restore that snapshot on my new cluster ( >>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html >>>>> ) >>>>> >>>>> 2. Join the new data centre to the existing cluster ( >>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html). >>>>> Then separate the two data centres into two individual clusters by doing . >>>>> . . something??? >>>>> >>>>> Does anyone have any advice about how to tackle this problem? >>>>> >>>>> Many thanks, >>>>> >>>>> -Phil >>>>> >>>> >>>> >>> >> >