Re: Cassandra Rack - Datacenter Load Balancing relations

Jon Haddad Wed, 23 Oct 2019 13:12:27 -0700

Personally, I wouldn't ever do this.  I recommend separate DCs if you want
to keep workloads separate.


On Wed, Oct 23, 2019 at 4:06 PM Sergio <[email protected]> wrote:

>           I forgot to comment for
>
>    OPTION C)
>    1. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b
>    2. 3 read ONE us-east-1c
>    3. 4 write TWO us-east-1a 5 write TWO us-east-1b
>    4. 6 write TWO us-east-1c I would expect that I need to decrease the
>    Consistency Level in the reads if one of the AZ goes down. Please consider
>    the below one as the real OPTION A. The previous one looks to be wrong
>    because the same rack is assigned to 2 different DC.
>    5. OPTION A)
>    6. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a
>    7. 3 read ONE us-east-1a
>    8. 4 write TWO us-east-1b 5 write TWO us-east-1b
>    9. 6 write TWO us-east-1b
>
>
>
> Thanks,
>
> Sergio
>
> Il giorno mer 23 ott 2019 alle ore 12:33 Sergio <[email protected]>
> ha scritto:
>
>> Hi Reid,
>>
>> Thank you very much for clearing these concepts for me.
>> https://community.datastax.com/comments/1133/view.html I posted this
>> question on the datastax forum regarding our cluster that it is unbalanced
>> and the reply was related that the *number of racks should be a
>> multiplier of the replication factor *in order to be balanced or 1. I
>> thought then if I have 3 availability zones I should have 3 racks for each
>> datacenter and not 2 (us-east-1b, us-east-1a) as I have right now or in the
>> easiest way, I should have a rack for each datacenter.
>>
>>
>>
>>    1. Datacenter: live
>>    ================
>>    Status=Up/Down
>>    |/ State=Normal/Leaving/Joining/Moving
>>    --  Address      Load       Tokens       Owns    Host ID
>>                      Rack
>>    UN  10.1.20.49   289.75 GiB  256          ?
>>    be5a0193-56e7-4d42-8cc8-5d2141ab4872  us-east-1a
>>    UN  10.1.30.112  103.03 GiB  256          ?
>>    e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
>>    UN  10.1.19.163  129.61 GiB  256          ?
>>    3c2efdda-8dd4-4f08-b991-9aff062a5388  us-east-1a
>>    UN  10.1.26.181  145.28 GiB  256          ?
>>    0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
>>    UN  10.1.17.213  149.04 GiB  256          ?
>>    71563e86-b2ae-4d2c-91c5-49aa08386f67  us-east-1a
>>    DN  10.1.19.198  52.41 GiB  256          ?
>>    613b43c0-0688-4b86-994c-dc772b6fb8d2  us-east-1b
>>    UN  10.1.31.60   195.17 GiB  256          ?
>>    3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
>>    UN  10.1.25.206  100.67 GiB  256          ?
>>    f43532ad-7d2e-4480-a9ce-2529b47f823d  us-east-1b
>>    So each rack label right now matches the availability zone and we
>>    have 3 Datacenters and 2 Availability Zone with 2 racks per DC but the
>>    above is clearly unbalanced
>>    If I have a keyspace with a replication factor = 3 and I want to
>>    minimize the number of nodes to scale up and down the cluster and keep it
>>    balanced should I consider an approach like OPTION A)
>>    2. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a
>>    3. 3 read ONE us-east-1a
>>    4. 4 write ONE us-east-1b 5 write ONE us-east-1b
>>    5. 6 write ONE us-east-1b
>>    6. OPTION B)
>>    7. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a
>>    8. 3 read ONE us-east-1a
>>    9. 4 write TWO us-east-1b 5 write TWO us-east-1b
>>    10. 6 write TWO us-east-1b
>>    11. *7 read ONE us-east-1c 8 write TWO us-east-1c*
>>    12. *9 read ONE us-east-1c* Option B looks to be unbalanced and I
>>    would exclude it OPTION C)
>>    13. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b
>>    14. 3 read ONE us-east-1c
>>    15. 4 write TWO us-east-1a 5 write TWO us-east-1b
>>    16. 6 write TWO us-east-1c
>>    17.
>>
>>
>>    so I am thinking of A if I have the restriction of 2 AZ but I guess
>>    that option C would be the best. If I have to add another DC for reads
>>    because we want to assign a new DC for each new microservice it would look
>>    like:
>>       OPTION EXTRA DC For Reads
>>       1. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b
>>       2. 3 read ONE us-east-1c
>>       3. 4 write TWO us-east-1a 5 write TWO us-east-1b
>>       4. 6 write TWO us-east-1c 7 extra-read THREE us-east-1a
>>       5. 8 extra-read THREE us-east-1b
>>       6.
>>          7.
>>
>>
>>    1. 9 extra-read THREE us-east-1c
>>       2.
>>    The DC for *write* will replicate the data in the other datacenters.
>>    My scope is to keep the *read* machines dedicated to serve reads and
>>    *write* machines to serve writes. Cassandra will handle the
>>    replication for me. Is there any other option that is I missing or wrong
>>    assumption? I am thinking that I will write a blog post about all my
>>    learnings so far, thank you very much for the replies Best, Sergio
>>
>>
>> Il giorno mer 23 ott 2019 alle ore 10:57 Reid Pinchback <
>> [email protected]> ha scritto:
>>
>>> No, that’s not correct.  The point of racks is to help you distribute
>>> the replicas, not further-replicate the replicas.  Data centers are what do
>>> the latter.  So for example, if you wanted to be able to ensure that you
>>> always had quorum if an AZ went down, then you could have two DCs where one
>>> was in each AZ, and use one rack in each DC.  In your situation I think I’d
>>> be more tempted to consider that.  Then if an AZ went away, you could fail
>>> over your traffic to the remaining DC and still be perfectly fine.
>>>
>>>
>>>
>>> For background on replicas vs racks, I believe the information you want
>>> is under the heading ‘NetworkTopologyStrategy’ at:
>>>
>>> http://cassandra.apache.org/doc/latest/architecture/dynamo.html
>>>
>>>
>>>
>>> That should help you better understand how replicas distribute.
>>>
>>>
>>>
>>> As mentioned before, while you can choose to do the reads in one DC,
>>> except for concerns about contention related to network traffic and
>>> connection handling, you can’t isolate reads from writes.  You can _
>>> *mostly*_ insulate the write DC from the activity within the read DC,
>>> and even that isn’t an absolute because of repairs.  However, your mileage
>>> may vary, so do what makes sense for your usage pattern.
>>>
>>>
>>>
>>> R
>>>
>>>
>>>
>>> *From: *Sergio <[email protected]>
>>> *Reply-To: *"[email protected]" <[email protected]>
>>> *Date: *Wednesday, October 23, 2019 at 12:50 PM
>>> *To: *"[email protected]" <[email protected]>
>>> *Subject: *Re: Cassandra Rack - Datacenter Load Balancing relations
>>>
>>>
>>>
>>> *Message from External Sender*
>>>
>>> Hi Reid,
>>>
>>> Thanks for your reply. I really appreciate your explanation.
>>>
>>> We are in AWS and we are using right now 2 Availability Zone and not 3.
>>> We found our cluster really unbalanced because the keyspace has a
>>> replication factor = 3 and the number of racks is 2 with 2 datacenters.
>>> We want the writes spread across all the nodes but we wanted the reads
>>> isolated from the writes to keep the load on that node low and to be able
>>> to identify problems in the consumers (reads) or producers (writes)
>>> applications.
>>> It looks like that each rack contains an entire copy of the data so this
>>> would lead to replicate for each rack and then for each node the
>>> information. If I am correct if we have  a keyspace with 100GB and
>>> Replication Factor = 3 and RACKS = 3 => 100 * 3 * 3 = 900GB
>>> If I had only one rack across 2 or even 3 availability zone I would save
>>> in space and I would have 300GB only. Please correct me if I am wrong.
>>>
>>> Best,
>>>
>>> Sergio
>>>
>>>
>>>
>>> Il giorno mer 23 ott 2019 alle ore 09:21 Reid Pinchback <
>>> [email protected]> ha scritto:
>>>
>>> Datacenters and racks are different concepts.  While they don't have to
>>> be associated with their historical meanings, the historical meanings
>>> probably provide a helpful model for understanding what you want from them.
>>>
>>> When companies own their own physical servers and have them housed
>>> somewhere, the questions arise on where you want to locate any particular
>>> server.  It's a balancing act on things like network speed of related
>>> servers being able to talk to each other, versus fault-tolerance of having
>>> many servers not all exposed to the same risks.
>>>
>>> "Same rack" in that physical world tended to mean something like "all
>>> behind the same network switch and all sharing the same power bus".  The
>>> morning after an electrical glitch fries a power bus and thus everything in
>>> that rack, you realize you wished you didn't have so many of the same type
>>> of server together.  Well, they were servers.  Now they are door stops.
>>> Badness and sadness.
>>>
>>> That's kind of the mindset to have in mind with racks in Cassandra.
>>> It's an artifact for you to separate servers into pools so that the
>>> disparate pools have hopefully somewhat independent infrastructure risks.
>>> However, all those servers are still doing the same kind of work, are the
>>> same version, etc.
>>>
>>> Datacenters are amalgams of those racks, and how similar or different
>>> they are from each other depends on what you want to do with them.  What is
>>> true is that if you have N datacenters, each one of them must have enough
>>> disk storage to house all the data.  The actual physical footprint of that
>>> data in each DC depends on the replication factors in play.
>>>
>>> Note that you sorta can't have "one datacenter for writes" because the
>>> writes will replicate across the data centers.  You could definitely choose
>>> to have only one that takes read queries, but best to think of writing as
>>> being universal.  One scenario you can have is where the DC not taking live
>>> traffic read queries is the one you use for maintenance or performance
>>> testing or version upgrades.
>>>
>>> One rack makes your life easier if you don't have a reason for multiple
>>> racks. It depends on the environment you deploy into and your fault
>>> tolerance goals.  If you were in AWS and wanting to spread risk across
>>> availability zones, then you would likely have as many racks as AZs you
>>> choose to be in, because that's really the point of using multiple AZs.
>>>
>>> R
>>>
>>>
>>> On 10/23/19, 4:06 AM, "Sergio Bilello" <[email protected]>
>>> wrote:
>>>
>>>      Message from External Sender
>>>
>>>     Hello guys!
>>>
>>>     I was reading about
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cassandra.apache.org_doc_latest_architecture_dynamo.html-23networktopologystrategy&d=DwIBaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xmgs1uQTlmvCtIoGJKHbByZZ6aDFzS5hDQzChDPCfFA&s=9ZDWAK6pstkCQfdbwLNsB-ZGsK64RwXSXfAkOWtmkq4&e=
>>>
>>>     I would like to understand a concept related to the node load
>>> balancing.
>>>
>>>     I know that Jon recommends Vnodes = 4 but right now I found a
>>> cluster with vnodes = 256 replication factor = 3 and 2 racks. This is
>>> unbalanced because the racks are not a multiplier of the replication factor.
>>>
>>>     However, my plan is to move all the nodes in a single rack to
>>> eventually scale up and down the node in the cluster once at the time.
>>>
>>>     If I had 3 racks and I would like to keep the things balanced I
>>> should scale up 3 nodes at the time one for each rack.
>>>
>>>     If I would have 3 racks, should I have also 3 different datacenters
>>> so one datacenter for each rack?
>>>
>>>     Can I have 2 datacenters and 3 racks? If this is possible one
>>> datacenter would have more nodes than the others? Could it be a problem?
>>>
>>>     I am thinking to split my cluster in one datacenter for reads and
>>> one for writes and keep all the nodes in the same rack so I can scale up
>>> once node at the time.
>>>
>>>
>>>
>>>     Please correct me if I am wrong
>>>
>>>
>>>
>>>     Thanks,
>>>
>>>
>>>
>>>     Sergio
>>>
>>>
>>>
>>>     ---------------------------------------------------------------------
>>>
>>>     To unsubscribe, e-mail: [email protected]
>>>
>>>     For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>>>
>>>

Re: Cassandra Rack - Datacenter Load Balancing relations

Reply via email to