> Hey,
> 
> I have a few of VM host (bare metal) machines with varying amounts of free 
> hard drive space on them. For simplicity let’s say I have three machine like 
> so:
>  * Machine 1
>   - Harddrive 1: 150 GB available.
>  * Machine 2:
>   - Harddrive 1: 150 GB available.
>   - Harddrive 2: 150 GB available.
>  * Machine 3.
>   - Harddrive 1: 150 GB available.
> 
> I am setting up a Cassandra cluster between them and as I see it I have two 
> options:
> 
> 1. I set up one Cassandra node/VM per bare metal machine. I assign all free 
> hard drive space to each Cassandra node and I balance the cluster using 
> vnodes proportionally to the amount of free hard drive space (CPU/RAM is not 
> going to be a bottle neck here).
> 
> 2. I set up four VMs, each running a Cassandra node with equal amount of hard 
> drive space and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes down you 
may lose two replicas. As the two VMs on Machine 2 might be replicas for the 
same key.

> 
> General question: Is any of these preferable to the other? I understand 1) 
> yields lower high-availability (since nodes are on the same hardware).

Other way around (2 would be potentially lower availability)… Cassandra thinks 
two of the vm's are separate when they in fact rely on the same underlying 
machine.

> 
> Question about alternative 1: With varying vnodes, can I always be sure that 
> replicas are never put on the same virtual machine?

Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123

> Or is varying vnodes really only useful/recommended when migrating from 
> machines with varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is 
responsible for. You can use it to account for different types of hardware, you 
can also use it for creating awesome situations like hotspots if you aren't 
careful… ymmv.

At the end of the day I would throw out the extra hard drive / not use it / put 
more hard drives in the other machines. Why? Hard drives are cheap and your 
time as an admin for the cluster isn't. If you do add more hard drives you can 
also split out the commit log etc onto different disks.

I would take less problems over trying to draw every last scrap of performance 
out of the available hardware any day of the year. 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

Reply via email to