Cassandra storage: Some thoughts

Vangelis Koukis Fri, 09 Mar 2018 07:57:06 -0800

Hello all,

My name is Vangelis Koukis and I am a Founder and the CTO of Arrikto.

I'm writing to share our thoughts on how people run distributed,
stateful applications such as Cassandra on modern infrastructure,
and would love to get the community's feedback and comments.

The fundamental question is: Where does a Cassandra node find its data?
Does it run over local storage, e.g., a super-fast NVMe device, or does
it run over some sort of external, managed storage, e.g., EBS on AWS?

Going in one of the two directions is a tradeoff between flexibility on
one hand, and performance/cost on the other.

* External storage, e.g., EBS:

Easy backups as thin/instant EBS snapshots, and easy node recovery
in the case of instance failure by re-attaching the EBS data volume
to a newly-created instance. But then, I/O bandwidth, I/O latency,
and cost suffer.

* Local NVMe:

Blazing fast, with very low latency, excellent bandwidth, a
fraction of the cost, but then it is not obvious how one backs up
their data, or recovers from node failure.

At Arrikto we are building decentralized storage to tackle this problem
for cloud-native apps. Our software, Rok, allows you to run stateful
apps directly over fast, local NVMe storage on-prem or on the cloud, and
still be able to snapshot the containers and distribute them
efficiently: across machines of the same cluster, or across distinct
locations and administrative domains over a decentralized network.

Rok runs on the side of Cassandra, which accesses local storage
directly. It only has to intervene during snapshot-based node recovery,
which is transparent to the application. It does not invoke an
application-wide data recovery and rebalancing operation, which would
put load on the whole cluster and impact application responsiveness.
Instead, it performs block-level recovery of this specific node from the
Rok snapshot store, e.g., S3, with predictable performance.

This solves four important issues we have seen people running Cassandra
at scale face today:

* Node recovery / node migration:

If you lose an entire Cassandra node, then your database will
continue operating normally, as Rok in combination with your
Container Orchestrator (e.g., Kubernetes) will present another
Cassandra node. This node will have the data of the latest
snapshot that resides on the Rok snapshot store. In this case,
Cassandra only has to recover the changed parts, which is just a
small fraction of the node data, and does not cause CPU load on
the whole cluster. Similarly, you can migrate a Cassandra node
from one physical host to another, without depending on external,
EBS-like storage.

* Backup and recovery:

You can use Rok to take a full backup of your whole application,
along with the DB, as a group-consistent snapshot of its VMs or
containers, and store it externally. This does not depend on app-
or Cassandra-specific functionality.

* Data mobility:

You can synchronize these snapshots to different locations, e.g.,
across regions or cloud providers, and across administrative
domains, i.e., share them with others without giving them direct
access to your Cassandra DB. You can then spawn your entire
application stack in the new location.

* Testing / analytics:

Being able to spawn a copy of your Cassandra DB as a thin clone
means you can have test & dev workflows running in parallel, on
independent, mutable clones, with real data underneath. Similarly,
your analytics team can run their lengthy reporting and analytics
workloads on an independent clone of your transactional DB, on
completely distinct hardware, or even on a different location.

So far, initial validation of our solution with early adopters shows
significant performance gains at a fraction of the cost of external
storage, while enabling a multi-region setup.

Here are some numbers and a whitepaper to support this:
https://journal.arrikto.com/why-your-cassandra-needs-local-nvme-and-rok-1787b9fc286d
http://arrikto.com/wp-content/uploads/2018/03/20180206-rok_decentralized_storage_for_the_cloud_native_world.pdf

If the above sounds interesting, we are eager to hear from you, learn
about your potential use cases, and include you in our beta test
program.

Thank you,
Vangelis.

--
Vangelis Koukis
CTO, Arrikto Inc.
3505 El Camino Real, Palo Alto, CA 94306
www.arrikto.com

signature.asc
Description: Digital signature

Cassandra storage: Some thoughts

Reply via email to