I'll give you the general guidance around any type of storage you
pick. This even applies to local disks but it will directly apply to
your question.

The key to success with storage and Cassandra is sequential,
concurrent IO. Most of the large IO operations are either writing and
reading a large file from disk. Sometimes, and in the harder case to
manage, at the same time. Storage systems that bias more to reads or
writes will create an imbalance that can lead to issues. And worth
emphasizing. These are sequential reads and writes. IOPs are mostly
irrelevant. The second aspect to manage is latency. Latency from disk
directly correlates to query performance.

With respect to remote storage, they tend to have more issues with
these requirements. NFS, for example, has far too much latency and
concurrency. Just don't use it. The best thing you can do when looking
at choices is run some simple tests. Another Jon Haddad resource but
great: https://www.youtube.com/watch?v=dPpEORxoMRU You don't even need
to run Cassandra in the test. Just do some IO testing and verify that
it can read and write in a balanced manner, observe the latency and
watch for any IOWait that creeps up.

If you have a specific technology combination, just ask here.
Collectively we have probably seen it all.

Patrick

On Wed, Feb 19, 2025 at 10:27 PM Long Pan <panlong...@gmail.com> wrote:
>
> Hi Cassandra Community,
>
> I’m exploring the feasibility of running Cassandra with remote storage, 
> primarily block storage (e.g., AWS EBS, OCI Block Volume, Google Persistent 
> Disk) and possibly even file storage (e.g., NFS, EFS, FSx). While local SSDs 
> are the typical recommendation for optimal performance, I’d like to 
> understand if anyone has experience or insights on using remote disks in 
> production.
>
> Specifically, I’m looking for guidance on:
>
> Feasibility – Has anyone successfully run Cassandra with remote storage? If 
> so, what use cases worked well?
> Major Downsides & Caveats – Are there any known performance bottlenecks, 
> consistency issues?
> Configuration Tuning – Are there any special settings (e.g., compaction, 
> memtable flush thresholds, disk I/O tuning) that can help mitigate potential 
> drawbacks?
> Monitoring & Alerting – What are the key metrics and failure scenarios to 
> watch out for when using remote storage?
>
> I’d appreciate any insights, war stories, or best practices from those who 
> have experimented with or deployed Cassandra on remote storage.
>
> Thanks,
> Long Pan

Reply via email to