I'll give you the general guidance around any type of storage you pick. This even applies to local disks but it will directly apply to your question.
The key to success with storage and Cassandra is sequential, concurrent IO. Most of the large IO operations are either writing and reading a large file from disk. Sometimes, and in the harder case to manage, at the same time. Storage systems that bias more to reads or writes will create an imbalance that can lead to issues. And worth emphasizing. These are sequential reads and writes. IOPs are mostly irrelevant. The second aspect to manage is latency. Latency from disk directly correlates to query performance. With respect to remote storage, they tend to have more issues with these requirements. NFS, for example, has far too much latency and concurrency. Just don't use it. The best thing you can do when looking at choices is run some simple tests. Another Jon Haddad resource but great: https://www.youtube.com/watch?v=dPpEORxoMRU You don't even need to run Cassandra in the test. Just do some IO testing and verify that it can read and write in a balanced manner, observe the latency and watch for any IOWait that creeps up. If you have a specific technology combination, just ask here. Collectively we have probably seen it all. Patrick On Wed, Feb 19, 2025 at 10:27 PM Long Pan <panlong...@gmail.com> wrote: > > Hi Cassandra Community, > > I’m exploring the feasibility of running Cassandra with remote storage, > primarily block storage (e.g., AWS EBS, OCI Block Volume, Google Persistent > Disk) and possibly even file storage (e.g., NFS, EFS, FSx). While local SSDs > are the typical recommendation for optimal performance, I’d like to > understand if anyone has experience or insights on using remote disks in > production. > > Specifically, I’m looking for guidance on: > > Feasibility – Has anyone successfully run Cassandra with remote storage? If > so, what use cases worked well? > Major Downsides & Caveats – Are there any known performance bottlenecks, > consistency issues? > Configuration Tuning – Are there any special settings (e.g., compaction, > memtable flush thresholds, disk I/O tuning) that can help mitigate potential > drawbacks? > Monitoring & Alerting – What are the key metrics and failure scenarios to > watch out for when using remote storage? > > I’d appreciate any insights, war stories, or best practices from those who > have experimented with or deployed Cassandra on remote storage. > > Thanks, > Long Pan