Hello Manish, I think any disk works. As long as it is big enough. It's also better if it's a reliable system (some kind of redundant raid, NAS, storage like GCS or S3...). We are not looking for speed mostly during a backup, but resiliency and not harming the source cluster mostly I would say. Then how fast you write to the backup storage system will probably be more often limited by what you can read from the source cluster. The backups have to be taken from running nodes, thus it's easy to overload the disk (reads), network (export backup data to final destination), and even CPU (as/if the machine handles the transfer).
What are the best practices while designing backup storage system for big > Cassandra cluster? What is nice to have (not to say mandatory) is a system of incremental backups. You should not take the data from the nodes every time, or you'll either harm the cluster regularly OR spend days to transfer the data (if the amount of data grows big enough). I'm not speaking about Cassandra incremental snapshots, but of using something like AWS Snapshot, or copying this behaviour programmatically to take (copy, link?) old SSTables from previous backups when they exist, will greatly unload the clusters work and the resource needed as soon enough a substantial amount of the data should be coming from the backup data source itself. The problem with incremental snapshot is that when restoring, you have to restore multiple pieces, making it harder and involving a lot of compaction work. The "caching" technic mentioned above gives the best of the 2 worlds: - You will always backup from the nodes only the sstables you don’t have already in your backup storage system, - You will always restore easily as each backup is a full backup. It's not really a "hands-on" writing, but this should let you know about existing ways to do backups and the tradeoffs, I wrote this a year ago: http://thelastpickle.com/blog/2018/04/03/cassandra-backup-and-restore-aws-ebs.html . It's a complex topic, I hope some of this is helpful to you. C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le jeu. 28 mars 2019 à 11:24, manish khandelwal < manishkhandelwa...@gmail.com> a écrit : > Hi > > > > I would like to know is there any guideline for selecting storage device > (disk type) for Cassandra backups. > > > > As per my current observation, NearLine (NL) disk on SAN slows down > significantly while copying backup files (taking full backup) from all node > simultaneously. Will using SSD disk on SAN help us in this regard? > > Apart from using SSD disk, what are the alternative approach to make my > backup process fast? > > What are the best practices while designing backup storage system for big > Cassandra cluster? > > > Regards > > Manish >