Hello Alex,
- use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the *preserving metadata* could be a problem. Here what exactly do you mean by "preserving metadata" ? would you mind explaining? On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thank you for the suggestions > > On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote: > >> CQLSH definitely won't work for that amount of data, so you need to use >> other tools. >> >> But before selecting them, you need to define requirements. For example: >> >> 1. Are you copying the data into tables with exactly the same >> structure? >> 2. Do you need to preserve metadata, like, writetime & TTL? >> >> Depending on that, you may have following choices: >> >> - use sstableloader - it will preserve all metadata, like, ttl and >> writetime. You just need to copy SSTable files, or stream directly from >> the >> source cluster. But this will require copying of data into tables with >> exactly same structure (and in case of UDTs, the keyspace names should be >> the same) >> - use DSBulk - it's a very effective tool for unloading & loading >> data from/to Cassandra/DSE. Use zstd compression for offloaded data to >> save >> disk space (see blog links below for more details). But the preserving >> metadata could be a problem. >> - use Spark + Spark Cassandra Connector. But also, preserving the >> metadata is not an easy task, and requires programming to handle all edge >> cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >> details) >> >> >> blog series on DSBulk: >> >> - >> >> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >> - >> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >> - >> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >> - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >> - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >> - >> >> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >> >> >> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Hello, >>> >>> I would like to copy some data from one cassandra cluster to another >>> cassandra cluster using the CQLSH copy command. Is this the good approach >>> if the dataset size on the source cluster is very high(500G - 1TB)? If not >>> what is the safe approach? and are there any limitations/known issues to >>> keep in mind before attempting this? >>> >> >> >> -- >> With best wishes, Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> >