Alex is referring to the "writetime" and "tttl" values for each cell. Most tools copy via CQL writes and don't by default copy those previous writetime and ttl values and instead just give a new writetime value which matches the copy time rather than initial insert time.
On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello Alex, > > > - use DSBulk - it's a very effective tool for unloading & loading data > from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk > space (see blog links below for more details). But the *preserving > metadata* could be a problem. > > Here what exactly do you mean by "preserving metadata" ? would you mind > explaining? > > On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Thank you for the suggestions >> >> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote: >> >>> CQLSH definitely won't work for that amount of data, so you need to use >>> other tools. >>> >>> But before selecting them, you need to define requirements. For example: >>> >>> 1. Are you copying the data into tables with exactly the same >>> structure? >>> 2. Do you need to preserve metadata, like, writetime & TTL? >>> >>> Depending on that, you may have following choices: >>> >>> - use sstableloader - it will preserve all metadata, like, ttl and >>> writetime. You just need to copy SSTable files, or stream directly from >>> the >>> source cluster. But this will require copying of data into tables with >>> exactly same structure (and in case of UDTs, the keyspace names should be >>> the same) >>> - use DSBulk - it's a very effective tool for unloading & loading >>> data from/to Cassandra/DSE. Use zstd compression for offloaded data to >>> save >>> disk space (see blog links below for more details). But the preserving >>> metadata could be a problem. >>> - use Spark + Spark Cassandra Connector. But also, preserving the >>> metadata is not an easy task, and requires programming to handle all edge >>> cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >>> details) >>> >>> >>> blog series on DSBulk: >>> >>> - >>> >>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>> - >>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>> - >>> >>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>> - >>> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>> - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>> - >>> >>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >>> >>> >>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I would like to copy some data from one cassandra cluster to another >>>> cassandra cluster using the CQLSH copy command. Is this the good approach >>>> if the dataset size on the source cluster is very high(500G - 1TB)? If not >>>> what is the safe approach? and are there any limitations/known issues to >>>> keep in mind before attempting this? >>>> >>> >>> >>> -- >>> With best wishes, Alex Ott >>> http://alexott.net/ >>> Twitter: alexott_en (English), alexott (Russian) >>> >>