thank you On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <russell.spit...@gmail.com> wrote:
> Alex is referring to the "writetime" and "tttl" values for each cell. Most > tools copy via CQL writes and don't by default copy those previous > writetime and ttl values and instead just give a new writetime value which > matches the copy time rather than initial insert time. > > On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Hello Alex, >> >> >> - use DSBulk - it's a very effective tool for unloading & loading >> data from/to Cassandra/DSE. Use zstd compression for offloaded data to >> save >> disk space (see blog links below for more details). But the *preserving >> metadata* could be a problem. >> >> Here what exactly do you mean by "preserving metadata" ? would you mind >> explaining? >> >> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Thank you for the suggestions >>> >>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote: >>> >>>> CQLSH definitely won't work for that amount of data, so you need to use >>>> other tools. >>>> >>>> But before selecting them, you need to define requirements. For example: >>>> >>>> 1. Are you copying the data into tables with exactly the same >>>> structure? >>>> 2. Do you need to preserve metadata, like, writetime & TTL? >>>> >>>> Depending on that, you may have following choices: >>>> >>>> - use sstableloader - it will preserve all metadata, like, ttl and >>>> writetime. You just need to copy SSTable files, or stream directly from >>>> the >>>> source cluster. But this will require copying of data into tables with >>>> exactly same structure (and in case of UDTs, the keyspace names should >>>> be >>>> the same) >>>> - use DSBulk - it's a very effective tool for unloading & loading >>>> data from/to Cassandra/DSE. Use zstd compression for offloaded data to >>>> save >>>> disk space (see blog links below for more details). But the preserving >>>> metadata could be a problem. >>>> - use Spark + Spark Cassandra Connector. But also, preserving the >>>> metadata is not an easy task, and requires programming to handle all >>>> edge >>>> cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >>>> details) >>>> >>>> >>>> blog series on DSBulk: >>>> >>>> - >>>> >>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>>> - >>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>>> - >>>> >>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>>> - >>>> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>>> - >>>> https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>>> - >>>> >>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >>>> >>>> >>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >>>> jaibheem...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I would like to copy some data from one cassandra cluster to another >>>>> cassandra cluster using the CQLSH copy command. Is this the good approach >>>>> if the dataset size on the source cluster is very high(500G - 1TB)? If not >>>>> what is the safe approach? and are there any limitations/known issues to >>>>> keep in mind before attempting this? >>>>> >>>> >>>> >>>> -- >>>> With best wishes, Alex Ott >>>> http://alexott.net/ >>>> Twitter: alexott_en (English), alexott (Russian) >>>> >>>