thank you

On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Alex is referring to the "writetime" and "tttl" values for each cell. Most
> tools copy via CQL writes and don't by default copy those previous
> writetime and ttl values and instead just give a new writetime value which
> matches the copy time rather than initial insert time.
>
> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello Alex,
>>
>>
>>    - use DSBulk - it's a very effective tool for unloading & loading
>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>> save
>>    disk space (see blog links below for more details).  But the *preserving
>>    metadata* could be a problem.
>>
>> Here what exactly do you mean by "preserving metadata" ? would you mind
>> explaining?
>>
>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Thank you for the suggestions
>>>
>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:
>>>
>>>> CQLSH definitely won't work for that amount of data, so you need to use
>>>> other tools.
>>>>
>>>> But before selecting them, you need to define requirements. For example:
>>>>
>>>>    1. Are you copying the data into tables with exactly the same
>>>>    structure?
>>>>    2. Do you need to preserve metadata, like, writetime & TTL?
>>>>
>>>> Depending on that, you may have following choices:
>>>>
>>>>    - use sstableloader - it will preserve all metadata, like, ttl and
>>>>    writetime. You just need to copy SSTable files, or stream directly from 
>>>> the
>>>>    source cluster.  But this will require copying of data into tables with
>>>>    exactly same structure (and in case of UDTs, the keyspace names should 
>>>> be
>>>>    the same)
>>>>    - use DSBulk - it's a very effective tool for unloading & loading
>>>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>>>> save
>>>>    disk space (see blog links below for more details).  But the preserving
>>>>    metadata could be a problem.
>>>>    - use Spark + Spark Cassandra Connector. But also, preserving the
>>>>    metadata is not an easy task, and requires programming to handle all 
>>>> edge
>>>>    cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>>>>    details)
>>>>
>>>>
>>>> blog series on DSBulk:
>>>>
>>>>    -
>>>>    
>>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>>    -
>>>>    https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>>    -
>>>>    
>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>>    -
>>>>    https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>>    -
>>>>    https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>>    -
>>>>    
>>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>>
>>>>
>>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I would like to copy some data from one cassandra cluster to another
>>>>> cassandra cluster using the CQLSH copy command. Is this the good approach
>>>>> if the dataset size on the source cluster is very high(500G - 1TB)? If not
>>>>> what is the safe approach? and are there any limitations/known issues to
>>>>> keep in mind before attempting this?
>>>>>
>>>>
>>>>
>>>> --
>>>> With best wishes,                    Alex Ott
>>>> http://alexott.net/
>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>
>>>

Reply via email to