Re: Cqlsh copy command on a larger data set

Jai Bheemsen Rao Dhanwada Thu, 16 Jul 2020 11:28:26 -0700

okay, is there a way to export the TTL using CQLsh or DSBulk?

On Thu, Jul 16, 2020 at 11:20 AM Alex Ott <alex...@gmail.com> wrote:


> if you didn't export TTL explicitly, and didn't load it back, then you'll
> get not expirable data.
>
> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> In tried verify metadata, In case of writetime it is setting it as insert
>> time but the TTL value is showing as null. Is this expected? Does this mean
>> this record will never expire after the insert?
>> Is there any alternative to preserve the TTL ?
>>
>> In the new Table inserted with Cqlsh and Dsbulk
>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>>
>>  ttl(secret)
>> --------------
>>          null
>>          null
>>
>> (2 rows)
>>
>> In the old table where the data was written from application
>>
>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>>
>>  ttl(secret)
>> --------------------
>>          4517461
>>          4525958
>>
>> (2 rows)
>>
>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> thank you
>>>
>>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> Alex is referring to the "writetime" and "tttl" values for each cell.
>>>> Most tools copy via CQL writes and don't by default copy those previous
>>>> writetime and ttl values and instead just give a new writetime value which
>>>> matches the copy time rather than initial insert time.
>>>>
>>>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>>> Hello Alex,
>>>>>
>>>>>
>>>>>    - use DSBulk - it's a very effective tool for unloading & loading
>>>>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>>>>> save
>>>>>    disk space (see blog links below for more details).  But the 
>>>>> *preserving
>>>>>    metadata* could be a problem.
>>>>>
>>>>> Here what exactly do you mean by "preserving metadata" ? would you
>>>>> mind explaining?
>>>>>
>>>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>>>> jaibheem...@gmail.com> wrote:
>>>>>
>>>>>> Thank you for the suggestions
>>>>>>
>>>>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:
>>>>>>
>>>>>>> CQLSH definitely won't work for that amount of data, so you need to
>>>>>>> use other tools.
>>>>>>>
>>>>>>> But before selecting them, you need to define requirements. For
>>>>>>> example:
>>>>>>>
>>>>>>>    1. Are you copying the data into tables with exactly the same
>>>>>>>    structure?
>>>>>>>    2. Do you need to preserve metadata, like, writetime & TTL?
>>>>>>>
>>>>>>> Depending on that, you may have following choices:
>>>>>>>
>>>>>>>    - use sstableloader - it will preserve all metadata, like, ttl
>>>>>>>    and writetime. You just need to copy SSTable files, or stream 
>>>>>>> directly from
>>>>>>>    the source cluster.  But this will require copying of data into 
>>>>>>> tables with
>>>>>>>    exactly same structure (and in case of UDTs, the keyspace names 
>>>>>>> should be
>>>>>>>    the same)
>>>>>>>    - use DSBulk - it's a very effective tool for unloading &
>>>>>>>    loading data from/to Cassandra/DSE. Use zstd compression for 
>>>>>>> offloaded data
>>>>>>>    to save disk space (see blog links below for more details).  But the
>>>>>>>    preserving metadata could be a problem.
>>>>>>>    - use Spark + Spark Cassandra Connector. But also, preserving
>>>>>>>    the metadata is not an easy task, and requires programming to handle 
>>>>>>> all
>>>>>>>    edge cases (see
>>>>>>>    https://datastax-oss.atlassian.net/browse/SPARKC-596 for details)
>>>>>>>
>>>>>>>
>>>>>>> blog series on DSBulk:
>>>>>>>
>>>>>>>    -
>>>>>>>    
>>>>>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>>>>>    -
>>>>>>>    
>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>>>>>    -
>>>>>>>    
>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>>>>>    -
>>>>>>>    https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>>>>>    -
>>>>>>>    https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>>>>>    -
>>>>>>>    
>>>>>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I would like to copy some data from one cassandra cluster to
>>>>>>>> another cassandra cluster using the CQLSH copy command. Is this the 
>>>>>>>> good
>>>>>>>> approach if the dataset size on the source cluster is very high(500G -
>>>>>>>> 1TB)? If not what is the safe approach? and are there any 
>>>>>>>> limitations/known
>>>>>>>> issues to keep in mind before attempting this?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> With best wishes,                    Alex Ott
>>>>>>> http://alexott.net/
>>>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>>>
>>>>>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Cqlsh copy command on a larger data set

Reply via email to