Re: Cqlsh copy command on a larger data set

Alex Ott Thu, 16 Jul 2020 12:29:14 -0700

look into a series of the blog posts that I sent, I think that it should be
in the 4th post


On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> okay, is there a way to export the TTL using CQLsh or DSBulk?
>
> On Thu, Jul 16, 2020 at 11:20 AM Alex Ott <alex...@gmail.com> wrote:
>
>> if you didn't export TTL explicitly, and didn't load it back, then you'll
>> get not expirable data.
>>
>> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> In tried verify metadata, In case of writetime it is setting it as
>>> insert time but the TTL value is showing as null. Is this expected? Does
>>> this mean this record will never expire after the insert?
>>> Is there any alternative to preserve the TTL ?
>>>
>>> In the new Table inserted with Cqlsh and Dsbulk
>>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>>>
>>>  ttl(secret)
>>> --------------
>>>          null
>>>          null
>>>
>>> (2 rows)
>>>
>>> In the old table where the data was written from application
>>>
>>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>>>
>>>  ttl(secret)
>>> --------------------
>>>          4517461
>>>          4525958
>>>
>>> (2 rows)
>>>
>>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
>>>> thank you
>>>>
>>>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>>>> russell.spit...@gmail.com> wrote:
>>>>
>>>>> Alex is referring to the "writetime" and "tttl" values for each cell.
>>>>> Most tools copy via CQL writes and don't by default copy those previous
>>>>> writetime and ttl values and instead just give a new writetime value which
>>>>> matches the copy time rather than initial insert time.
>>>>>
>>>>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>>>>> jaibheem...@gmail.com> wrote:
>>>>>
>>>>>> Hello Alex,
>>>>>>
>>>>>>
>>>>>>    - use DSBulk - it's a very effective tool for unloading & loading
>>>>>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data 
>>>>>> to save
>>>>>>    disk space (see blog links below for more details).  But the 
>>>>>> *preserving
>>>>>>    metadata* could be a problem.
>>>>>>
>>>>>> Here what exactly do you mean by "preserving metadata" ? would you
>>>>>> mind explaining?
>>>>>>
>>>>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>
>>>>>>> Thank you for the suggestions
>>>>>>>
>>>>>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:
>>>>>>>
>>>>>>>> CQLSH definitely won't work for that amount of data, so you need to
>>>>>>>> use other tools.
>>>>>>>>
>>>>>>>> But before selecting them, you need to define requirements. For
>>>>>>>> example:
>>>>>>>>
>>>>>>>>    1. Are you copying the data into tables with exactly the same
>>>>>>>>    structure?
>>>>>>>>    2. Do you need to preserve metadata, like, writetime & TTL?
>>>>>>>>
>>>>>>>> Depending on that, you may have following choices:
>>>>>>>>
>>>>>>>>    - use sstableloader - it will preserve all metadata, like, ttl
>>>>>>>>    and writetime. You just need to copy SSTable files, or stream 
>>>>>>>> directly from
>>>>>>>>    the source cluster.  But this will require copying of data into 
>>>>>>>> tables with
>>>>>>>>    exactly same structure (and in case of UDTs, the keyspace names 
>>>>>>>> should be
>>>>>>>>    the same)
>>>>>>>>    - use DSBulk - it's a very effective tool for unloading &
>>>>>>>>    loading data from/to Cassandra/DSE. Use zstd compression for 
>>>>>>>> offloaded data
>>>>>>>>    to save disk space (see blog links below for more details).  But the
>>>>>>>>    preserving metadata could be a problem.
>>>>>>>>    - use Spark + Spark Cassandra Connector. But also, preserving
>>>>>>>>    the metadata is not an easy task, and requires programming to 
>>>>>>>> handle all
>>>>>>>>    edge cases (see
>>>>>>>>    https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>>>>>>>>    details)
>>>>>>>>
>>>>>>>>
>>>>>>>> blog series on DSBulk:
>>>>>>>>
>>>>>>>>    -
>>>>>>>>    
>>>>>>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>>>>>>    -
>>>>>>>>    
>>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>>>>>>    -
>>>>>>>>    
>>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>>>>>>    -
>>>>>>>>    https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>>>>>>    -
>>>>>>>>    https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>>>>>>    -
>>>>>>>>    
>>>>>>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I would like to copy some data from one cassandra cluster to
>>>>>>>>> another cassandra cluster using the CQLSH copy command. Is this the 
>>>>>>>>> good
>>>>>>>>> approach if the dataset size on the source cluster is very high(500G -
>>>>>>>>> 1TB)? If not what is the safe approach? and are there any 
>>>>>>>>> limitations/known
>>>>>>>>> issues to keep in mind before attempting this?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> With best wishes,                    Alex Ott
>>>>>>>> http://alexott.net/
>>>>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>>>>
>>>>>>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Cqlsh copy command on a larger data set

Reply via email to