Re: Cqlsh copy command on a larger data set

2020-07-16 Thread Jai Bheemsen Rao Dhanwada
thank you

On Thu, Jul 16, 2020 at 12:29 PM Alex Ott  wrote:

> look into a series of the blog posts that I sent, I think that it should
> be in the 4th post
>
> On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> okay, is there a way to export the TTL using CQLsh or DSBulk?
>>
>> On Thu, Jul 16, 2020 at 11:20 AM Alex Ott  wrote:
>>
>>> if you didn't export TTL explicitly, and didn't load it back, then
>>> you'll get not expirable data.
>>>
>>> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 In tried verify metadata, In case of writetime it is setting it as
 insert time but the TTL value is showing as null. Is this expected? Does
 this mean this record will never expire after the insert?
 Is there any alternative to preserve the TTL ?

 In the new Table inserted with Cqlsh and Dsbulk
 cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;

  ttl(secret)
 --
  null
  null

 (2 rows)

 In the old table where the data was written from application

 cqlsh > SELECT ttl(secret) from ks_old.cf_old ;

  ttl(secret)
 
  4517461
  4525958

 (2 rows)

 On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
 jaibheem...@gmail.com> wrote:

> thank you
>
> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> Alex is referring to the "writetime" and "tttl" values for each cell.
>> Most tools copy via CQL writes and don't by default copy those previous
>> writetime and ttl values and instead just give a new writetime value 
>> which
>> matches the copy time rather than initial insert time.
>>
>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello Alex,
>>>
>>>
>>>- use DSBulk - it's a very effective tool for unloading &
>>>loading data from/to Cassandra/DSE. Use zstd compression for 
>>> offloaded data
>>>to save disk space (see blog links below for more details).  But the 
>>> *preserving
>>>metadata* could be a problem.
>>>
>>> Here what exactly do you mean by "preserving metadata" ? would you
>>> mind explaining?
>>>
>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 Thank you for the suggestions

 On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:

> CQLSH definitely won't work for that amount of data, so you need
> to use other tools.
>
> But before selecting them, you need to define requirements. For
> example:
>
>1. Are you copying the data into tables with exactly the same
>structure?
>2. Do you need to preserve metadata, like, writetime & TTL?
>
> Depending on that, you may have following choices:
>
>- use sstableloader - it will preserve all metadata, like, ttl
>and writetime. You just need to copy SSTable files, or stream 
> directly from
>the source cluster.  But this will require copying of data into 
> tables with
>exactly same structure (and in case of UDTs, the keyspace names 
> should be
>the same)
>- use DSBulk - it's a very effective tool for unloading &
>loading data from/to Cassandra/DSE. Use zstd compression for 
> offloaded data
>to save disk space (see blog links below for more details).  But 
> the
>preserving metadata could be a problem.
>- use Spark + Spark Cassandra Connector. But also, preserving
>the metadata is not an easy task, and requires programming to 
> handle all
>edge cases (see
>https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>details)
>
>
> blog series on DSBulk:
>
>-
>
> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>-
>
> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>-
>
> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>-
>
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>-
>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>-
>
> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>
>
> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
> 

Re: Cqlsh copy command on a larger data set

2020-07-16 Thread Alex Ott
look into a series of the blog posts that I sent, I think that it should be
in the 4th post

On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> okay, is there a way to export the TTL using CQLsh or DSBulk?
>
> On Thu, Jul 16, 2020 at 11:20 AM Alex Ott  wrote:
>
>> if you didn't export TTL explicitly, and didn't load it back, then you'll
>> get not expirable data.
>>
>> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> In tried verify metadata, In case of writetime it is setting it as
>>> insert time but the TTL value is showing as null. Is this expected? Does
>>> this mean this record will never expire after the insert?
>>> Is there any alternative to preserve the TTL ?
>>>
>>> In the new Table inserted with Cqlsh and Dsbulk
>>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>>>
>>>  ttl(secret)
>>> --
>>>  null
>>>  null
>>>
>>> (2 rows)
>>>
>>> In the old table where the data was written from application
>>>
>>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>>>
>>>  ttl(secret)
>>> 
>>>  4517461
>>>  4525958
>>>
>>> (2 rows)
>>>
>>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 thank you

 On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
 russell.spit...@gmail.com> wrote:

> Alex is referring to the "writetime" and "tttl" values for each cell.
> Most tools copy via CQL writes and don't by default copy those previous
> writetime and ttl values and instead just give a new writetime value which
> matches the copy time rather than initial insert time.
>
> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello Alex,
>>
>>
>>- use DSBulk - it's a very effective tool for unloading & loading
>>data from/to Cassandra/DSE. Use zstd compression for offloaded data 
>> to save
>>disk space (see blog links below for more details).  But the 
>> *preserving
>>metadata* could be a problem.
>>
>> Here what exactly do you mean by "preserving metadata" ? would you
>> mind explaining?
>>
>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Thank you for the suggestions
>>>
>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>>>
 CQLSH definitely won't work for that amount of data, so you need to
 use other tools.

 But before selecting them, you need to define requirements. For
 example:

1. Are you copying the data into tables with exactly the same
structure?
2. Do you need to preserve metadata, like, writetime & TTL?

 Depending on that, you may have following choices:

- use sstableloader - it will preserve all metadata, like, ttl
and writetime. You just need to copy SSTable files, or stream 
 directly from
the source cluster.  But this will require copying of data into 
 tables with
exactly same structure (and in case of UDTs, the keyspace names 
 should be
the same)
- use DSBulk - it's a very effective tool for unloading &
loading data from/to Cassandra/DSE. Use zstd compression for 
 offloaded data
to save disk space (see blog links below for more details).  But the
preserving metadata could be a problem.
- use Spark + Spark Cassandra Connector. But also, preserving
the metadata is not an easy task, and requires programming to 
 handle all
edge cases (see
https://datastax-oss.atlassian.net/browse/SPARKC-596 for
details)


 blog series on DSBulk:

-

 https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
-

 https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
-

 https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
-
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
-
https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
-

 https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations


 On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
 jaibheem...@gmail.com> wrote:

> Hello,
>
> I would like to copy some data from one cassandra cluster to
> another cassandra cluster using the CQLSH copy command. Is this the 
> good
> 

Re: Cqlsh copy command on a larger data set

2020-07-16 Thread Jai Bheemsen Rao Dhanwada
okay, is there a way to export the TTL using CQLsh or DSBulk?

On Thu, Jul 16, 2020 at 11:20 AM Alex Ott  wrote:

> if you didn't export TTL explicitly, and didn't load it back, then you'll
> get not expirable data.
>
> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> In tried verify metadata, In case of writetime it is setting it as insert
>> time but the TTL value is showing as null. Is this expected? Does this mean
>> this record will never expire after the insert?
>> Is there any alternative to preserve the TTL ?
>>
>> In the new Table inserted with Cqlsh and Dsbulk
>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>>
>>  ttl(secret)
>> --
>>  null
>>  null
>>
>> (2 rows)
>>
>> In the old table where the data was written from application
>>
>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>>
>>  ttl(secret)
>> 
>>  4517461
>>  4525958
>>
>> (2 rows)
>>
>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> thank you
>>>
>>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
 Alex is referring to the "writetime" and "tttl" values for each cell.
 Most tools copy via CQL writes and don't by default copy those previous
 writetime and ttl values and instead just give a new writetime value which
 matches the copy time rather than initial insert time.

 On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
 jaibheem...@gmail.com> wrote:

> Hello Alex,
>
>
>- use DSBulk - it's a very effective tool for unloading & loading
>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
> save
>disk space (see blog links below for more details).  But the 
> *preserving
>metadata* could be a problem.
>
> Here what exactly do you mean by "preserving metadata" ? would you
> mind explaining?
>
> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Thank you for the suggestions
>>
>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>>
>>> CQLSH definitely won't work for that amount of data, so you need to
>>> use other tools.
>>>
>>> But before selecting them, you need to define requirements. For
>>> example:
>>>
>>>1. Are you copying the data into tables with exactly the same
>>>structure?
>>>2. Do you need to preserve metadata, like, writetime & TTL?
>>>
>>> Depending on that, you may have following choices:
>>>
>>>- use sstableloader - it will preserve all metadata, like, ttl
>>>and writetime. You just need to copy SSTable files, or stream 
>>> directly from
>>>the source cluster.  But this will require copying of data into 
>>> tables with
>>>exactly same structure (and in case of UDTs, the keyspace names 
>>> should be
>>>the same)
>>>- use DSBulk - it's a very effective tool for unloading &
>>>loading data from/to Cassandra/DSE. Use zstd compression for 
>>> offloaded data
>>>to save disk space (see blog links below for more details).  But the
>>>preserving metadata could be a problem.
>>>- use Spark + Spark Cassandra Connector. But also, preserving
>>>the metadata is not an easy task, and requires programming to handle 
>>> all
>>>edge cases (see
>>>https://datastax-oss.atlassian.net/browse/SPARKC-596 for details)
>>>
>>>
>>> blog series on DSBulk:
>>>
>>>-
>>>
>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>-
>>>
>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>-
>>>
>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>-
>>>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>-
>>>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>-
>>>
>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>
>>>
>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 Hello,

 I would like to copy some data from one cassandra cluster to
 another cassandra cluster using the CQLSH copy command. Is this the 
 good
 approach if the dataset size on the source cluster is very high(500G -
 1TB)? If not what is the safe approach? and are there any 
 limitations/known
 issues to keep in mind before attempting this?

>>>
>>>
>>> --
>>> With best wishes,Alex Ott
>>> 

Re: Cqlsh copy command on a larger data set

2020-07-16 Thread Alex Ott
if you didn't export TTL explicitly, and didn't load it back, then you'll
get not expirable data.

On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> In tried verify metadata, In case of writetime it is setting it as insert
> time but the TTL value is showing as null. Is this expected? Does this mean
> this record will never expire after the insert?
> Is there any alternative to preserve the TTL ?
>
> In the new Table inserted with Cqlsh and Dsbulk
> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>
>  ttl(secret)
> --
>  null
>  null
>
> (2 rows)
>
> In the old table where the data was written from application
>
> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>
>  ttl(secret)
> 
>  4517461
>  4525958
>
> (2 rows)
>
> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> thank you
>>
>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> Alex is referring to the "writetime" and "tttl" values for each cell.
>>> Most tools copy via CQL writes and don't by default copy those previous
>>> writetime and ttl values and instead just give a new writetime value which
>>> matches the copy time rather than initial insert time.
>>>
>>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 Hello Alex,


- use DSBulk - it's a very effective tool for unloading & loading
data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
 save
disk space (see blog links below for more details).  But the *preserving
metadata* could be a problem.

 Here what exactly do you mean by "preserving metadata" ? would you
 mind explaining?

 On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
 jaibheem...@gmail.com> wrote:

> Thank you for the suggestions
>
> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>
>> CQLSH definitely won't work for that amount of data, so you need to
>> use other tools.
>>
>> But before selecting them, you need to define requirements. For
>> example:
>>
>>1. Are you copying the data into tables with exactly the same
>>structure?
>>2. Do you need to preserve metadata, like, writetime & TTL?
>>
>> Depending on that, you may have following choices:
>>
>>- use sstableloader - it will preserve all metadata, like, ttl
>>and writetime. You just need to copy SSTable files, or stream 
>> directly from
>>the source cluster.  But this will require copying of data into 
>> tables with
>>exactly same structure (and in case of UDTs, the keyspace names 
>> should be
>>the same)
>>- use DSBulk - it's a very effective tool for unloading & loading
>>data from/to Cassandra/DSE. Use zstd compression for offloaded data 
>> to save
>>disk space (see blog links below for more details).  But the 
>> preserving
>>metadata could be a problem.
>>- use Spark + Spark Cassandra Connector. But also, preserving the
>>metadata is not an easy task, and requires programming to handle all 
>> edge
>>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596
>>for details)
>>
>>
>> blog series on DSBulk:
>>
>>-
>>
>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>-
>>
>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>-
>>
>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>-
>>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>-
>>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>-
>>
>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>
>>
>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I would like to copy some data from one cassandra cluster to another
>>> cassandra cluster using the CQLSH copy command. Is this the good 
>>> approach
>>> if the dataset size on the source cluster is very high(500G - 1TB)? If 
>>> not
>>> what is the safe approach? and are there any limitations/known issues to
>>> keep in mind before attempting this?
>>>
>>
>>
>> --
>> With best wishes,Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>

-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


Re: Cqlsh copy command on a larger data set

2020-07-16 Thread Jai Bheemsen Rao Dhanwada
In tried verify metadata, In case of writetime it is setting it as insert
time but the TTL value is showing as null. Is this expected? Does this mean
this record will never expire after the insert?
Is there any alternative to preserve the TTL ?

In the new Table inserted with Cqlsh and Dsbulk
cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;

 ttl(secret)
--
 null
 null

(2 rows)

In the old table where the data was written from application

cqlsh > SELECT ttl(secret) from ks_old.cf_old ;

 ttl(secret)

 4517461
 4525958

(2 rows)

On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> thank you
>
> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer 
> wrote:
>
>> Alex is referring to the "writetime" and "tttl" values for each cell.
>> Most tools copy via CQL writes and don't by default copy those previous
>> writetime and ttl values and instead just give a new writetime value which
>> matches the copy time rather than initial insert time.
>>
>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello Alex,
>>>
>>>
>>>- use DSBulk - it's a very effective tool for unloading & loading
>>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>>> save
>>>disk space (see blog links below for more details).  But the *preserving
>>>metadata* could be a problem.
>>>
>>> Here what exactly do you mean by "preserving metadata" ? would you mind
>>> explaining?
>>>
>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 Thank you for the suggestions

 On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:

> CQLSH definitely won't work for that amount of data, so you need to
> use other tools.
>
> But before selecting them, you need to define requirements. For
> example:
>
>1. Are you copying the data into tables with exactly the same
>structure?
>2. Do you need to preserve metadata, like, writetime & TTL?
>
> Depending on that, you may have following choices:
>
>- use sstableloader - it will preserve all metadata, like, ttl and
>writetime. You just need to copy SSTable files, or stream directly 
> from the
>source cluster.  But this will require copying of data into tables with
>exactly same structure (and in case of UDTs, the keyspace names should 
> be
>the same)
>- use DSBulk - it's a very effective tool for unloading & loading
>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
> save
>disk space (see blog links below for more details).  But the preserving
>metadata could be a problem.
>- use Spark + Spark Cassandra Connector. But also, preserving the
>metadata is not an easy task, and requires programming to handle all 
> edge
>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596
>for details)
>
>
> blog series on DSBulk:
>
>-
>
> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>-
>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>-
>
> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>-
>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>-
>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>-
>
> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>
>
> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello,
>>
>> I would like to copy some data from one cassandra cluster to another
>> cassandra cluster using the CQLSH copy command. Is this the good approach
>> if the dataset size on the source cluster is very high(500G - 1TB)? If 
>> not
>> what is the safe approach? and are there any limitations/known issues to
>> keep in mind before attempting this?
>>
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>



Re: Cqlsh copy command on a larger data set

2020-07-15 Thread Jai Bheemsen Rao Dhanwada
thank you

On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer 
wrote:

> Alex is referring to the "writetime" and "tttl" values for each cell. Most
> tools copy via CQL writes and don't by default copy those previous
> writetime and ttl values and instead just give a new writetime value which
> matches the copy time rather than initial insert time.
>
> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello Alex,
>>
>>
>>- use DSBulk - it's a very effective tool for unloading & loading
>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>> save
>>disk space (see blog links below for more details).  But the *preserving
>>metadata* could be a problem.
>>
>> Here what exactly do you mean by "preserving metadata" ? would you mind
>> explaining?
>>
>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Thank you for the suggestions
>>>
>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>>>
 CQLSH definitely won't work for that amount of data, so you need to use
 other tools.

 But before selecting them, you need to define requirements. For example:

1. Are you copying the data into tables with exactly the same
structure?
2. Do you need to preserve metadata, like, writetime & TTL?

 Depending on that, you may have following choices:

- use sstableloader - it will preserve all metadata, like, ttl and
writetime. You just need to copy SSTable files, or stream directly from 
 the
source cluster.  But this will require copying of data into tables with
exactly same structure (and in case of UDTs, the keyspace names should 
 be
the same)
- use DSBulk - it's a very effective tool for unloading & loading
data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
 save
disk space (see blog links below for more details).  But the preserving
metadata could be a problem.
- use Spark + Spark Cassandra Connector. But also, preserving the
metadata is not an easy task, and requires programming to handle all 
 edge
cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
details)


 blog series on DSBulk:

-

 https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
-
https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
-

 https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
-
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
-
https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
-

 https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations


 On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
 jaibheem...@gmail.com> wrote:

> Hello,
>
> I would like to copy some data from one cassandra cluster to another
> cassandra cluster using the CQLSH copy command. Is this the good approach
> if the dataset size on the source cluster is very high(500G - 1TB)? If not
> what is the safe approach? and are there any limitations/known issues to
> keep in mind before attempting this?
>


 --
 With best wishes,Alex Ott
 http://alexott.net/
 Twitter: alexott_en (English), alexott (Russian)

>>>


Re: Cqlsh copy command on a larger data set

2020-07-15 Thread Russell Spitzer
Alex is referring to the "writetime" and "tttl" values for each cell. Most
tools copy via CQL writes and don't by default copy those previous
writetime and ttl values and instead just give a new writetime value which
matches the copy time rather than initial insert time.

On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello Alex,
>
>
>- use DSBulk - it's a very effective tool for unloading & loading data
>from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk
>space (see blog links below for more details).  But the *preserving
>metadata* could be a problem.
>
> Here what exactly do you mean by "preserving metadata" ? would you mind
> explaining?
>
> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Thank you for the suggestions
>>
>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>>
>>> CQLSH definitely won't work for that amount of data, so you need to use
>>> other tools.
>>>
>>> But before selecting them, you need to define requirements. For example:
>>>
>>>1. Are you copying the data into tables with exactly the same
>>>structure?
>>>2. Do you need to preserve metadata, like, writetime & TTL?
>>>
>>> Depending on that, you may have following choices:
>>>
>>>- use sstableloader - it will preserve all metadata, like, ttl and
>>>writetime. You just need to copy SSTable files, or stream directly from 
>>> the
>>>source cluster.  But this will require copying of data into tables with
>>>exactly same structure (and in case of UDTs, the keyspace names should be
>>>the same)
>>>- use DSBulk - it's a very effective tool for unloading & loading
>>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>>> save
>>>disk space (see blog links below for more details).  But the preserving
>>>metadata could be a problem.
>>>- use Spark + Spark Cassandra Connector. But also, preserving the
>>>metadata is not an easy task, and requires programming to handle all edge
>>>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>>>details)
>>>
>>>
>>> blog series on DSBulk:
>>>
>>>-
>>>
>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>-
>>>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>-
>>>
>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>-
>>>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>-
>>>
>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>
>>>
>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
 Hello,

 I would like to copy some data from one cassandra cluster to another
 cassandra cluster using the CQLSH copy command. Is this the good approach
 if the dataset size on the source cluster is very high(500G - 1TB)? If not
 what is the safe approach? and are there any limitations/known issues to
 keep in mind before attempting this?

>>>
>>>
>>> --
>>> With best wishes,Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>


Re: Cqlsh copy command on a larger data set

2020-07-15 Thread Jai Bheemsen Rao Dhanwada
Hello Alex,


   - use DSBulk - it's a very effective tool for unloading & loading data
   from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk
   space (see blog links below for more details).  But the *preserving
   metadata* could be a problem.

Here what exactly do you mean by "preserving metadata" ? would you mind
explaining?

On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thank you for the suggestions
>
> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:
>
>> CQLSH definitely won't work for that amount of data, so you need to use
>> other tools.
>>
>> But before selecting them, you need to define requirements. For example:
>>
>>1. Are you copying the data into tables with exactly the same
>>structure?
>>2. Do you need to preserve metadata, like, writetime & TTL?
>>
>> Depending on that, you may have following choices:
>>
>>- use sstableloader - it will preserve all metadata, like, ttl and
>>writetime. You just need to copy SSTable files, or stream directly from 
>> the
>>source cluster.  But this will require copying of data into tables with
>>exactly same structure (and in case of UDTs, the keyspace names should be
>>the same)
>>- use DSBulk - it's a very effective tool for unloading & loading
>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>> save
>>disk space (see blog links below for more details).  But the preserving
>>metadata could be a problem.
>>- use Spark + Spark Cassandra Connector. But also, preserving the
>>metadata is not an easy task, and requires programming to handle all edge
>>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>>details)
>>
>>
>> blog series on DSBulk:
>>
>>-
>>
>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>-
>>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>-
>>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>- https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>-
>>
>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>
>>
>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I would like to copy some data from one cassandra cluster to another
>>> cassandra cluster using the CQLSH copy command. Is this the good approach
>>> if the dataset size on the source cluster is very high(500G - 1TB)? If not
>>> what is the safe approach? and are there any limitations/known issues to
>>> keep in mind before attempting this?
>>>
>>
>>
>> --
>> With best wishes,Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>


Re: Cqlsh copy command on a larger data set

2020-07-14 Thread Jai Bheemsen Rao Dhanwada
Thank you for the suggestions

On Tue, Jul 14, 2020 at 1:42 AM Alex Ott  wrote:

> CQLSH definitely won't work for that amount of data, so you need to use
> other tools.
>
> But before selecting them, you need to define requirements. For example:
>
>1. Are you copying the data into tables with exactly the same
>structure?
>2. Do you need to preserve metadata, like, writetime & TTL?
>
> Depending on that, you may have following choices:
>
>- use sstableloader - it will preserve all metadata, like, ttl and
>writetime. You just need to copy SSTable files, or stream directly from the
>source cluster.  But this will require copying of data into tables with
>exactly same structure (and in case of UDTs, the keyspace names should be
>the same)
>- use DSBulk - it's a very effective tool for unloading & loading data
>from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk
>space (see blog links below for more details).  But the preserving metadata
>could be a problem.
>- use Spark + Spark Cassandra Connector. But also, preserving the
>metadata is not an easy task, and requires programming to handle all edge
>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>details)
>
>
> blog series on DSBulk:
>
>-
>
> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>-
>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>-
>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>- https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>-
>
> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>
>
> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello,
>>
>> I would like to copy some data from one cassandra cluster to another
>> cassandra cluster using the CQLSH copy command. Is this the good approach
>> if the dataset size on the source cluster is very high(500G - 1TB)? If not
>> what is the safe approach? and are there any limitations/known issues to
>> keep in mind before attempting this?
>>
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>


Re: Cqlsh copy command on a larger data set

2020-07-14 Thread Alex Ott
CQLSH definitely won't work for that amount of data, so you need to use
other tools.

But before selecting them, you need to define requirements. For example:

   1. Are you copying the data into tables with exactly the same structure?
   2. Do you need to preserve metadata, like, writetime & TTL?

Depending on that, you may have following choices:

   - use sstableloader - it will preserve all metadata, like, ttl and
   writetime. You just need to copy SSTable files, or stream directly from the
   source cluster.  But this will require copying of data into tables with
   exactly same structure (and in case of UDTs, the keyspace names should be
   the same)
   - use DSBulk - it's a very effective tool for unloading & loading data
   from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk
   space (see blog links below for more details).  But the preserving metadata
   could be a problem.
   - use Spark + Spark Cassandra Connector. But also, preserving the
   metadata is not an easy task, and requires programming to handle all edge
   cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for
   details)


blog series on DSBulk:

   -
   
https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
   - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
   -
   https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
   - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
   - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
   -
   
https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations


On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello,
>
> I would like to copy some data from one cassandra cluster to another
> cassandra cluster using the CQLSH copy command. Is this the good approach
> if the dataset size on the source cluster is very high(500G - 1TB)? If not
> what is the safe approach? and are there any limitations/known issues to
> keep in mind before attempting this?
>


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


Re: Cqlsh copy command on a larger data set

2020-07-13 Thread Kiran mk
I wouldn't say it's good approach for that size.  But you can try dsbulk
approach too.

Try to split output into multiple files.

Best Regards,
Kiran M K

On Tue, Jul 14, 2020, 5:17 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello,
>
> I would like to copy some data from one cassandra cluster to another
> cassandra cluster using the CQLSH copy command. Is this the good approach
> if the dataset size on the source cluster is very high(500G - 1TB)? If not
> what is the safe approach? and are there any limitations/known issues to
> keep in mind before attempting this?
>


Cqlsh copy command on a larger data set

2020-07-13 Thread Jai Bheemsen Rao Dhanwada
Hello,

I would like to copy some data from one cassandra cluster to another
cassandra cluster using the CQLSH copy command. Is this the good approach
if the dataset size on the source cluster is very high(500G - 1TB)? If not
what is the safe approach? and are there any limitations/known issues to
keep in mind before attempting this?