Re: Cqlsh copy command on a larger data set
thank you On Thu, Jul 16, 2020 at 12:29 PM Alex Ott wrote: > look into a series of the blog posts that I sent, I think that it should > be in the 4th post > > On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> okay, is there a way to export the TTL using CQLsh or DSBulk? >> >> On Thu, Jul 16, 2020 at 11:20 AM Alex Ott wrote: >> >>> if you didn't export TTL explicitly, and didn't load it back, then >>> you'll get not expirable data. >>> >>> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> In tried verify metadata, In case of writetime it is setting it as insert time but the TTL value is showing as null. Is this expected? Does this mean this record will never expire after the insert? Is there any alternative to preserve the TTL ? In the new Table inserted with Cqlsh and Dsbulk cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ; ttl(secret) -- null null (2 rows) In the old table where the data was written from application cqlsh > SELECT ttl(secret) from ks_old.cf_old ; ttl(secret) 4517461 4525958 (2 rows) On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > thank you > > On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Alex is referring to the "writetime" and "tttl" values for each cell. >> Most tools copy via CQL writes and don't by default copy those previous >> writetime and ttl values and instead just give a new writetime value >> which >> matches the copy time rather than initial insert time. >> >> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Hello Alex, >>> >>> >>>- use DSBulk - it's a very effective tool for unloading & >>>loading data from/to Cassandra/DSE. Use zstd compression for >>> offloaded data >>>to save disk space (see blog links below for more details). But the >>> *preserving >>>metadata* could be a problem. >>> >>> Here what exactly do you mean by "preserving metadata" ? would you >>> mind explaining? >>> >>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> Thank you for the suggestions On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: > CQLSH definitely won't work for that amount of data, so you need > to use other tools. > > But before selecting them, you need to define requirements. For > example: > >1. Are you copying the data into tables with exactly the same >structure? >2. Do you need to preserve metadata, like, writetime & TTL? > > Depending on that, you may have following choices: > >- use sstableloader - it will preserve all metadata, like, ttl >and writetime. You just need to copy SSTable files, or stream > directly from >the source cluster. But this will require copying of data into > tables with >exactly same structure (and in case of UDTs, the keyspace names > should be >the same) >- use DSBulk - it's a very effective tool for unloading & >loading data from/to Cassandra/DSE. Use zstd compression for > offloaded data >to save disk space (see blog links below for more details). But > the >preserving metadata could be a problem. >- use Spark + Spark Cassandra Connector. But also, preserving >the metadata is not an easy task, and requires programming to > handle all >edge cases (see >https://datastax-oss.atlassian.net/browse/SPARKC-596 for >details) > > > blog series on DSBulk: > >- > > https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >- > > https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >- > > https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >- > > https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >- >https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >- > > https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations > > > On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmai
Re: Cqlsh copy command on a larger data set
look into a series of the blog posts that I sent, I think that it should be in the 4th post On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > okay, is there a way to export the TTL using CQLsh or DSBulk? > > On Thu, Jul 16, 2020 at 11:20 AM Alex Ott wrote: > >> if you didn't export TTL explicitly, and didn't load it back, then you'll >> get not expirable data. >> >> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> In tried verify metadata, In case of writetime it is setting it as >>> insert time but the TTL value is showing as null. Is this expected? Does >>> this mean this record will never expire after the insert? >>> Is there any alternative to preserve the TTL ? >>> >>> In the new Table inserted with Cqlsh and Dsbulk >>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ; >>> >>> ttl(secret) >>> -- >>> null >>> null >>> >>> (2 rows) >>> >>> In the old table where the data was written from application >>> >>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ; >>> >>> ttl(secret) >>> >>> 4517461 >>> 4525958 >>> >>> (2 rows) >>> >>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> thank you On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer < russell.spit...@gmail.com> wrote: > Alex is referring to the "writetime" and "tttl" values for each cell. > Most tools copy via CQL writes and don't by default copy those previous > writetime and ttl values and instead just give a new writetime value which > matches the copy time rather than initial insert time. > > On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Hello Alex, >> >> >>- use DSBulk - it's a very effective tool for unloading & loading >>data from/to Cassandra/DSE. Use zstd compression for offloaded data >> to save >>disk space (see blog links below for more details). But the >> *preserving >>metadata* could be a problem. >> >> Here what exactly do you mean by "preserving metadata" ? would you >> mind explaining? >> >> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Thank you for the suggestions >>> >>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: >>> CQLSH definitely won't work for that amount of data, so you need to use other tools. But before selecting them, you need to define requirements. For example: 1. Are you copying the data into tables with exactly the same structure? 2. Do you need to preserve metadata, like, writetime & TTL? Depending on that, you may have following choices: - use sstableloader - it will preserve all metadata, like, ttl and writetime. You just need to copy SSTable files, or stream directly from the source cluster. But this will require copying of data into tables with exactly same structure (and in case of UDTs, the keyspace names should be the same) - use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the preserving metadata could be a problem. - use Spark + Spark Cassandra Connector. But also, preserving the metadata is not an easy task, and requires programming to handle all edge cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for details) blog series on DSBulk: - https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting - https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello, > > I would like to copy some data from one cassandra cluster to > another cassandra cluster using the CQLSH copy command. Is this the > good > appr
Re: Cqlsh copy command on a larger data set
okay, is there a way to export the TTL using CQLsh or DSBulk? On Thu, Jul 16, 2020 at 11:20 AM Alex Ott wrote: > if you didn't export TTL explicitly, and didn't load it back, then you'll > get not expirable data. > > On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> In tried verify metadata, In case of writetime it is setting it as insert >> time but the TTL value is showing as null. Is this expected? Does this mean >> this record will never expire after the insert? >> Is there any alternative to preserve the TTL ? >> >> In the new Table inserted with Cqlsh and Dsbulk >> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ; >> >> ttl(secret) >> -- >> null >> null >> >> (2 rows) >> >> In the old table where the data was written from application >> >> cqlsh > SELECT ttl(secret) from ks_old.cf_old ; >> >> ttl(secret) >> >> 4517461 >> 4525958 >> >> (2 rows) >> >> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> thank you >>> >>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> Alex is referring to the "writetime" and "tttl" values for each cell. Most tools copy via CQL writes and don't by default copy those previous writetime and ttl values and instead just give a new writetime value which matches the copy time rather than initial insert time. On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello Alex, > > >- use DSBulk - it's a very effective tool for unloading & loading >data from/to Cassandra/DSE. Use zstd compression for offloaded data to > save >disk space (see blog links below for more details). But the > *preserving >metadata* could be a problem. > > Here what exactly do you mean by "preserving metadata" ? would you > mind explaining? > > On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Thank you for the suggestions >> >> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: >> >>> CQLSH definitely won't work for that amount of data, so you need to >>> use other tools. >>> >>> But before selecting them, you need to define requirements. For >>> example: >>> >>>1. Are you copying the data into tables with exactly the same >>>structure? >>>2. Do you need to preserve metadata, like, writetime & TTL? >>> >>> Depending on that, you may have following choices: >>> >>>- use sstableloader - it will preserve all metadata, like, ttl >>>and writetime. You just need to copy SSTable files, or stream >>> directly from >>>the source cluster. But this will require copying of data into >>> tables with >>>exactly same structure (and in case of UDTs, the keyspace names >>> should be >>>the same) >>>- use DSBulk - it's a very effective tool for unloading & >>>loading data from/to Cassandra/DSE. Use zstd compression for >>> offloaded data >>>to save disk space (see blog links below for more details). But the >>>preserving metadata could be a problem. >>>- use Spark + Spark Cassandra Connector. But also, preserving >>>the metadata is not an easy task, and requires programming to handle >>> all >>>edge cases (see >>>https://datastax-oss.atlassian.net/browse/SPARKC-596 for details) >>> >>> >>> blog series on DSBulk: >>> >>>- >>> >>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>>- >>> >>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>>- >>> >>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>>- >>>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>>- >>>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>>- >>> >>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >>> >>> >>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> Hello, I would like to copy some data from one cassandra cluster to another cassandra cluster using the CQLSH copy command. Is this the good approach if the dataset size on the source cluster is very high(500G - 1TB)? If not what is the safe approach? and are there any limitations/known issues to keep in mind before attempting this? >>> >>> >>> -- >>> With best wishes,Alex Ott >>> http://alexott.
Re: Cqlsh copy command on a larger data set
if you didn't export TTL explicitly, and didn't load it back, then you'll get not expirable data. On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > In tried verify metadata, In case of writetime it is setting it as insert > time but the TTL value is showing as null. Is this expected? Does this mean > this record will never expire after the insert? > Is there any alternative to preserve the TTL ? > > In the new Table inserted with Cqlsh and Dsbulk > cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ; > > ttl(secret) > -- > null > null > > (2 rows) > > In the old table where the data was written from application > > cqlsh > SELECT ttl(secret) from ks_old.cf_old ; > > ttl(secret) > > 4517461 > 4525958 > > (2 rows) > > On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> thank you >> >> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Alex is referring to the "writetime" and "tttl" values for each cell. >>> Most tools copy via CQL writes and don't by default copy those previous >>> writetime and ttl values and instead just give a new writetime value which >>> matches the copy time rather than initial insert time. >>> >>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> Hello Alex, - use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the *preserving metadata* could be a problem. Here what exactly do you mean by "preserving metadata" ? would you mind explaining? On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thank you for the suggestions > > On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: > >> CQLSH definitely won't work for that amount of data, so you need to >> use other tools. >> >> But before selecting them, you need to define requirements. For >> example: >> >>1. Are you copying the data into tables with exactly the same >>structure? >>2. Do you need to preserve metadata, like, writetime & TTL? >> >> Depending on that, you may have following choices: >> >>- use sstableloader - it will preserve all metadata, like, ttl >>and writetime. You just need to copy SSTable files, or stream >> directly from >>the source cluster. But this will require copying of data into >> tables with >>exactly same structure (and in case of UDTs, the keyspace names >> should be >>the same) >>- use DSBulk - it's a very effective tool for unloading & loading >>data from/to Cassandra/DSE. Use zstd compression for offloaded data >> to save >>disk space (see blog links below for more details). But the >> preserving >>metadata could be a problem. >>- use Spark + Spark Cassandra Connector. But also, preserving the >>metadata is not an easy task, and requires programming to handle all >> edge >>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 >>for details) >> >> >> blog series on DSBulk: >> >>- >> >> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>- >> >> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>- >> >> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>- >>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>- >>https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>- >> >> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >> >> >> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Hello, >>> >>> I would like to copy some data from one cassandra cluster to another >>> cassandra cluster using the CQLSH copy command. Is this the good >>> approach >>> if the dataset size on the source cluster is very high(500G - 1TB)? If >>> not >>> what is the safe approach? and are there any limitations/known issues to >>> keep in mind before attempting this? >>> >> >> >> -- >> With best wishes,Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)
Re: Cqlsh copy command on a larger data set
In tried verify metadata, In case of writetime it is setting it as insert time but the TTL value is showing as null. Is this expected? Does this mean this record will never expire after the insert? Is there any alternative to preserve the TTL ? In the new Table inserted with Cqlsh and Dsbulk cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ; ttl(secret) -- null null (2 rows) In the old table where the data was written from application cqlsh > SELECT ttl(secret) from ks_old.cf_old ; ttl(secret) 4517461 4525958 (2 rows) On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > thank you > > On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer > wrote: > >> Alex is referring to the "writetime" and "tttl" values for each cell. >> Most tools copy via CQL writes and don't by default copy those previous >> writetime and ttl values and instead just give a new writetime value which >> matches the copy time rather than initial insert time. >> >> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Hello Alex, >>> >>> >>>- use DSBulk - it's a very effective tool for unloading & loading >>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to >>> save >>>disk space (see blog links below for more details). But the *preserving >>>metadata* could be a problem. >>> >>> Here what exactly do you mean by "preserving metadata" ? would you mind >>> explaining? >>> >>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> Thank you for the suggestions On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: > CQLSH definitely won't work for that amount of data, so you need to > use other tools. > > But before selecting them, you need to define requirements. For > example: > >1. Are you copying the data into tables with exactly the same >structure? >2. Do you need to preserve metadata, like, writetime & TTL? > > Depending on that, you may have following choices: > >- use sstableloader - it will preserve all metadata, like, ttl and >writetime. You just need to copy SSTable files, or stream directly > from the >source cluster. But this will require copying of data into tables with >exactly same structure (and in case of UDTs, the keyspace names should > be >the same) >- use DSBulk - it's a very effective tool for unloading & loading >data from/to Cassandra/DSE. Use zstd compression for offloaded data to > save >disk space (see blog links below for more details). But the preserving >metadata could be a problem. >- use Spark + Spark Cassandra Connector. But also, preserving the >metadata is not an easy task, and requires programming to handle all > edge >cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 >for details) > > > blog series on DSBulk: > >- > > https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >- >https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >- > > https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >- >https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >- >https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >- > > https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations > > > On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Hello, >> >> I would like to copy some data from one cassandra cluster to another >> cassandra cluster using the CQLSH copy command. Is this the good approach >> if the dataset size on the source cluster is very high(500G - 1TB)? If >> not >> what is the safe approach? and are there any limitations/known issues to >> keep in mind before attempting this? >> > > > -- > With best wishes,Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >
Re: Cqlsh copy command on a larger data set
thank you On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer wrote: > Alex is referring to the "writetime" and "tttl" values for each cell. Most > tools copy via CQL writes and don't by default copy those previous > writetime and ttl values and instead just give a new writetime value which > matches the copy time rather than initial insert time. > > On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Hello Alex, >> >> >>- use DSBulk - it's a very effective tool for unloading & loading >>data from/to Cassandra/DSE. Use zstd compression for offloaded data to >> save >>disk space (see blog links below for more details). But the *preserving >>metadata* could be a problem. >> >> Here what exactly do you mean by "preserving metadata" ? would you mind >> explaining? >> >> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Thank you for the suggestions >>> >>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: >>> CQLSH definitely won't work for that amount of data, so you need to use other tools. But before selecting them, you need to define requirements. For example: 1. Are you copying the data into tables with exactly the same structure? 2. Do you need to preserve metadata, like, writetime & TTL? Depending on that, you may have following choices: - use sstableloader - it will preserve all metadata, like, ttl and writetime. You just need to copy SSTable files, or stream directly from the source cluster. But this will require copying of data into tables with exactly same structure (and in case of UDTs, the keyspace names should be the same) - use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the preserving metadata could be a problem. - use Spark + Spark Cassandra Connector. But also, preserving the metadata is not an easy task, and requires programming to handle all edge cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for details) blog series on DSBulk: - https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting - https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello, > > I would like to copy some data from one cassandra cluster to another > cassandra cluster using the CQLSH copy command. Is this the good approach > if the dataset size on the source cluster is very high(500G - 1TB)? If not > what is the safe approach? and are there any limitations/known issues to > keep in mind before attempting this? > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) >>>
Re: Cqlsh copy command on a larger data set
Alex is referring to the "writetime" and "tttl" values for each cell. Most tools copy via CQL writes and don't by default copy those previous writetime and ttl values and instead just give a new writetime value which matches the copy time rather than initial insert time. On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello Alex, > > >- use DSBulk - it's a very effective tool for unloading & loading data >from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk >space (see blog links below for more details). But the *preserving >metadata* could be a problem. > > Here what exactly do you mean by "preserving metadata" ? would you mind > explaining? > > On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Thank you for the suggestions >> >> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: >> >>> CQLSH definitely won't work for that amount of data, so you need to use >>> other tools. >>> >>> But before selecting them, you need to define requirements. For example: >>> >>>1. Are you copying the data into tables with exactly the same >>>structure? >>>2. Do you need to preserve metadata, like, writetime & TTL? >>> >>> Depending on that, you may have following choices: >>> >>>- use sstableloader - it will preserve all metadata, like, ttl and >>>writetime. You just need to copy SSTable files, or stream directly from >>> the >>>source cluster. But this will require copying of data into tables with >>>exactly same structure (and in case of UDTs, the keyspace names should be >>>the same) >>>- use DSBulk - it's a very effective tool for unloading & loading >>>data from/to Cassandra/DSE. Use zstd compression for offloaded data to >>> save >>>disk space (see blog links below for more details). But the preserving >>>metadata could be a problem. >>>- use Spark + Spark Cassandra Connector. But also, preserving the >>>metadata is not an easy task, and requires programming to handle all edge >>>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >>>details) >>> >>> >>> blog series on DSBulk: >>> >>>- >>> >>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>>- >>>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>>- >>> >>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>>- >>>https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>>- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>>- >>> >>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >>> >>> >>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >>> jaibheem...@gmail.com> wrote: >>> Hello, I would like to copy some data from one cassandra cluster to another cassandra cluster using the CQLSH copy command. Is this the good approach if the dataset size on the source cluster is very high(500G - 1TB)? If not what is the safe approach? and are there any limitations/known issues to keep in mind before attempting this? >>> >>> >>> -- >>> With best wishes,Alex Ott >>> http://alexott.net/ >>> Twitter: alexott_en (English), alexott (Russian) >>> >>
Re: Cqlsh copy command on a larger data set
Hello Alex, - use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the *preserving metadata* could be a problem. Here what exactly do you mean by "preserving metadata" ? would you mind explaining? On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thank you for the suggestions > > On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: > >> CQLSH definitely won't work for that amount of data, so you need to use >> other tools. >> >> But before selecting them, you need to define requirements. For example: >> >>1. Are you copying the data into tables with exactly the same >>structure? >>2. Do you need to preserve metadata, like, writetime & TTL? >> >> Depending on that, you may have following choices: >> >>- use sstableloader - it will preserve all metadata, like, ttl and >>writetime. You just need to copy SSTable files, or stream directly from >> the >>source cluster. But this will require copying of data into tables with >>exactly same structure (and in case of UDTs, the keyspace names should be >>the same) >>- use DSBulk - it's a very effective tool for unloading & loading >>data from/to Cassandra/DSE. Use zstd compression for offloaded data to >> save >>disk space (see blog links below for more details). But the preserving >>metadata could be a problem. >>- use Spark + Spark Cassandra Connector. But also, preserving the >>metadata is not an easy task, and requires programming to handle all edge >>cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >>details) >> >> >> blog series on DSBulk: >> >>- >> >> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >>- >>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >>- >>https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >>- https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >>- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >>- >> >> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations >> >> >> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >>> Hello, >>> >>> I would like to copy some data from one cassandra cluster to another >>> cassandra cluster using the CQLSH copy command. Is this the good approach >>> if the dataset size on the source cluster is very high(500G - 1TB)? If not >>> what is the safe approach? and are there any limitations/known issues to >>> keep in mind before attempting this? >>> >> >> >> -- >> With best wishes,Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> >
Re: Cqlsh copy command on a larger data set
Thank you for the suggestions On Tue, Jul 14, 2020 at 1:42 AM Alex Ott wrote: > CQLSH definitely won't work for that amount of data, so you need to use > other tools. > > But before selecting them, you need to define requirements. For example: > >1. Are you copying the data into tables with exactly the same >structure? >2. Do you need to preserve metadata, like, writetime & TTL? > > Depending on that, you may have following choices: > >- use sstableloader - it will preserve all metadata, like, ttl and >writetime. You just need to copy SSTable files, or stream directly from the >source cluster. But this will require copying of data into tables with >exactly same structure (and in case of UDTs, the keyspace names should be >the same) >- use DSBulk - it's a very effective tool for unloading & loading data >from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk >space (see blog links below for more details). But the preserving metadata >could be a problem. >- use Spark + Spark Cassandra Connector. But also, preserving the >metadata is not an easy task, and requires programming to handle all edge >cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for >details) > > > blog series on DSBulk: > >- > > https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading >- >https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading >- >https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings >- https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading >- https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting >- > > https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations > > > On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Hello, >> >> I would like to copy some data from one cassandra cluster to another >> cassandra cluster using the CQLSH copy command. Is this the good approach >> if the dataset size on the source cluster is very high(500G - 1TB)? If not >> what is the safe approach? and are there any limitations/known issues to >> keep in mind before attempting this? >> > > > -- > With best wishes,Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >
Re: Cqlsh copy command on a larger data set
CQLSH definitely won't work for that amount of data, so you need to use other tools. But before selecting them, you need to define requirements. For example: 1. Are you copying the data into tables with exactly the same structure? 2. Do you need to preserve metadata, like, writetime & TTL? Depending on that, you may have following choices: - use sstableloader - it will preserve all metadata, like, ttl and writetime. You just need to copy SSTable files, or stream directly from the source cluster. But this will require copying of data into tables with exactly same structure (and in case of UDTs, the keyspace names should be the same) - use DSBulk - it's a very effective tool for unloading & loading data from/to Cassandra/DSE. Use zstd compression for offloaded data to save disk space (see blog links below for more details). But the preserving metadata could be a problem. - use Spark + Spark Cassandra Connector. But also, preserving the metadata is not an easy task, and requires programming to handle all edge cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596 for details) blog series on DSBulk: - https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading - https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings - https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading - https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting - https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello, > > I would like to copy some data from one cassandra cluster to another > cassandra cluster using the CQLSH copy command. Is this the good approach > if the dataset size on the source cluster is very high(500G - 1TB)? If not > what is the safe approach? and are there any limitations/known issues to > keep in mind before attempting this? > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)
Re: Cqlsh copy command on a larger data set
I wouldn't say it's good approach for that size. But you can try dsbulk approach too. Try to split output into multiple files. Best Regards, Kiran M K On Tue, Jul 14, 2020, 5:17 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello, > > I would like to copy some data from one cassandra cluster to another > cassandra cluster using the CQLSH copy command. Is this the good approach > if the dataset size on the source cluster is very high(500G - 1TB)? If not > what is the safe approach? and are there any limitations/known issues to > keep in mind before attempting this? >