Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Dor Laor
Another option instead of raw sstables is to use the Spark Migrator [1].
It reads a source cluster, can make some transformations (like
table/column naming) and
writes to a target cluster. It's a very convenient tool, OSS and free of charge.

[1] https://github.com/scylladb/scylla-migrator

On Fri, Jan 17, 2020 at 5:31 PM Erick Ramirez  wrote:
>>
>> In terms of speed, the sstableloader should be faster correct?
>> Maybe the DSE BulkLoader finds application when you want a slice of the data 
>> and not the entire cake. Is it correct?
>
>
> There's no real direct comparison because DSBulk is designed for operating on 
> data in CSV or JSON as a replacement for the COPY command. Cheers!
>
> On Sat, Jan 18, 2020 at 6:29 AM Sergio  wrote:
>>
>> Hi everyone,
>>
>> Is the DSE BulkLoader faster than the sstableloader?
>>
>> Sometimes I need to make a cluster snapshot and replicate a Cluster A to a 
>> Cluster B  with fewer performance capabilities but the same data size.
>>
>> In terms of speed, the sstableloader should be faster correct?
>>
>> Maybe the DSE BulkLoader finds application when you want a slice of the data 
>> and not the entire cake. Is it correct?
>>
>> Thanks,
>>
>> Sergio
>>
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Erick Ramirez
>
>
> *In terms of speed, the sstableloader should be faster correct?Maybe the
> DSE BulkLoader finds application when you want a slice of the data and not
> the entire cake. Is it correct?*


There's no real direct comparison because DSBulk is designed for operating
on data in CSV or JSON as a replacement for the COPY command. Cheers!

On Sat, Jan 18, 2020 at 6:29 AM Sergio  wrote:

> Hi everyone,
>
> Is the DSE BulkLoader faster than the sstableloader?
>
> Sometimes I need to make a cluster snapshot and replicate a Cluster A to a
> Cluster B  with fewer performance capabilities but the same data size.
>
> In terms of speed, the sstableloader should be faster correct?
>
> Maybe the DSE BulkLoader finds application when you want a slice of the
> data and not the entire cake. Is it correct?
>
> Thanks,
>
> Sergio
>
>
>


Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Sergio
Hi everyone,

Is the DSE BulkLoader faster than the sstableloader?

Sometimes I need to make a cluster snapshot and replicate a Cluster A to a
Cluster B  with fewer performance capabilities but the same data size.

In terms of speed, the sstableloader should be faster correct?

Maybe the DSE BulkLoader finds application when you want a slice of the
data and not the entire cake. Is it correct?

Thanks,

Sergio


RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
Not sure what you mean by “online” migration. You can load data into the same 
name table in cluster B. If the primary keys match, data will be overwritten 
(effectively, not actually on disk). I think you can pipe the output of a 
dsbulk unload to a dsbulk load and make the data transfer very quick. Your 
clusters are very small, so this probably wouldn’t take long.

How you get the client apps to connect to the correct cluster/stop running/etc. 
is beyond the scope of Cassandra.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 1:05 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra 
cluster few having same keyspace/table names

Hi Sean,

You got all valid points.

Please see my answers below -

1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region 
completely.

2. Cluster names in 'A' and 'B' are different.

3. DSbulk - Is there anyway I can do online migration? - I still need to get 
clarity on whether data for same keyspace/table names can be merged between A 
and B. So 2 cases -  1. If merge is not an issue - I guess DSBulk or 
SSTableloader would be an option? 2. If merge is an issue - I am guessing 
without app code change - this wont be possible ,right?


Thanks & Regards,
Ankit Gadhiya


On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya mailto:ankitgadh...@gmail.com>>
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com<mailto:028upasana...@gmail.com>> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!ZYeKjPXZF1wl9Nz0tJN8gy3m46Qf4nw7EmJX_Wd5ecuSBeP0V8GyjQhTiQh8hnDvcRk_RUg$>

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629

On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
mailto:ankitgadh...@g

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Hi Sean,

You got all valid points.

Please see my answers below -

1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region
completely.

2. Cluster names in 'A' and 'B' are different.

3. DSbulk - Is there anyway I can do online migration? - I still need to
get clarity on whether data for same keyspace/table names can be merged
between A and B. So 2 cases -  1. If merge is not an issue - I guess DSBulk
or SSTableloader would be an option? 2. If merge is an issue - I am
guessing without app code change - this wont be possible ,right?


*Thanks & Regards,*
*Ankit Gadhiya*



On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R 
wrote:

> A couple things to consider:
>
>- A separation of apps into their own clusters is typically a better
>model to avoid later entanglements
>- Dsbulk (1.4.1) is now available for only open source clusters. It is
>a great tool for unloading/loading
>- What data problem are you trying to solve with Cassandra and this
>move to another cluster? If it is high-availability, then trying to get to
>2 DCs would be important. However, I think you will need at least a new
>keyspace if you can’t combine the data from the clusters. Whether this
>requires a code or config change depends on how configurable the developers
>made the connection and query details. (As a side rant: why is it that
>developers will write all kinds of new code, but don’t want to touch
>existing code?)
>- Your migration requirements are quite stringent (“we don’t want to
>change anything, lose anything, or stop anything. Make it happen!”). There
>may be a solution, but you may end up with something even more fragile
>afterwards. I would push back to see what is negotiable.
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Ankit Gadhiya 
> *Sent:* Friday, January 17, 2020 8:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: *URGENT* Migration across different Cassandra
> cluster few having same keyspace/table names
>
>
>
> Hi Upasana,
>
>
>
> Thanks for your response. I’d love to do that as a first strategy but
> since they are both separate clusters , how would I do that? Keyspaces
> already have networktopologystrategy with RF=3.
>
>
>
>
>
> — Ankit
>
>
>
> On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> Did you consider adding Cassandra nodes from cluster B,  into cluster A as
> a different data center ?
>
>
>
> Your keyspace would than be on Network topology data strategy.
>
>
>
> In this case, all data can be synced between both data centers by
> Cassandra using rebalancing.
>
>
>
>
>
> At client/application level you will have to ensure local quorum/ local
> consistency  so that there is no impact on latencies.
>
>
>
> Once you have moved data applications to new cluster , you can then remove
> the old data center (cluster A),  and cluster B would have fresh data.
>
>
>
>
>
>
>
>
>
> On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
> wrote:
>
> Thanks but there’s no DSE License.
>
> Wondering how sstableloader will help as some oh the Keyspace and tables
> names are same. Also how do i sync few system keyspaces.
>
>
>
>
>
> Thanks & Regards,
>
> Ankit
>
>
>
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
>
> Loader*
>
>
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
> [datastax.com]
> 
>
>
>
> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>
> DataStax bulk loaded can be an option if data is large.
>
>
>
> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>
> If the keyspace already exist, use copy command or sstableloader to merge
> data. If data volume it too big, consider spark or a custom java program
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
>
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
> wrote:
>
> 
>
> Any leads on this ?
>
>
>
> — Ankit
>
>
>
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
> wrote:
>
> Hi Arvinder,
>
>
>
> Thanks for your response.
>
>
>
> Yes - Cluster B already has some data. Tables/KS names are identical ; for
> data - I still haven't got the clarity if it has identical data or no - I
> am assuming no since it's for different customers but need the confirmation.
>
>
>
> *Thanks & Regards,*
>
> *Ankit Gadhiya*
>
>
>
>
>
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
> wrote:
>
> So as I understand, Cluster B already has some data and not an empty
> cluster.
>
>
>
> When you say, clusters share same keyspace and table names, do you mean
> both clusters have identical data on those ks/tables?
>
>
>
> -Arvi
>
>
>
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
> wrote:
>
> Hello Group,
>
>
>
> I have a requirement in one of the production systems where I need to be
> able to 

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Jeff Jirsa
The migration requirements are impossible given the current state of the 
database

You probably can’t join two distinct clusters without app changes and without 
downtime unless you’re very lucky (same cluster name, app using quorum but not 
local quorum, both clusters using NetworkTopologyStrategy, neither app using 
serial reads or writes), and trying to do it with conflicting keyspace and 
table names makes it impossible 

Would just assume this isn’t possible and look for alternate plans, like 
downtime or code changes. 


> On Jan 17, 2020, at 6:40 AM, Durity, Sean R  
> wrote:
> 
> 
> A couple things to consider:
> A separation of apps into their own clusters is typically a better model to 
> avoid later entanglements
> Dsbulk (1.4.1) is now available for only open source clusters. It is a great 
> tool for unloading/loading
> What data problem are you trying to solve with Cassandra and this move to 
> another cluster? If it is high-availability, then trying to get to 2 DCs 
> would be important. However, I think you will need at least a new keyspace if 
> you can’t combine the data from the clusters. Whether this requires a code or 
> config change depends on how configurable the developers made the connection 
> and query details. (As a side rant: why is it that developers will write all 
> kinds of new code, but don’t want to touch existing code?)
> Your migration requirements are quite stringent (“we don’t want to change 
> anything, lose anything, or stop anything. Make it happen!”). There may be a 
> solution, but you may end up with something even more fragile afterwards. I 
> would push back to see what is negotiable.
>  
>  
>  
> Sean Durity – Staff Systems Engineer, Cassandra
>  
> From: Ankit Gadhiya  
> Sent: Friday, January 17, 2020 8:50 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
> few having same keyspace/table names
>  
> Hi Upasana,
>  
> Thanks for your response. I’d love to do that as a first strategy but since 
> they are both separate clusters , how would I do that? Keyspaces already have 
> networktopologystrategy with RF=3.
>  
>  
> — Ankit
>  
> On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com> 
> wrote:
> Hi,
>  
> Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
> different data center ? 
>  
> Your keyspace would than be on Network topology data strategy. 
>  
> In this case, all data can be synced between both data centers by Cassandra 
> using rebalancing.
>  
>  
> At client/application level you will have to ensure local quorum/ local 
> consistency  so that there is no impact on latencies.
>  
> Once you have moved data applications to new cluster , you can then remove 
> the old data center (cluster A),  and cluster B would have fresh data.
>  
>  
>  
>  
> On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya  wrote:
> Thanks but there’s no DSE License.
> Wondering how sstableloader will help as some oh the Keyspace and tables 
> names are same. Also how do i sync few system keyspaces.
>  
>  
> Thanks & Regards,
> Ankit
>  
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
> Loader*
>  
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
> [datastax.com]
>  
> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
> DataStax bulk loaded can be an option if data is large. 
>  
> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
> If the keyspace already exist, use copy command or sstableloader to merge 
> data. If data volume it too big, consider spark or a custom java program 
> 
>  
> Regards,
> Nitan
> Cell: 510 449 9629
> 
> 
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya  wrote:
> 
> 
> Any leads on this ?
>  
> — Ankit
>  
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya  wrote:
> Hi Arvinder,
>  
> Thanks for your response.
>  
> Yes - Cluster B already has some data. Tables/KS names are identical ; for 
> data - I still haven't got the clarity if it has identical data or no - I am 
> assuming no since it's for different customers but need the confirmation.
>  
> Thanks & Regards,
> Ankit Gadhiya
> 
>  
>  
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon  
> wrote:
> So as I understand, Cluster B already has some data and not an empty cluster.
>  
> When you say, clusters share same keyspace and table names, do you mean both 
> clusters have identical data on those ks/tables?
>  
> 
> -Arvi
>  
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya  wrote:
> Hello Group,
>  
> I have a requirement in one of the production systems where I need to be able 
> to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure 
> Region B). 
>  
> Each cluster have 3 Cassandra nodes (RF=3) running used by different 
> applications. Few of the applications are common is Cluster A and Cluster B 
> thereby sharing same keyspace/table names. 
> Need suggestion for the best possible migration strategy here considering - 
> 

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629


On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hi Arvinder,

Thanks for your response.

Yes - Cluster B already has some data. Tables/KS names are identical ; for data 
- I still haven't got the clarity if it has identical data or no - I am 
assuming no since it's for different customers but need the confirmation.

Thanks & Regards,
Ankit Gadhiya


On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
mailto:dhillona...@gmail.com>> wrote:
So as I understand, Cluster B already has some data and not an empty cluster.

When you say, clusters share same keyspace and table names, do you mean both 
clusters have identical data on those ks/tables?

-Arvi

On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hello Group,

I have a requirement in one of the production systems where I need to be able 
to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure 
Region B).

Each cluster have 3 Cassandra nodes (RF=3) running used by different 
applications. Few of the applications are common is Cluster A and Cluster B 
thereby sharing same keyspace/table names.
Need suggestion for the best possible migration strategy here considering - 1. 
No Application code changes possible - Minor config/infra changes can be 
considered. 2. Zero data loss. 3. No/Minimal downtime.

It'd be great to hear ideas from all of you based on your experiences.

Cassandra Version - Cassandra 3.0.13 on both sides.
Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--