Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-06-26 Thread Venkata Hari Krishna Nukala
Hello all,

I had started a voting mail thread for this cep with the subject line:
"[VOTE] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating
Instances". Looks like it went unnoticed (no one voted yet). Can you vote
in the voting thread?

Thanks!
Hari


On Fri, Jun 21, 2024 at 8:33 PM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> Hi all,
>
> I did not hear anything in the last 10+ days. I am taking it as a positive
> sign and proceeding to the voting stage for this CEP.
>
> Thanks!
> Hari
>
> On Fri, Jun 7, 2024 at 10:26 PM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
>>
>> Summarizing the discussion happened so far
>>
>>
>>
>> *Data copy using rsync vs SideCar*
>> Data copy via rsync is an incomplete solution and has to be executed
>> outside of the Cassandra ecosystem. Data copy via Sidecar is valuable for
>> Cassandra to have an ecosystem-native approach outside the streaming path
>> which excludes repairs, decommissions and bootstraps. Proposed solution
>> poses fewer security concerns than rsync. An ecosystem-native approach is
>> more instrumentable and measurable than rsync. Tooling can be built on top
>> of it.
>>
>> *File digest/checksum*
>>
>> Initial proposal mentioned that combination of file path and size is used
>> to verify that destination and source have the same set of data. Scott, Jon
>> and Dinesh expressed concerns about hitting corner cases where just
>> verifying path & size is not good enough. I had updated CEP-40 to have
>> binary level file verification using a digest algorithm.
>>
>> *Managing C* life cycle with Sidecar*
>>
>> The migration process proposed requires biring up and down the Cassandra
>> instances. This CEP called out that bringing the instances up/down is not
>> in scope. Jon and Jordan expressed that adding this ability to make this
>> entire workflow self managed is the biggest win.
>>
>> Managing C* lifecycle (safely start, stop & restart) is already
>> considered in scope for CEP-1
>> .
>> It can be leveraged when implemented as part of CEP-1.
>>
>> *Abstraction of how files get moved, backup and restore*
>>
>> Jordan & German mentioned that having an abstracttion how files get moved
>> / put in place would help allow others to plugin alternative means of data
>> movement like pulling down from backups/S3/any other source. Jeff added the
>> following points. 1) If you think of it instead as “change backup/restore
>> mechanism to be able to safely restore from a running instance”, you may
>> end up with a cleaner abstraction that’s easier to think about (and may
>> also be easier to generalize in clouds where you have other tools available
>> ). 2) “ensure the original source node isn’t running” , “migrate the
>> config”, “choose and copy a snapshot” , maybe “forcibly exclude the
>> original instance from the cluster” are all things the restore code is
>> going to need to do anyway, and if restore doesn’t do that today, it seems
>> like we can solve it once. It accomplishes the original goal, in largely
>> the same fashion, it just makes the logic reusable for other purposes?
>>
>> Jon and Jordan mentioned that framing of replacing a node as restoring a
>> node and then kicking off a replace node is an interesting perspective. The
>> data copy task mentioned in this CEP can be viewed as a restore task/job
>> which treats another running Sidecar as a source. When it is generalised to
>> support other sources like S3 or disk snapshots, support for many use cases
>> can be added like restoring from S3 or disk snapshots.
>>
>> Updated CEP-40 with the details how files get moved and put in place
>> which can be treated as default implementation for live migration. Having a
>> cleaner abstraction/interface for source is added as one of the goals. The
>> data copy task can be untied with live migration so that it can be used to
>> copy data from any source(remote) to local. This way it can be leveraged
>> across different use cases. Data copy task endpoint can be tailored to
>> accommodate different plugins during implementation. Francisco mentioned
>> that Sidecar has now the ability to restore data from S3 (for Analytics
>> library) and it can be extended for live migration, backup and restore, and
>> others.
>>
>> *Supporting Live migration with-in Cassandra process instead of sidecar*
>>
>> Paulo and Ariel raised a point about supporting migration in the main
>> process via entire sstable streaming which could also help people who
>> aren't running the Sidecar.
>>
>> Jon, Francisco, Jordan, Scott & Dinesh mentioned the following benefits
>> of doing live migration. Sidecar can be used for coordination to start and
>> stop instances or do things that require something out process. Sidecar
>> would be able to migrate from a Cassandra instance

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-06-21 Thread Venkata Hari Krishna Nukala
Hi all,

I did not hear anything in the last 10+ days. I am taking it as a positive
sign and proceeding to the voting stage for this CEP.

Thanks!
Hari

On Fri, Jun 7, 2024 at 10:26 PM Venkata Hari Krishna Nukala <
[email protected]> wrote:

>
> Summarizing the discussion happened so far
>
>
>
> *Data copy using rsync vs SideCar*
> Data copy via rsync is an incomplete solution and has to be executed
> outside of the Cassandra ecosystem. Data copy via Sidecar is valuable for
> Cassandra to have an ecosystem-native approach outside the streaming path
> which excludes repairs, decommissions and bootstraps. Proposed solution
> poses fewer security concerns than rsync. An ecosystem-native approach is
> more instrumentable and measurable than rsync. Tooling can be built on top
> of it.
>
> *File digest/checksum*
>
> Initial proposal mentioned that combination of file path and size is used
> to verify that destination and source have the same set of data. Scott, Jon
> and Dinesh expressed concerns about hitting corner cases where just
> verifying path & size is not good enough. I had updated CEP-40 to have
> binary level file verification using a digest algorithm.
>
> *Managing C* life cycle with Sidecar*
>
> The migration process proposed requires biring up and down the Cassandra
> instances. This CEP called out that bringing the instances up/down is not
> in scope. Jon and Jordan expressed that adding this ability to make this
> entire workflow self managed is the biggest win.
>
> Managing C* lifecycle (safely start, stop & restart) is already considered
> in scope for CEP-1
> .
> It can be leveraged when implemented as part of CEP-1.
>
> *Abstraction of how files get moved, backup and restore*
>
> Jordan & German mentioned that having an abstracttion how files get moved
> / put in place would help allow others to plugin alternative means of data
> movement like pulling down from backups/S3/any other source. Jeff added the
> following points. 1) If you think of it instead as “change backup/restore
> mechanism to be able to safely restore from a running instance”, you may
> end up with a cleaner abstraction that’s easier to think about (and may
> also be easier to generalize in clouds where you have other tools available
> ). 2) “ensure the original source node isn’t running” , “migrate the
> config”, “choose and copy a snapshot” , maybe “forcibly exclude the
> original instance from the cluster” are all things the restore code is
> going to need to do anyway, and if restore doesn’t do that today, it seems
> like we can solve it once. It accomplishes the original goal, in largely
> the same fashion, it just makes the logic reusable for other purposes?
>
> Jon and Jordan mentioned that framing of replacing a node as restoring a
> node and then kicking off a replace node is an interesting perspective. The
> data copy task mentioned in this CEP can be viewed as a restore task/job
> which treats another running Sidecar as a source. When it is generalised to
> support other sources like S3 or disk snapshots, support for many use cases
> can be added like restoring from S3 or disk snapshots.
>
> Updated CEP-40 with the details how files get moved and put in place which
> can be treated as default implementation for live migration. Having a
> cleaner abstraction/interface for source is added as one of the goals. The
> data copy task can be untied with live migration so that it can be used to
> copy data from any source(remote) to local. This way it can be leveraged
> across different use cases. Data copy task endpoint can be tailored to
> accommodate different plugins during implementation. Francisco mentioned
> that Sidecar has now the ability to restore data from S3 (for Analytics
> library) and it can be extended for live migration, backup and restore, and
> others.
>
> *Supporting Live migration with-in Cassandra process instead of sidecar*
>
> Paulo and Ariel raised a point about supporting migration in the main
> process via entire sstable streaming which could also help people who
> aren't running the Sidecar.
>
> Jon, Francisco, Jordan, Scott & Dinesh mentioned the following benefits of
> doing live migration. Sidecar can be used for coordination to start and
> stop instances or do things that require something out process. Sidecar
> would be able to migrate from a Cassandra instance that is already dead and
> cannot recover (not because of disk issues). If we are considering the main
> process then we have to do some additional work to ensure that it doesn’t
> put pressure on the JVM and introduce latency. The host replacement process
> also puts a lot of stress on gossip and is a great way to encounter all
> sorts of painful races if you perform it hundreds or thousands of times
> (but shouldn’t be a problem in TCM-world). It is also valuab

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-06-07 Thread Venkata Hari Krishna Nukala
Summarizing the discussion happened so far



*Data copy using rsync vs SideCar*
Data copy via rsync is an incomplete solution and has to be executed
outside of the Cassandra ecosystem. Data copy via Sidecar is valuable for
Cassandra to have an ecosystem-native approach outside the streaming path
which excludes repairs, decommissions and bootstraps. Proposed solution
poses fewer security concerns than rsync. An ecosystem-native approach is
more instrumentable and measurable than rsync. Tooling can be built on top
of it.

*File digest/checksum*

Initial proposal mentioned that combination of file path and size is used
to verify that destination and source have the same set of data. Scott, Jon
and Dinesh expressed concerns about hitting corner cases where just
verifying path & size is not good enough. I had updated CEP-40 to have
binary level file verification using a digest algorithm.

*Managing C* life cycle with Sidecar*

The migration process proposed requires biring up and down the Cassandra
instances. This CEP called out that bringing the instances up/down is not
in scope. Jon and Jordan expressed that adding this ability to make this
entire workflow self managed is the biggest win.

Managing C* lifecycle (safely start, stop & restart) is already considered
in scope for CEP-1
.
It can be leveraged when implemented as part of CEP-1.

*Abstraction of how files get moved, backup and restore*

Jordan & German mentioned that having an abstracttion how files get moved /
put in place would help allow others to plugin alternative means of data
movement like pulling down from backups/S3/any other source. Jeff added the
following points. 1) If you think of it instead as “change backup/restore
mechanism to be able to safely restore from a running instance”, you may
end up with a cleaner abstraction that’s easier to think about (and may
also be easier to generalize in clouds where you have other tools available
). 2) “ensure the original source node isn’t running” , “migrate the
config”, “choose and copy a snapshot” , maybe “forcibly exclude the
original instance from the cluster” are all things the restore code is
going to need to do anyway, and if restore doesn’t do that today, it seems
like we can solve it once. It accomplishes the original goal, in largely
the same fashion, it just makes the logic reusable for other purposes?

Jon and Jordan mentioned that framing of replacing a node as restoring a
node and then kicking off a replace node is an interesting perspective. The
data copy task mentioned in this CEP can be viewed as a restore task/job
which treats another running Sidecar as a source. When it is generalised to
support other sources like S3 or disk snapshots, support for many use cases
can be added like restoring from S3 or disk snapshots.

Updated CEP-40 with the details how files get moved and put in place which
can be treated as default implementation for live migration. Having a
cleaner abstraction/interface for source is added as one of the goals. The
data copy task can be untied with live migration so that it can be used to
copy data from any source(remote) to local. This way it can be leveraged
across different use cases. Data copy task endpoint can be tailored to
accommodate different plugins during implementation. Francisco mentioned
that Sidecar has now the ability to restore data from S3 (for Analytics
library) and it can be extended for live migration, backup and restore, and
others.

*Supporting Live migration with-in Cassandra process instead of sidecar*

Paulo and Ariel raised a point about supporting migration in the main
process via entire sstable streaming which could also help people who
aren't running the Sidecar.

Jon, Francisco, Jordan, Scott & Dinesh mentioned the following benefits of
doing live migration. Sidecar can be used for coordination to start and
stop instances or do things that require something out process. Sidecar
would be able to migrate from a Cassandra instance that is already dead and
cannot recover (not because of disk issues). If we are considering the main
process then we have to do some additional work to ensure that it doesn’t
put pressure on the JVM and introduce latency. The host replacement process
also puts a lot of stress on gossip and is a great way to encounter all
sorts of painful races if you perform it hundreds or thousands of times
(but shouldn’t be a problem in TCM-world). It is also valuable to have a
paved path implementation of a safe migration/forklift state machine when
you’re in a bind, or need to do this hundreds or thousands of times.

*Migrating a specific keyspace to a dedicated cluster*

Patrick brought up an interesting use case. In his words: In many cases,
multiple tenants present cause the cluster to overpressure. The best
solution in that case is to migrate the largest keyspac

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-30 Thread Alex Petrov
> Alex, just want to make sure that I understand your point correctly. Are you 
> suggesting this sequence of operations with TCM?
> 
> * Make config changes
> * Do the initial data copy
> * Make destination part of write placements (same as source)
> * Start destination instance
> * Decommission the source
> * Enable reads for destination by making it part of read placements (as 
> source)

Almost. I am suggesting reuse the logic we have in TCM and already use for 
bootstraps and replacements. I think the way it'll be sequencing will be 
something like:
  * Make config changes
  * Start destination instance
  * Make destination part of write placements (same as source)
  * Do the initial data copy
  * Load sstables from the initial data copy
  * Enable reads for destination by making it part of read placements
  * Decommission the source

We've also had a short discussion offline, and brought up a good point that 
this may require extra care for making sure that initial data copy sstables 
aren't involved in the regular node sstable lifecycle, since in that case we 
may inadvertently remove or compact them. 

> It is a fair point. It is good to have the understanding of availability and 
> durability guarantees during migration. I can create a JIRA for it later.

Sounds good. As I mentioned, I'm fine either way: if we do it as a part of CEP, 
or as a follow-up. 

On Sun, May 12, 2024, at 8:18 PM, Venkata Hari Krishna Nukala wrote:
> Replies from my side for the other points of the discussion:
> *Managing C* life cycle with Sidecar*
> 
> >lifecycle / orchestration portion is the more challenging aspect. It would 
> >be nice to address that as well so we don’t end up with something like 
> >repair where the building blocks are there but the hard parts are left to 
> >the operator
> 
> CEP-1 has lifecycle operations under scope. 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224#CEP1:ApacheCassandraManagementProcess(es)-3.Lifecycle(safelystart,stop,restartC*).
>  I think it can be leveraged when implemented as part of CEP-1.
> 
> *On backup & restore use case*
> 
> I see similarities between backup/restore & this migration. But I feel there 
> will be considerable differences while implementing it and we might need to 
> tailor the API to make it usable for backup & restore. I think making the 
> code/logic reusable can be an implicit goal. Does calling backup & restore - 
> a stretch goal or creating a separate CEP sounds fair?
> 
> *Migrate the largest keyspace to a dedicated cluster*
> 
> Parick, proposed API can help to copy specific keyspace data to another 
> cluster. "No chance of doing this manually without some serious brain surgery 
> on c* and downtime." - sounds a bit tricky to me. Since the clusters are 
> independent, doing it without any coordination between clusters and downtime 
> sounds like a case this CEP is not targeting at the moment.
> 
> *Live migration + TCM*
> 
> >We can implement CEP-40 using a similar approach: we can leave the source 
> >node as both a read and write target, and allow the new node to be a target 
> >for (pending) writes. Unfortunately, this does not help with availability 
> >(in fact, it decreases write availability, since we will have to collect 2+1 
> >mandatory write responses instead of just 2), but increases durability, and 
> >I think helps to fully eliminate the second phase. This also increases read 
> >availability when the source node is up, since we can still use the source 
> >node as a part of the read quorum.
> 
> Alex, just want to make sure that I understand your point correctly. Are you 
> suggesting this sequence of operations with TCM?
> 
> * Make config changes
> * Do the initial data copy
> * Make destination part of write placements (same as source)
> * Start destination instance
> * Decommission the source
> * Enable reads for destination by making it part of read placements (as 
> source)
> 
> >I am also not against to have this to be done post-factum, after 
> >implementation of CEP in its current form, but I think it would be good to 
> >have good understanding of availability and durability guarantees we want to 
> >provide with it, and have it stated explicitly, for both "source node down" 
> >and "source node up" cases.
> 
> It is a fair point. It is good to have the understanding of availability and 
> durability guarantees during migration. I can create a JIRA for it later.
> 
> Thanks!
> Hari
> 
> On Thu, May 2, 2024 at 12:30 PM Alex Petrov  wrote:
>> __
>> Thank you for input! 
>> 
>> > Would it be possible to create a new type of write target node?  The new 
>> > write target node is notified of writes (like any other write node) but 
>> > does not participate in the write availability calculation. 
>> 
>> We could make a some kind of optional write, but unfortunately this way we 
>> can not codify our consistency level. Since we already use a notion of 
>> pending ranges that requires 1 extra ack, and 

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-20 Thread Jordan West
On Wed, May 1, 2024 at 3:34 AM Alex Petrov  wrote:

>
> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.
>
>
I 100% agree that this is the more durable approach. And that bringing the
source node down reduces availability during the second phase. While my
inclination is that it would be better to implement the logic in the manner
you describe, from a pure correctness perspective, that loss of
availability of the r/w quorum is rare in my experience. Running a setup
like CEP-40 currently describes (but using S3 for the file transfer) for
over 3 years, in practice I have a hard time remembering one incident of
it. I'm sure its happened, but at the rate we replace hardware its not
something we deal with regularly despite taking the risk. I do agree as
well it needs to be well documented as surprising edge cases are never fun.
I think the existing and future TCM implementations cover the more
conservative/correct case and having this option as an alternative, or for
when the instance is unable to bring up the C* process, is a good to have.



> On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-12 Thread Venkata Hari Krishna Nukala
Replies from my side for the other points of the discussion:

*Managing C* life cycle with Sidecar*

>lifecycle / orchestration portion is the more challenging aspect. It would
be nice to address that as well so we don’t end up with something like
repair where the building blocks are there but the hard parts are left to
the operator

CEP-1 has lifecycle operations under scope.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224#CEP1:ApacheCassandraManagementProcess(es)-3.Lifecycle(safelystart,stop,restartC*).
I think it can be leveraged when implemented as part of CEP-1.

*On backup & restore use case*

I see similarities between backup/restore & this migration. But I feel
there will be considerable differences while implementing it and we might
need to tailor the API to make it usable for backup & restore. I think
making the code/logic reusable can be an implicit goal. Does calling backup
& restore - a stretch goal or creating a separate CEP sounds fair?

*Migrate the largest keyspace to a dedicated cluster*

Parick, proposed API can help to copy specific keyspace data to another
cluster. "No chance of doing this manually without some serious brain
surgery on c* and downtime." - sounds a bit tricky to me. Since the
clusters are independent, doing it without any coordination between
clusters and downtime sounds like a case this CEP is not targeting at the
moment.

*Live migration + TCM*

>We can implement CEP-40 using a similar approach: we can leave the source
node as both a read and write target, and allow the new node to be a target
for (pending) writes. Unfortunately, this does not help with availability
(in fact, it decreases write availability, since we will have to collect
2+1 mandatory write responses instead of just 2), but increases durability,
and I think helps to fully eliminate the second phase. This also increases
read availability when the source node is up, since we can still use the
source node as a part of the read quorum.

Alex, just want to make sure that I understand your point correctly. Are
you suggesting this sequence of operations with TCM?

* Make config changes
* Do the initial data copy
* Make destination part of write placements (same as source)
* Start destination instance
* Decommission the source
* Enable reads for destination by making it part of read placements (as
source)

>I am also not against to have this to be done post-factum, after
implementation of CEP in its current form, but I think it would be good to
have good understanding of availability and durability guarantees we want
to provide with it, and have it stated explicitly, for both "source node
down" and "source node up" cases.

It is a fair point. It is good to have the understanding of availability
and durability guarantees during migration. I can create a JIRA for it
later.

Thanks!
Hari

On Thu, May 2, 2024 at 12:30 PM Alex Petrov  wrote:

> Thank you for input!
>
> > Would it be possible to create a new type of write target node?  The new
> write target node is notified of writes (like any other write node) but
> does not participate in the write availability calculation.
>
> We could make a some kind of optional write, but unfortunately this way we
> can not codify our consistency level. Since we already use a notion of
> pending ranges that requires 1 extra ack, and we as a community are OK with
> it, I think for simplicity we should stick to the same notion.
>
> If there is a lot of interest in this kind of availability/durability
> tradeoff, we should discuss all implications in a separate CEP, but then it
> probably would make sense to make it available for all operations.
>
> My personal opinion is that if we can't guarantee/rely on the number of
> acks, this may accidentally mislead people as they would expect it to work
> and lead to surprises when it does not.
>
> On Wed, May 1, 2024, at 4:38 PM, Claude Warren, Jr via dev wrote:
>
> Alex,
>
>  you write:
>
> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.
>
>
> Would it be possible to create a new type of write target node?  The new
> write target node is notified of writes (like any other write node) but
> does not participate in the write availability calculation.  In this way a
> node this is being migrated to could receive writes and have minimal impact
> on the current operation of the cluster?
>
> Claude
>
>
>
> On Wed, May 1, 2024 at 12:33 PM Alex Petrov  wrote:
>
>
> Thank you for submitting this CEP!
>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-02 Thread Alex Petrov
Thank you for input! 

> Would it be possible to create a new type of write target node?  The new 
> write target node is notified of writes (like any other write node) but does 
> not participate in the write availability calculation. 

We could make a some kind of optional write, but unfortunately this way we can 
not codify our consistency level. Since we already use a notion of pending 
ranges that requires 1 extra ack, and we as a community are OK with it, I think 
for simplicity we should stick to the same notion. 

If there is a lot of interest in this kind of availability/durability tradeoff, 
we should discuss all implications in a separate CEP, but then it probably 
would make sense to make it available for all operations. 

My personal opinion is that if we can't guarantee/rely on the number of acks, 
this may accidentally mislead people as they would expect it to work and lead 
to surprises when it does not.

On Wed, May 1, 2024, at 4:38 PM, Claude Warren, Jr via dev wrote:
> Alex,
> 
>  you write:
>> We can implement CEP-40 using a similar approach: we can leave the source 
>> node as both a read and write target, and allow the new node to be a target 
>> for (pending) writes. Unfortunately, this does not help with availability 
>> (in fact, it decreases write availability, since we will have to collect 2+1 
>> mandatory write responses instead of just 2), but increases durability, and 
>> I think helps to fully eliminate the second phase. This also increases read 
>> availability when the source node is up, since we can still use the source 
>> node as a part of read quorum.
> 
> Would it be possible to create a new type of write target node?  The new 
> write target node is notified of writes (like any other write node) but does 
> not participate in the write availability calculation.  In this way a node 
> this is being migrated to could receive writes and have minimal impact on the 
> current operation of the cluster?
> 
> Claude
> 
> 
> 
> On Wed, May 1, 2024 at 12:33 PM Alex Petrov  wrote:
>> __
>> Thank you for submitting this CEP!
>> 
>> Wanted to discuss this point from the description:
>> 
>> > How to bring up/down Cassandra/Sidecar instances or making/applying config 
>> > changes are outside the scope of this document.
>> 
>> One advantage of doing migration via sidecar is the fact that we can stream 
>> sstables to the target node from the source node while the source node is 
>> down. Also if the source node is down, it does not matter if we can’t use it 
>> as a write target However, if we are replacing a live node, we do lose both 
>> durability and availability during the second copy phase. There are copious 
>> other advantages described by others in the thread above.
>> 
>> For example, we have three adjacent nodes A,B,C and simple RF 3. C (source) 
>> is up and is being replaced with live-migrated D (destination). According to 
>> the described process in CEP-40, we perform streaming in 2 phases: first one 
>> is a full copy (similar to bootstrap/replacement in cassandra), and the 
>> second one is just a diff. The second phase is still going to take a 
>> non-trivial amount of time, and is likely to last at very least minutes. 
>> During this time, we only have nodes A and B as both read and write targets, 
>> with no alternatives: we have to have both of them present for any 
>> operation, and losing either one of them leaves us with only one copy of 
>> data.
>> 
>> To contrast this, TCM bootstrap process is 4-step: between the old owner 
>> being phased out and the new owner brought in, we always ensure r/w quorum 
>> consistency and liveness of at least 2 nodes for the read quorum, 3 nodes 
>> available for reads in best case, and 2+1 pending replica for the write 
>> quorum, with 4 nodes (3 existing owners + 1 pending) being available for 
>> writes in best case. Replacement in TCM is implemented similarly, with the 
>> old node remaining an (unavailable) read target, but new node already being 
>> the target for (pending) writes.
>> 
>> We can implement CEP-40 using a similar approach: we can leave the source 
>> node as both a read and write target, and allow the new node to be a target 
>> for (pending) writes. Unfortunately, this does not help with availability 
>> (in fact, it decreases write availability, since we will have to collect 2+1 
>> mandatory write responses instead of just 2), but increases durability, and 
>> I think helps to fully eliminate the second phase. This also increases read 
>> availability when the source node is up, since we can still use the source 
>> node as a part of read quorum.
>> 
>> I think if we want to call this feature "live migration", since this term is 
>> used in hypervisor community to describe an instant and uninterrupted 
>> instance migration from one host to the other without guest instance being 
>> able to notice as much as the time jump, we may want to provide similar 
>> guarantees. 
>> 
>> I am also not against t

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-01 Thread Claude Warren, Jr via dev
Alex,

 you write:

> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.


Would it be possible to create a new type of write target node?  The new
write target node is notified of writes (like any other write node) but
does not participate in the write availability calculation.  In this way a
node this is being migrated to could receive writes and have minimal impact
on the current operation of the cluster?

Claude



On Wed, May 1, 2024 at 12:33 PM Alex Petrov  wrote:

> Thank you for submitting this CEP!
>
> Wanted to discuss this point from the description:
>
> > How to bring up/down Cassandra/Sidecar instances or making/applying
> config changes are outside the scope of this document.
>
> One advantage of doing migration via sidecar is the fact that we can
> stream sstables to the target node from the source node while the source
> node is down. Also if the source node is down, it does not matter if we
> can’t use it as a write target However, if we are replacing a live node, we
> do lose both durability and availability during the second copy phase.
> There are copious other advantages described by others in the thread above.
>
> For example, we have three adjacent nodes A,B,C and simple RF 3. C
> (source) is up and is being replaced with live-migrated D (destination).
> According to the described process in CEP-40, we perform streaming in 2
> phases: first one is a full copy (similar to bootstrap/replacement in
> cassandra), and the second one is just a diff. The second phase is still
> going to take a non-trivial amount of time, and is likely to last at very
> least minutes. During this time, we only have nodes A and B as both read
> and write targets, with no alternatives: we have to have both of them
> present for any operation, and losing either one of them leaves us with
> only one copy of data.
>
> To contrast this, TCM bootstrap process is 4-step: between the old owner
> being phased out and the new owner brought in, we always ensure r/w quorum
> consistency and liveness of at least 2 nodes for the read quorum, 3 nodes
> available for reads in best case, and 2+1 pending replica for the write
> quorum, with 4 nodes (3 existing owners + 1 pending) being available for
> writes in best case. Replacement in TCM is implemented similarly, with the
> old node remaining an (unavailable) read target, but new node already being
> the target for (pending) writes.
>
> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.
>
> I think if we want to call this feature "live migration", since this term
> is used in hypervisor community to describe an instant and uninterrupted
> instance migration from one host to the other without guest instance being
> able to notice as much as the time jump, we may want to provide similar
> guarantees.
>
> I am also not against to have this to be done post-factum, after
> implementation of CEP in its current form, but I think it would be good to
> have good understanding of availability and durability guarantees we want
> to provide with it, and have it stated explicitly, for both "source node
> down" and "source node up" cases. That said, since we will have to
> integrate CEP-40 with TCM, and will have to ensure correctness of sstable
> diffing for the second phase, it might make sense to consider reusing some
> of the existing replacement logic from TCM. Just to make sure this is
> mentioned explicitly, my proposal is only concerned with the second copy
> phase, without any implications about the first.
>
> Thank you,
> --Alex
>
> On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. De

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-01 Thread Alex Petrov
Thank you for submitting this CEP!

Wanted to discuss this point from the description:

> How to bring up/down Cassandra/Sidecar instances or making/applying config 
> changes are outside the scope of this document.

One advantage of doing migration via sidecar is the fact that we can stream 
sstables to the target node from the source node while the source node is down. 
Also if the source node is down, it does not matter if we can’t use it as a 
write target However, if we are replacing a live node, we do lose both 
durability and availability during the second copy phase. There are copious 
other advantages described by others in the thread above.

For example, we have three adjacent nodes A,B,C and simple RF 3. C (source) is 
up and is being replaced with live-migrated D (destination). According to the 
described process in CEP-40, we perform streaming in 2 phases: first one is a 
full copy (similar to bootstrap/replacement in cassandra), and the second one 
is just a diff. The second phase is still going to take a non-trivial amount of 
time, and is likely to last at very least minutes. During this time, we only 
have nodes A and B as both read and write targets, with no alternatives: we 
have to have both of them present for any operation, and losing either one of 
them leaves us with only one copy of data.

To contrast this, TCM bootstrap process is 4-step: between the old owner being 
phased out and the new owner brought in, we always ensure r/w quorum 
consistency and liveness of at least 2 nodes for the read quorum, 3 nodes 
available for reads in best case, and 2+1 pending replica for the write quorum, 
with 4 nodes (3 existing owners + 1 pending) being available for writes in best 
case. Replacement in TCM is implemented similarly, with the old node remaining 
an (unavailable) read target, but new node already being the target for 
(pending) writes.

We can implement CEP-40 using a similar approach: we can leave the source node 
as both a read and write target, and allow the new node to be a target for 
(pending) writes. Unfortunately, this does not help with availability (in fact, 
it decreases write availability, since we will have to collect 2+1 mandatory 
write responses instead of just 2), but increases durability, and I think helps 
to fully eliminate the second phase. This also increases read availability when 
the source node is up, since we can still use the source node as a part of read 
quorum.

I think if we want to call this feature "live migration", since this term is 
used in hypervisor community to describe an instant and uninterrupted instance 
migration from one host to the other without guest instance being able to 
notice as much as the time jump, we may want to provide similar guarantees. 

I am also not against to have this to be done post-factum, after implementation 
of CEP in its current form, but I think it would be good to have good 
understanding of availability and durability guarantees we want to provide with 
it, and have it stated explicitly, for both "source node down" and "source node 
up" cases. That said, since we will have to integrate CEP-40 with TCM, and will 
have to ensure correctness of sstable diffing for the second phase, it might 
make sense to consider reusing some of the existing replacement logic from TCM. 
Just to make sure this is mentioned explicitly, my proposal is only concerned 
with the second copy phase, without any implications about the first.

Thank you,
--Alex

On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
> Hi all,
> 
> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
> Cassandra Sidecar.
> 
> When someone needs to move all or a portion of the Cassandra nodes belonging 
> to a cluster to different hosts, the traditional approach of Cassandra node 
> replacement can be time-consuming due to repairs and the bootstrapping of new 
> nodes. Depending on the volume of the storage service load, replacements 
> (repair + bootstrap) may take anywhere from a few hours to days.
> 
> Proposing a Sidecar based solution to address these challenges. This solution 
> proposes transferring data from the old host (source) to the new host 
> (destination) and then bringing up the Cassandra process at the destination, 
> to enable fast instance migration. This approach would help to minimise node 
> downtime, as it is based on a Sidecar solution for data transfer and avoids 
> repairs and bootstrap.
> 
> Looking forward to the discussions.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> 
> Thanks!
> Hari


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-30 Thread Venkata Hari Krishna Nukala
Just realised that for this live migration case, I think it is not just
about validating whether the SSTables present in local are valid or not,
but also about validating whether the destination has identical file as
source. Then in that case binary level content verification needs to be
done between source and destination.

On Wed, May 1, 2024 at 12:27 AM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> >In the next iteration, I am going to address the below point.
> >> I would like to see more abstraction of how the files get moved / put
> in place with the proposed solution being the default implementation. That
> would allow others to plug in alternatives means of data movement like
> pulling down backups from S3 or rsync, etc.
>
> I have added details to the "New or Changed Public Interfaces" section.
>
> >Sidecar has now the ability to restore data from S3 (although the
> restores
> are for bulk write jobs coming from the Cassandra Analytics library).
>
> Francisco, can you share a JIRA ticket or some references to it?
>
> >There are some components which may be mutated and therefore their
> checksum may need to be recomputed.
> Initially I thought just running sstableverify (or nodetool verify) would
> be sufficient but looks like it has its problems (CASSANDRA-9947,
>  CASSANDRA-17017
>  & CASSANDRA-12682
> ). Maybe I am
> exaggerating the problem. I would wait for expert opinion here.
>
> In this case, Sidecar needs to verify all content (not just sstables). It
> needs to cover commit log, hints etc... I was thinking of giving a flexible
> option to use different digest algorithms with some caching/optimization at
> Sidecar with the file digest endpoint (which considers each and every
> file). If verifying sstables is good enough, then probably it can be
> skipped.
>
>
> On Mon, Apr 29, 2024 at 10:37 PM Dinesh Joshi  wrote:
>
>> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
>> [email protected]> wrote:
>>
>>> reason why I called out binary level verification out of initial scope
>>> is because of these two reasons: 1) Calculating digest for each file may
>>> increase CPU utilisation and 2) Disk would also be under pressure as
>>> complete disk content will also be read to calculate digest. As called out
>>> in the discussion, I think we can't
>>>
>>
>> We should have a digest / checksum for each of the file components
>> computed and stored on disk so this doesn't need to be recomputed each
>> time. Most files / components are immutable and therefore their checksum
>> won't change. There are some components which may be mutated and therefore
>> their checksum may need to be recomputed. However, data integrity is not
>> something we can compromise on. On the receiving node, CPU utilization is
>> not a big issue as that node isn't servicing traffic.
>>
>> I was too lazy to dig into the code and someone who is more familiar with
>> the SSTable components / file format can help shed light on checksums.
>>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-30 Thread Venkata Hari Krishna Nukala
>In the next iteration, I am going to address the below point.
>> I would like to see more abstraction of how the files get moved / put in
place with the proposed solution being the default implementation. That
would allow others to plug in alternatives means of data movement like
pulling down backups from S3 or rsync, etc.

I have added details to the "New or Changed Public Interfaces" section.

>Sidecar has now the ability to restore data from S3 (although the restores
are for bulk write jobs coming from the Cassandra Analytics library).

Francisco, can you share a JIRA ticket or some references to it?

>There are some components which may be mutated and therefore their
checksum may need to be recomputed.
Initially I thought just running sstableverify (or nodetool verify) would
be sufficient but looks like it has its problems (CASSANDRA-9947,
 CASSANDRA-17017
 & CASSANDRA-12682
). Maybe I am
exaggerating the problem. I would wait for expert opinion here.

In this case, Sidecar needs to verify all content (not just sstables). It
needs to cover commit log, hints etc... I was thinking of giving a flexible
option to use different digest algorithms with some caching/optimization at
Sidecar with the file digest endpoint (which considers each and every
file). If verifying sstables is good enough, then probably it can be
skipped.


On Mon, Apr 29, 2024 at 10:37 PM Dinesh Joshi  wrote:

> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
>> reason why I called out binary level verification out of initial scope is
>> because of these two reasons: 1) Calculating digest for each file may
>> increase CPU utilisation and 2) Disk would also be under pressure as
>> complete disk content will also be read to calculate digest. As called out
>> in the discussion, I think we can't
>>
>
> We should have a digest / checksum for each of the file components
> computed and stored on disk so this doesn't need to be recomputed each
> time. Most files / components are immutable and therefore their checksum
> won't change. There are some components which may be mutated and therefore
> their checksum may need to be recomputed. However, data integrity is not
> something we can compromise on. On the receiving node, CPU utilization is
> not a big issue as that node isn't servicing traffic.
>
> I was too lazy to dig into the code and someone who is more familiar with
> the SSTable components / file format can help shed light on checksums.
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-29 Thread Dinesh Joshi
On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't
>

We should have a digest / checksum for each of the file components computed
and stored on disk so this doesn't need to be recomputed each time. Most
files / components are immutable and therefore their checksum won't change.
There are some components which may be mutated and therefore their checksum
may need to be recomputed. However, data integrity is not something we can
compromise on. On the receiving node, CPU utilization is not a big issue as
that node isn't servicing traffic.

I was too lazy to dig into the code and someone who is more familiar with
the SSTable components / file format can help shed light on checksums.


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-29 Thread Francisco Guerrero
I just wanted to chime in here about a  recently introduced feature in
Sidecar.

Sidecar has now the ability to restore data from S3 (although the restores
are for bulk write jobs coming from the Cassandra Analytics library). The
ability to restore data from S3 (or any other cloud provider) can be extended
for other use cases such as live migration, backup and restore, and others.

On 2024/04/25 17:57:16 Venkata Hari Krishna Nukala wrote:
> I have updated the CEP to use binary level file digest verification.
> 
> In the next iteration, I am going to address the below point.
> > I would like to see more abstraction of how the files get moved / put in
> place with the proposed solution being the default implementation. That
> would allow others to plug in alternatives means of data movement like
> pulling down backups from S3 or rsync, etc.
> 
> Thanks!
> Hari
> 
> On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin  wrote:
> 
> > I finally got a chance to digest this CEP and am happy to see it raised.
> > This feature has been left to the end user for far too long.
> >
> > It might get roasted for scope creep, but here goes. Related and something
> > that I've heard for years is the ability to migrate a single keyspace away
> > from a set of hardware... online. Similar problem but a lot more
> > coordination.
> >  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
> >  - Establish replication between keyspaces and sync schema
> >  - Move data from Cluster A to B
> >  - Decommission keyspace in Cluster A
> >
> > In many cases, multiple tenants present cause the cluster to overpressure.
> > The best solution in that case is to migrate the largest keyspace to a
> > dedicated cluster.
> >
> > Live migration but a bit more complicated. No chance of doing this
> > manually without some serious brain surgery on c* and downtime.
> >
> > Patrick
> >
> >
> > On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
> > [email protected]> wrote:
> >
> >> Thank you all for the inputs and apologies for the late reply. I see good
> >> points raised in this discussion. *Please allow me to reply to each
> >> point individually.*
> >>
> >> To start with, let me focus on the point raised by Scott & Jon about file
> >> content verification at the destination with the source in this reply.
> >> Agree that just verifying the file name + size is not fool proof. The
> >> reason why I called out binary level verification out of initial scope is
> >> because of these two reasons: 1) Calculating digest for each file may
> >> increase CPU utilisation and 2) Disk would also be under pressure as
> >> complete disk content will also be read to calculate digest. As called out
> >> in the discussion, I think we can't compromise on binary level check for
> >> these two reasons. Let me update the CEP to include binary level
> >> verification. During implementation, it can probably be made optional so
> >> that it can be skipped if someone doesn't want it.
> >>
> >> Thanks!
> >> Hari
> >>
> >> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
> >> [email protected]> wrote:
> >>
> >>> We use backup/restore for our implementation of this concept. It has the
> >>> added benefit that the backup / restore path gets exercised much more
> >>> regularly than it would in normal operations, finding edge case bugs at a
> >>> time when you still have other ways of recovering rather than in a full
> >>> disaster scenario.
> >>>
> >>>
> >>>
> >>> Cheers
> >>>
> >>> Ben
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *From: *Jordan West 
> >>> *Date: *Sunday, 21 April 2024 at 05:38
> >>> *To: *[email protected] 
> >>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
> >>> for Live Migrating Instances
> >>>
> >>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
> >>>
> >>>
> >>>
> >>> I do really like the framing of replacing a node is restoring a node and
> >>> then kicking off a replace. That is effectively what we do internally.
> >>>
> >>>
> >>>
> >>> I also agree we should be able to do data movement well both

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-26 Thread Josh McKenzie
> might get roasted for scope creep
This community *would never*.

What you've outlined seems like a very reasonable stretch goal or v2 to keep in 
mind so we architect something in v1 that's also supportive of a v2 keyspace 
only migration.

On Thu, Apr 25, 2024, at 1:57 PM, Venkata Hari Krishna Nukala wrote:
> I have updated the CEP to use binary level file digest verification.
> 
> In the next iteration, I am going to address the below point. 
> > I would like to see more abstraction of how the files get moved / put in 
> > place with the proposed solution being the default implementation. That 
> > would allow others to plug in alternatives means of data movement like 
> > pulling down backups from S3 or rsync, etc. 
> 
> Thanks!
> Hari
> 
> On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin  wrote:
>> I finally got a chance to digest this CEP and am happy to see it raised. 
>> This feature has been left to the end user for far too long.
>> 
>> It might get roasted for scope creep, but here goes. Related and something 
>> that I've heard for years is the ability to migrate a single keyspace away 
>> from a set of hardware... online. Similar problem but a lot more 
>> coordination.
>>  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
>>  - Establish replication between keyspaces and sync schema
>>  - Move data from Cluster A to B
>>  - Decommission keyspace in Cluster A
>> 
>> In many cases, multiple tenants present cause the cluster to overpressure. 
>> The best solution in that case is to migrate the largest keyspace to a 
>> dedicated cluster.
>> 
>> Live migration but a bit more complicated. No chance of doing this manually 
>> without some serious brain surgery on c* and downtime.
>> 
>> Patrick
>> 
>> 
>> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala 
>>  wrote:
>>> Thank you all for the inputs and apologies for the late reply. I see good 
>>> points raised in this discussion. _Please allow me to reply to each point 
>>> individually._
>>> 
>>> To start with, let me focus on the point raised by Scott & Jon about file 
>>> content verification at the destination with the source in this reply. 
>>> Agree that just verifying the file name + size is not fool proof. The 
>>> reason why I called out binary level verification out of initial scope is 
>>> because of these two reasons: 1) Calculating digest for each file may 
>>> increase CPU utilisation and 2) Disk would also be under pressure as 
>>> complete disk content will also be read to calculate digest. As called out 
>>> in the discussion, I think we can't compromise on binary level check for 
>>> these two reasons. Let me update the CEP to include binary level 
>>> verification. During implementation, it can probably be made optional so 
>>> that it can be skipped if someone doesn't want it.
>>> 
>>> Thanks!
>>> Hari
>>> 
>>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev 
>>>  wrote:
 We use backup/restore for our implementation of this concept. It has the 
 added benefit that the backup / restore path gets exercised much more 
 regularly than it would in normal operations, finding edge case bugs at a 
 time when you still have other ways of recovering rather than in a full 
 disaster scenario.
 __ __
 Cheers
 Ben
 __ __
 __ __
 __ __
 __ __
 *From: *Jordan West 
 *Date: *Sunday, 21 April 2024 at 05:38
 *To: *[email protected] 
 *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for 
 Live Migrating Instances
 *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*
 
 
 I do really like the framing of replacing a node is restoring a node and 
 then kicking off a replace. That is effectively what we do internally. 
 __ __
 I also agree we should be able to do data movement well both internal to 
 Cassandra and externally for a variety of reasons. 
 __ __
 We’ve seen great performance with “ZCS+TLS” even though it’s not full zero 
 copy — nodes that previously took *days* to replace now take a few hours. 
 But we have seen it put pressure on nodes and drive up latencies which is 
 the main reason we still rely on an external data movement system by 
 default — falling back to ZCS+TLS as needed. 
 __ __
 Jordan 
 __ __
 On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
> Jeff, this is probably the best explanation and justification of the idea 
> that I've heard so far.
> __ __
> I like it because
> __ __
> 1) we really should have something official for backups
> 2) backups / object store would be great for analytics
> 3) it solves a much bigger problem than the single goal of moving 
> instances.
> __ __
> I'm a huge +1 in favor of this perspective, with live migration being one 
> use case for backup / restore.
> __ __
>>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-25 Thread Venkata Hari Krishna Nukala
I have updated the CEP to use binary level file digest verification.

In the next iteration, I am going to address the below point.
> I would like to see more abstraction of how the files get moved / put in
place with the proposed solution being the default implementation. That
would allow others to plug in alternatives means of data movement like
pulling down backups from S3 or rsync, etc.

Thanks!
Hari

On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin  wrote:

> I finally got a chance to digest this CEP and am happy to see it raised.
> This feature has been left to the end user for far too long.
>
> It might get roasted for scope creep, but here goes. Related and something
> that I've heard for years is the ability to migrate a single keyspace away
> from a set of hardware... online. Similar problem but a lot more
> coordination.
>  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
>  - Establish replication between keyspaces and sync schema
>  - Move data from Cluster A to B
>  - Decommission keyspace in Cluster A
>
> In many cases, multiple tenants present cause the cluster to overpressure.
> The best solution in that case is to migrate the largest keyspace to a
> dedicated cluster.
>
> Live migration but a bit more complicated. No chance of doing this
> manually without some serious brain surgery on c* and downtime.
>
> Patrick
>
>
> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
>> Thank you all for the inputs and apologies for the late reply. I see good
>> points raised in this discussion. *Please allow me to reply to each
>> point individually.*
>>
>> To start with, let me focus on the point raised by Scott & Jon about file
>> content verification at the destination with the source in this reply.
>> Agree that just verifying the file name + size is not fool proof. The
>> reason why I called out binary level verification out of initial scope is
>> because of these two reasons: 1) Calculating digest for each file may
>> increase CPU utilisation and 2) Disk would also be under pressure as
>> complete disk content will also be read to calculate digest. As called out
>> in the discussion, I think we can't compromise on binary level check for
>> these two reasons. Let me update the CEP to include binary level
>> verification. During implementation, it can probably be made optional so
>> that it can be skipped if someone doesn't want it.
>>
>> Thanks!
>> Hari
>>
>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
>> [email protected]> wrote:
>>
>>> We use backup/restore for our implementation of this concept. It has the
>>> added benefit that the backup / restore path gets exercised much more
>>> regularly than it would in normal operations, finding edge case bugs at a
>>> time when you still have other ways of recovering rather than in a full
>>> disaster scenario.
>>>
>>>
>>>
>>> Cheers
>>>
>>> Ben
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jordan West 
>>> *Date: *Sunday, 21 April 2024 at 05:38
>>> *To: *[email protected] 
>>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
>>> for Live Migrating Instances
>>>
>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>
>>>
>>>
>>> I do really like the framing of replacing a node is restoring a node and
>>> then kicking off a replace. That is effectively what we do internally.
>>>
>>>
>>>
>>> I also agree we should be able to do data movement well both internal to
>>> Cassandra and externally for a variety of reasons.
>>>
>>>
>>>
>>> We’ve seen great performance with “ZCS+TLS” even though it’s not full
>>> zero copy — nodes that previously took *days* to replace now take a few
>>> hours. But we have seen it put pressure on nodes and drive up latencies
>>> which is the main reason we still rely on an external data movement system
>>> by default — falling back to ZCS+TLS as needed.
>>>
>>>
>>>
>>> Jordan
>>>
>>>
>>>
>>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
>>>
>>> Jeff, this is probably the best explanation and justification of the
>>> idea that I've heard so far.
>>>
>>>
>>>
>>> I like it because
>>>
>>&g

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-23 Thread Patrick McFadin
I finally got a chance to digest this CEP and am happy to see it raised.
This feature has been left to the end user for far too long.

It might get roasted for scope creep, but here goes. Related and something
that I've heard for years is the ability to migrate a single keyspace away
from a set of hardware... online. Similar problem but a lot more
coordination.
 - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
 - Establish replication between keyspaces and sync schema
 - Move data from Cluster A to B
 - Decommission keyspace in Cluster A

In many cases, multiple tenants present cause the cluster to overpressure.
The best solution in that case is to migrate the largest keyspace to a
dedicated cluster.

Live migration but a bit more complicated. No chance of doing this manually
without some serious brain surgery on c* and downtime.

Patrick


On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> Thank you all for the inputs and apologies for the late reply. I see good
> points raised in this discussion. *Please allow me to reply to each point
> individually.*
>
> To start with, let me focus on the point raised by Scott & Jon about file
> content verification at the destination with the source in this reply.
> Agree that just verifying the file name + size is not fool proof. The
> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't compromise on binary level check for
> these two reasons. Let me update the CEP to include binary level
> verification. During implementation, it can probably be made optional so
> that it can be skipped if someone doesn't want it.
>
> Thanks!
> Hari
>
> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
> [email protected]> wrote:
>
>> We use backup/restore for our implementation of this concept. It has the
>> added benefit that the backup / restore path gets exercised much more
>> regularly than it would in normal operations, finding edge case bugs at a
>> time when you still have other ways of recovering rather than in a full
>> disaster scenario.
>>
>>
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Jordan West 
>> *Date: *Sunday, 21 April 2024 at 05:38
>> *To: *[email protected] 
>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
>> for Live Migrating Instances
>>
>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>
>>
>>
>> I do really like the framing of replacing a node is restoring a node and
>> then kicking off a replace. That is effectively what we do internally.
>>
>>
>>
>> I also agree we should be able to do data movement well both internal to
>> Cassandra and externally for a variety of reasons.
>>
>>
>>
>> We’ve seen great performance with “ZCS+TLS” even though it’s not full
>> zero copy — nodes that previously took *days* to replace now take a few
>> hours. But we have seen it put pressure on nodes and drive up latencies
>> which is the main reason we still rely on an external data movement system
>> by default — falling back to ZCS+TLS as needed.
>>
>>
>>
>> Jordan
>>
>>
>>
>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
>>
>> Jeff, this is probably the best explanation and justification of the idea
>> that I've heard so far.
>>
>>
>>
>> I like it because
>>
>>
>>
>> 1) we really should have something official for backups
>>
>> 2) backups / object store would be great for analytics
>>
>> 3) it solves a much bigger problem than the single goal of moving
>> instances.
>>
>>
>>
>> I'm a huge +1 in favor of this perspective, with live migration being one
>> use case for backup / restore.
>>
>>
>>
>> Jon
>>
>>
>>
>>
>>
>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:
>>
>> I think Jordan and German had an interesting insight, or at least their
>> comment made me think about this slightly differently, and I’m going to
>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>
>>
>>
>> The CEP treats this as “move a live instance from one machine to
>> another”. I know why t

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-23 Thread Venkata Hari Krishna Nukala
Thank you all for the inputs and apologies for the late reply. I see good
points raised in this discussion. *Please allow me to reply to each point
individually.*

To start with, let me focus on the point raised by Scott & Jon about file
content verification at the destination with the source in this reply.
Agree that just verifying the file name + size is not fool proof. The
reason why I called out binary level verification out of initial scope is
because of these two reasons: 1) Calculating digest for each file may
increase CPU utilisation and 2) Disk would also be under pressure as
complete disk content will also be read to calculate digest. As called out
in the discussion, I think we can't compromise on binary level check for
these two reasons. Let me update the CEP to include binary level
verification. During implementation, it can probably be made optional so
that it can be skipped if someone doesn't want it.

Thanks!
Hari

On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
[email protected]> wrote:

> We use backup/restore for our implementation of this concept. It has the
> added benefit that the backup / restore path gets exercised much more
> regularly than it would in normal operations, finding edge case bugs at a
> time when you still have other ways of recovering rather than in a full
> disaster scenario.
>
>
>
> Cheers
>
> Ben
>
>
>
>
>
>
>
>
>
> *From: *Jordan West 
> *Date: *Sunday, 21 April 2024 at 05:38
> *To: *[email protected] 
> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
> for Live Migrating Instances
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
>
> I do really like the framing of replacing a node is restoring a node and
> then kicking off a replace. That is effectively what we do internally.
>
>
>
> I also agree we should be able to do data movement well both internal to
> Cassandra and externally for a variety of reasons.
>
>
>
> We’ve seen great performance with “ZCS+TLS” even though it’s not full zero
> copy — nodes that previously took *days* to replace now take a few hours.
> But we have seen it put pressure on nodes and drive up latencies which is
> the main reason we still rely on an external data movement system by
> default — falling back to ZCS+TLS as needed.
>
>
>
> Jordan
>
>
>
> On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
>
> Jeff, this is probably the best explanation and justification of the idea
> that I've heard so far.
>
>
>
> I like it because
>
>
>
> 1) we really should have something official for backups
>
> 2) backups / object store would be great for analytics
>
> 3) it solves a much bigger problem than the single goal of moving
> instances.
>
>
>
> I'm a huge +1 in favor of this perspective, with live migration being one
> use case for backup / restore.
>
>
>
> Jon
>
>
>
>
>
> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:
>
> I think Jordan and German had an interesting insight, or at least their
> comment made me think about this slightly differently, and I’m going to
> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>
>
>
> The CEP treats this as “move a live instance from one machine to another”.
> I know why the author wants to do this.
>
>
>
> If you think of it instead as “change backup/restore mechanism to be able
> to safely restore from a running instance”, you may end up with a cleaner
> abstraction that’s easier to think about (and may also be easier to
> generalize in clouds where you have other tools available ).
>
>
>
> I’m not familiar enough with the sidecar to know the state of
> orchestration for backup/restore, but “ensure the original source node
> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
> “forcibly exclude the original instance from the cluster” are all things
> the restore code is going to need to do anyway, and if restore doesn’t do
> that today, it seems like we can solve it once.
>
>
>
> Backup probably needs to be generalized to support many sources, too.
> Object storage is obvious (s3 download). Block storage is obvious (snapshot
> and reattach). Reading sstables from another sidecar seems reasonable, too.
>
>
>
> It accomplishes the original goal, in largely the same fashion, it just
> makes the logic reusable for other purposes?
>
>
>
>
>
>
>
>
>
>
>
> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi  wrote:
>
> 
>
> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:
>
>
>
> If there is a faster/better way to replace a node why not  have Cassandra

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-21 Thread Slater, Ben via dev
We use backup/restore for our implementation of this concept. It has the added 
benefit that the backup / restore path gets exercised much more regularly than 
it would in normal operations, finding edge case bugs at a time when you still 
have other ways of recovering rather than in a full disaster scenario.

Cheers
Ben




From: Jordan West 
Date: Sunday, 21 April 2024 at 05:38
To: [email protected] 
Subject: Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live 
Migrating Instances
EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


I do really like the framing of replacing a node is restoring a node and then 
kicking off a replace. That is effectively what we do internally.

I also agree we should be able to do data movement well both internal to 
Cassandra and externally for a variety of reasons.

We’ve seen great performance with “ZCS+TLS” even though it’s not full zero copy 
— nodes that previously took *days* to replace now take a few hours. But we 
have seen it put pressure on nodes and drive up latencies which is the main 
reason we still rely on an external data movement system by default — falling 
back to ZCS+TLS as needed.

Jordan

On Fri, Apr 19, 2024 at 19:15 Jon Haddad 
mailto:[email protected]>> wrote:
Jeff, this is probably the best explanation and justification of the idea that 
I've heard so far.

I like it because

1) we really should have something official for backups
2) backups / object store would be great for analytics
3) it solves a much bigger problem than the single goal of moving instances.

I'm a huge +1 in favor of this perspective, with live migration being one use 
case for backup / restore.

Jon


On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa 
mailto:[email protected]>> wrote:
I think Jordan and German had an interesting insight, or at least their comment 
made me think about this slightly differently, and I’m going to repeat it so 
it’s not lost in the discussion about zerocopy / sendfile.

The CEP treats this as “move a live instance from one machine to another”. I 
know why the author wants to do this.

If you think of it instead as “change backup/restore mechanism to be able to 
safely restore from a running instance”, you may end up with a cleaner 
abstraction that’s easier to think about (and may also be easier to generalize 
in clouds where you have other tools available ).

I’m not familiar enough with the sidecar to know the state of orchestration for 
backup/restore, but “ensure the original source node isn’t running” , “migrate 
the config”, “choose and copy a snapshot” , maybe “forcibly exclude the 
original instance from the cluster” are all things the restore code is going to 
need to do anyway, and if restore doesn’t do that today, it seems like we can 
solve it once.

Backup probably needs to be generalized to support many sources, too. Object 
storage is obvious (s3 download). Block storage is obvious (snapshot and 
reattach). Reading sstables from another sidecar seems reasonable, too.

It accomplishes the original goal, in largely the same fashion, it just makes 
the logic reusable for other purposes?






On Apr 19, 2024, at 5:52 PM, Dinesh Joshi 
mailto:[email protected]>> wrote:

On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg 
mailto:[email protected]>> wrote:

If there is a faster/better way to replace a node why not  have Cassandra 
support that natively without the sidecar so people who aren’t running the 
sidecar can benefit?

I am not the author of the CEP so take whatever I say with a pinch of salt. 
Scott and Jordan have pointed out some benefits of doing this in the Sidecar vs 
Cassandra.

Today Cassandra is able to do fast node replacements. However, this CEP is 
addressing an important corner case when Cassandra is unable to start up due to 
old / ailing hardware. Can we fix it in Cassandra so it doesn't die on old 
hardware? Sure. However, you would still need operator intervention to start it 
up in some special mode both on the old and new node so the new node can peer 
with the old node, copy over its data and join the ring. This would still 
require some orchestration outside the database. The Sidecar can do that 
orchestration for the operator. The point I'm making here is that the CEP 
addresses a real issue. The way it is currently built can improve over time 
with improvements in Cassandra.

Dinesh



Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-20 Thread Jordan West
I do really like the framing of replacing a node is restoring a node and
then kicking off a replace. That is effectively what we do internally.

I also agree we should be able to do data movement well both internal to
Cassandra and externally for a variety of reasons.

We’ve seen great performance with “ZCS+TLS” even though it’s not full zero
copy — nodes that previously took *days* to replace now take a few hours.
But we have seen it put pressure on nodes and drive up latencies which is
the main reason we still rely on an external data movement system by
default — falling back to ZCS+TLS as needed.

Jordan

On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:

> Jeff, this is probably the best explanation and justification of the idea
> that I've heard so far.
>
> I like it because
>
> 1) we really should have something official for backups
> 2) backups / object store would be great for analytics
> 3) it solves a much bigger problem than the single goal of moving
> instances.
>
> I'm a huge +1 in favor of this perspective, with live migration being one
> use case for backup / restore.
>
> Jon
>
>
> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:
>
>> I think Jordan and German had an interesting insight, or at least their
>> comment made me think about this slightly differently, and I’m going to
>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>
>> The CEP treats this as “move a live instance from one machine to
>> another”. I know why the author wants to do this.
>>
>> If you think of it instead as “change backup/restore mechanism to be able
>> to safely restore from a running instance”, you may end up with a cleaner
>> abstraction that’s easier to think about (and may also be easier to
>> generalize in clouds where you have other tools available ).
>>
>> I’m not familiar enough with the sidecar to know the state of
>> orchestration for backup/restore, but “ensure the original source node
>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
>> “forcibly exclude the original instance from the cluster” are all things
>> the restore code is going to need to do anyway, and if restore doesn’t do
>> that today, it seems like we can solve it once.
>>
>> Backup probably needs to be generalized to support many sources, too.
>> Object storage is obvious (s3 download). Block storage is obvious (snapshot
>> and reattach). Reading sstables from another sidecar seems reasonable, too.
>>
>> It accomplishes the original goal, in largely the same fashion, it just
>> makes the logic reusable for other purposes?
>>
>>
>>
>>
>>
>> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi  wrote:
>>
>> 
>> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg 
>> wrote:
>>
>>>
>>> If there is a faster/better way to replace a node why not  have
>>> Cassandra support that natively without the sidecar so people who aren’t
>>> running the sidecar can benefit?
>>>
>>
>> I am not the author of the CEP so take whatever I say with a pinch of
>> salt. Scott and Jordan have pointed out some benefits of doing this in the
>> Sidecar vs Cassandra.
>>
>> Today Cassandra is able to do fast node replacements. However, this CEP
>> is addressing an important corner case when Cassandra is unable to start up
>> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
>> on old hardware? Sure. However, you would still need operator intervention
>> to start it up in some special mode both on the old and new node so the new
>> node can peer with the old node, copy over its data and join the ring. This
>> would still require some orchestration outside the database. The Sidecar
>> can do that orchestration for the operator. The point I'm making here is
>> that the CEP addresses a real issue. The way it is currently built can
>> improve over time with improvements in Cassandra.
>>
>> Dinesh
>>
>>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jon Haddad
Jeff, this is probably the best explanation and justification of the idea
that I've heard so far.

I like it because

1) we really should have something official for backups
2) backups / object store would be great for analytics
3) it solves a much bigger problem than the single goal of moving instances.

I'm a huge +1 in favor of this perspective, with live migration being one
use case for backup / restore.

Jon


On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:

> I think Jordan and German had an interesting insight, or at least their
> comment made me think about this slightly differently, and I’m going to
> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>
> The CEP treats this as “move a live instance from one machine to another”.
> I know why the author wants to do this.
>
> If you think of it instead as “change backup/restore mechanism to be able
> to safely restore from a running instance”, you may end up with a cleaner
> abstraction that’s easier to think about (and may also be easier to
> generalize in clouds where you have other tools available ).
>
> I’m not familiar enough with the sidecar to know the state of
> orchestration for backup/restore, but “ensure the original source node
> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
> “forcibly exclude the original instance from the cluster” are all things
> the restore code is going to need to do anyway, and if restore doesn’t do
> that today, it seems like we can solve it once.
>
> Backup probably needs to be generalized to support many sources, too.
> Object storage is obvious (s3 download). Block storage is obvious (snapshot
> and reattach). Reading sstables from another sidecar seems reasonable, too.
>
> It accomplishes the original goal, in largely the same fashion, it just
> makes the logic reusable for other purposes?
>
>
>
>
>
> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi  wrote:
>
> 
> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:
>
>>
>> If there is a faster/better way to replace a node why not  have Cassandra
>> support that natively without the sidecar so people who aren’t running the
>> sidecar can benefit?
>>
>
> I am not the author of the CEP so take whatever I say with a pinch of
> salt. Scott and Jordan have pointed out some benefits of doing this in the
> Sidecar vs Cassandra.
>
> Today Cassandra is able to do fast node replacements. However, this CEP is
> addressing an important corner case when Cassandra is unable to start up
> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
> on old hardware? Sure. However, you would still need operator intervention
> to start it up in some special mode both on the old and new node so the new
> node can peer with the old node, copy over its data and join the ring. This
> would still require some orchestration outside the database. The Sidecar
> can do that orchestration for the operator. The point I'm making here is
> that the CEP addresses a real issue. The way it is currently built can
> improve over time with improvements in Cassandra.
>
> Dinesh
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jeff Jirsa
I think Jordan and German had an interesting insight, or at least their comment made me think about this slightly differently, and I’m going to repeat it so it’s not lost in the discussion about zerocopy / sendfile.The CEP treats this as “move a live instance from one machine to another”. I know why the author wants to do this.If you think of it instead as “change backup/restore mechanism to be able to safely restore from a running instance”, you may end up with a cleaner abstraction that’s easier to think about (and may also be easier to generalize in clouds where you have other tools available ). I’m not familiar enough with the sidecar to know the state of orchestration for backup/restore, but “ensure the original source node isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe “forcibly exclude the original instance from the cluster” are all things the restore code is going to need to do anyway, and if restore doesn’t do that today, it seems like we can solve it once. Backup probably needs to be generalized to support many sources, too. Object storage is obvious (s3 download). Block storage is obvious (snapshot and reattach). Reading sstables from another sidecar seems reasonable, too.It accomplishes the original goal, in largely the same fashion, it just makes the logic reusable for other purposes? On Apr 19, 2024, at 5:52 PM, Dinesh Joshi  wrote:On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:If there is a faster/better way to replace a node why not  have Cassandra support that natively without the sidecar so people who aren’t running the sidecar can benefit? I am not the author of the CEP so take whatever I say with a pinch of salt. Scott and Jordan have pointed out some benefits of doing this in the Sidecar vs Cassandra. Today Cassandra is able to do fast node replacements. However, this CEP is addressing an important corner case when Cassandra is unable to start up due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die on old hardware? Sure. However, you would still need operator intervention to start it up in some special mode both on the old and new node so the new node can peer with the old node, copy over its data and join the ring. This would still require some orchestration outside the database. The Sidecar can do that orchestration for the operator. The point I'm making here is that the CEP addresses a real issue. The way it is currently built can improve over time with improvements in Cassandra.Dinesh


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Dinesh Joshi
On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:

>
> If there is a faster/better way to replace a node why not  have Cassandra
> support that natively without the sidecar so people who aren’t running the
> sidecar can benefit?
>

I am not the author of the CEP so take whatever I say with a pinch of salt.
Scott and Jordan have pointed out some benefits of doing this in the
Sidecar vs Cassandra.

Today Cassandra is able to do fast node replacements. However, this CEP is
addressing an important corner case when Cassandra is unable to start up
due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
on old hardware? Sure. However, you would still need operator intervention
to start it up in some special mode both on the old and new node so the new
node can peer with the old node, copy over its data and join the ring. This
would still require some orchestration outside the database. The Sidecar
can do that orchestration for the operator. The point I'm making here is
that the CEP addresses a real issue. The way it is currently built can
improve over time with improvements in Cassandra.

Dinesh


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Dinesh Joshi
On Fri, Apr 19, 2024 at 3:12 PM Jon Haddad  wrote:

> I haven't looked at streaming over TLS, so I might be way off base here,
> but our own docs (
> https://cassandra.apache.org/doc/latest/cassandra/architecture/streaming.html)
> say ZCS is not available when using encryption, and if we have to bring the
> data into the JVM then I'm not sure how it would even work.  sendfile is a
> direct file descriptor to file descriptor copy.  How are we simultaneously
> doing kernel-only operations while also performing encryption in the JVM?
>

Yes, the 'zero copy' aspect of streaming is not available when we stream
over TLS as we're required to bring in those bytes into the JVM to encrypt.
However, we still get the benefit of copying entire files and skipping the
non-trivial ser/deser & GC overhead associated with streaming individual
partitions. Cassandra will handle this transparently[1] depending on
whether you enable TLS or not.

Dinesh

[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/AsyncStreamingOutputPlus.java#L159


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jon Haddad
I haven't looked at streaming over TLS, so I might be way off base here,
but our own docs (
https://cassandra.apache.org/doc/latest/cassandra/architecture/streaming.html)
say ZCS is not available when using encryption, and if we have to bring the
data into the JVM then I'm not sure how it would even work.  sendfile is a
direct file descriptor to file descriptor copy.  How are we simultaneously
doing kernel-only operations while also performing encryption in the JVM?

I'm assuming you mean something other than ZCS when you say "ZCS with
TLS"?  Maybe "no serde" streaming?

Jon




On Fri, Apr 19, 2024 at 2:36 PM C. Scott Andreas 
wrote:

> These are the salient points here for me, yes:
>
> > My understanding from the proposal is that Sidecar would be able to
> migrate from a Cassandra instance that is already dead and cannot recover.
>
> > That’s one thing I like about having it an external process — not that
> it’s bullet proof but it’s one less thing to worry about.
>
> The manual/rsync version of the state machine Hari describes in the CEP is
> one of the best escape hatches for migrating an instance that’s
> overstressed, limping on ailing hardware, or that has exhausted disk. If
> the system is functional but the C* process is in bad shape, it’s great to
> have a paved-path flow for migrating the instance and data to more capable
> hardware.
>
> I also agree in principle that “streaming should be just as fast via the
> C* process itself.” This hits a couple snags today:
>
> - This option isn’t available when the C* instance is struggling.
> - In the scenario of replacing an entire cluster’s hardware with new
> machines, applying this process to an entire cluster via host replacements
> of all instances (which also requires repairs) or by doubling then halving
> capacity is incredibly cumbersome and operationally-impacting to the
> database’s users - especially if the DB is already having a hard time.
> - The host replacement process also puts a lot of stress on gossip and is
> a great way to encounter all sorts of painful races if you perform it
> hundreds or thousands of times (but shouldn’t be a problem in TCM-world).
>
> So I think I agree with both points:
>
> - Cassandra should be able to do this itself.
> - It is also valuable to have a paved path implementation of a safe
> migration/forklift state machine when you’re in a bind, or need to do this
> hundreds or thousands of times.
>
> On zero copy: what really makes ZCS fast compared to legacy streaming is
> that the JVM is able to ship entire files around, rather than deserializing
> SSTables and reserializing them to stream each individual row. That’s the
> slow and expensive part. It’s true that TLS means you incur an extra memcpy
> as that stream is encrypted before it’s chunked into packets — but the cost
> of that memcpy for encryption pales in comparison to how slow
> deserializing/reserializing SSTables is/was.
>
> ZCS with TLS can push 20Gbps+ today on decent but not extravagant Xeon
> hardware. In-kernel TLS would also still encounter a memcpy in the
> encryption path; the kernel.org doc alludes to this via “the kernel will
> need to allocate a buffer for the encrypted data.” But it would allow using
> sendfile and cut a copy in userspace. If someone is interested in testing
> it out I’d love to learn what they find. It’s always a great surprise to
> learn there’s a more perf left on the table. This comparison looks
> promising: https://tinselcity.github.io/SSL_Sendfile/
>
> – Scott
>
> —
> Mobile
>
> On Apr 19, 2024, at 11:31 AM, Jordan West  wrote:
>
> 
> If we are considering the main process then we have to do some additional
> work to ensure that it doesn’t put pressure on the JVM and introduce
> latency. That’s one thing I like about having it an external process — not
> that it’s bullet proof but it’s one less thing to worry about.
>
> Jordan
>
> On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero 
> wrote:
>
>> My understanding from the proposal is that Sidecar would be able to
>> migrate
>> from a Cassandra instance that is already dead and cannot recover. This
>> is a
>> scenario that is possible where Sidecar should still be able to migrate
>> to a new
>> instance.
>>
>> Alternatively, Cassandra itself could have some flag to start up with
>> limited
>> subsystems enabled to allow live migration.
>>
>> In any case, we'll need to weigh in the pros and cons of each alternative
>> and
>> decide if the live migration process can be handled within the C* process
>> itself
>> or if we allow this functionality to be handled by Sidecar.
>>
>> I am looking forward to this feature though, as it will be of great value
>> for many
>> users across the ecosystem.
>>
>> On 2024/04/18 22:25:23 Jon Haddad wrote:
>> > Hmm... I guess if you're using encryption you can't use ZCS so there's
>> that.
>> >
>> > It probably makes sense to implement kernel TLS:
>> > https://www.kernel.org/doc/html/v5.7/networking/tls.html
>> >
>> > Then we can get ZCS a

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread C. Scott Andreas
These are the salient points here for me, yes:> My understanding from the proposal is that Sidecar would be able to migrate from a Cassandra instance that is already dead and cannot recover.> That’s one thing I like about having it an external process — not that it’s bullet proof but it’s one less thing to worry about.The manual/rsync version of the state machine Hari describes in the CEP is one of the best escape hatches for migrating an instance that’s overstressed, limping on ailing hardware, or that has exhausted disk. If the system is functional but the C* process is in bad shape, it’s great to have a paved-path flow for migrating the instance and data to more capable hardware.I also agree in principle that “streaming should be just as fast via the C* process itself.” This hits a couple snags today:- This option isn’t available when the C* instance is struggling.- In the scenario of replacing an entire cluster’s hardware with new machines, applying this process to an entire cluster via host replacements of all instances (which also requires repairs) or by doubling then halving capacity is incredibly cumbersome and operationally-impacting to the database’s users - especially if the DB is already having a hard time.- The host replacement process also puts a lot of stress on gossip and is a great way to encounter all sorts of painful races if you perform it hundreds or thousands of times (but shouldn’t be a problem in TCM-world).So I think I agree with both points:- Cassandra should be able to do this itself.- It is also valuable to have a paved path implementation of a safe migration/forklift state machine when you’re in a bind, or need to do this hundreds or thousands of times.On zero copy: what really makes ZCS fast compared to legacy streaming is that the JVM is able to ship entire files around, rather than deserializing SSTables and reserializing them to stream each individual row. That’s the slow and expensive part. It’s true that TLS means you incur an extra memcpy as that stream is encrypted before it’s chunked into packets — but the cost of that memcpy for encryption pales in comparison to how slow deserializing/reserializing SSTables is/was.ZCS with TLS can push 20Gbps+ today on decent but not extravagant Xeon hardware. In-kernel TLS would also still encounter a memcpy in the encryption path; the kernel.org doc alludes to this via “the kernel will need to allocate a buffer for the encrypted data.” But it would allow using sendfile and cut a copy in userspace. If someone is interested in testing it out I’d love to learn what they find. It’s always a great surprise to learn there’s a more perf left on the table. This comparison looks promising: https://tinselcity.github.io/SSL_Sendfile/– Scott—MobileOn Apr 19, 2024, at 11:31 AM, Jordan West  wrote:If we are considering the main process then we have to do some additional work to ensure that it doesn’t put pressure on the JVM and introduce latency. That’s one thing I like about having it an external process — not that it’s bullet proof but it’s one less thing to worry about. Jordan On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero  wrote:My understanding from the proposal is that Sidecar would be able to migrate
from a Cassandra instance that is already dead and cannot recover. This is a
scenario that is possible where Sidecar should still be able to migrate to a new
instance.

Alternatively, Cassandra itself could have some flag to start up with limited
subsystems enabled to allow live migration.

In any case, we'll need to weigh in the pros and cons of each alternative and
decide if the live migration process can be handled within the C* process itself
or if we allow this functionality to be handled by Sidecar.

I am looking forward to this feature though, as it will be of great value for many
users across the ecosystem.

On 2024/04/18 22:25:23 Jon Haddad wrote:
> Hmm... I guess if you're using encryption you can't use ZCS so there's that.
> 
> It probably makes sense to implement kernel TLS:
> https://www.kernel.org/doc/html/v5.7/networking/tls.html
> 
> Then we can get ZCS all the time, for bootstrap & replacements.
> 
> Jon
> 
> 
> On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad  wrote:
> 
> > Ariel, having it in C* process makes sense to me.
> >
> > Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
> > have no distinguishable difference in overhead from doing it using the
> > sidecar?  Since the underlying call is sendfile, never hitting userspace, I
> > can't see why we'd opt for the transfer in sidecar.  What's the
> > advantage of duplicating the work that's already been done?
> >
> > I can see using the sidecar for coordination to start and stop instances
> > or do things that require something out of process.
> >
> > Jon
> >
> >
> > On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:
> >
> >> Hi,
> >>
> >> If there is a faster/better way to replace a node w

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jordan West
If we are considering the main process then we have to do some additional
work to ensure that it doesn’t put pressure on the JVM and introduce
latency. That’s one thing I like about having it an external process — not
that it’s bullet proof but it’s one less thing to worry about.

Jordan

On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero  wrote:

> My understanding from the proposal is that Sidecar would be able to migrate
> from a Cassandra instance that is already dead and cannot recover. This is
> a
> scenario that is possible where Sidecar should still be able to migrate to
> a new
> instance.
>
> Alternatively, Cassandra itself could have some flag to start up with
> limited
> subsystems enabled to allow live migration.
>
> In any case, we'll need to weigh in the pros and cons of each alternative
> and
> decide if the live migration process can be handled within the C* process
> itself
> or if we allow this functionality to be handled by Sidecar.
>
> I am looking forward to this feature though, as it will be of great value
> for many
> users across the ecosystem.
>
> On 2024/04/18 22:25:23 Jon Haddad wrote:
> > Hmm... I guess if you're using encryption you can't use ZCS so there's
> that.
> >
> > It probably makes sense to implement kernel TLS:
> > https://www.kernel.org/doc/html/v5.7/networking/tls.html
> >
> > Then we can get ZCS all the time, for bootstrap & replacements.
> >
> > Jon
> >
> >
> > On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad  wrote:
> >
> > > Ariel, having it in C* process makes sense to me.
> > >
> > > Please correct me if I'm wrong here, but shouldn't using ZCS to
> transfer
> > > have no distinguishable difference in overhead from doing it using the
> > > sidecar?  Since the underlying call is sendfile, never hitting
> userspace, I
> > > can't see why we'd opt for the transfer in sidecar.  What's the
> > > advantage of duplicating the work that's already been done?
> > >
> > > I can see using the sidecar for coordination to start and stop
> instances
> > > or do things that require something out of process.
> > >
> > > Jon
> > >
> > >
> > > On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg 
> wrote:
> > >
> > >> Hi,
> > >>
> > >> If there is a faster/better way to replace a node why not  have
> Cassandra
> > >> support that natively without the sidecar so people who aren’t
> running the
> > >> sidecar can benefit?
> > >>
> > >> Copying files over a network shouldn’t be slow in C* and it would also
> > >> already have all the connectivity issues solved.
> > >>
> > >> Regards,
> > >> Ariel
> > >>
> > >> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> > >>
> > >> Hi all,
> > >>
> > >> I have filed CEP-40 [1] for live migrating Cassandra instances using
> the
> > >> Cassandra Sidecar.
> > >>
> > >> When someone needs to move all or a portion of the Cassandra nodes
> > >> belonging to a cluster to different hosts, the traditional approach of
> > >> Cassandra node replacement can be time-consuming due to repairs and
> the
> > >> bootstrapping of new nodes. Depending on the volume of the storage
> service
> > >> load, replacements (repair + bootstrap) may take anywhere from a few
> hours
> > >> to days.
> > >>
> > >> Proposing a Sidecar based solution to address these challenges. This
> > >> solution proposes transferring data from the old host (source) to the
> new
> > >> host (destination) and then bringing up the Cassandra process at the
> > >> destination, to enable fast instance migration. This approach would
> help to
> > >> minimise node downtime, as it is based on a Sidecar solution for data
> > >> transfer and avoids repairs and bootstrap.
> > >>
> > >> Looking forward to the discussions.
> > >>
> > >> [1]
> > >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> > >>
> > >> Thanks!
> > >> Hari
> > >>
> > >>
> > >>
> >
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Francisco Guerrero
My understanding from the proposal is that Sidecar would be able to migrate
from a Cassandra instance that is already dead and cannot recover. This is a
scenario that is possible where Sidecar should still be able to migrate to a new
instance.

Alternatively, Cassandra itself could have some flag to start up with limited
subsystems enabled to allow live migration.

In any case, we'll need to weigh in the pros and cons of each alternative and
decide if the live migration process can be handled within the C* process itself
or if we allow this functionality to be handled by Sidecar.

I am looking forward to this feature though, as it will be of great value for 
many
users across the ecosystem.

On 2024/04/18 22:25:23 Jon Haddad wrote:
> Hmm... I guess if you're using encryption you can't use ZCS so there's that.
> 
> It probably makes sense to implement kernel TLS:
> https://www.kernel.org/doc/html/v5.7/networking/tls.html
> 
> Then we can get ZCS all the time, for bootstrap & replacements.
> 
> Jon
> 
> 
> On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad  wrote:
> 
> > Ariel, having it in C* process makes sense to me.
> >
> > Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
> > have no distinguishable difference in overhead from doing it using the
> > sidecar?  Since the underlying call is sendfile, never hitting userspace, I
> > can't see why we'd opt for the transfer in sidecar.  What's the
> > advantage of duplicating the work that's already been done?
> >
> > I can see using the sidecar for coordination to start and stop instances
> > or do things that require something out of process.
> >
> > Jon
> >
> >
> > On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:
> >
> >> Hi,
> >>
> >> If there is a faster/better way to replace a node why not  have Cassandra
> >> support that natively without the sidecar so people who aren’t running the
> >> sidecar can benefit?
> >>
> >> Copying files over a network shouldn’t be slow in C* and it would also
> >> already have all the connectivity issues solved.
> >>
> >> Regards,
> >> Ariel
> >>
> >> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> >>
> >> Hi all,
> >>
> >> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> >> Cassandra Sidecar.
> >>
> >> When someone needs to move all or a portion of the Cassandra nodes
> >> belonging to a cluster to different hosts, the traditional approach of
> >> Cassandra node replacement can be time-consuming due to repairs and the
> >> bootstrapping of new nodes. Depending on the volume of the storage service
> >> load, replacements (repair + bootstrap) may take anywhere from a few hours
> >> to days.
> >>
> >> Proposing a Sidecar based solution to address these challenges. This
> >> solution proposes transferring data from the old host (source) to the new
> >> host (destination) and then bringing up the Cassandra process at the
> >> destination, to enable fast instance migration. This approach would help to
> >> minimise node downtime, as it is based on a Sidecar solution for data
> >> transfer and avoids repairs and bootstrap.
> >>
> >> Looking forward to the discussions.
> >>
> >> [1]
> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> >>
> >> Thanks!
> >> Hari
> >>
> >>
> >>
> 


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Jon Haddad
Hmm... I guess if you're using encryption you can't use ZCS so there's that.

It probably makes sense to implement kernel TLS:
https://www.kernel.org/doc/html/v5.7/networking/tls.html

Then we can get ZCS all the time, for bootstrap & replacements.

Jon


On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad  wrote:

> Ariel, having it in C* process makes sense to me.
>
> Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
> have no distinguishable difference in overhead from doing it using the
> sidecar?  Since the underlying call is sendfile, never hitting userspace, I
> can't see why we'd opt for the transfer in sidecar.  What's the
> advantage of duplicating the work that's already been done?
>
> I can see using the sidecar for coordination to start and stop instances
> or do things that require something out of process.
>
> Jon
>
>
> On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> If there is a faster/better way to replace a node why not  have Cassandra
>> support that natively without the sidecar so people who aren’t running the
>> sidecar can benefit?
>>
>> Copying files over a network shouldn’t be slow in C* and it would also
>> already have all the connectivity issues solved.
>>
>> Regards,
>> Ariel
>>
>> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
>>
>> Hi all,
>>
>> I have filed CEP-40 [1] for live migrating Cassandra instances using the
>> Cassandra Sidecar.
>>
>> When someone needs to move all or a portion of the Cassandra nodes
>> belonging to a cluster to different hosts, the traditional approach of
>> Cassandra node replacement can be time-consuming due to repairs and the
>> bootstrapping of new nodes. Depending on the volume of the storage service
>> load, replacements (repair + bootstrap) may take anywhere from a few hours
>> to days.
>>
>> Proposing a Sidecar based solution to address these challenges. This
>> solution proposes transferring data from the old host (source) to the new
>> host (destination) and then bringing up the Cassandra process at the
>> destination, to enable fast instance migration. This approach would help to
>> minimise node downtime, as it is based on a Sidecar solution for data
>> transfer and avoids repairs and bootstrap.
>>
>> Looking forward to the discussions.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>>
>> Thanks!
>> Hari
>>
>>
>>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Jon Haddad
Ariel, having it in C* process makes sense to me.

Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
have no distinguishable difference in overhead from doing it using the
sidecar?  Since the underlying call is sendfile, never hitting userspace, I
can't see why we'd opt for the transfer in sidecar.  What's the
advantage of duplicating the work that's already been done?

I can see using the sidecar for coordination to start and stop instances or
do things that require something out of process.

Jon


On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:

> Hi,
>
> If there is a faster/better way to replace a node why not  have Cassandra
> support that natively without the sidecar so people who aren’t running the
> sidecar can benefit?
>
> Copying files over a network shouldn’t be slow in C* and it would also
> already have all the connectivity issues solved.
>
> Regards,
> Ariel
>
> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Ariel Weisberg
Hi,

If there is a faster/better way to replace a node why not  have Cassandra 
support that natively without the sidecar so people who aren’t running the 
sidecar can benefit?

Copying files over a network shouldn’t be slow in C* and it would also already 
have all the connectivity issues solved.

Regards,
Ariel

On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> Hi all,
> 
> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
> Cassandra Sidecar.
> 
> When someone needs to move all or a portion of the Cassandra nodes belonging 
> to a cluster to different hosts, the traditional approach of Cassandra node 
> replacement can be time-consuming due to repairs and the bootstrapping of new 
> nodes. Depending on the volume of the storage service load, replacements 
> (repair + bootstrap) may take anywhere from a few hours to days.
> 
> Proposing a Sidecar based solution to address these challenges. This solution 
> proposes transferring data from the old host (source) to the new host 
> (destination) and then bringing up the Cassandra process at the destination, 
> to enable fast instance migration. This approach would help to minimise node 
> downtime, as it is based on a Sidecar solution for data transfer and avoids 
> repairs and bootstrap.
> 
> Looking forward to the discussions.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> 
> Thanks!
> Hari


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Claude Warren, Jr via dev
I think this solution would solve one of the problems that Aiven has with
node replacement currently.  Though TCM will probably help as well.

On Mon, Apr 15, 2024 at 11:47 PM German Eichberger via dev <
[email protected]> wrote:

> Thanks for the proposal. I second Jordan that we need more abstraction in
> (1), e.g. most cloud provider allow for disk snapshots and starting nodes
> from a snapshot which would be a good mechanism if you find yourself there.
>
> German
> --
> *From:* Jordan West 
> *Sent:* Sunday, April 14, 2024 12:27 PM
> *To:* [email protected] 
> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra
> Sidecar for Live Migrating Instances
>
> Thanks for proposing this CEP! We have something like this internally so I
> have some familiarity with the approach and the challenges. After reading
> the CEP a couple things come to mind:
>
> 1. I would like to see more abstraction of how the files get moved / put
> in place with the proposed solution being the default implementation. That
> would allow others to plug in alternatives means of data movement like
> pulling down backups from S3 or rsync, etc.
>
> 2. I do agree with Jon’s last email that the lifecycle / orchestration
> portion is the more challenging aspect. It would be nice to address that as
> well so we don’t end up with something like repair where the building
> blocks are there but the hard parts are left to the operator. I do,
> however, see that portion being done in a follow-on CEP to limit the scope
> of CEP-40 and have a higher chance for success by incrementally adding
> these features.
>
> Jordan
>
> On Thu, Apr 11, 2024 at 12:31 Jon Haddad  wrote:
>
> First off, let me apologize for my initial reply, it came off harsher than
> I had intended.
>
> I know I didn't say it initially, but I like the idea of making it easier
> to replace a node.  I think it's probably not obvious to folks that you can
> use rsync (with stunnel, or alternatively rclone), and for a lot of teams
> it's intimidating to do so.  Whether it actually is easy or not to do with
> rsync is irrelevant.  Having tooling that does it right is better than duct
> taping things together.
>
> So with that said, if you're looking to get feedback on how to make the
> CEP more generally useful, I have a couple thoughts.
>
> > Managing the Cassandra processes like bringing them up or down while
> migrating the instances.
>
> Maybe I missed this, but I thought we already had support for managing the
> C* lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me
> that adding the ability to make this entire workflow self managed would be
> the biggest win, because having a live migrate *feature* instead of what's
> essentially a runbook would be far more useful.
>
> > To verify whether the desired file set matches with source, only file
> path and size is considered at the moment. Strict binary level verification
> is deferred for later.
>
> Scott already mentioned this is a problem and I agree, we cannot simply
> rely on file path and size.
>
> TL;DR: I like the intention of the CEP.  I think it would be better if it
> managed the entire lifecycle of the migration, but you might not have an
> appetite to implement all that.
>
> Jon
>
>
> On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
> Thanks Jon & Scott for taking time to go through this CEP and providing
> inputs.
>
> I am completely with what Scott had mentioned earlier (I would have added
> more details into the CEP). Adding a few more points to the same.
>
> Having a solution with Sidecar can make the migration easy without
> depending on rsync. At least in the cases I have seen, rsync is not enabled
> by default and most of them want to run OS/images with as minimal
> requirements as possible. Installing rsync requires admin privileges and
> syncing data is a manual operation. If an API is provided with Sidecar,
> then tooling can be built around it reducing the scope for manual errors.
>
> From performance wise, at least in the cases I had seen, the File
> Streaming API in Sidecar performs a lot better. To give an idea on the
> performance, I would like to quote "up to 7 Gbps/instance writes (depending
> on hardware)" from CEP-28 as this CEP proposes to leverage the same.
>
> For:
>
> >When enabled for LCS, single sstable uplevel will mutate only the level
> of an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level o

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-15 Thread German Eichberger via dev
Thanks for the proposal. I second Jordan that we need more abstraction in (1), 
e.g. most cloud provider allow for disk snapshots and starting nodes from a 
snapshot which would be a good mechanism if you find yourself there.

German

From: Jordan West 
Sent: Sunday, April 14, 2024 12:27 PM
To: [email protected] 
Subject: [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar 
for Live Migrating Instances

Thanks for proposing this CEP! We have something like this internally so I have 
some familiarity with the approach and the challenges. After reading the CEP a 
couple things come to mind:

1. I would like to see more abstraction of how the files get moved / put in 
place with the proposed solution being the default implementation. That would 
allow others to plug in alternatives means of data movement like pulling down 
backups from S3 or rsync, etc.

2. I do agree with Jon’s last email that the lifecycle / orchestration portion 
is the more challenging aspect. It would be nice to address that as well so we 
don’t end up with something like repair where the building blocks are there but 
the hard parts are left to the operator. I do, however, see that portion being 
done in a follow-on CEP to limit the scope of CEP-40 and have a higher chance 
for success by incrementally adding these features.

Jordan

On Thu, Apr 11, 2024 at 12:31 Jon Haddad 
mailto:[email protected]>> wrote:
First off, let me apologize for my initial reply, it came off harsher than I 
had intended.

I know I didn't say it initially, but I like the idea of making it easier to 
replace a node.  I think it's probably not obvious to folks that you can use 
rsync (with stunnel, or alternatively rclone), and for a lot of teams it's 
intimidating to do so.  Whether it actually is easy or not to do with rsync is 
irrelevant.  Having tooling that does it right is better than duct taping 
things together.

So with that said, if you're looking to get feedback on how to make the CEP 
more generally useful, I have a couple thoughts.

> Managing the Cassandra processes like bringing them up or down while 
> migrating the instances.

Maybe I missed this, but I thought we already had support for managing the C* 
lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me that 
adding the ability to make this entire workflow self managed would be the 
biggest win, because having a live migrate *feature* instead of what's 
essentially a runbook would be far more useful.

> To verify whether the desired file set matches with source, only file path 
> and size is considered at the moment. Strict binary level verification is 
> deferred for later.

Scott already mentioned this is a problem and I agree, we cannot simply rely on 
file path and size.

TL;DR: I like the intention of the CEP.  I think it would be better if it 
managed the entire lifecycle of the migration, but you might not have an 
appetite to implement all that.

Jon


On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala 
mailto:[email protected]>> 
wrote:
Thanks Jon & Scott for taking time to go through this CEP and providing inputs.

I am completely with what Scott had mentioned earlier (I would have added more 
details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without depending on 
rsync. At least in the cases I have seen, rsync is not enabled by default and 
most of them want to run OS/images with as minimal requirements as possible. 
Installing rsync requires admin privileges and syncing data is a manual 
operation. If an API is provided with Sidecar, then tooling can be built around 
it reducing the scope for manual errors.

From performance wise, at least in the cases I had seen, the File Streaming API 
in Sidecar performs a lot better. To give an idea on the performance, I would 
like to quote "up to 7 Gbps/instance writes (depending on hardware)" from 
CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of an 
>SSTable in its stats metadata component, which wouldn't alter the filename and 
>may not alter the length of the stats metadata component. A change to the 
>level of an SSTable on the source via single sstable uplevel may not be caught 
>by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified time 
would change, right? It is addressed in section MIGRATING ONE INSTANCE, point 
2.b.ii which says "If a file is present at the destination but did not match 
(by size or timestamp) with the source file, then local file is deleted and 
added to list of files to download.". And after download by final data copy 
task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-15 Thread Paulo Motta
I am sympathetic to having a native fast instance migration solution as
this is a common operational recipe.

>  It's valuable for Cassandra to have an ecosystem-native mechanism of
migrating data between physical/virtual instances outside the standard
streaming path. As Hari mentions, the current ecosystem-native approach of
executing repairs, decommissions, and bootstraps is time-consuming and
cumbersome.

Have you considered supporting fast instance migration in the main process
via entire sstable streaming, which from my understanding is meant to make
streaming faster ? My only concern is moving core functionality to the
sidecar that could be efficiently supported in the main process.

In my view this could conceptually work as follows:
1. Add new startup flag -Dcassandra.replace_live_node=.
2. Start replacement node as a pending replica with the same ranges as
source node, so it will receive in-progress writes from source node during
migration, while not serving reads.
3. Perform entire sstable streaming to perform fast migration to
destination node.
4. Old node is shut down after entire sstable streaming is completed, new
node takes over.

While this approach is conceptually simple, the biggest difficulty would be
to support live nodes with the same tokens on different cluster membership
states, but perhaps this was made possible or easier on 5.x with
transactional cluster metadata ?

On Wed, Apr 10, 2024 at 10:00 PM C. Scott Andreas 
wrote:

> Oh, one note on this item:
>
> >  The operator can ensure that files in the destination matches with the
> source. In the first iteration of this feature, an API is introduced to
> calculate digest for the list of file names and their lengths to identify
> any mismatches. It does not validate the file contents at the binary level,
> but, such feature can be added at a later point of time.
>
> When enabled for LCS, single sstable uplevel will mutate only the level of
> an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> Including the file’s modification timestamp would address this without
> requiring a deep hash of the data. This would be good to include to ensure
> SSTables aren’t downleveled unexpectedly during migration.
>
> - Scott
>
> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas  wrote:
>
> 
> Hi Jon,
>
> Thanks for taking the time to read and reply to this proposal. Would
> encourage you to approach it from an attitude of seeking understanding on
> the part of the first-time CEP author, as this reply casts it off pretty
> quickly as NIH.
>
> The proposal isn't mine, but I'll offer a few notes on where I see this as
> valuable:
>
> – It's valuable for Cassandra to have an ecosystem-native mechanism of
> migrating data between physical/virtual instances outside the standard
> streaming path. As Hari mentions, the current ecosystem-native approach of
> executing repairs, decommissions, and bootstraps is time-consuming and
> cumbersome.
>
> – An ecosystem-native solution is safer than a bunch of bash and rsync.
> Defining a safe protocol to migrate data between instances via rsync
> without downtime is surprisingly difficult - and even moreso to do safely
> and repeatedly at scale. Enabling this process to be orchestrated by a
> control plane mechanizing offical endpoints of the database and sidecar –
> rather than trying to move data around behind its back – is much safer than
> hoping one's cobbled together the right set of scripts to move data in a
> way that won't violate strong / transactional consistency guarantees. This
> complexity is kind of exemplified by the "Migrating One Instance" section
> of the doc and state machine diagram, which illustrates an approach to
> solving that problem.
>
> – An ecosystem-native approach poses fewer security concerns than rsync.
> mTLS-authenticated endpoints in the sidecar for data movement eliminate the
> requirement for orchestration to occur via (typically) high-privilege SSH,
> which often allows for code execution of some form or complex efforts to
> scope SSH privileges of particular users; and eliminates the need to manage
> and secure rsyncd processes on each instance if not via SSH.
>
> – An ecosystem-native approach is more instrumentable and measurable than
> rsync. Support for data migration endpoints in the sidecar would allow for
> metrics reporting, stats collection, and alerting via mature and modern
> mechanisms rather than monitoring the output of a shell script.
>
> I'll yield to Hari to share more, though today is a public holiday in
> India.
>
> I do see this CEP as solving an important problem.
>
> Thanks,
>
> – Scott
>
> On Apr 8, 2024, at 10:23 AM, Jon Haddad  wrote:
>
>
> This seems like a lot of work to create an rsync alternative.  I can't
> really sa

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-14 Thread Jordan West
Thanks for proposing this CEP! We have something like this internally so I
have some familiarity with the approach and the challenges. After reading
the CEP a couple things come to mind:

1. I would like to see more abstraction of how the files get moved / put in
place with the proposed solution being the default implementation. That
would allow others to plug in alternatives means of data movement like
pulling down backups from S3 or rsync, etc.

2. I do agree with Jon’s last email that the lifecycle / orchestration
portion is the more challenging aspect. It would be nice to address that as
well so we don’t end up with something like repair where the building
blocks are there but the hard parts are left to the operator. I do,
however, see that portion being done in a follow-on CEP to limit the scope
of CEP-40 and have a higher chance for success by incrementally adding
these features.

Jordan

On Thu, Apr 11, 2024 at 12:31 Jon Haddad  wrote:

> First off, let me apologize for my initial reply, it came off harsher than
> I had intended.
>
> I know I didn't say it initially, but I like the idea of making it easier
> to replace a node.  I think it's probably not obvious to folks that you can
> use rsync (with stunnel, or alternatively rclone), and for a lot of teams
> it's intimidating to do so.  Whether it actually is easy or not to do with
> rsync is irrelevant.  Having tooling that does it right is better than duct
> taping things together.
>
> So with that said, if you're looking to get feedback on how to make the
> CEP more generally useful, I have a couple thoughts.
>
> > Managing the Cassandra processes like bringing them up or down while
> migrating the instances.
>
> Maybe I missed this, but I thought we already had support for managing the
> C* lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me
> that adding the ability to make this entire workflow self managed would be
> the biggest win, because having a live migrate *feature* instead of what's
> essentially a runbook would be far more useful.
>
> > To verify whether the desired file set matches with source, only file
> path and size is considered at the moment. Strict binary level verification
> is deferred for later.
>
> Scott already mentioned this is a problem and I agree, we cannot simply
> rely on file path and size.
>
> TL;DR: I like the intention of the CEP.  I think it would be better if it
> managed the entire lifecycle of the migration, but you might not have an
> appetite to implement all that.
>
> Jon
>
>
> On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
>> Thanks Jon & Scott for taking time to go through this CEP and providing
>> inputs.
>>
>> I am completely with what Scott had mentioned earlier (I would have added
>> more details into the CEP). Adding a few more points to the same.
>>
>> Having a solution with Sidecar can make the migration easy without
>> depending on rsync. At least in the cases I have seen, rsync is not enabled
>> by default and most of them want to run OS/images with as minimal
>> requirements as possible. Installing rsync requires admin privileges and
>> syncing data is a manual operation. If an API is provided with Sidecar,
>> then tooling can be built around it reducing the scope for manual errors.
>>
>> From performance wise, at least in the cases I had seen, the File
>> Streaming API in Sidecar performs a lot better. To give an idea on the
>> performance, I would like to quote "up to 7 Gbps/instance writes (depending
>> on hardware)" from CEP-28 as this CEP proposes to leverage the same.
>>
>> For:
>>
>> >When enabled for LCS, single sstable uplevel will mutate only the level
>> of an SSTable in its stats metadata component, which wouldn't alter the
>> filename and may not alter the length of the stats metadata component. A
>> change to the level of an SSTable on the source via single sstable uplevel
>> may not be caught by a digest based only on filename and length.
>>
>> In this case file size may not change, but the timestamp of last modified
>> time would change, right? It is addressed in section MIGRATING ONE
>> INSTANCE, point 2.b.ii which says "If a file is present at the destination
>> but did not match (by size or timestamp) with the source file, then local
>> file is deleted and added to list of files to download.". And after
>> download by final data copy task, file should match with source.
>>
>> On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
>> wrote:
>>
>>> Oh, one note on this item:
>>>
>>> >  The operator can ensure that files in the destination matches with
>>> the source. In the first iteration of this feature, an API is introduced to
>>> calculate digest for the list of file names and their lengths to identify
>>> any mismatches. It does not validate the file contents at the binary level,
>>> but, such feature can be added at a later point of time.
>>>
>>> When enabled for LCS, single sstable uplevel will 

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Jon Haddad
First off, let me apologize for my initial reply, it came off harsher than
I had intended.

I know I didn't say it initially, but I like the idea of making it easier
to replace a node.  I think it's probably not obvious to folks that you can
use rsync (with stunnel, or alternatively rclone), and for a lot of teams
it's intimidating to do so.  Whether it actually is easy or not to do with
rsync is irrelevant.  Having tooling that does it right is better than duct
taping things together.

So with that said, if you're looking to get feedback on how to make the CEP
more generally useful, I have a couple thoughts.

> Managing the Cassandra processes like bringing them up or down while
migrating the instances.

Maybe I missed this, but I thought we already had support for managing the
C* lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me
that adding the ability to make this entire workflow self managed would be
the biggest win, because having a live migrate *feature* instead of what's
essentially a runbook would be far more useful.

> To verify whether the desired file set matches with source, only file
path and size is considered at the moment. Strict binary level verification
is deferred for later.

Scott already mentioned this is a problem and I agree, we cannot simply
rely on file path and size.

TL;DR: I like the intention of the CEP.  I think it would be better if it
managed the entire lifecycle of the migration, but you might not have an
appetite to implement all that.

Jon


On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> Thanks Jon & Scott for taking time to go through this CEP and providing
> inputs.
>
> I am completely with what Scott had mentioned earlier (I would have added
> more details into the CEP). Adding a few more points to the same.
>
> Having a solution with Sidecar can make the migration easy without
> depending on rsync. At least in the cases I have seen, rsync is not enabled
> by default and most of them want to run OS/images with as minimal
> requirements as possible. Installing rsync requires admin privileges and
> syncing data is a manual operation. If an API is provided with Sidecar,
> then tooling can be built around it reducing the scope for manual errors.
>
> From performance wise, at least in the cases I had seen, the File
> Streaming API in Sidecar performs a lot better. To give an idea on the
> performance, I would like to quote "up to 7 Gbps/instance writes (depending
> on hardware)" from CEP-28 as this CEP proposes to leverage the same.
>
> For:
>
> >When enabled for LCS, single sstable uplevel will mutate only the level
> of an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> In this case file size may not change, but the timestamp of last modified
> time would change, right? It is addressed in section MIGRATING ONE
> INSTANCE, point 2.b.ii which says "If a file is present at the destination
> but did not match (by size or timestamp) with the source file, then local
> file is deleted and added to list of files to download.". And after
> download by final data copy task, file should match with source.
>
> On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
> wrote:
>
>> Oh, one note on this item:
>>
>> >  The operator can ensure that files in the destination matches with
>> the source. In the first iteration of this feature, an API is introduced to
>> calculate digest for the list of file names and their lengths to identify
>> any mismatches. It does not validate the file contents at the binary level,
>> but, such feature can be added at a later point of time.
>>
>> When enabled for LCS, single sstable uplevel will mutate only the level
>> of an SSTable in its stats metadata component, which wouldn't alter the
>> filename and may not alter the length of the stats metadata component. A
>> change to the level of an SSTable on the source via single sstable uplevel
>> may not be caught by a digest based only on filename and length.
>>
>> Including the file’s modification timestamp would address this without
>> requiring a deep hash of the data. This would be good to include to ensure
>> SSTables aren’t downleveled unexpectedly during migration.
>>
>> - Scott
>>
>> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas 
>> wrote:
>>
>> 
>> Hi Jon,
>>
>> Thanks for taking the time to read and reply to this proposal. Would
>> encourage you to approach it from an attitude of seeking understanding on
>> the part of the first-time CEP author, as this reply casts it off pretty
>> quickly as NIH.
>>
>> The proposal isn't mine, but I'll offer a few notes on where I see this
>> as valuable:
>>
>> – It's valuable for Cassandra to have an ecosystem-native mechanism of
>> migrating data bet

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Dinesh Joshi
On Mon, Apr 8, 2024 at 10:23 AM Jon Haddad  wrote:

> This seems like a lot of work to create an rsync alternative.  I can't
> really say I see the point.  I noticed your "rejected alternatives"
> mentions it with this note:
>

I want to point out a few things before dismissing it as an 'rsync
alternative' -

1. rsync is dangerous for many reasons. Top reason is security. rsync
executed over ssh offers a much broader access than is necessary for this
use-case. Operators also have to maintain multiple sets of credentials for
AuthN/AuthZ - ssh being just one of them. Finally, ssh isn't simply allowed
in some environments.

2. rsync is an incomplete solution. You still need to wrap rsync in a
script that will ensure that it does the right thing for each version of
Cassandra, accounts for failures, retries, etc.

The way I see it is if this solves a problem and adds value for even a
subset of our users it would be valuable to accept it.

Dinesh


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Venkata Hari Krishna Nukala
Thanks Jon & Scott for taking time to go through this CEP and providing
inputs.

I am completely with what Scott had mentioned earlier (I would have added
more details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without
depending on rsync. At least in the cases I have seen, rsync is not enabled
by default and most of them want to run OS/images with as minimal
requirements as possible. Installing rsync requires admin privileges and
syncing data is a manual operation. If an API is provided with Sidecar,
then tooling can be built around it reducing the scope for manual errors.

>From performance wise, at least in the cases I had seen, the File Streaming
API in Sidecar performs a lot better. To give an idea on the performance, I
would like to quote "up to 7 Gbps/instance writes (depending on hardware)"
from CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of
an SSTable in its stats metadata component, which wouldn't alter the
filename and may not alter the length of the stats metadata component. A
change to the level of an SSTable on the source via single sstable uplevel
may not be caught by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified
time would change, right? It is addressed in section MIGRATING ONE
INSTANCE, point 2.b.ii which says "If a file is present at the destination
but did not match (by size or timestamp) with the source file, then local
file is deleted and added to list of files to download.". And after
download by final data copy task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
wrote:

> Oh, one note on this item:
>
> >  The operator can ensure that files in the destination matches with the
> source. In the first iteration of this feature, an API is introduced to
> calculate digest for the list of file names and their lengths to identify
> any mismatches. It does not validate the file contents at the binary level,
> but, such feature can be added at a later point of time.
>
> When enabled for LCS, single sstable uplevel will mutate only the level of
> an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> Including the file’s modification timestamp would address this without
> requiring a deep hash of the data. This would be good to include to ensure
> SSTables aren’t downleveled unexpectedly during migration.
>
> - Scott
>
> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas  wrote:
>
> 
> Hi Jon,
>
> Thanks for taking the time to read and reply to this proposal. Would
> encourage you to approach it from an attitude of seeking understanding on
> the part of the first-time CEP author, as this reply casts it off pretty
> quickly as NIH.
>
> The proposal isn't mine, but I'll offer a few notes on where I see this as
> valuable:
>
> – It's valuable for Cassandra to have an ecosystem-native mechanism of
> migrating data between physical/virtual instances outside the standard
> streaming path. As Hari mentions, the current ecosystem-native approach of
> executing repairs, decommissions, and bootstraps is time-consuming and
> cumbersome.
>
> – An ecosystem-native solution is safer than a bunch of bash and rsync.
> Defining a safe protocol to migrate data between instances via rsync
> without downtime is surprisingly difficult - and even moreso to do safely
> and repeatedly at scale. Enabling this process to be orchestrated by a
> control plane mechanizing offical endpoints of the database and sidecar –
> rather than trying to move data around behind its back – is much safer than
> hoping one's cobbled together the right set of scripts to move data in a
> way that won't violate strong / transactional consistency guarantees. This
> complexity is kind of exemplified by the "Migrating One Instance" section
> of the doc and state machine diagram, which illustrates an approach to
> solving that problem.
>
> – An ecosystem-native approach poses fewer security concerns than rsync.
> mTLS-authenticated endpoints in the sidecar for data movement eliminate the
> requirement for orchestration to occur via (typically) high-privilege SSH,
> which often allows for code execution of some form or complex efforts to
> scope SSH privileges of particular users; and eliminates the need to manage
> and secure rsyncd processes on each instance if not via SSH.
>
> – An ecosystem-native approach is more instrumentable and measurable than
> rsync. Support for data migration endpoints in the sidecar would allow for
> metrics reporting, stats collection, and alerting via mature and modern
> mechanisms rather than monitoring the output of a shell s

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-10 Thread C. Scott Andreas
Oh, one note on this item:>  The operator can ensure that files in the destination matches with the source. In the first iteration of this feature, an API is introduced to calculate digest for the list of file names and their lengths to identify any mismatches. It does not validate the file contents at the binary level, but, such feature can be added at a later point of time.When enabled for LCS, single sstable uplevel will mutate only the level of an SSTable in its stats metadata component, which wouldn't alter the filename and may not alter the length of the stats metadata component. A change to the level of an SSTable on the source via single sstable uplevel may not be caught by a digest based only on filename and length.Including the file’s modification timestamp would address this without requiring a deep hash of the data. This would be good to include to ensure SSTables aren’t downleveled unexpectedly during migration.- ScottOn Apr 8, 2024, at 2:15 PM, C. Scott Andreas  wrote:Hi Jon,Thanks for taking the time to read and reply to this proposal. Would encourage you to approach it from an attitude of seeking understanding on the part of the first-time CEP author, as this reply casts it off pretty quickly as NIH.The proposal isn't mine, but I'll offer a few notes on where I see this as valuable:– It's valuable for Cassandra to have an ecosystem-native mechanism of migrating data between physical/virtual instances outside the standard streaming path. As Hari mentions, the current ecosystem-native approach of executing repairs, decommissions, and bootstraps is time-consuming and cumbersome.– An ecosystem-native solution is safer than a bunch of bash and rsync. Defining a safe protocol to migrate data between instances via rsync without downtime is surprisingly difficult - and even moreso to do safely and repeatedly at scale. Enabling this process to be orchestrated by a control plane mechanizing offical endpoints of the database and sidecar – rather than trying to move data around behind its back – is much safer than hoping one's cobbled together the right set of scripts to move data in a way that won't violate strong / transactional consistency guarantees. This complexity is kind of exemplified by the "Migrating One Instance" section of the doc and state machine diagram, which illustrates an approach to solving that problem.– An ecosystem-native approach poses fewer security concerns than rsync. mTLS-authenticated endpoints in the sidecar for data movement eliminate the requirement for orchestration to occur via (typically) high-privilege SSH, which often allows for code execution of some form or complex efforts to scope SSH privileges of particular users; and eliminates the need to manage and secure rsyncd processes on each instance if not via SSH.– An ecosystem-native approach is more instrumentable and measurable than rsync. Support for data migration endpoints in the sidecar would allow for metrics reporting, stats collection, and alerting via mature and modern mechanisms rather than monitoring the output of a shell script.I'll yield to Hari to share more, though today is a public holiday in India.I do see this CEP as solving an important problem.Thanks,– ScottOn Apr 8, 2024, at 10:23 AM, Jon Haddad  wrote:This seems like a lot of work to create an rsync alternative.  I can't really say I see the point.  I noticed your "rejected alternatives" mentions it with this note:However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration.This feels more like NIH than solving a real problem, as what you've listed is a hypothetical, and one that's easily addressed.JonOn Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala  wrote:Hi all,I have filed CEP-40 [1] for live migrating Cassandra instances using the Cassandra Sidecar.When someone needs to move all or a portion of the Cassandra nodes belonging to a cluster to different hosts, the traditional approach of Cassandra node replacement can be time-consuming due to repairs and the bootstrapping of new nodes. Depending on the volume of the storage service load, replacements (repair + bootstrap) may take anywhere from a few hours to days.Proposing a Sidecar based solution to address these challenges. This solution proposes transferring data from the old host (source) to the new host (destination) and then bringing up the Cassandra process at the destination, to enable fast instance migration. This approach would help to minimise node downtime, as it is based on a Sidecar solution for data transfer and avoids repairs and bootstrap.Looking forward to the discussions.[1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+InstancesThanks!Hari

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-08 Thread C. Scott Andreas

Hi Jon, Thanks for taking the time to read and reply to this proposal. Would encourage you to approach it from 
an attitude of seeking understanding on the part of the first-time CEP author, as this reply casts it off 
pretty quickly as NIH. The proposal isn't mine, but I'll offer a few notes on where I see this as valuable: – 
It's valuable for Cassandra to have an ecosystem-native mechanism of migrating data between physical/virtual 
instances outside the standard streaming path. As Hari mentions, the current ecosystem-native approach of 
executing repairs, decommissions, and bootstraps is time-consuming and cumbersome. – An ecosystem-native 
solution is safer than a bunch of bash and rsync. Defining a safe protocol to migrate data between instances 
via rsync without downtime is surprisingly difficult - and even moreso to do safely and repeatedly at scale. 
Enabling this process to be orchestrated by a control plane mechanizing offical endpoints of the database and 
sidecar – rather than trying to move data around behind its back – is much safer than hoping one's cobbled 
together the right set of scripts to move data in a way that won't violate strong / transactional consistency 
guarantees. This complexity is kind of exemplified by the "Migrating One Instance" section of the doc 
and state machine diagram, which illustrates an approach to solving that problem. – An ecosystem-native 
approach poses fewer security concerns than rsync. mTLS-authenticated endpoints in the sidecar for data 
movement eliminate the requirement for orchestration to occur via (typically) high-privilege SSH, which often 
allows for code execution of some form or complex efforts to scope SSH privileges of particular users; and 
eliminates the need to manage and secure rsyncd processes on each instance if not via SSH. – An 
ecosystem-native approach is more instrumentable and measurable than rsync. Support for data migration 
endpoints in the sidecar would allow for metrics reporting, stats collection, and alerting via mature and 
modern mechanisms rather than monitoring the output of a shell script. I'll yield to Hari to share more, though 
today is a public holiday in India. I do see this CEP as solving an important problem. Thanks, – Scott On Apr 
8, 2024, at 10:23 AM, Jon Haddad  wrote: This seems like a lot of work to create an 
rsync alternative. I can't really say I see the point. I noticed your "rejected alternatives" 
mentions it with this note: However, it might not be permitted by the administrator or available in various 
environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar 
facilitates smooth instance migration . This feels more like NIH than solving a real problem, as what you've 
listed is a hypothetical, and one that's easily addressed. Jon On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari 
Krishna Nukala < [email protected] > wrote: Hi all, I have filed CEP-40 [1] for live 
migrating Cassandra instances using the Cassandra Sidecar. When someone needs to move all or a portion of the 
Cassandra nodes belonging to a cluster to different hosts, the traditional approach of Cassandra node 
replacement can be time-consuming due to repairs and the bootstrapping of new nodes. Depending on the volume of 
the storage service load, replacements (repair + bootstrap) may take anywhere from a few hours to days. 
Proposing a Sidecar based solution to address these challenges. This solution proposes transferring data from 
the old host (source) to the new host (destination) and then bringing up the Cassandra process at the 
destination, to enable fast instance migration. This approach would help to minimise node downtime, as it is 
based on a Sidecar solution for data transfer and avoids repairs and bootstrap. Looking forward to the 
discussions. [1] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
 Thanks! Hari

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-08 Thread Jon Haddad
This seems like a lot of work to create an rsync alternative.  I can't
really say I see the point.  I noticed your "rejected alternatives"
mentions it with this note:


   - However, it might not be permitted by the administrator or available
   in various environments such as Kubernetes or virtual instances like EC2.
   Enabling data transfer through a sidecar facilitates smooth instance
   migration.

This feels more like NIH than solving a real problem, as what you've listed
is a hypothetical, and one that's easily addressed.

Jon



On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala <
[email protected]> wrote:

> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>