Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-10 Thread Inès Potier
Thanks for your response!
Following your advice, I filed a Jira ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-18319

On Thu, Mar 9, 2023 at 11:16 AM Jeff Jirsa  wrote:

> I described something roughly similar to this a few years ago on the list.
> The specific chain you're describing isn't one I've thought about before,
> but if you open a JIRA for tracking and attribution, I'll ask some folks to
> take a peek at it.
>
>
>
> On Thu, Mar 9, 2023 at 10:57 AM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> Reaching out again in case anyone has recently faced the below issue.
>> Additional opinions on this would be super helpful for us.
>>
>> Thanks in advance,
>> Ines
>>
>> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
>> wrote:
>>
>>> Hi Cassandra community,
>>>
>>> We have recently encountered a recurring old IP reappearance issue while
>>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>>> We have not yet found other references to this issue online. We could
>>> really use some additional inputs/opinions, both on the problem itself and
>>> the fix we are currently considering.
>>>
>>> *Issue Description*
>>>
>>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We
>>> have noticed that this behavior, associated with a decommission operation,
>>> can get the cluster into an erroneous state.
>>>
>>> Consider the following situation: a Cassandra node node1 , with hostId1,
>>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>>> After a couple gossip iterations, all other nodes’ nodetool status output
>>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>>> entry.
>>>
>>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>>> not have a lot of data, and the decommission operation completes pretty
>>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s
>>> pod is deleted.
>>>
>>> After a minute delay, the cluster enters the erroneous state. An  old_IP
>>>  DN entry reappears in nodetool status, owning 20.5% of the token ring.
>>> No node owns this IP anymore and according to logs, old_IP is still
>>> associated with hostId1.
>>>
>>> *Issue Root Cause*
>>>
>>> By digging through Cassandra logs, and re-testing this scenario over and
>>> over again, we have reached the following conclusion:
>>>
>>>- Other nodes will continue exchanging gossip about old_IP , even
>>>after it becomes a fatClient.
>>>- The fatClient timeout and subsequent quarantine does not stop
>>>old_IP from reappearing in a node’s Gossip state, once its
>>>quarantine is over. We believe that this is due to a misalignment on all
>>>nodes’ old_IP expiration time.
>>>- Once new_IP has left the cluster, and old_IP next gossip state
>>>message is received by a node, StorageService will no longer face
>>>collisions (or will, but with an even older IP) for hostId1 and its
>>>corresponding tokens. As a result, old_IP will regain ownership of
>>>20.5% of the token ring.
>>>
>>>
>>> *Proposed fix*
>>>
>>> Following the above investigation, we were thinking about implementing
>>> the following fix:
>>>
>>> When a node receives a gossip status change with STATE_LEFT for a
>>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>>> following criteria:
>>>
>>>- endpointStateMap contains this endpoint
>>>- The endpoint is not currently a token owner (
>>>!tokenMetadata.isMember(endpoint))
>>>- The endpoint’s hostId matches the hostId of new_IP
>>>- The endpoint is older than leaving_IP (
>>>Gossiper.instance.compareEndpointStartup)
>>>- The endpoint’s token range (from endpointStateMap) intersects with
>>>new_IP’s
>>>
>>> This modification’s intention is to force nodes to realign on old_IP 
>>> expiration,
>>> and expunge it from Gossip so it does not reappear after new_IP leaves
>>> the ring.
>>>
>>>
>>> Additional opinions/ideas regarding the fix’s viability and the issue
>>> itself would be really helpful.
>>> Thanks in advance,
>>> Ines
>>>
>>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Tom Nora
unsubscribe


*Tom Nora  *

*Startup Growth & Funding*

The Book --
HACKING THE
CORE 

*linkedin   |  twitter
  |  angellist   *





On Thu, Mar 9, 2023 at 10:57 AM Inès Potier  wrote:

> Hi Cassandra community,
>
> Reaching out again in case anyone has recently faced the below issue.
> Additional opinions on this would be super helpful for us.
>
> Thanks in advance,
> Ines
>
> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> We have recently encountered a recurring old IP reappearance issue while
>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>> We have not yet found other references to this issue online. We could
>> really use some additional inputs/opinions, both on the problem itself and
>> the fix we are currently considering.
>>
>> *Issue Description*
>>
>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
>> noticed that this behavior, associated with a decommission operation, can
>> get the cluster into an erroneous state.
>>
>> Consider the following situation: a Cassandra node node1 , with hostId1,
>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>> After a couple gossip iterations, all other nodes’ nodetool status output
>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>> entry.
>>
>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>> not have a lot of data, and the decommission operation completes pretty
>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
>> is deleted.
>>
>> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
>> entry reappears in nodetool status, owning 20.5% of the token ring. No node
>> owns this IP anymore and according to logs, old_IP is still associated
>> with hostId1.
>>
>> *Issue Root Cause*
>>
>> By digging through Cassandra logs, and re-testing this scenario over and
>> over again, we have reached the following conclusion:
>>
>>- Other nodes will continue exchanging gossip about old_IP , even
>>after it becomes a fatClient.
>>- The fatClient timeout and subsequent quarantine does not stop old_IP
>> from reappearing in a node’s Gossip state, once its quarantine is
>>over. We believe that this is due to a misalignment on all nodes’
>>old_IP expiration time.
>>- Once new_IP has left the cluster, and old_IP next gossip state
>>message is received by a node, StorageService will no longer face
>>collisions (or will, but with an even older IP) for hostId1 and its
>>corresponding tokens. As a result, old_IP will regain ownership of
>>20.5% of the token ring.
>>
>>
>> *Proposed fix*
>>
>> Following the above investigation, we were thinking about implementing
>> the following fix:
>>
>> When a node receives a gossip status change with STATE_LEFT for a
>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>> following criteria:
>>
>>- endpointStateMap contains this endpoint
>>- The endpoint is not currently a token owner (
>>!tokenMetadata.isMember(endpoint))
>>- The endpoint’s hostId matches the hostId of new_IP
>>- The endpoint is older than leaving_IP (
>>Gossiper.instance.compareEndpointStartup)
>>- The endpoint’s token range (from endpointStateMap) intersects with
>>new_IP’s
>>
>> This modification’s intention is to force nodes to realign on old_IP 
>> expiration,
>> and expunge it from Gossip so it does not reappear after new_IP leaves
>> the ring.
>>
>>
>> Additional opinions/ideas regarding the fix’s viability and the issue
>> itself would be really helpful.
>> Thanks in advance,
>> Ines
>>
>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Jeff Jirsa
I described something roughly similar to this a few years ago on the list.
The specific chain you're describing isn't one I've thought about before,
but if you open a JIRA for tracking and attribution, I'll ask some folks to
take a peek at it.



On Thu, Mar 9, 2023 at 10:57 AM Inès Potier  wrote:

> Hi Cassandra community,
>
> Reaching out again in case anyone has recently faced the below issue.
> Additional opinions on this would be super helpful for us.
>
> Thanks in advance,
> Ines
>
> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> We have recently encountered a recurring old IP reappearance issue while
>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>> We have not yet found other references to this issue online. We could
>> really use some additional inputs/opinions, both on the problem itself and
>> the fix we are currently considering.
>>
>> *Issue Description*
>>
>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
>> noticed that this behavior, associated with a decommission operation, can
>> get the cluster into an erroneous state.
>>
>> Consider the following situation: a Cassandra node node1 , with hostId1,
>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>> After a couple gossip iterations, all other nodes’ nodetool status output
>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>> entry.
>>
>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>> not have a lot of data, and the decommission operation completes pretty
>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
>> is deleted.
>>
>> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
>> entry reappears in nodetool status, owning 20.5% of the token ring. No node
>> owns this IP anymore and according to logs, old_IP is still associated
>> with hostId1.
>>
>> *Issue Root Cause*
>>
>> By digging through Cassandra logs, and re-testing this scenario over and
>> over again, we have reached the following conclusion:
>>
>>- Other nodes will continue exchanging gossip about old_IP , even
>>after it becomes a fatClient.
>>- The fatClient timeout and subsequent quarantine does not stop old_IP
>> from reappearing in a node’s Gossip state, once its quarantine is
>>over. We believe that this is due to a misalignment on all nodes’
>>old_IP expiration time.
>>- Once new_IP has left the cluster, and old_IP next gossip state
>>message is received by a node, StorageService will no longer face
>>collisions (or will, but with an even older IP) for hostId1 and its
>>corresponding tokens. As a result, old_IP will regain ownership of
>>20.5% of the token ring.
>>
>>
>> *Proposed fix*
>>
>> Following the above investigation, we were thinking about implementing
>> the following fix:
>>
>> When a node receives a gossip status change with STATE_LEFT for a
>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>> following criteria:
>>
>>- endpointStateMap contains this endpoint
>>- The endpoint is not currently a token owner (
>>!tokenMetadata.isMember(endpoint))
>>- The endpoint’s hostId matches the hostId of new_IP
>>- The endpoint is older than leaving_IP (
>>Gossiper.instance.compareEndpointStartup)
>>- The endpoint’s token range (from endpointStateMap) intersects with
>>new_IP’s
>>
>> This modification’s intention is to force nodes to realign on old_IP 
>> expiration,
>> and expunge it from Gossip so it does not reappear after new_IP leaves
>> the ring.
>>
>>
>> Additional opinions/ideas regarding the fix’s viability and the issue
>> itself would be really helpful.
>> Thanks in advance,
>> Ines
>>
>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Inès Potier
Hi Cassandra community,

Reaching out again in case anyone has recently faced the below issue.
Additional opinions on this would be super helpful for us.

Thanks in advance,
Ines

On Thu, Feb 23, 2023 at 3:40 PM Inès Potier  wrote:

> Hi Cassandra community,
>
> We have recently encountered a recurring old IP reappearance issue while
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> We have not yet found other references to this issue online. We could
> really use some additional inputs/opinions, both on the problem itself and
> the fix we are currently considering.
>
> *Issue Description*
>
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
> noticed that this behavior, associated with a decommission operation, can
> get the cluster into an erroneous state.
>
> Consider the following situation: a Cassandra node node1 , with hostId1,
> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
> After a couple gossip iterations, all other nodes’ nodetool status output
> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
> entry.
>
> Shortly after the bounce, node1 gets decommissioned. Our cluster does not
> have a lot of data, and the decommission operation completes pretty
> quickly. Logs on other nodes start showing acknowledgment that node1 has
> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
> is deleted.
>
> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
> entry reappears in nodetool status, owning 20.5% of the token ring. No node
> owns this IP anymore and according to logs, old_IP is still associated
> with hostId1.
>
> *Issue Root Cause*
>
> By digging through Cassandra logs, and re-testing this scenario over and
> over again, we have reached the following conclusion:
>
>- Other nodes will continue exchanging gossip about old_IP , even
>after it becomes a fatClient.
>- The fatClient timeout and subsequent quarantine does not stop old_IP from
>reappearing in a node’s Gossip state, once its quarantine is over. We
>believe that this is due to a misalignment on all nodes’ old_IP expiration
>time.
>- Once new_IP has left the cluster, and old_IP next gossip state
>message is received by a node, StorageService will no longer face
>collisions (or will, but with an even older IP) for hostId1 and its
>corresponding tokens. As a result, old_IP will regain ownership of
>20.5% of the token ring.
>
>
> *Proposed fix*
>
> Following the above investigation, we were thinking about implementing the
> following fix:
>
> When a node receives a gossip status change with STATE_LEFT for a leaving
> endpoint new_IP, before evicting new_IP from the token ring, purge from
> Gossip (ie evictFromMembership) all endpoints that meet the following
> criteria:
>
>- endpointStateMap contains this endpoint
>- The endpoint is not currently a token owner (
>!tokenMetadata.isMember(endpoint))
>- The endpoint’s hostId matches the hostId of new_IP
>- The endpoint is older than leaving_IP (
>Gossiper.instance.compareEndpointStartup)
>- The endpoint’s token range (from endpointStateMap) intersects with
>new_IP’s
>
> This modification’s intention is to force nodes to realign on old_IP 
> expiration,
> and expunge it from Gossip so it does not reappear after new_IP leaves
> the ring.
>
>
> Additional opinions/ideas regarding the fix’s viability and the issue
> itself would be really helpful.
> Thanks in advance,
> Ines
>


Re: Cassandra on Kubernetes

2019-10-30 Thread John Sanda
One of the problems I have experienced in the past has more to do with Java
than Cassandra in particular, and that is the JVM ignoring cgroups. With
Cassandra in particular I would often see memory usage go higher than what
was desired. This would lead to pods getting oom killed. This was fixed in
Java 10 though, and I believe even back ported to Java 8.

I think another issue is the lack of options for backup/restore.

I worked with Cassandra in Kubernetes pre-StatefulSets. That was a bit
rough :)

Local volumes were promoted to GA in Kubernetes 1.14. That is certainly a
good thing for stateful applications like Cassandra.

It is also important to have a sufficiently large value for the termination
grace period on pods to allow drain operation to complete, assuming you
perform a drain on shutdown.

On Wed, Oct 30, 2019 at 1:46 PM Akshit Jain  wrote:

> Hi Jean
> Thanks for replying. I had seen CassKop and the amount functionality it
> provides is quite awesome as compared to other operators.
>
> I would like to know how stable is kubernetes for stateful/database
> applications right now?
>
> I haven't read/heard any major production stateful application running on
> k8s.
>
>
> -Akshit
>
>
>
>
> On Wed, 30 Oct, 2019, 8:12 PM Jean-Armel Luce,  wrote:
>
>> Hi,
>>
>> We are currently developping CassKop, a Cassandra operator for K8S.
>> This operator is developped in Go, based on the operator-sdk framework.
>>
>> At this time of the project, the goal is to deploy a Cassandra cluster in
>> 1 Kubernetes datacenter, but this will change in next versions to deal with
>> Kubernetes in multi-datacenters.
>>
>> The following features are already supported by CassKop:
>> - Deployment of a C* cluster (rack or AZ aware)
>> - Scaling up the cluster (with cleanup)
>> - Scaling down the cluster (with decommission prior to Kubernetes scale
>> down)
>> - Pods operations (removenode, upgradesstable, cleanup, rebuild..)
>> - Adding a Cassandra DC
>> - Removing a Cassandra DC
>> - Setting and modifying configuration files
>> - Setting and modifying configuration parameters
>> - Update of the Cassandra docker image
>> - Rolling update of a Cassandra cluster
>> - Update of Cassandra version (including upgradesstable in case of major
>> upgrade)
>> - Update of JVM
>> - Update of configuration
>> - Stopping a Kubernetes node for maintenance
>> - Process a remove node (and create new Cassandra node on another
>> Kubernetes node)
>> - Process a replace address (of the old Cassandra node on another
>> Kubernetes node)
>> - Manage operations on pods through CassKop plugin (cleanup, rebuild,
>> upgradesstable, removenode..)
>> - Monitoring (using Instaclustr Prometheus exporter to Prometheus/Grafana)
>> - Pause/Restart & rolling restart operations through CassKoP plugin.
>>
>> We use also Cassandra reaper for scheduling repair sessions.
>>
>>
>> If you would like more informations about this operator, you may have a
>> look here : https://github.com/Orange-OpenSource/cassandra-k8s-operator
>>
>> Please, feel free to download it and try it. We would be more than happy
>> to receive your feedback
>>
>>
>> If you have any question about this operator, feel free to contact us via
>> our mailing-list: prj.casskop.supp...@list.orangeportails.net or on our
>> slack https://casskop.slack.com
>>
>> Note : this operator is still in alpha version and works only in a mono
>> region architecture for now. We are currently working hard for adding new
>> features in order to run it in multi-regions architecture.
>>
>>
>> Thanks.
>>
>>
>>
>> Le mer. 30 oct. 2019 à 13:56, Akshit Jain  a
>> écrit :
>>
>>> Hi everyone,
>>>
>>> Is there anyone who is running Cassandra on K8s clusters. It would be
>>> great if you can share your experience , the operator you are using and the
>>> overall stability of stateful sets in Kubernetes
>>>
>>> -Akshit
>>>
>>

-- 

- John


Re: Cassandra on Kubernetes

2019-10-30 Thread Jean-Armel Luce
Hi Jain,


Thanks for your comments about CassKop.

We began the development of Casskop at the beginning of 2018. At this time,
some K8S objects (i.e. statefulsets, operators, …) were still in beta
version and we discovered a few strange behaviours.


We upgraded to K8S 1.12 in mid-2018.

After this upgrade, we did not encounter any real problems.


However, we continue our tests; we have for example a cluster (16 nodes in
3 DC in the same region) which was deployed via CassKop and has been
working correctly for 3 months , and we will continue to observe it during
the next months before hopefully deploying a C* cluster via cassKop in
production.

In parallel, we do other tests of robustness and performances; right now,
we have no real issue to report about K8S for our use case.


I can’t say more right now.


Thanks for your question. I would be more than happy to read other answers.

Le mer. 30 oct. 2019 à 18:46, Akshit Jain  a écrit :

> Hi Jean
> Thanks for replying. I had seen CassKop and the amount functionality it
> provides is quite awesome as compared to other operators.
>
> I would like to know how stable is kubernetes for stateful/database
> applications right now?
>
> I haven't read/heard any major production stateful application running on
> k8s.
>
>
> -Akshit
>
>
>
>
> On Wed, 30 Oct, 2019, 8:12 PM Jean-Armel Luce,  wrote:
>
>> Hi,
>>
>> We are currently developping CassKop, a Cassandra operator for K8S.
>> This operator is developped in Go, based on the operator-sdk framework.
>>
>> At this time of the project, the goal is to deploy a Cassandra cluster in
>> 1 Kubernetes datacenter, but this will change in next versions to deal with
>> Kubernetes in multi-datacenters.
>>
>> The following features are already supported by CassKop:
>> - Deployment of a C* cluster (rack or AZ aware)
>> - Scaling up the cluster (with cleanup)
>> - Scaling down the cluster (with decommission prior to Kubernetes scale
>> down)
>> - Pods operations (removenode, upgradesstable, cleanup, rebuild..)
>> - Adding a Cassandra DC
>> - Removing a Cassandra DC
>> - Setting and modifying configuration files
>> - Setting and modifying configuration parameters
>> - Update of the Cassandra docker image
>> - Rolling update of a Cassandra cluster
>> - Update of Cassandra version (including upgradesstable in case of major
>> upgrade)
>> - Update of JVM
>> - Update of configuration
>> - Stopping a Kubernetes node for maintenance
>> - Process a remove node (and create new Cassandra node on another
>> Kubernetes node)
>> - Process a replace address (of the old Cassandra node on another
>> Kubernetes node)
>> - Manage operations on pods through CassKop plugin (cleanup, rebuild,
>> upgradesstable, removenode..)
>> - Monitoring (using Instaclustr Prometheus exporter to Prometheus/Grafana)
>> - Pause/Restart & rolling restart operations through CassKoP plugin.
>>
>> We use also Cassandra reaper for scheduling repair sessions.
>>
>>
>> If you would like more informations about this operator, you may have a
>> look here : https://github.com/Orange-OpenSource/cassandra-k8s-operator
>>
>> Please, feel free to download it and try it. We would be more than happy
>> to receive your feedback
>>
>>
>> If you have any question about this operator, feel free to contact us via
>> our mailing-list: prj.casskop.supp...@list.orangeportails.net or on our
>> slack https://casskop.slack.com
>>
>> Note : this operator is still in alpha version and works only in a mono
>> region architecture for now. We are currently working hard for adding new
>> features in order to run it in multi-regions architecture.
>>
>>
>> Thanks.
>>
>>
>>
>> Le mer. 30 oct. 2019 à 13:56, Akshit Jain  a
>> écrit :
>>
>>> Hi everyone,
>>>
>>> Is there anyone who is running Cassandra on K8s clusters. It would be
>>> great if you can share your experience , the operator you are using and the
>>> overall stability of stateful sets in Kubernetes
>>>
>>> -Akshit
>>>
>>


Re: Cassandra on Kubernetes

2019-10-30 Thread Akshit Jain
Hi Jean
Thanks for replying. I had seen CassKop and the amount functionality it
provides is quite awesome as compared to other operators.

I would like to know how stable is kubernetes for stateful/database
applications right now?

I haven't read/heard any major production stateful application running on
k8s.


-Akshit




On Wed, 30 Oct, 2019, 8:12 PM Jean-Armel Luce,  wrote:

> Hi,
>
> We are currently developping CassKop, a Cassandra operator for K8S.
> This operator is developped in Go, based on the operator-sdk framework.
>
> At this time of the project, the goal is to deploy a Cassandra cluster in
> 1 Kubernetes datacenter, but this will change in next versions to deal with
> Kubernetes in multi-datacenters.
>
> The following features are already supported by CassKop:
> - Deployment of a C* cluster (rack or AZ aware)
> - Scaling up the cluster (with cleanup)
> - Scaling down the cluster (with decommission prior to Kubernetes scale
> down)
> - Pods operations (removenode, upgradesstable, cleanup, rebuild..)
> - Adding a Cassandra DC
> - Removing a Cassandra DC
> - Setting and modifying configuration files
> - Setting and modifying configuration parameters
> - Update of the Cassandra docker image
> - Rolling update of a Cassandra cluster
> - Update of Cassandra version (including upgradesstable in case of major
> upgrade)
> - Update of JVM
> - Update of configuration
> - Stopping a Kubernetes node for maintenance
> - Process a remove node (and create new Cassandra node on another
> Kubernetes node)
> - Process a replace address (of the old Cassandra node on another
> Kubernetes node)
> - Manage operations on pods through CassKop plugin (cleanup, rebuild,
> upgradesstable, removenode..)
> - Monitoring (using Instaclustr Prometheus exporter to Prometheus/Grafana)
> - Pause/Restart & rolling restart operations through CassKoP plugin.
>
> We use also Cassandra reaper for scheduling repair sessions.
>
>
> If you would like more informations about this operator, you may have a
> look here : https://github.com/Orange-OpenSource/cassandra-k8s-operator
>
> Please, feel free to download it and try it. We would be more than happy
> to receive your feedback
>
>
> If you have any question about this operator, feel free to contact us via
> our mailing-list: prj.casskop.supp...@list.orangeportails.net or on our
> slack https://casskop.slack.com
>
> Note : this operator is still in alpha version and works only in a mono
> region architecture for now. We are currently working hard for adding new
> features in order to run it in multi-regions architecture.
>
>
> Thanks.
>
>
>
> Le mer. 30 oct. 2019 à 13:56, Akshit Jain  a
> écrit :
>
>> Hi everyone,
>>
>> Is there anyone who is running Cassandra on K8s clusters. It would be
>> great if you can share your experience , the operator you are using and the
>> overall stability of stateful sets in Kubernetes
>>
>> -Akshit
>>
>


Re: Cassandra on Kubernetes

2019-10-30 Thread Jean-Armel Luce
Hi,

We are currently developping CassKop, a Cassandra operator for K8S.
This operator is developped in Go, based on the operator-sdk framework.

At this time of the project, the goal is to deploy a Cassandra cluster in 1
Kubernetes datacenter, but this will change in next versions to deal with
Kubernetes in multi-datacenters.

The following features are already supported by CassKop:
- Deployment of a C* cluster (rack or AZ aware)
- Scaling up the cluster (with cleanup)
- Scaling down the cluster (with decommission prior to Kubernetes scale
down)
- Pods operations (removenode, upgradesstable, cleanup, rebuild..)
- Adding a Cassandra DC
- Removing a Cassandra DC
- Setting and modifying configuration files
- Setting and modifying configuration parameters
- Update of the Cassandra docker image
- Rolling update of a Cassandra cluster
- Update of Cassandra version (including upgradesstable in case of major
upgrade)
- Update of JVM
- Update of configuration
- Stopping a Kubernetes node for maintenance
- Process a remove node (and create new Cassandra node on another
Kubernetes node)
- Process a replace address (of the old Cassandra node on another
Kubernetes node)
- Manage operations on pods through CassKop plugin (cleanup, rebuild,
upgradesstable, removenode..)
- Monitoring (using Instaclustr Prometheus exporter to Prometheus/Grafana)
- Pause/Restart & rolling restart operations through CassKoP plugin.

We use also Cassandra reaper for scheduling repair sessions.


If you would like more informations about this operator, you may have a
look here : https://github.com/Orange-OpenSource/cassandra-k8s-operator

Please, feel free to download it and try it. We would be more than happy to
receive your feedback


If you have any question about this operator, feel free to contact us via
our mailing-list: prj.casskop.supp...@list.orangeportails.net or on our
slack https://casskop.slack.com

Note : this operator is still in alpha version and works only in a mono
region architecture for now. We are currently working hard for adding new
features in order to run it in multi-regions architecture.


Thanks.



Le mer. 30 oct. 2019 à 13:56, Akshit Jain  a écrit :

> Hi everyone,
>
> Is there anyone who is running Cassandra on K8s clusters. It would be
> great if you can share your experience , the operator you are using and the
> overall stability of stateful sets in Kubernetes
>
> -Akshit
>


Re: Cassandra and Kubernetes and scaling

2016-09-12 Thread Jens Rantil
David,

Were you the one who wrote the article? I just finished reading it. It's
excellent! I'm also excited that running mutable infrastructure on
containers is maturing. I have a few specific questions you (or someone
else!) might be able to answer.

1. In the article you state

> We deployed 1,009 minion nodes to Google Compute Engine
 (GCE), spread across 4 zones, running a
custom version of the Kubernetes 1.3 beta.

Did you deploy a custom Kubernetes on GCE because 1.3 wasn't available? Or
was that because Pet Sets alpha feature was disabled on Google Cloud
Platform's hosted Kubernetes[1]?

[1] http://serverfault.com/q/802437/37237

2. The article stated

> Yes we deployed 1,000 pets, but one really did not want to join the party!

Do you have any speculation why this happened? By default Cassandra doesn't
allow concurrent nodes joining the cluster, but Pet Sets are added serially
by definition, right?

3. The article doesn't mention downscaling. Do you have any idea on how
that would/could be done? I consider myself a Kubernetes/container noob. It
there an equivalent of `readinessProbe` for shutting down containers? Or
would an external agent have to be deployed that orchestrates `nodetool
decommission`s an instance and then reduces the number of replicas by one
for the Pet Set?

4. For a smaller number of Cassandra nodes. Would you feel comfortable
running it on Kubernetes 1.3? ;)

Cheers,
Jens

On Monday, September 12, 2016, David Aronchick  wrote:

> Please let me know if I can help at all!
>
> On Sun, Sep 11, 2016 at 2:55 PM, Jens Rantil  > wrote:
>
>> Hi Aiman,
>>
>> I noticed you never got any reply. This might be of interest:
>> http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-
>> kubernetes-pet-set.html
>>
>> Cheers,
>> Jens
>>
>> On Tuesday, May 24, 2016, Aiman Parvaiz > > wrote:
>>
>>> Looking forward to hearing from the community about this.
>>>
>>> Sent from my iPhone
>>>
>>> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz 
>>> wrote:
>>> >
>>> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
>>> and have a few follow up questions.  It seems that especially after v1.2 of
>>> Kubernetes, and the upcoming 1.3 features, this would be a very viable
>>> option of running Cassandra on.
>>> >
>>> > My questions pertain to HostIds and Scaling Up/Down, and are related:
>>> >
>>> > 1.  If a container's host dies and is then brought up on another host,
>>> can you start up with the same PersistentVolume as the original container
>>> had?  Which begs the question would the new container get a new HostId,
>>> implying it would need to bootstrap into the environment?   If it's a
>>> bootstrap, does the old one get deco'd/assassinated?
>>> >
>>> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
>>> just kick off Bootstrapping the node into the cluster, but what if you need
>>> to scale down?  Would the Container get deco'd by the scaling down process?
>>> or just terminated, leaving you with potential missing replicas
>>> >
>>> > 3. Scaling up and increasing the RF of a particular keyspace, would
>>> there be a clean way to do this with the kubernetes tooling?
>>> >
>>> > In the end I'm wondering how much of the Kubernetes + Cassandra
>>> involves nodetool, and how much is just a Docker image where you need to
>>> manage that all yourself (painfully)
>>> >
>>> > --
>>> > --mike
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> 
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook  Linkedin
>> 
>>  Twitter 
>>
>>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Cassandra and Kubernetes and scaling

2016-09-11 Thread David Aronchick
Please let me know if I can help at all!

On Sun, Sep 11, 2016 at 2:55 PM, Jens Rantil  wrote:

> Hi Aiman,
>
> I noticed you never got any reply. This might be of interest: http://blog.
> kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-
> pet-set.html
>
> Cheers,
> Jens
>
> On Tuesday, May 24, 2016, Aiman Parvaiz  wrote:
>
>> Looking forward to hearing from the community about this.
>>
>> Sent from my iPhone
>>
>> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz 
>> wrote:
>> >
>> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
>> and have a few follow up questions.  It seems that especially after v1.2 of
>> Kubernetes, and the upcoming 1.3 features, this would be a very viable
>> option of running Cassandra on.
>> >
>> > My questions pertain to HostIds and Scaling Up/Down, and are related:
>> >
>> > 1.  If a container's host dies and is then brought up on another host,
>> can you start up with the same PersistentVolume as the original container
>> had?  Which begs the question would the new container get a new HostId,
>> implying it would need to bootstrap into the environment?   If it's a
>> bootstrap, does the old one get deco'd/assassinated?
>> >
>> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
>> just kick off Bootstrapping the node into the cluster, but what if you need
>> to scale down?  Would the Container get deco'd by the scaling down process?
>> or just terminated, leaving you with potential missing replicas
>> >
>> > 3. Scaling up and increasing the RF of a particular keyspace, would
>> there be a clean way to do this with the kubernetes tooling?
>> >
>> > In the end I'm wondering how much of the Kubernetes + Cassandra
>> involves nodetool, and how much is just a Docker image where you need to
>> manage that all yourself (painfully)
>> >
>> > --
>> > --mike
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>
>


Re: Cassandra and Kubernetes and scaling

2016-09-11 Thread Jens Rantil
Hi Aiman,

I noticed you never got any reply. This might be of interest:
http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-pet-set.html

Cheers,
Jens

On Tuesday, May 24, 2016, Aiman Parvaiz  wrote:

> Looking forward to hearing from the community about this.
>
> Sent from my iPhone
>
> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz  > wrote:
> >
> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
> and have a few follow up questions.  It seems that especially after v1.2 of
> Kubernetes, and the upcoming 1.3 features, this would be a very viable
> option of running Cassandra on.
> >
> > My questions pertain to HostIds and Scaling Up/Down, and are related:
> >
> > 1.  If a container's host dies and is then brought up on another host,
> can you start up with the same PersistentVolume as the original container
> had?  Which begs the question would the new container get a new HostId,
> implying it would need to bootstrap into the environment?   If it's a
> bootstrap, does the old one get deco'd/assassinated?
> >
> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
> just kick off Bootstrapping the node into the cluster, but what if you need
> to scale down?  Would the Container get deco'd by the scaling down process?
> or just terminated, leaving you with potential missing replicas
> >
> > 3. Scaling up and increasing the RF of a particular keyspace, would
> there be a clean way to do this with the kubernetes tooling?
> >
> > In the end I'm wondering how much of the Kubernetes + Cassandra involves
> nodetool, and how much is just a Docker image where you need to manage that
> all yourself (painfully)
> >
> > --
> > --mike
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Cassandra and Kubernetes and scaling

2016-05-24 Thread Aiman Parvaiz
Looking forward to hearing from the community about this.

Sent from my iPhone

> On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz  wrote:
> 
> I saw a thread from April 2016 talking about Cassandra and Kubernetes, and 
> have a few follow up questions.  It seems that especially after v1.2 of 
> Kubernetes, and the upcoming 1.3 features, this would be a very viable option 
> of running Cassandra on.
> 
> My questions pertain to HostIds and Scaling Up/Down, and are related:
> 
> 1.  If a container's host dies and is then brought up on another host, can 
> you start up with the same PersistentVolume as the original container had?  
> Which begs the question would the new container get a new HostId, implying it 
> would need to bootstrap into the environment?   If it's a bootstrap, does the 
> old one get deco'd/assassinated?
> 
> 2. Scaling up/down.  Scaling up would be relatively easy, as it should just 
> kick off Bootstrapping the node into the cluster, but what if you need to 
> scale down?  Would the Container get deco'd by the scaling down process? or 
> just terminated, leaving you with potential missing replicas
> 
> 3. Scaling up and increasing the RF of a particular keyspace, would there be 
> a clean way to do this with the kubernetes tooling? 
> 
> In the end I'm wondering how much of the Kubernetes + Cassandra involves 
> nodetool, and how much is just a Docker image where you need to manage that 
> all yourself (painfully)
> 
> -- 
> --mike