Re: Rolling Update

Jeremy McMillan Mon, 09 Sep 2024 08:34:21 -0700

An operator as I understand it, is just a pod that interacts with your
application and Kubernetes API server as necessary to do what you might be
doing manually.


https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
https://kubernetes.io/docs/reference/using-api/client-libraries/

You might start by creating an admin-pod with Ignite control.sh,
sqlline.sh, thin client, etc. tools PLUS kubectl or some other Kubernetes
API client that you can exec into and manually perform all of the rolling
update steps. Once you know you have all the tools and steps complete, you
can try adding scripts to the pod to automate sequences of steps. Then once
the scripts are fairly robust and complete, you can use the admin-pod as a
basis for Kubernetes Job definitions. It's up to you whether you'd like to
continue integrating with Kubernetes further. Next steps would be to create
a CustomResourceDefinition instead of using Kubernetes Job, or
writing/adding a Kubernetes compatible API that does what your Job command
line startup does, but with more control over parameters.

Please share your results once you've got things working. Best of luck!

On Fri, Sep 6, 2024 at 10:15 AM Humphrey <[email protected]> wrote:

> Thanks for the explanation, is there any operator ready for use? Is it
> hard to create own Operator if it doesn’t exist yet?
>
> Thanks
>
> On 5 Sep 2024, at 19:39, Jeremy McMillan <[email protected]> wrote:
>
> 
> It is correct for an operator, but not correct for readiness probe. It's
> not your understanding of Ignite metrics. It is your understanding of
> Kubernetes.
> Kubernetes rolling update logic assumes all of your service backend nodes
> are completely independent, but you have chosen a readiness probe which
> reflects how nodes are interacting and interdependent.
>
> Hypothetically:
>   We have bounced one node, and it has rejoined the cluster, and is
> rebalancing.
>   If Kubernetes probes this node for readiness, we fail because we are
> rebalancing. The scheduler will block progress of the rolling update.
>   If Kubernetes probes any other node for readiness, it will fail because
> we are rebalancing. The scheduler will remove this node from any services.
>   All the nodes will reflect the state of the cluster: rebalancing.
>   No nodes will remain in the service backend. If you are using the
> Kubernetes discovery SPI, the restarted node will find itself unable to
> discover any peers.
>
> The problem is that Kubernetes interprets the readiness probe as a NODE
> STATE. The cluster.rebalanced metric is a CLUSTER STATE.
>
> If you had a Kubernetes job that executes Kubectl commands from within the
> cluster, looping over the pods in a StatefulSet and restarting them, it
> would make perfect sense to check cluster.rebalanced and block until
> rebalancing finishes, but Kubernetes does something different with
> readiness probes based on some assumptions about clustering which do not
> apply to Ignite.
>
> On Thu, Sep 5, 2024 at 11:29 AM Humphrey Lopez <[email protected]> wrote:
>
>> Yes I’m trying to read the cluster.rebalanced metric from the JMX mBean,
>> is that the correct one? I’ve build that into the readiness endpoint from
>> actuator and let kubernetes wait for the cluster to be ready before move to
>> the next pod.
>>
>> Humphrey
>>
>> On 5 Sep 2024, at 17:34, Jeremy McMillan <[email protected]> wrote:
>>
>> 
>> I assume you have created your caches/tables with backups>=1.
>>
>> You should restart one node at a time, and wait until the restarted node
>> has rejoined the cluster, then wait for rebalancing to begin, then wait for
>> rebalancing to finish before restarting the next node. Kubernetes readiness
>> probes aren't sophisticated enough. "Node ready" state isn't the same thing
>> as "Cluster ready" state, but Kubernetes scheduler can't distinguish. This
>> should be handled by an operator, either human, or a Kubernetes automated
>> one.
>>
>> On Tue, Sep 3, 2024 at 1:13 PM Humphrey <[email protected]> wrote:
>>
>>> Thanks, I meant Rolling Update of the same version of Ignite (2.16). Not
>>> upgrade to a new version. We have our ignite embedded in Spring Boot
>>> application, and when changing code we need to deploy new version of the
>>> jar.
>>>
>>> Humphrey
>>>
>>> On 3 Sep 2024, at 19:24, Gianluca Bonetti <[email protected]>
>>> wrote:
>>>
>>> 
>>> Hello
>>>
>>> If you want to upgrade Apache Ignite version, this is not supported by
>>> Apache Ignite
>>>
>>> "Ignite cluster cannot have nodes that run on different Ignite versions.
>>> You need to stop the cluster and start it again on the new Ignite version."
>>> https://ignite.apache.org/docs/latest/installation/upgrades
>>>
>>> If you need rolling upgrades you can upgrade to GridGain which bring
>>> rolling upgrades together with many other interesting features
>>> "Rolling Upgrades is a feature of GridGain Enterprise and Ultimate
>>> Edition that allows nodes with different GridGain versions to coexist in a
>>> cluster while you roll out a new version. This prevents downtime when
>>> performing software upgrades."
>>> https://www.gridgain.com/docs/latest/installation-guide/rolling-upgrades
>>>
>>> Cheers
>>> Gianluca Bonetti
>>>
>>> On Tue, 3 Sept 2024 at 18:15, Humphrey Lopez <[email protected]> wrote:
>>>
>>>> Hello, we have several pods with ignite caches running in kubernetes.
>>>> We only use memory mode (not persistence) and want to perform rolling
>>>> update of without losing data. What metric should we monitor to know when
>>>> it’s safe to replace the next pod?
>>>>
>>>> We have tried the Cluser.Rebalanced (1) metric from JMX in a readiness
>>>> probe but we still end up losing data from the caches.
>>>>
>>>> 1)
>>>> https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#cluster
>>>>
>>>> Should we use another mechanism or metric for determining the readiness
>>>> of the new started pod?
>>>>
>>>>
>>>> Humphrey
>>>>
>>>

Re: Rolling Update

Reply via email to