[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242124#comment-17242124 ] Xintong Song commented on FLINK-12884: -- [~shravan.adharapurapu], K8s HA service will be released with Flink 1.12.0. Flink 1.12.0 is not released yet, but should be very soon. You can try it out now with the RC version. https://dist.apache.org/repos/dist/dev/flink/flink-1.12.0-rc2/ > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242086#comment-17242086 ] shravan commented on FLINK-12884: - [~trohrmann] [~fly_in_gis] Is the K8s HA service with 1.12 released? We tested our lower env with zookeeper but hoping to use the k8 HA service now instead of migrating at a later point. Could you please confirm? > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232568#comment-17232568 ] Yang Wang commented on FLINK-12884: --- [~ksp0422] Thanks for your suggestion. I second your idea and am trying to add a E2E test to cover the whole process. * Start a Flink application with HA configured * The Flink job completes checkpoints successfully * Kill the JobManager * A new one should be launched and takes over the leadership * The Flink job should be recovered from the latest checkpoint successfully > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232493#comment-17232493 ] Kevin Kwon commented on FLINK-12884: Just in my opinion, I think the e2e test should mostly focus on killing the job manager since Zookeeper was used for checkpoint metadata storage aside from leader election (which is innately handled by Kubernetes' pod spawning) > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222398#comment-17222398 ] shravan commented on FLINK-12884: - Thanks [~trohrmann] > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222047#comment-17222047 ] Till Rohrmann commented on FLINK-12884: --- Hi [~shravan.adharapurapu], we are currently working on merging the K8s HA services into the master. We hope that we can ship it with the {{1.12}} release which should come in the next couple of weeks. The best thing to do is to follow this ticket in order to see feature's progress. > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221944#comment-17221944 ] shravan commented on FLINK-12884: - [~fly_in_gis] We have just migrated to Kubernetes (EKS) and setting up the Flink cluster/operator on the K8s at the moment. We need to enable HA for the flink job manager and since we already have an AWS MSK (AWS managed kafka which is on zookeeper) we may not want to setup another zookeeper cluster on EKS (Kubernetes). Just wanted to check if the native kubernetes HA service is available to implement now? If yes, is it a stable version? Please share nay documentation/runbook steps to follow through. Also, if you have any other thoughts on setting up HA kindly share. Thanks, Shravan > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214508#comment-17214508 ] Yang Wang commented on FLINK-12884: --- [~mapohl] Great. Thanks:) > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214489#comment-17214489 ] Matthias commented on FLINK-12884: -- [~fly_in_gis] I quickly assigned you to all the subtasks since I was looking at the issues anyway. > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service
[ https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214483#comment-17214483 ] Yang Wang commented on FLINK-12884: --- [~xintongsong] I have attached a PR for the first subtask. And will keep working on this. Do you mind to assign all the subtasks to me? > FLIP-144: Native Kubernetes HA Service > -- > > Key: FLINK-12884 > URL: https://issues.apache.org/jira/browse/FLINK-12884 > Project: Flink > Issue Type: New Feature > Components: Deployment / Kubernetes, Runtime / Coordination >Reporter: MalcolmSanders >Assignee: Yang Wang >Priority: Major > Fix For: 1.12.0 > > > Currently flink only supports HighAvailabilityService using zookeeper. As a > result, it requires a zookeeper cluster to be deployed on k8s cluster if our > customers needs high availability for flink. If we support > HighAvailabilityService based on native k8s APIs, it will save the efforts of > zookeeper deployment as well as the resources used by zookeeper cluster. It > might be especially helpful for customers who run small-scale k8s clusters so > that flink HighAvailabilityService may not cause too much overhead on k8s > clusters. > Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] > has proposed a HighAvailabilityService using etcd. As [~NathanHowell] > suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by > design (see [Securing etcd > clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]), > it also requires the deployment of etcd cluster if flink uses etcd to > achieve HA. -- This message was sent by Atlassian Jira (v8.3.4#803005)