[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629585#comment-17629585 ] Yang Wang commented on FLINK-22262: --- Thanks [~yunta] for sharing your problem. I think the root cause might be kubelet is too late to know the deployment deletion. So the kubelet simply start the JobManager again exactly after it exited. If you want to completely ignore this issue, I suggest to use the job result store[1]. FYI: The Flink Kubernetes operator is also using the JRS to avoid duplicated submission. [1]. https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628320#comment-17628320 ] Yun Tang commented on FLINK-22262: -- [~wangyang0918] I come across a rare case: When the jobmanager pod deletes the deployment on job cancelation, it suddenly restarts due to some reason. Thus the job would submit again from previous savepoint and create new HA related configmaps with the restoring savepoint just as the job started again. After a while, since the deployment has been deleted, the job manager would finally be deleted and no taskmanagers could be created. However, those HA related configmaps left behind due to not having OwnerReference. Then user submit the job again, however, since the left HA related configmaps, the job would resume from previous savepoints, which leads to incorrect job state. I think offering options to let HA related configmaps have OwnerReference with deployment is reasonable in some cases. Or do you have some suggestions to walk around this problem? > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408643#comment-17408643 ] Robert Metzger commented on FLINK-22262: Thanks a lot for your response. I didn't consider the deletion of the HA storage, thanks a lot for mentioning this. Given that, I'll consider changing my operator to implement the cancellation through a proper cancel-REST call. So for now, I won't need the feature of setting owner references. > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Minor > Labels: auto-deprioritized-major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408503#comment-17408503 ] Yang Wang commented on FLINK-22262: --- [~rmetzger] Thanks for reviving this discussion and sharing your use case. As a user, I think it is reasonable to have such a config option to set owner reference of HA related ConfigMaps to the JobManager deployment. The only major concern I still have is about the residual HA storage. Even though the HA related ConfigMaps could be deleted automatically, the HA storage(on the HDFS, S3, etc.) will be left behind. > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Minor > Labels: auto-deprioritized-major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408210#comment-17408210 ] Robert Metzger commented on FLINK-22262: I understand that the lifecycle for HA-configmaps is well-defined in the current implementation, and that deletion of the configMaps should not happen under normal circumstances. However, I wonder if we could add an optional parameter to the K8s HA mode to set an owner reference to the created config maps. My use-case is the following: I have a K8s operator which, based on some input "FlinkCluster" custom resource creates a Flink cluster with Kubernetes HA enabled. Cancellation (and in general cleanup) is implemented by just deleting the "FlinkCluster" custom resource instance, which, through owner references also deletes the pods responsible for running the Flink cluster components. ... but this leaves behind the HA config maps, because the JobManager gets killed, the job is not shutting down properly. In this case, it would be great if I could configure Flink to set owner references for the config maps, so that (when the job gets erased), also the ConfigMaps disappear. What do you think about this case [~wangyang0918]]? > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Minor > Labels: auto-deprioritized-major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324956#comment-17324956 ] Yang Wang commented on FLINK-22262: --- When using the example Flink job(state machine), I could not reproduce your issue. I find the log "A fatal error has occurred. The streamlet is going to shutdown", does it mean that it will crash the JVM directly? Moreover, could you share your user jar that I could directly run on my minikube? > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322706#comment-17322706 ] Andrea Peruffo commented on FLINK-22262: No calls to {{System.exit }}, the JobManager crashes and gets restarted by K8s, that's, possibly, the problem. How can I help more in debugging this issue? > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322562#comment-17322562 ] Yang Wang commented on FLINK-22262: --- Do you have called {{System.exit}} in the {{cloudflow.runner.Runner}}? >From your attached logs, I did not find the logs about deregistering the Flink >application from K8s or {{SIGTERM}} handling. > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322161#comment-17322161 ] Andrea Peruffo commented on FLINK-22262: Thanks for your input [~fly_in_gis] , I have incorporated most of your suggestions in my repro: [https://github.com/andreaTP/repro-FLINK-22262] And I can still observe the same behavior: after running "flink cancel" the JobManager Pod enters in something similar to "CrashLoopBackoff" (keeps being restarted failing after some time) and the relevant ConfigMaps are not removed. Attached the full log of the JobManager that received the "cancel" command. [^jm.log] > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321885#comment-17321885 ] Yang Wang commented on FLINK-22262: --- Could you please share the JobManager logs when you cancel the Flink job successfully and still have the residual ConfigMaps? I think you could use {{kubectl logs podname}} to get the logs. I have used the following steps to start/stop Flink applications on K8s with HA enable in my minikube. And it works well. 1. Start the native Flink K8a application {code:java} $FLINK_HOME/bin/flink run-application -d -t kubernetes-application \ -Dkubernetes.cluster-id=$CLUSTER_ID \ -Dkubernetes.namespace=$NAMESPACE \ -Dkubernetes.container.image=wangyang09180523/flink:1.13.0-rc0 \ -Dkubernetes.container.image.pull-policy=Always \ -Dkubernetes.rest-service.exposed.type=NodePort \ -Dkubernetes.jobmanager.cpu=0.5 -Djobmanager.memory.process.size=1700m \ -Dkubernetes.jobmanager.service-account=default \ -Dkubernetes.taskmanager.cpu=0.5 -Dtaskmanager.memory.process.size=1500m -Dtaskmanager.numberOfTaskSlots=4 \ -Dstate.checkpoints.dir=$HA_STORAGE \ -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory \ -Dhigh-availability.storageDir=$HA_STORAGE \ -Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=1 \ -Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.13.0.jar -Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.13.0.jar \ -Dstate.savepoints.dir=$HA_STORAGE \ local:///opt/flink/examples/streaming/StateMachineExample.jar {code} 2. Cancel the Flink job with savepoint. All the K8s resources will be deleted. I do not find residual HA ConfigMaps after canceled successfully. {code:java} ./bin/flink cancel --target kubernetes-application --withSavepoint -Dkubernetes.cluster-id=k8s-app-ha-1-113-rc1 -Dkubernetes.namespace=default ... ... Cancelled job . Savepoint stored in oss://flink-debug-yiqi/flink-ha/savepoint-00-8741523cb1d1. {code} 3. Maybe change the user codes and resubmit the Flink application with stored savepoint {code:java} $FLINK_HOME/bin/flink run-application -d -t kubernetes-application \ --fromSavepoint oss://flink-debug-yiqi/flink-ha/savepoint-00-8741523cb1d1 \ ... ... local:///opt/flink/examples/streaming/StateMachineExample.jar{code} > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type:
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321212#comment-17321212 ] Andrea Peruffo commented on FLINK-22262: I have spent some time trying to make a reproduction and this is the best I came up with for now: [https://github.com/andreaTP/repro-FLINK-22262] In the repository, you have the full Exception thrown by the JobManager after canceling the Job. So far I was unable to reproduce in that repo the following error with the class loader (I can eventually share a larger example). > If you change your code, I think you might need to cancel your job with > savepoint. Can you please give more guidance on how to achieve it since the Job "cancel" seems to not be working as expected? > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320895#comment-17320895 ] Yang Wang commented on FLINK-22262: --- Could you share the JobManager logs that you canceled the Flink job but HA related configmaps are not deleted automatically? Recently, we have fixed a bug which might be related. Refer to FLINK-21008 for more information. If you change you code, I think you might need to cancel your job with savepoint. After then you could resubmit the job with new binary and recover from the savepoint. In such case, the HA configmaps should be cleaned up automatically. > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320807#comment-17320807 ] Andrea Peruffo commented on FLINK-22262: thanks for the quick answer [~fly_in_gis] ! > if you stop your Flink application with cancel, then I believe the HA related >ConfigMaps will be deleted automatically I tested it and the ConfigMaps are not automatically deleted. This is ok if the code of the Job doesn't change, but, if the Job code is updated the TM starts to fail with user classpath issues and the error is unrecoverable. > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference
[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320663#comment-17320663 ] Yang Wang commented on FLINK-22262: --- I think we could not set the owner reference for the HA related ConfigMaps. Because it could happen that we delete the K8s resources but want to recover the Flink jobs. Actually, if you stop your Flink application with cancel, then I believe the HA related ConfigMaps will be deleted automatically. Refer to here[1] for more information about the HA related ConfigMaps clean up. 1. https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up > Flink on Kubernetes ConfigMaps are created without OwnerReference > - > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Andrea Peruffo >Priority: Major > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor--jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.3.4#803005)