[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2022-11-06 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629585#comment-17629585
 ] 

Yang Wang commented on FLINK-22262:
---

Thanks [~yunta] for sharing your problem. I think the root cause might be 
kubelet is too late to know the deployment deletion. So the kubelet simply 
start the JobManager again exactly after it exited.

If you want to completely ignore this issue, I suggest to use the job result 
store[1].

FYI: The Flink Kubernetes operator is also using the JRS to avoid duplicated 
submission.

 

[1]. 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2022-11-03 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628320#comment-17628320
 ] 

Yun Tang commented on FLINK-22262:
--

[~wangyang0918] I come across a rare case:
When the jobmanager pod deletes the deployment on job cancelation, it suddenly 
restarts due to some reason. Thus the job would submit again from previous 
savepoint and create new HA related configmaps with the restoring savepoint 
just as the job started again. After a while, since the deployment has been 
deleted, the job manager would finally be deleted and no taskmanagers could be 
created. However, those HA related configmaps left behind due to not having 
OwnerReference.
Then user submit the job again, however, since the left HA related configmaps, 
the job would resume from previous savepoints, which leads to incorrect job 
state.
I think offering options to let HA related configmaps have OwnerReference with 
deployment is reasonable in some cases.
Or do you have some suggestions to walk around this problem?

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-09-02 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408643#comment-17408643
 ] 

Robert Metzger commented on FLINK-22262:


Thanks a lot for your response. I didn't consider the deletion of the HA 
storage, thanks a lot for mentioning this.
Given that, I'll consider changing my operator to implement the cancellation 
through a proper cancel-REST call.

So for now, I won't need the feature of setting owner references.

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Minor
>  Labels: auto-deprioritized-major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-09-01 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408503#comment-17408503
 ] 

Yang Wang commented on FLINK-22262:
---

[~rmetzger] Thanks for reviving this discussion and sharing your use case.

As a user, I think it is reasonable to have such a config option to set owner 
reference of HA related ConfigMaps to the JobManager deployment.

The only major concern I still have is about the residual HA storage. Even 
though the HA related ConfigMaps could be deleted automatically, the HA 
storage(on the HDFS, S3, etc.) will be left behind.

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Minor
>  Labels: auto-deprioritized-major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-09-01 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408210#comment-17408210
 ] 

Robert Metzger commented on FLINK-22262:


I understand that the lifecycle for HA-configmaps is well-defined in the 
current implementation, and that deletion of the configMaps should not happen 
under normal circumstances.

However, I wonder if we could add an optional parameter to the K8s HA mode to 
set an owner reference to the created config maps.
My use-case is the following: I have a K8s operator which, based on some input 
"FlinkCluster" custom resource creates a Flink cluster with Kubernetes HA 
enabled.
Cancellation (and in general cleanup) is implemented by just deleting the 
"FlinkCluster" custom resource instance, which, through owner references also 
deletes the pods responsible for running the Flink cluster components. ... but 
this leaves behind the HA config maps, because the JobManager gets killed, the 
job is not shutting down properly.
In this case, it would be great if I could configure Flink to set owner 
references for the config maps, so that (when the job gets erased), also the 
ConfigMaps disappear.

What do you think about this case [~wangyang0918]]?

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Minor
>  Labels: auto-deprioritized-major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-19 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324956#comment-17324956
 ] 

Yang Wang commented on FLINK-22262:
---

When using the example Flink job(state machine), I could not reproduce your 
issue. 

I find the log "A fatal error has occurred. The streamlet is going to 
shutdown", does it mean that it will crash the JVM directly?

 

Moreover, could you share your user jar that I could directly run on my 
minikube?

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-16 Thread Andrea Peruffo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322706#comment-17322706
 ] 

Andrea Peruffo commented on FLINK-22262:


No calls to {{System.exit }}, the JobManager crashes and gets restarted by K8s, 
that's, possibly, the problem.
How can I help more in debugging this issue?

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-15 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322562#comment-17322562
 ] 

Yang Wang commented on FLINK-22262:
---

Do you have called {{System.exit}} in the {{cloudflow.runner.Runner}}? 

>From your attached logs, I did not find the logs about deregistering the Flink 
>application from K8s or {{SIGTERM}} handling.

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-15 Thread Andrea Peruffo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322161#comment-17322161
 ] 

Andrea Peruffo commented on FLINK-22262:


Thanks for your input [~fly_in_gis] , I have incorporated most of your 
suggestions in my repro:
[https://github.com/andreaTP/repro-FLINK-22262]

And I can still observe the same behavior:
after running "flink cancel" the JobManager Pod enters in something similar to 
"CrashLoopBackoff" (keeps being restarted failing after some time) and the 
relevant ConfigMaps are not removed.


Attached the full log of the JobManager that received the "cancel" command. 
[^jm.log]

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-14 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321885#comment-17321885
 ] 

Yang Wang commented on FLINK-22262:
---

Could you please share the JobManager logs when you cancel the Flink job 
successfully and still have the residual ConfigMaps? I think you could use 
{{kubectl logs podname}} to get the logs.

 

I have used the following steps to start/stop Flink applications on K8s with HA 
enable in my minikube. And it works well.

1. Start the native Flink K8a application 

 
{code:java}
$FLINK_HOME/bin/flink run-application -d -t kubernetes-application \
-Dkubernetes.cluster-id=$CLUSTER_ID \
-Dkubernetes.namespace=$NAMESPACE \
-Dkubernetes.container.image=wangyang09180523/flink:1.13.0-rc0 \
-Dkubernetes.container.image.pull-policy=Always \
-Dkubernetes.rest-service.exposed.type=NodePort \
-Dkubernetes.jobmanager.cpu=0.5 -Djobmanager.memory.process.size=1700m \
-Dkubernetes.jobmanager.service-account=default \
-Dkubernetes.taskmanager.cpu=0.5 -Dtaskmanager.memory.process.size=1500m 
-Dtaskmanager.numberOfTaskSlots=4 \
-Dstate.checkpoints.dir=$HA_STORAGE \
-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
 \
-Dhigh-availability.storageDir=$HA_STORAGE \
-Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=1 \
-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.13.0.jar
 
-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.13.0.jar
 \
-Dstate.savepoints.dir=$HA_STORAGE \
local:///opt/flink/examples/streaming/StateMachineExample.jar
{code}
 

2. Cancel the Flink job with savepoint. All the K8s resources will be deleted. 
I do not find residual HA ConfigMaps after canceled successfully.

 
{code:java}
./bin/flink cancel --target kubernetes-application --withSavepoint 
-Dkubernetes.cluster-id=k8s-app-ha-1-113-rc1 -Dkubernetes.namespace=default 

... ...
Cancelled job . Savepoint stored in 
oss://flink-debug-yiqi/flink-ha/savepoint-00-8741523cb1d1.
{code}
3. Maybe change the user codes and resubmit the Flink application with stored 
savepoint

 
{code:java}
$FLINK_HOME/bin/flink run-application -d -t kubernetes-application \
--fromSavepoint oss://flink-debug-yiqi/flink-ha/savepoint-00-8741523cb1d1 \
... ...
local:///opt/flink/examples/streaming/StateMachineExample.jar{code}
 

 

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: 

[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-14 Thread Andrea Peruffo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321212#comment-17321212
 ] 

Andrea Peruffo commented on FLINK-22262:


I have spent some time trying to make a reproduction and this is the best I 
came up with for now:

[https://github.com/andreaTP/repro-FLINK-22262] 

In the repository, you have the full Exception thrown by the JobManager after 
canceling the Job.
 So far I was unable to reproduce in that repo the following error with the 
class loader (I can eventually share a larger example).

> If you change your code, I think you might need to cancel your job with 
> savepoint. 

Can you please give more guidance on how to achieve it since the Job "cancel" 
seems to not be working as expected?

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-14 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320895#comment-17320895
 ] 

Yang Wang commented on FLINK-22262:
---

Could you share the JobManager logs that you canceled the Flink job but HA 
related configmaps are not deleted automatically? Recently, we have fixed a bug 
which might be related. Refer to FLINK-21008 for more information.

 

If you change you code, I think you might need to cancel your job with 
savepoint. After then you could resubmit the job with new binary and recover 
from the savepoint. In such case, the HA configmaps should be cleaned up 
automatically.

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-14 Thread Andrea Peruffo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320807#comment-17320807
 ] 

Andrea Peruffo commented on FLINK-22262:


thanks for the quick answer [~fly_in_gis] !

> if you stop your Flink application with cancel, then I believe the HA related 
>ConfigMaps will be deleted automatically

I tested it and the ConfigMaps are not automatically deleted.
This is ok if the code of the Job doesn't change, but, if the Job code is 
updated the TM starts to fail with user classpath issues and the error is 
unrecoverable.

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22262) Flink on Kubernetes ConfigMaps are created without OwnerReference

2021-04-13 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320663#comment-17320663
 ] 

Yang Wang commented on FLINK-22262:
---

I think we could not set the owner reference for the HA related ConfigMaps. 
Because it could happen that we delete the K8s resources but want to recover 
the Flink jobs. Actually, if you stop your Flink application with cancel, then 
I believe the HA related ConfigMaps will be deleted automatically.

 

Refer to here[1] for more information about the HA related ConfigMaps clean up.

 

1. 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up

> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Andrea Peruffo
>Priority: Major
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an 
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't 
> happen and causes all sorts of issues when the classpath and the jars of the 
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars 
> of the Job.
> Can you please give guidance if there are additional caveats on manually 
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-049: 
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader: 
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.0,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name: 
> taxi-ride-fare-processor--jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink: 
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor--jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)