[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719145#comment-17719145 ] Zhihao Chen commented on FLINK-31135: - It's working as expected now after we fixed our S3 deletion issue. Thanks for your help! > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, > flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip, > image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, > image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719102#comment-17719102 ] Zhihao Chen commented on FLINK-31135: - [~Swathi Chandrashekar] thank you for pointing it out! I believe there are some S3 permission issues from our side. I've missed the error information. I'll fix it from our side and let you know if it's all good. Please feel free to close this ticket. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, > flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip, > image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, > image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718781#comment-17718781 ] Zhihao Chen edited comment on FLINK-31135 at 5/3/23 4:03 AM: - hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1465,height=799! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! JM log:[^flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip] My guess is that Flink never cleaned any of the records in CM at all for our cases. was (Author: JIRAUSER299871): hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! JM log:[^flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip] My guess is that Flink never cleaned any of the records in CM at all for our cases. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, > flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip, > image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, > image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: >
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718781#comment-17718781 ] Zhihao Chen edited comment on FLINK-31135 at 5/3/23 4:03 AM: - hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! JM log:[^flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip] My guess is that Flink never cleaned any of the records in CM at all for our cases. was (Author: JIRAUSER299871): hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! JM log: My guess is that Flink never cleaned any of the records in CM at all for our cases. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, > flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip, > image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, > image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, > flink--kubernetes-application-0-parked-logs-ingestion-644b80-b4bc58747-lc865.log.zip, > image-2023-04-19-09-48-19-089.png, image-2023-05-03-13-47-51-440.png, > image-2023-05-03-13-50-54-783.png, image-2023-05-03-13-51-21-685.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718781#comment-17718781 ] Zhihao Chen edited comment on FLINK-31135 at 5/3/23 4:02 AM: - hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! JM log: My guess is that Flink never cleaned any of the records in CM at all for our cases. was (Author: JIRAUSER299871): hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! My guess is that Flink never cleaned any of the record in CM at all for our cases. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, image-2023-05-03-13-50-54-783.png, > image-2023-05-03-13-51-21-685.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, >
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718781#comment-17718781 ] Zhihao Chen edited comment on FLINK-31135 at 5/3/23 3:59 AM: - hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, none of the checkpoint records in the CM was ever cleaned up. The error "Flink was not able to determine whether the metadata was successfully persisted" starts to happen when the CM reaches the 1MB size limitation. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! My guess is that Flink never cleaned any of the record in CM at all for our cases. was (Author: JIRAUSER299871): hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, the none of the checkpoint record in the CM was ever cleaned up. When the CM reaches the 1MB size limiation, the error "Flink was not able to determine whether the metadata was successfully persisted." starts to happen. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! My guess is that Flink never cleaned any of the record in CM at all for our cases. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, image-2023-05-03-13-50-54-783.png, > image-2023-05-03-13-51-21-685.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, >
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718781#comment-17718781 ] Zhihao Chen commented on FLINK-31135: - hey [~Swathi Chandrashekar], thank you for looking into it. {quote}This error was populated for all the checkpoints due to state inconsistency which resulted in storing lot of checkpoints in S3, which eventually caused the size of the configMap > 1MB ] {quote} I don't think that's the case. Instead, the none of the checkpoint record in the CM was ever cleaned up. When the CM reaches the 1MB size limiation, the error "Flink was not able to determine whether the metadata was successfully persisted." starts to happen. I have another flink job running here as an example. Configmap: [^parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml] The checkpoint ids are as "checkpointID-001", "checkpointID-002", ... "checkpointID-0001040". in a consecutive way. Worth reminding the IDs are from "1" to "1040". The configmap has reached the 1MB size limitation. The "Flink was not able to determine whether the metadata was successfully persisted." actually happens when the CM attached the record "1040". Please see the logs below. The bottom one is first error log, which complains about the record "1041". I think that makes sense as it's not recorded in the CM, hence Flink can't determine if the metadata was successfully persisted. !image-2023-05-03-13-47-51-440.png|width=1579,height=861! The flink dashboard log also reflects the assumption. !image-2023-05-03-13-51-21-685.png|width=1473,height=783! My guess is that Flink never cleaned any of the record in CM at all for our cases. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, image-2023-05-03-13-50-54-783.png, > image-2023-05-03-13-51-21-685.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) >
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-50-54-783.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, image-2023-05-03-13-50-54-783.png, > image-2023-05-03-13-51-21-685.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-51-21-685.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, image-2023-05-03-13-50-54-783.png, > image-2023-05-03-13-51-21-685.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-47-51-440.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-47-51-440.png, jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > jobmanager_log.txt, > parked-logs-ingestion-644b80-3494e4c01b82eb7a75a76080974b41cd-config-map.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: (was: image-2023-05-03-13-26-51-992.png) > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: (was: image-2023-05-03-13-27-58-256.png) > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: (was: image-2023-05-03-13-27-44-449.png) > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: (was: image-2023-05-03-13-27-50-513.png) > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-27-50-513.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, image-2023-05-03-13-27-44-449.png, > image-2023-05-03-13-27-50-513.png, image-2023-05-03-13-27-58-256.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-27-58-256.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, image-2023-05-03-13-27-44-449.png, > image-2023-05-03-13-27-50-513.png, image-2023-05-03-13-27-58-256.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-26-51-992.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, image-2023-05-03-13-27-44-449.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: image-2023-05-03-13-27-44-449.png > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > image-2023-05-03-13-26-51-992.png, image-2023-05-03-13-27-44-449.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717014#comment-17717014 ] Zhihao Chen commented on FLINK-31135: - [~Swathi Chandrashekar], please see the attached log from JM with this issue. I didn't find the error message of discard completed checkpoint tho. [^jobmanager_log.txt] > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: jobmanager_log.txt > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png, > jobmanager_log.txt > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716479#comment-17716479 ] Zhihao Chen commented on FLINK-31135: - Hi [~Swathi Chandrashekar], can I ask do we have any update on this? > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml, image-2023-04-19-09-48-19-089.png > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713796#comment-17713796 ] Zhihao Chen edited comment on FLINK-31135 at 4/18/23 11:49 PM: --- Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class: org.apache.flink.metrics.statsd.StatsDReporterFactory metrics.reporter.stsd.host: localhost metrics.reporter.stsd.interval: 30 SECONDS metrics.reporter.stsd.port: "8125" metrics.reporters: stsd metrics.scope.jm: jobmanager metrics.scope.jm.job: jobmanager. metrics.scope.operator: taskmanager.. metrics.scope.task: taskmanager.. metrics.scope.tm: taskmanager metrics.scope.tm.job: taskmanager. metrics.system-resource: "true" metrics.system-resource-probing-interval: "3" restart-strategy: fixed-delay restart-strategy.fixed-delay.attempts: "2147483647" state.backend: hashmap state.checkpoint-storage: filesystem state.checkpoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/checkpoints state.checkpoints.num-retained: "5" state.savepoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/savepoints taskmanager.memory.managed.size: "0" taskmanager.memory.network.fraction: "0.1" taskmanager.memory.network.max: 1000m taskmanager.memory.network.min: 64m taskmanager.memory.process.size: 2048m taskmanager.numberOfTaskSlots: "10" web.cancel.enable: "false" flinkVersion: v1_15 {code} in UI: !image-2023-04-19-09-48-19-089.png|width=2730,height=1786! I got the same issue before we switched to the flink-kubenertes-operator. At that time we were using flink standalone deployment on Kubernetes. We set state.checkpoints.num-retained as 5, but hit the same issue. was (Author: JIRAUSER299871): Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class:
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713796#comment-17713796 ] Zhihao Chen edited comment on FLINK-31135 at 4/18/23 11:49 PM: --- Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class: org.apache.flink.metrics.statsd.StatsDReporterFactory metrics.reporter.stsd.host: localhost metrics.reporter.stsd.interval: 30 SECONDS metrics.reporter.stsd.port: "8125" metrics.reporters: stsd metrics.scope.jm: jobmanager metrics.scope.jm.job: jobmanager. metrics.scope.operator: taskmanager.. metrics.scope.task: taskmanager.. metrics.scope.tm: taskmanager metrics.scope.tm.job: taskmanager. metrics.system-resource: "true" metrics.system-resource-probing-interval: "3" restart-strategy: fixed-delay restart-strategy.fixed-delay.attempts: "2147483647" state.backend: hashmap state.checkpoint-storage: filesystem state.checkpoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/checkpoints state.checkpoints.num-retained: "5" state.savepoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/savepoints taskmanager.memory.managed.size: "0" taskmanager.memory.network.fraction: "0.1" taskmanager.memory.network.max: 1000m taskmanager.memory.network.min: 64m taskmanager.memory.process.size: 2048m taskmanager.numberOfTaskSlots: "10" web.cancel.enable: "false" flinkVersion: v1_15 {code} in UI: !image-2023-04-19-09-48-19-089.png|width=590,height=386! I got the same issue before we switched to the flink-kubenertes-operator. At that time we were using flink standalone deployment on Kubernetes. We set state.checkpoints.num-retained as 5, but hit the same issue. was (Author: JIRAUSER299871): Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class:
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713796#comment-17713796 ] Zhihao Chen edited comment on FLINK-31135 at 4/18/23 11:48 PM: --- Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class: org.apache.flink.metrics.statsd.StatsDReporterFactory metrics.reporter.stsd.host: localhost metrics.reporter.stsd.interval: 30 SECONDS metrics.reporter.stsd.port: "8125" metrics.reporters: stsd metrics.scope.jm: jobmanager metrics.scope.jm.job: jobmanager. metrics.scope.operator: taskmanager.. metrics.scope.task: taskmanager.. metrics.scope.tm: taskmanager metrics.scope.tm.job: taskmanager. metrics.system-resource: "true" metrics.system-resource-probing-interval: "3" restart-strategy: fixed-delay restart-strategy.fixed-delay.attempts: "2147483647" state.backend: hashmap state.checkpoint-storage: filesystem state.checkpoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/checkpoints state.checkpoints.num-retained: "5" state.savepoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/savepoints taskmanager.memory.managed.size: "0" taskmanager.memory.network.fraction: "0.1" taskmanager.memory.network.max: 1000m taskmanager.memory.network.min: 64m taskmanager.memory.process.size: 2048m taskmanager.numberOfTaskSlots: "10" web.cancel.enable: "false" flinkVersion: v1_15 {code} in UI: !image-2023-04-19-09-48-19-089.png! I got the same issue before we switched to the flink-kubenertes-operator. That time we were use flink standalone deployment on Kubernetes. We set state.checkpoints.num-retained as 5, but hit the same issue. was (Author: JIRAUSER299871): Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class:
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713796#comment-17713796 ] Zhihao Chen commented on FLINK-31135: - Hi [~Swathi Chandrashekar] , in my case, the state.checkpoints.num-retained for our flink jobs is always set as 5, but looks like that's not respected tho. Please see the code snippet from the flinkdeployment via flink-kubernetes-operator. {code:java} // code placeholder apiVersion: v1 items: - apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: creationTimestamp: "2023-04-04T03:02:25Z" finalizers: - flinkdeployments.flink.apache.org/finalizer generation: 2 labels: instanceId: parked-logs-ingestion-16805773-a96408 jobName: parked-logs-ingestion-16805773 name: parked-logs-ingestion-16805773-a96408 namespace: parked-logs-ingestion-16805773-a96408 resourceVersion: "533476748" uid: 182b9c7e-74cc-490b-8045-9fddaa7b8aa9 spec: flinkConfiguration: execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION execution.checkpointing.interval: "6" execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.min-pause: 5s execution.checkpointing.mode: EXACTLY_ONCE execution.checkpointing.prefer-checkpoint-for-recovery: "true" execution.checkpointing.timeout: 60min high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/ha jobmanager.memory.process.size: 1024m metrics.reporter.stsd.factory.class: org.apache.flink.metrics.statsd.StatsDReporterFactory metrics.reporter.stsd.host: localhost metrics.reporter.stsd.interval: 30 SECONDS metrics.reporter.stsd.port: "8125" metrics.reporters: stsd metrics.scope.jm: jobmanager metrics.scope.jm.job: jobmanager. metrics.scope.operator: taskmanager.. metrics.scope.task: taskmanager.. metrics.scope.tm: taskmanager metrics.scope.tm.job: taskmanager. metrics.system-resource: "true" metrics.system-resource-probing-interval: "3" restart-strategy: fixed-delay restart-strategy.fixed-delay.attempts: "2147483647" state.backend: hashmap state.checkpoint-storage: filesystem state.checkpoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/checkpoints state.checkpoints.num-retained: "5" state.savepoints.dir: s3://eureka-flink-data-prod/parked-logs-ingestion-16805773-a96408/savepoints taskmanager.memory.managed.size: "0" taskmanager.memory.network.fraction: "0.1" taskmanager.memory.network.max: 1000m taskmanager.memory.network.min: 64m taskmanager.memory.process.size: 2048m taskmanager.numberOfTaskSlots: "10" web.cancel.enable: "false" flinkVersion: v1_15 {code} I got the same issue before we switched to the flink-kubenertes-operator. That time we were use flink standalone deployment on Kubernetes. We set state.checkpoints.num-retained as 5, but hit the same issue. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at >
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712878#comment-17712878 ] Zhihao Chen commented on FLINK-31135: - Hi [~Swathi Chandrashekar] , please see the attached configmap file: [^dump_cm.yaml] ^The error shown in Flink dashboard is as:^ ^*Checkpoint Detail:*^ *Path:* - *Discarded:* - *Checkpoint Type:* aligned checkpoint *Failure Message:* io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://10.32.228.1/api/v1/namespaces/parked-logs-ingestion-16805773-a96408/configmaps/parked-logs-ingestion-16805773-a96408-110331249bb495a4d23b4d69849c8224-config-map. Message: ConfigMap "parked-logs-ingestion-16805773-a96408-110331249bb495a4d23b4d69849c8224-config-map" is invalid: []: Too long: must have at most 1048576 bytes. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must have at most 1048576 bytes, reason=FieldValueTooLong, additionalProperties={})], group=null, kind=ConfigMap, name=parked-logs-ingestion-16805773-a96408-110331249bb495a4d23b4d69849c8224-config-map, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=ConfigMap "parked-logs-ingestion-16805773-a96408-110331249bb495a4d23b4d69849c8224-config-map" is invalid: []: Too long: must have at most 1048576 bytes, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at >
[jira] [Updated] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihao Chen updated FLINK-31135: Attachment: dump_cm.yaml > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > Attachments: dump_cm.yaml > > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712854#comment-17712854 ] Zhihao Chen edited comment on FLINK-31135 at 4/17/23 1:48 AM: -- I have encountered the same issue. Actually, it's an ongoing issue for us. I believe it has nothing to do with the Flink-Kubernetes-operator as it happened with both Flink Standalone Kubernetes deployment and Flink-kubernetes-operator deployment. I have checked our configuration but didn't find anything interesting. was (Author: JIRAUSER299871): I have encountered the same issue. Actually, it's an ongoing issue for us. I believe it has nothing to do with the Flink-Kubernetes-operator as it happened with Flink Standalone Kubernetes deployment and Flink-kubernetes-operator deployment. I have checked our configuration but didn't find anything interesting. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31135) ConfigMap DataSize went > 1 MB and cluster stopped working
[ https://issues.apache.org/jira/browse/FLINK-31135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712854#comment-17712854 ] Zhihao Chen commented on FLINK-31135: - I have encountered the same issue. Actually, it's an ongoing issue for us. I believe it has nothing to do with the Flink-Kubernetes-operator as it happened with Flink Standalone Kubernetes deployment and Flink-kubernetes-operator deployment. I have checked our configuration but didn't find anything interesting. > ConfigMap DataSize went > 1 MB and cluster stopped working > -- > > Key: FLINK-31135 > URL: https://issues.apache.org/jira/browse/FLINK-31135 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0 >Reporter: Sriram Ganesh >Priority: Major > > I am Flink Operator to manage clusters. Flink version: 1.15.2. Flink jobs > failed with the below error. It seems the config map size went beyond 1 MB > (default size). > Since it is managed by the operator and config maps are not updated with any > manual intervention, I suspect it could be an operator issue. > > {code:java} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: PUT at: > https:///api/v1/namespaces//configmaps/-config-map. Message: > ConfigMap "-config-map" is invalid: []: Too long: must have at most > 1048576 bytes. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must > have at most 1048576 bytes, reason=FieldValueTooLong, > additionalProperties={})], group=null, kind=ConfigMap, name=-config-map, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=ConfigMap "-config-map" is invalid: []: Too long: must have at > most 1048576 bytes, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:347) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:327) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:781) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:183) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:188) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:130) > ~[flink-dist-1.15.2.jar:1.15.2] > at > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:41) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$attemptCheckAndUpdateConfigMap$11(Fabric8FlinkKubeClient.java:325) > ~[flink-dist-1.15.2.jar:1.15.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-6610) WebServer could not be created,when set the "jobmanager.web.submit.enable" to false
zhihao chen created FLINK-6610: -- Summary: WebServer could not be created,when set the "jobmanager.web.submit.enable" to false Key: FLINK-6610 URL: https://issues.apache.org/jira/browse/FLINK-6610 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 1.3.0 Reporter: zhihao chen Assignee: zhihao chen WebServer could not be created,when set the "jobmanager.web.submit.enable" to false because the WebFrontendBootstrap will check uploadDir not allow be null this.uploadDir = Preconditions.checkNotNull(directory); {code} 2017-05-17 15:15:46,938 ERROR org.apache.flink.runtime.webmonitor.WebMonitorUtils - WebServer could not be created java.lang.NullPointerException at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:58) at org.apache.flink.runtime.webmonitor.utils.WebFrontendBootstrap.(WebFrontendBootstrap.java:73) at org.apache.flink.runtime.webmonitor.WebRuntimeMonitor.(WebRuntimeMonitor.java:359) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.flink.runtime.webmonitor.WebMonitorUtils.startWebRuntimeMonitor(WebMonitorUtils.java:135) at org.apache.flink.runtime.clusterframework.BootstrapTools.createWebMonitorIfConfigured(BootstrapTools.java:242) at org.apache.flink.yarn.YarnApplicationMasterRunner.runApplicationMaster(YarnApplicationMasterRunner.java:352) at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:195) at org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:192) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.YarnApplicationMasterRunner.run(YarnApplicationMasterRunner.java:192) at org.apache.flink.yarn.YarnApplicationMasterRunner.main(YarnApplicationMasterRunner.java:116) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6477) The first time to click Taskmanager cannot get the actual data
[ https://issues.apache.org/jira/browse/FLINK-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004314#comment-16004314 ] zhihao chen commented on FLINK-6477: I made a test, compared the first request and the second time, found the first time can not get the metric of the data, the second time can. I have an idea, do not know if it is feasible, please help to check it We visit the TM steps like this, * overview-> Task managers -> Task Manager Metrics could we send the request to get metrics data in the second step? > The first time to click Taskmanager cannot get the actual data > -- > > Key: FLINK-6477 > URL: https://issues.apache.org/jira/browse/FLINK-6477 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0 >Reporter: zhihao chen >Assignee: zhihao chen > Attachments: errDisplay.jpg > > > Flink web page first click Taskmanager to get less than the actual data, when > the parameter “jobmanager.web.refresh-interval” is set to a larger value, eg: > 180, if you do not manually refresh the page you need to wait time after > the timeout normal display -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-5901) DAG can not show properly in IE
[ https://issues.apache.org/jira/browse/FLINK-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003962#comment-16003962 ] zhihao chen edited comment on FLINK-5901 at 5/10/17 3:10 AM: - [~StephanEwen][~WangTao] Encountered the same problem, and confirmed in accordance with [FLINK-5902|https://issues.apache.org/jira/browse/FLINK-5902] , but can not solve this problem, i found we used the foreignObject element to draw svg vector map, maybe the reason. E9 Mode, IE10 Mode, and IE11 Mode (All Versions) The foreignObject element is not supported. [https://msdn.microsoft.com/en-us/library/hh834675%28v=vs.85%29.aspx] was (Author: chenzio): [~StephanEwen][~WangTao] Encountered the same problem, and confirmed in accordance with [FLINK-5902|https://issues.apache.org/jira/browse/FLINK-5902] , but can not solve this problem, i found we used the foreignObject element to draw svg vector map, maybe the reason. [2.1.24 [SVG11] Section 23.3, The 'foreignObject' element|https://msdn.microsoft.com/en-us/library/hh834675%28v=vs.85%29.aspx] > DAG can not show properly in IE > --- > > Key: FLINK-5901 > URL: https://issues.apache.org/jira/browse/FLINK-5901 > Project: Flink > Issue Type: Bug > Components: Webfrontend > Environment: IE 11 >Reporter: Tao Wang >Priority: Critical > Attachments: using chrom(same job).png, using IE.png > > > The DAG of running jobs can not show properly in IE11(I am using > 11.0.9600.18059, but assuming same with IE9). The description of task is > not shown within the rectangle. > Chrome is well. I pasted the screeshot under IE and Chrome below. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5901) DAG can not show properly in IE
[ https://issues.apache.org/jira/browse/FLINK-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003962#comment-16003962 ] zhihao chen commented on FLINK-5901: [~StephanEwen][~WangTao] Encountered the same problem, and confirmed in accordance with [FLINK-5902|https://issues.apache.org/jira/browse/FLINK-5902] , but can not solve this problem, i found we used the foreignObject element to draw svg vector map, maybe the reason. [2.1.24 [SVG11] Section 23.3, The 'foreignObject' element|https://msdn.microsoft.com/en-us/library/hh834675%28v=vs.85%29.aspx] > DAG can not show properly in IE > --- > > Key: FLINK-5901 > URL: https://issues.apache.org/jira/browse/FLINK-5901 > Project: Flink > Issue Type: Bug > Components: Webfrontend > Environment: IE 11 >Reporter: Tao Wang >Priority: Critical > Attachments: using chrom(same job).png, using IE.png > > > The DAG of running jobs can not show properly in IE11(I am using > 11.0.9600.18059, but assuming same with IE9). The description of task is > not shown within the rectangle. > Chrome is well. I pasted the screeshot under IE and Chrome below. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6477) The first time to click Taskmanager cannot get the actual data
[ https://issues.apache.org/jira/browse/FLINK-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003951#comment-16003951 ] zhihao chen commented on FLINK-6477: hi Chesnay Schepler: yes, this problem only visible on the first access, and we could get the metrics data from the TM at the first time, but if we repeat the requests as the follow, the display is normal , not very understand why. {code} .controller 'SingleTaskManagerController', ($scope, $stateParams, SingleTaskManagerService, $interval, flinkConfig) -> $scope.metrics = {} SingleTaskManagerService.loadMetrics($stateParams.taskmanagerid).then (data) -> $scope.metrics = data[0] SingleTaskManagerService.loadMetrics($stateParams.taskmanagerid).then (data) -> $scope.metrics = data[0] {code} > The first time to click Taskmanager cannot get the actual data > -- > > Key: FLINK-6477 > URL: https://issues.apache.org/jira/browse/FLINK-6477 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0 >Reporter: zhihao chen >Assignee: zhihao chen > Attachments: errDisplay.jpg > > > Flink web page first click Taskmanager to get less than the actual data, when > the parameter “jobmanager.web.refresh-interval” is set to a larger value, eg: > 180, if you do not manually refresh the page you need to wait time after > the timeout normal display -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-6477) The first time to click Taskmanager cannot get the actual data
[ https://issues.apache.org/jira/browse/FLINK-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihao chen updated FLINK-6477: --- Attachment: errDisplay.jpg > The first time to click Taskmanager cannot get the actual data > -- > > Key: FLINK-6477 > URL: https://issues.apache.org/jira/browse/FLINK-6477 > Project: Flink > Issue Type: Bug > Components: Web Client >Affects Versions: 1.2.0 >Reporter: zhihao chen >Assignee: zhihao chen > Attachments: errDisplay.jpg > > > Flink web page first click Taskmanager to get less than the actual data, when > the parameter “jobmanager.web.refresh-interval” is set to a larger value, eg: > 180, if you do not manually refresh the page you need to wait time after > the timeout normal display -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-6477) The first time to click Taskmanager cannot get the actual data
[ https://issues.apache.org/jira/browse/FLINK-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihao chen updated FLINK-6477: --- Affects Version/s: 1.2.0 Component/s: Web Client > The first time to click Taskmanager cannot get the actual data > -- > > Key: FLINK-6477 > URL: https://issues.apache.org/jira/browse/FLINK-6477 > Project: Flink > Issue Type: Bug > Components: Web Client >Affects Versions: 1.2.0 >Reporter: zhihao chen >Assignee: zhihao chen > > Flink web page first click Taskmanager to get less than the actual data, when > the parameter “jobmanager.web.refresh-interval” is set to a larger value, eg: > 180, if you do not manually refresh the page you need to wait time after > the timeout normal display -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-6477) The first time to click Taskmanager cannot get the actual data
zhihao chen created FLINK-6477: -- Summary: The first time to click Taskmanager cannot get the actual data Key: FLINK-6477 URL: https://issues.apache.org/jira/browse/FLINK-6477 Project: Flink Issue Type: Bug Reporter: zhihao chen Assignee: zhihao chen Flink web page first click Taskmanager to get less than the actual data, when the parameter “jobmanager.web.refresh-interval” is set to a larger value, eg: 180, if you do not manually refresh the page you need to wait time after the timeout normal display -- This message was sent by Atlassian JIRA (v6.3.15#6346)