[jira] [Created] (FLINK-17709) Active Kubernetes integration phase 3 - Advanced Features
Canbin Zheng created FLINK-17709: Summary: Active Kubernetes integration phase 3 - Advanced Features Key: FLINK-17709 URL: https://issues.apache.org/jira/browse/FLINK-17709 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes, Runtime / Coordination Reporter: Canbin Zheng Fix For: 1.12.0 This is the umbrella issue to track all the advanced features for phase 3 of active Kubernetes integration in Flink 1.12.0. Some of the features are: # Support multiple JobManagers in ZooKeeper based HA setups. # Support user-specified pod templates. # Support FileSystem based high availability. # Support running PyFlink. # Support accessing secured services via K8s secrets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17707) Support configuring replica of Deployment in ZooKeeper based HA setups
Canbin Zheng created FLINK-17707: Summary: Support configuring replica of Deployment in ZooKeeper based HA setups Key: FLINK-17707 URL: https://issues.apache.org/jira/browse/FLINK-17707 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes, Runtime / Coordination Reporter: Canbin Zheng Fix For: 1.12.0 At the moment, in the native K8s setups, we hard code the replica of Deployment to 1. However, when users enable the ZooKeeper HighAvailabilityServices, they would like to configure the replica of Deployment also for faster failover. This ticket proposes to make *replica* of Deployment configurable in the ZooKeeper based HA setups. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17598) Implement FileSystemHAServices for native K8s setups
Canbin Zheng created FLINK-17598: Summary: Implement FileSystemHAServices for native K8s setups Key: FLINK-17598 URL: https://issues.apache.org/jira/browse/FLINK-17598 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes, Runtime / Coordination Reporter: Canbin Zheng At the moment we use Zookeeper as a distributed coordinator for implementing JobManager high availability services. But in the cloud-native environment, there is a trend that more and more users prefer to use *Kubernetes* as the underlying scheduler backend while *Storage Object* as the Storage medium, both of these two services don't require Zookeeper deployment. As a result, in the K8s setups, people have to deploy and maintain additional Zookeeper clusters for solving JobManager SPOF. This ticket proposes to provide a simplified FileSystem HA implementation with the leader-election removed, it saves the efforts of Zookeeper deployment and maintenance. To achieve this, we plan to # Introduce the {{FileSystemHaServices}} which implements the {{HighAvailabilityServices}}. # Replace Deployment with StatefulSet to ensure *at most one* semantics to avoid potential concurrent access to the underlying FileSystem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17566) Fix potential K8s resources leak after JobManager finishes in Applicaion mode
Canbin Zheng created FLINK-17566: Summary: Fix potential K8s resources leak after JobManager finishes in Applicaion mode Key: FLINK-17566 URL: https://issues.apache.org/jira/browse/FLINK-17566 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Reporter: Canbin Zheng FLINK-10934 introduces applicaion mode support in the native K8s setups., but as the discussion in [https://github.com/apache/flink/pull/12003|https://github.com/apache/flink/pull/12003,], there's large probability that all the K8s resources leak after the JobManager finishes except that the replica of Deployment is scaled down to 0. We need to find out the root cause and fix it. This may be related to the way fabric8 SDK deletes a Deployment. It splits the procedure into three steps as follows: # Scales down the replica to 0 # Wait until the scaling down succeed # Delete the ReplicaSet -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17565) Bump fabric8 version from 4.5.2 to 4.10.1
Canbin Zheng created FLINK-17565: Summary: Bump fabric8 version from 4.5.2 to 4.10.1 Key: FLINK-17565 URL: https://issues.apache.org/jira/browse/FLINK-17565 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Currently, we are using a version of 4.5.2, it's better that we upgrade it to 4.10.1 for features like K8s 1.17 support, LeaderElection support, and etc. For more details, please refer to [fabric8 releases|[https://github.com/fabric8io/kubernetes-client/releases].] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17549) Support running Stateful Functions on native Kubernetes setup
Canbin Zheng created FLINK-17549: Summary: Support running Stateful Functions on native Kubernetes setup Key: FLINK-17549 URL: https://issues.apache.org/jira/browse/FLINK-17549 Project: Flink Issue Type: New Feature Components: Build System / Stateful Functions, Deployment / Kubernetes Reporter: Canbin Zheng This is the umbrella issue for running Stateful Functions on Kubernetes in native mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17480) Support PyFlink on native Kubernetes setup
Canbin Zheng created FLINK-17480: Summary: Support PyFlink on native Kubernetes setup Key: FLINK-17480 URL: https://issues.apache.org/jira/browse/FLINK-17480 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng This is the umbrella issue for all PyFlink related tasks with relation to Flink on Kubernetes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17332) Fix restart policy not equals to Never for native task manager pods
Canbin Zheng created FLINK-17332: Summary: Fix restart policy not equals to Never for native task manager pods Key: FLINK-17332 URL: https://issues.apache.org/jira/browse/FLINK-17332 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, we do not explicitly set the {{RestartPolicy}} for the TaskManager Pod in native K8s setups so that it is {{Always}} by default. The task manager pod itself should not restart the failed Container, the decision should always made by the job manager. Therefore, this ticket proposes to set the {{RestartPolicy}} to {{Never}} for the task manager pods. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17232) Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint
Canbin Zheng created FLINK-17232: Summary: Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint Key: FLINK-17232 URL: https://issues.apache.org/jira/browse/FLINK-17232 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, for the LB/NodePort type Service, if we found that the {{LoadBalancer}} in the {{Service}} is null, we would use the externalIPs configured in the external Service as the address of the Endpoint. Again, this is another implicit toleration and may confuse the users. This ticket proposes to rethink the implicit toleration behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17231) Deprecate possible implicit fallback from LB to NodePort When retrieving Endpoint of LB Service
Canbin Zheng created FLINK-17231: Summary: Deprecate possible implicit fallback from LB to NodePort When retrieving Endpoint of LB Service Key: FLINK-17231 URL: https://issues.apache.org/jira/browse/FLINK-17231 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, if people use the LB Service, when it comes to retrieving the Endpoint for the external Service, we make an implicit fallback immediately to return the NodePort address and port if we fail to get the LB address in a single try. This kind of toleration can confuse the users since the NodePort address/port maybe unaccessible due to some reasons like network security policy, and the users may not know what really happen behind. This ticket proposes to always return the LB address/port otherwise throw an Exception indicating that the LB is unready or abnormal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17230) Fix incorrect returned address of Endpoint for the ClusterIP Service
Canbin Zheng created FLINK-17230: Summary: Fix incorrect returned address of Endpoint for the ClusterIP Service Key: FLINK-17230 URL: https://issues.apache.org/jira/browse/FLINK-17230 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 At the moment, when the type of the external Service is set to {{ClusterIP}}, we return an incorrect address {{KubernetesUtils.getInternalServiceName(clusterId) + "." + nameSpace}} for the Endpoint. This ticket aims to fix this bug by returning {{KubernetesUtils.getRestServiceName(clusterId) + "." + nameSpace}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17196) Rework the implementation of Fabric8FlinkKubeClient#getRestEndpoint
Canbin Zheng created FLINK-17196: Summary: Rework the implementation of Fabric8FlinkKubeClient#getRestEndpoint Key: FLINK-17196 URL: https://issues.apache.org/jira/browse/FLINK-17196 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 This is the umbrella issue for the rework of the implementation of {{Fabric8FlinkKubeClient#getRestEndpoint}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17177) Handle Pod ERROR event correctly in KubernetesResourceManager#onError
Canbin Zheng created FLINK-17177: Summary: Handle Pod ERROR event correctly in KubernetesResourceManager#onError Key: FLINK-17177 URL: https://issues.apache.org/jira/browse/FLINK-17177 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0, 1.10.1 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, once we receive an *ERROR* event that is sent from the K8s API server via the K8s {{Watcher}}, then {{KubernetesResourceManager#onError}} will handle it by calling the {{KubernetesResourceManager#removePodIfTerminated}}. This is incorrect since the *ERROR* event ** indicates an exception in the HTTP layer that is caused by the K8s Server, which means the previously created {{Watcher}} is no longer available and we should re-create a new {{Watcher}} immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17176) Slow down Pod recreation in KubernetesResourceManager#PodCallbackHandler
Canbin Zheng created FLINK-17176: Summary: Slow down Pod recreation in KubernetesResourceManager#PodCallbackHandler Key: FLINK-17176 URL: https://issues.apache.org/jira/browse/FLINK-17176 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 In the native K8s setups, there are some cases that we do not control the speed of pod re-creation which poses potential risks to flood the K8s API Server in the {{PodCallbackHandler}} implementation of {{KubernetesResourceManager.}} Here are steps to reproduce this kind of problems: # Mount the {{/opt/flink/log}} in the Container of TaskManager to a path on the K8s nodes via HostPath, make sure that the path exists but the TaskManager process has no write permission. We can achieve this via the user-specified pod template support or just hardcode it for testing only. # Launch a session cluster # Submit a new job to the session cluster, as expected, we can observe that the Pod constantly fails quickly during launching the main Container, then the {{KubernetesResourceManager#onModified}} is invoked to re-create a new Pod immediately, without any speed control. To sum up, once the {{KubernetesResourceManager}} receives the Pod *ADD* event and that Pod is terminated before successfully registering into the {{KubernetesResourceManager}}, the {{KubernetesResourceManager}} will send another creation request to K8s API Server immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17104) Support registering custom JobStatusListeners from config
Canbin Zheng created FLINK-17104: Summary: Support registering custom JobStatusListeners from config Key: FLINK-17104 URL: https://issues.apache.org/jira/browse/FLINK-17104 Project: Flink Issue Type: New Feature Components: API / Core Reporter: Canbin Zheng Currently, a variety of users are asking for registering custom JobStatusListener support to get timely feedback on the status transition of the jobs. This could be an important feature for effective Flink cluster monitoring systems. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17102) Add -Dkubernetes.container.image= for the start-flink-session section
Canbin Zheng created FLINK-17102: Summary: Add -Dkubernetes.container.image= for the start-flink-session section Key: FLINK-17102 URL: https://issues.apache.org/jira/browse/FLINK-17102 Project: Flink Issue Type: Sub-task Reporter: Canbin Zheng Add {{-Dkubernetes.container.image=}} as a guide for new users in the existing command: {quote}{{}} {{./bin/kubernetes-session.sh \}} {{-Dkubernetes.cluster-id= \-Dtaskmanager.memory.process.size=4096m \-Dkubernetes.taskmanager.cpu=2 \-Dtaskmanager.numberOfTaskSlots=4 \-Dresourcemanager.taskmanager-timeout=360}}{{}} {quote} Details could refer to [https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html#start-flink-session] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17100) Document Native Kubernetes Improvements
Canbin Zheng created FLINK-17100: Summary: Document Native Kubernetes Improvements Key: FLINK-17100 URL: https://issues.apache.org/jira/browse/FLINK-17100 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes, Documentation Reporter: Canbin Zheng This is the umbrella issue for the native Kubernetes documentation improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17090) Harden preckeck for the KubernetesConfigOptions.JOB_MANAGER_CPU
Canbin Zheng created FLINK-17090: Summary: Harden preckeck for the KubernetesConfigOptions.JOB_MANAGER_CPU Key: FLINK-17090 URL: https://issues.apache.org/jira/browse/FLINK-17090 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 If people specify a negative value for the config option of {{KubernetesConfigOptions#JOB_MANAGER_CPU}} as what the following command does, {code:java} ./bin/kubernetes-session.sh -Dkubernetes.jobmanager.cpu=-3.0 -Dkubernetes.cluster-id=...{code} then it will throw an exception as follows: {quote}org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "felix1". at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:192) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:129) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:108) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:185) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:185) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: [https://cls-cf5wqdwy.ccs.tencent-cloud.com/apis/apps/v1/namespaces/default/deployments]. Message: Deployment.apps "felix1" is invalid: [spec.template.spec.containers[0].resources.limits[cpu]: Invalid value: "-3": must be greater than or equal to 0, spec.template.spec.containers[0].resources.requests[cpu]: Invalid value: "-3": must be greater than or equal to 0]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].resources.limits[cpu], message=Invalid value: "-3": must be greater than or equal to 0, reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].resources.requests[cpu], message=Invalid value: "-3": must be greater than or equal to 0, reason=FieldValueInvalid, additionalProperties={})], group=apps, kind=Deployment, name=felix1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "felix1" is invalid: [spec.template.spec.containers[0].resources.limits[cpu]: Invalid value: "-3": must be greater than or equal to 0, spec.template.spec.containers[0].resources.requests[cpu]: Invalid value: "-3": must be greater than or equal to 0], metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324) at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.createJobManagerComponent(Fabric8FlinkKubeClient.java:83) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:182) {quote} Since there is a gap in the configuration model between the flink-side and the k8s-side, this ticket proposes to harden precheck in the flink k8s parameters parsing tool and throw a more user-friendly exception message like "the value of {{kubernetes.jobmanager.cpu}} must be greater than or equal to 0". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17087) Use constant port for rest.port when it's set as 0 on Kubernetes
Canbin Zheng created FLINK-17087: Summary: Use constant port for rest.port when it's set as 0 on Kubernetes Key: FLINK-17087 URL: https://issues.apache.org/jira/browse/FLINK-17087 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 If people set {{rest.port}} to 0 when deploying a native K8s session cluster as the following command does, {code:java} ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=felix1 -Drest.port=0 ... {code} the submission client will throw an Exception as follows: {quote}org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster felix1 at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:189) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:129) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:108) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:185) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:185) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://xxx/apis/apps/v1/namespaces/default/deployments. Message: Deployment.apps "felix1" is invalid: spec.template.spec.containers[0].ports[0].containerPort: Required value. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].ports[0].containerPort, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=felix1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "felix1" is invalid: spec.template.spec.containers[0].ports[0].containerPort: Required value, metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324) at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.createJobManagerComponent(Fabric8FlinkKubeClient.java:83) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:184) ... 5 more {quote} As we can see, the exception message is unintuitive and may confuse a variety of users. Therefore, this ticket proposes to use a fixed port instead if people set it as 0, like what we have done for the {{blob.server.port}} and the {{taskmanager.rpc.port}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17078) Logging output is misleading when executing bin/flink -e kubernetes-session without specifying cluster-id
Canbin Zheng created FLINK-17078: Summary: Logging output is misleading when executing bin/flink -e kubernetes-session without specifying cluster-id Key: FLINK-17078 URL: https://issues.apache.org/jira/browse/FLINK-17078 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 When executing the following command: {code:java} ./bin/flink run -d -e kubernetes-session examples/streaming/SocketWindowWordCount.jar --hostname 172.16.0.6 --port 12345 {code} The exception stack would be: {quote}org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of flink-cluster-6fa5f5dc-4b50-48c3-b57f-d91e686fa474 at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:148) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) Caused by: java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of flink-cluster-6fa5f5dc-4b50-48c3-b57f-d91e686fa474 at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:94) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:118) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:59) at org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:63) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1756) at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:106) at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:72) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1643) at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321) ... 8 more Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of flink-cluster-6fa5f5dc-4b50-48c3-b57f-d91e686fa474 ... 22 more {quote} The logging output is misleading, we'd better throw an exception indicating that people should explicitly specify the value of {{kubernetes.cluster-id}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17055) List jobs via bin/flink throws FlinkException indicating no cluster id was specified
Canbin Zheng created FLINK-17055: Summary: List jobs via bin/flink throws FlinkException indicating no cluster id was specified Key: FLINK-17055 URL: https://issues.apache.org/jira/browse/FLINK-17055 Project: Flink Issue Type: Bug Components: Command Line Client Affects Versions: 1.10.0 Reporter: Canbin Zheng The first command works fine. {code:java} ./bin/flink list -m yarn-cluster -yid application_1580534782944_0015 {code} The second command throws an Exception. {code:java} ./bin/flink list -e yarn-per-job -yid application_1580534782944_0015 {code} And the exception stack is: > The program finished with the following exception: > org.apache.flink.util.FlinkException: No cluster id was specified. Please > specify a cluster to which you would like to connect. > at >org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:836) > at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:334) > at >org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:896) > at >org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) > at >org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17034) Execute the container CMD under TINI for better hygiene
Canbin Zheng created FLINK-17034: Summary: Execute the container CMD under TINI for better hygiene Key: FLINK-17034 URL: https://issues.apache.org/jira/browse/FLINK-17034 Project: Flink Issue Type: Improvement Components: Deployment / Docker, Deployment / Kubernetes, Dockerfiles Reporter: Canbin Zheng The container of the JM or TMs is running in the shell form and it could not receive the *TERM* signal when the pod is deleted. Some of the problems are as follows: * The JM and the TMs could have no chance of cleanup, I used to create FLINK-15843[4] for tracking this problem. * The pod could take a long time(up to 40 seconds) to be deleted after the K8s API Server receives the deletion request. The discussion could refer to [http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-111-Docker-image-unification-td38444i20.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17033) Upgrade OpenJDK docker image for Kubernetes
Canbin Zheng created FLINK-17033: Summary: Upgrade OpenJDK docker image for Kubernetes Key: FLINK-17033 URL: https://issues.apache.org/jira/browse/FLINK-17033 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 The current docker image {{openjdk:8-jre-alpine}} used by Kubernetes has many problems, here is some of them: # There is no official support for this image since the commit [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099] # [DNS-lookup-5s-delay|https://k8s.imroc.io/troubleshooting/cases/dns-lookup-5s-delay] # [DNS resolver does not read /etc/hosts|https://github.com/golang/go/issues/22846] Therefore, this ticket proposes to investigate an alternative official JDK docker image; I think it's a good choice to use {{openjdk:8-jre-slim}}(184MB) instead, the reasons are as follows: # It has official support from openjdk: [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links] # It is based on debian and does not have many problems such as DNS lookup delay. # It's much smaller in size than other official docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17032) Naming convention unification for all the Kubernetes Resources
Canbin Zheng created FLINK-17032: Summary: Naming convention unification for all the Kubernetes Resources Key: FLINK-17032 URL: https://issues.apache.org/jira/browse/FLINK-17032 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, the naming rules are different among the Kubernetes resources we have created, the rules are as follows: # The Deployment: ${clusterId} # The internal Service: ${clusterId} # The external Service: ${clusterId}-rest # The Flink Configuration ConfigMap: flink-config-${clusterId} # The Hadoop Configuration ConfigMap: hadoop-config-${clusterId} # The JobManager Pod: ${clusterId}-${random string}-${random string} # The TaskManager Pod: ${clusterId}-taskmanager-${currentMaxAttemptId}-${currentMaxPodId} In the future, we would add other Kubernetes resources, and it would be better to have a unified naming convention for all of them. This ticket proposes the following naming convention: * The Deployment: ${clusterId} * The internal Service: ${clusterId}-inter-svc * The external Service: ${clusterId}-rest-svc * The Flink Configuration ConfigMap: ${clusterId}-flink-config * The Hadoop Configuration ConfigMap: ${clusterId}-hadoop-config * The JobManager Pod: ${clusterId}-${random string}-${random string} * The TaskManager Pod: ${clusterId}-taskmanager-${currentMaxAttemptId}-${currentMaxPodId} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17008) Handle null value for HADOOP_CONF_DIR/HADOOP_HOME env in AbstractKubernetesParameters#getLocalHadoopConfigurationDirectory
Canbin Zheng created FLINK-17008: Summary: Handle null value for HADOOP_CONF_DIR/HADOOP_HOME env in AbstractKubernetesParameters#getLocalHadoopConfigurationDirectory Key: FLINK-17008 URL: https://issues.apache.org/jira/browse/FLINK-17008 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 {{System.getenv(Constants.ENV_HADOOP_CONF_DIR)}} or {{System.getenv(Constants.HADOOP_HOME)}} could return null value if they are not set in the System environments, it's a minor improvement to take these situations into consideration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16913) ReadableConfigToConfigurationAdapter#getEnum throws UnsupportedOperationException
Canbin Zheng created FLINK-16913: Summary: ReadableConfigToConfigurationAdapter#getEnum throws UnsupportedOperationException Key: FLINK-16913 URL: https://issues.apache.org/jira/browse/FLINK-16913 Project: Flink Issue Type: Improvement Components: Runtime / Configuration Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.10.1, 1.11.0 Attachments: image-2020-04-01-16-46-13-122.png Steps to reproduce the issue: # Set flink-conf.yaml ** state.backend: rocksdb ** state.checkpoints.dir: hdfs:///flink-checkpoints ** state.savepoints.dir: hdfs:///flink-checkpoints # Start a Kubernetes session cluster # Submit a job to the session cluster following a UnsupportedOperationException {code:java} The program finished with the following exception:org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: The adapter does not support this method at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:143) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:659) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:890) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:963) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:963) Caused by: java.lang.UnsupportedOperationException: The adapter does not support this method at org.apache.flink.configuration.ReadableConfigToConfigurationAdapter.getEnum(ReadableConfigToConfigurationAdapter.java:258) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.(RocksDBStateBackend.java:336) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.configure(RocksDBStateBackend.java:394) at org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory.createFromConfig(RocksDBStateBackendFactory.java:47) at org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory.createFromConfig(RocksDBStateBackendFactory.java:32) at org.apache.flink.runtime.state.StateBackendLoader.loadStateBackendFromConfig(StateBackendLoader.java:154) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.loadStateBackend(StreamExecutionEnvironment.java:792) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.configure(StreamExecutionEnvironment.java:761) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.(StreamExecutionEnvironment.java:217) at org.apache.flink.client.program.StreamContextEnvironment.(StreamContextEnvironment.java:53) at org.apache.flink.client.program.StreamContextEnvironment.lambda$setAsContext$2(StreamContextEnvironment.java:103) at java.util.Optional.map(Optional.java:215) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment(StreamExecutionEnvironment.java:1882) at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321) ... 8 more {code} The call stack is: I am wondering why we introduce {{ReadableConfigToConfigurationAdapter}} to wrap the {{Configuration}} but leave many of the methods in it to throw UnsupportedOperationException that causes problems. Why don't we use {{Configuration}} object directly? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16855) Update K8s template for the standalone setup
Canbin Zheng created FLINK-16855: Summary: Update K8s template for the standalone setup Key: FLINK-16855 URL: https://issues.apache.org/jira/browse/FLINK-16855 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 Following FLINK-16602 and discussion in [https://github.com/apache/flink/pull/11456] We would like to update the K8s template for the standalone setup as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16737) Remove KubernetesUtils#getContentFromFile
Canbin Zheng created FLINK-16737: Summary: Remove KubernetesUtils#getContentFromFile Key: FLINK-16737 URL: https://issues.apache.org/jira/browse/FLINK-16737 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 Since {{org.apache.flink.util.FileUtils}} has already provided some utilities such as {{readFile}} or {{readFileUtf8}} for reading file contents, we can remove the {{KubernetesUtils#getContentFromFile}} and use the {{FileUtils}} tool instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16715) Always use the configuration argument in YarnClusterDescriptor#startAppMaster to make it more self-contained
Canbin Zheng created FLINK-16715: Summary: Always use the configuration argument in YarnClusterDescriptor#startAppMaster to make it more self-contained Key: FLINK-16715 URL: https://issues.apache.org/jira/browse/FLINK-16715 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Canbin Zheng Fix For: 1.11.0 In the YarnClusterDescriptor#{{startAppMaster()}} we are using some time the configuration argument to the method to get/set config options, and sometimes the flinkConfiguration which is a class member. This ticket proposes to always use the configuration argument to make the method more self-contained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16699) Support accessing secured services via K8s secrets
Canbin Zheng created FLINK-16699: Summary: Support accessing secured services via K8s secrets Key: FLINK-16699 URL: https://issues.apache.org/jira/browse/FLINK-16699 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 Kubernetes [Secrets|https://kubernetes.io/docs/concepts/configuration/secret/] can be used to provide credentials for a Flink application to access secured services. This ticket proposes to # Support to mount user-specified K8s Secrets into the JobManager/TaskManager Container # Support to use a user-specified K8s Secret through an environment variable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16625) Extract BootstrapTools#getEnvironmentVariables to a general utility in ConfigurationUtil
Canbin Zheng created FLINK-16625: Summary: Extract BootstrapTools#getEnvironmentVariables to a general utility in ConfigurationUtil Key: FLINK-16625 URL: https://issues.apache.org/jira/browse/FLINK-16625 Project: Flink Issue Type: Improvement Components: API / Core Reporter: Canbin Zheng Fix For: 1.11.0 {{BootstrapTools#getEnvironmentVariables}} actually is a general utility to extract key-value pairs with specified prefix trimmed from the Flink Configuration object. It can not only be used to extract customized environment variables in the YARN setup but also for customized annotations/labels/node-selectors in the Kubernetes setup. This ticket proposes to rename it to {{ConfigurationUtil#getPrefixedKeyValuePairs}} and move it to the {{flink-core}} module as a more general utility to share for the YARN/Kubernetes setup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16624) Support to customize annotations for the rest Service on Kubernetes
Canbin Zheng created FLINK-16624: Summary: Support to customize annotations for the rest Service on Kubernetes Key: FLINK-16624 URL: https://issues.apache.org/jira/browse/FLINK-16624 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 There are some scenarios that people would like to customize annotations Service, for example: # Specify the LB type to decide whether it should have an external IP. # Specify the Security Policy for the LB. It's a common need, especially in the Cloud Environment. This ticket proposes to add support for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16602) Rework the Service design for Kubernetes deployment
Canbin Zheng created FLINK-16602: Summary: Rework the Service design for Kubernetes deployment Key: FLINK-16602 URL: https://issues.apache.org/jira/browse/FLINK-16602 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 {color:#0e101a}At the moment we usually create two Services for a Flink application, one is the internal Service and the other is the so-called rest Service, the previous aims for forwarding request from the TMs to the JM, and the rest Service mainly serves as an external service for the Flink application. Here is a summary of the issues:{color} # {color:#0e101a}The functionality boundary of the two Services is not clear enough since the internal Service could also become the rest Service when its exposed type is ClusterIP.{color} # {color:#0e101a}For the high availability scenario, we create a useless internal Service which does not help forward the internal requests since the TMs directly communicate with the JM via the IP or hostname of the JM Pod.{color} # {color:#0e101a}Headless service is enough to help forward the internal requests from the TMs to the JM. Service of ClusterIP type would add corresponding rules into the iptables, too many rules in the iptables would lower the kube-proxy's efficiency in refreshing iptables while notified of change events, which could cause severe stability problems in a Kubernetes cluster.{color} {color:#0e101a}Therefore, we propose some improvements to the current design:{color} # {color:#0e101a}Clarify the functionality boundary for the two Services, the internal Service only serves the internal communication from TMs to JM, while the rest Service makes the Flink cluster accessible from outside. The internal Service only exposes the RPC and BLOB ports while the external one exposes the REST port.{color} # {color:#0e101a}Do not create the internal Service in the high availability case.{color} # {color:#0e101a}Use HEADLESS type for the internal Service.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16601) Corret the way to get Endpoint address for NodePort rest Service
Canbin Zheng created FLINK-16601: Summary: Corret the way to get Endpoint address for NodePort rest Service Key: FLINK-16601 URL: https://issues.apache.org/jira/browse/FLINK-16601 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Currently, if one sets the type of the rest-service to {{NodePort}}, then the way to get the Endpoint address is by calling the method of 'KubernetesClient.getMasterUrl().getHost()'. This solution works fine for the case of the non-managed Kubernetes cluster but not for the managed ones. For the managed Kubernetes cluster setups, the Kubernetes masters are deployed in a pool different from the Kubernetes nodes and do not expose NodePort for a NodePort Service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16600) Respect the rest.bind-port for the Kubernetes setup
Canbin Zheng created FLINK-16600: Summary: Respect the rest.bind-port for the Kubernetes setup Key: FLINK-16600 URL: https://issues.apache.org/jira/browse/FLINK-16600 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Our current logic only takes care of {{RestOptions.PORT}} but not {{RestOptions.BIND_PORT, }}which is a bug. For example, when one sets the {{RestOptions.BIND_PORT}} to a value other than {{RestOptions.PORT}}, jobs could not be submitted to the existing session cluster deployed via the kubernetes-session.sh. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16598) Respect the rest port exposed by Service in Fabric8FlinkKubeClient#getRestEndpoint
Canbin Zheng created FLINK-16598: Summary: Respect the rest port exposed by Service in Fabric8FlinkKubeClient#getRestEndpoint Key: FLINK-16598 URL: https://issues.apache.org/jira/browse/FLINK-16598 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Steps to reproduce the problem: # Start a Kubernetes session cluster, by default, the rest port exposed by the rest Service is 8081. # Submit a job to the session cluster, but specify a different rest port via -Drest.port=8082 As we can see, the job could never be submitted to the existing session cluster since we retrieve the Endpoint from the Flink configuration object instead of the created Service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16547) Corrent the order to write temporary files in YarnClusterDescriptor#startAppMaster
Canbin Zheng created FLINK-16547: Summary: Corrent the order to write temporary files in YarnClusterDescriptor#startAppMaster Key: FLINK-16547 URL: https://issues.apache.org/jira/browse/FLINK-16547 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Canbin Zheng Fix For: 1.11.0 Currently, in {{YarnClusterDescriptor#startAppMaster}}, we first write out and upload the Flink Configuration file, then start to write out the JobGraph file and set its name into the Flink Configuration object, the afterward setting is not written into the Flink Configuration file so that it does not take effect in the cluster side. Since in the client-side we name the JobGraph file with the default value of FileJobGraphRetriever.JOB_GRAPH_FILE_PATH option, the cluster side could succeed in retrieving that file. This ticket proposes to write out the JobGraph file before the Configuration file to ensure that the setting of FileJobGraphRetriever.JOB_GRAPH_FILE_PATH is delivered to the cluster side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16546) Fix logging bug in YarnClusterDescriptor#startAppMaster
Canbin Zheng created FLINK-16546: Summary: Fix logging bug in YarnClusterDescriptor#startAppMaster Key: FLINK-16546 URL: https://issues.apache.org/jira/browse/FLINK-16546 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Canbin Zheng Fix For: 1.11.0 It's a minor fixup of the warning in case of tmpJobGraphFile deletion failure. From {code:java} LOG.warn("Fail to delete temporary file {}.", tmpConfigurationFile.toPath()); {code} to {code:java} LOG.warn("Fail to delete temporary file {}.", tmpJobGraphFile.toPath()); {code} [https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L854] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16521) Remove unused FileUtils#isClassFile
Canbin Zheng created FLINK-16521: Summary: Remove unused FileUtils#isClassFile Key: FLINK-16521 URL: https://issues.apache.org/jira/browse/FLINK-16521 Project: Flink Issue Type: Improvement Components: API / Core Reporter: Canbin Zheng Fix For: 1.10.1, 1.11.0 Remove the unused public method of {{FileUtils#isClassFile}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16508) Name the ports exposed by the main Container in Pod
Canbin Zheng created FLINK-16508: Summary: Name the ports exposed by the main Container in Pod Key: FLINK-16508 URL: https://issues.apache.org/jira/browse/FLINK-16508 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.10.1, 1.11.0 Currently, we expose some ports via the main Container of the JobManager and the TaskManager, but we forget to name those ports so that people could be confused because there is no description of the port usage. This ticket proposes to explicitly name the ports in the Container to help people understand the usage of those ports. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16494) Use enum type instead of string type for KubernetesConfigOptions.CONTAINER_IMAGE_PULL_POLICY
Canbin Zheng created FLINK-16494: Summary: Use enum type instead of string type for KubernetesConfigOptions.CONTAINER_IMAGE_PULL_POLICY Key: FLINK-16494 URL: https://issues.apache.org/jira/browse/FLINK-16494 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 At the moment the value of {{KubernetesConfigOptions.{color:#172b4d}CONTAINER_IMAGE_PULL_POLICY}} {color}option could only be one of _{color:#172b4d}Always{color}_{color:#172b4d}, {color}_{color:#172b4d}Never{color}{color:#172b4d}, and IfNotPresent{color}_, this ticket proposes to use enum type for the option instead of string type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16493) Use enum type instead of string type for KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE
Canbin Zheng created FLINK-16493: Summary: Use enum type instead of string type for KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE Key: FLINK-16493 URL: https://issues.apache.org/jira/browse/FLINK-16493 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 At the moment the value of {{KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE}} option could only be one of _{color:#172b4d}ClusterIP, NodePort, and LoadBalancer{color}_{color:#172b4d}, {color}{color:#172b4d}this ticket proposes to use enum type for the option instead of string type.{color}_{color:#172b4d} {color}_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16253) Switch to Log4j 2 by default for flink-kubernetes submodule
Canbin Zheng created FLINK-16253: Summary: Switch to Log4j 2 by default for flink-kubernetes submodule Key: FLINK-16253 URL: https://issues.apache.org/jira/browse/FLINK-16253 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 Switch to Log4j 2 by default for flink-kubernetes submodule, including the script and the container startup command or parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16240) Port KubernetesUtilsTest to the right package
Canbin Zheng created FLINK-16240: Summary: Port KubernetesUtilsTest to the right package Key: FLINK-16240 URL: https://issues.apache.org/jira/browse/FLINK-16240 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Port {{KubernetesUtilsTest}} from {{org.apache.flink.kubernetes}} to {{org.apache.flink.kubernetes.utils.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16239) Port KubernetesSessionCliTest to the right package
Canbin Zheng created FLINK-16239: Summary: Port KubernetesSessionCliTest to the right package Key: FLINK-16239 URL: https://issues.apache.org/jira/browse/FLINK-16239 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Port KubernetesSessionCliTest from {{org.apache.flink.kubernetes}} to {{org.apache.flink.kubernetes.cli}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16238) Rename class name of Fabric8ClientTest to Fabric8FlinkKubeClient
Canbin Zheng created FLINK-16238: Summary: Rename class name of Fabric8ClientTest to Fabric8FlinkKubeClient Key: FLINK-16238 URL: https://issues.apache.org/jira/browse/FLINK-16238 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 It's a minor change to alignment the test class name of \{{org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16194) Refactor the Kubernetes resouces construction architecture
Canbin Zheng created FLINK-16194: Summary: Refactor the Kubernetes resouces construction architecture Key: FLINK-16194 URL: https://issues.apache.org/jira/browse/FLINK-16194 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 So far, Flink has made efforts for the native integration of Kubernetes. However, it is always essential to evaluate the existing design and consider alternatives that have better design and are easier to maintain in the long run. We have suffered from some problems while developing new features base on the current code. Here is some of them: # We don’t have a unified monadic-step based orchestrator architecture to construct all the Kubernetes resources. ** There are inconsistencies between the orchestrator architecture that client uses to create the Kubernetes resources, and the orchestrator architecture that the master uses to create Pods; this confuses new contributors, as there is a cognitive burden to understand two architectural philosophies instead of one; for another, maintenance and new feature development become quite challenging. ** Pod construction is done in one step. With the introduction of new features for the Pod, the construction process could become far more complicated, and the functionality of a single class could explode, which hurts code readability, writability, and testability. At the moment, we have encountered such challenges and realized that it is not an easy thing to develop new features related to the Pod. ** The implementations of a specific feature are usually scattered in multiple decoration classes. For example, the current design uses a decoration class chain that contains five Decorator class to mount a configuration file to the Pod. If people would like to introduce other configuration files support, such as Hadoop configuration or Keytab files, they have no choice but to repeat the same tedious and scattered process. # We don’t have dedicated objects or tools for centrally parsing, verifying, and managing the Kubernetes parameters, which has raised some maintenance and inconsistency issues. ** There are many duplicated parsing and validating code, including settings of Image, ImagePullPolicy, ClusterID, ConfDir, Labels, etc. It not only harms readability and testability but also is prone to mistakes. Refer to issue FLINK-16025 for inconsistent parsing of the same parameter. ** The parameters are scattered so that some of the method signatures have to declare many unnecessary input parameters, such as FlinkMasterDeploymentDecorator#createJobManagerContainer. For solving these issues, we propose to # Introduce a unified monadic-step based orchestrator architecture that has a better, cleaner and consistent abstraction for the Kubernetes resources construction process. # Add some dedicated tools for centrally parsing, verifying, and managing the Kubernetes parameters. Refer to the design doc for the details, any feedback is welcome. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16025) Service could expose different blob server port mismatched with JM Container
Canbin Zheng created FLINK-16025: Summary: Service could expose different blob server port mismatched with JM Container Key: FLINK-16025 URL: https://issues.apache.org/jira/browse/FLINK-16025 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.10.1, 1.11.0 The Service would always expose 6124 port if it should expose that port, and while building ServicePort we do not explicitly specify a target port, so the target port would always be 6124 too. {code:java} // From ServiceDecorator.java servicePorts.add(getServicePort( getPortName(BlobServerOptions.PORT.key()), Constants.BLOB_SERVER_PORT)); private ServicePort getServicePort(String name, int port) { return new ServicePortBuilder() .withName(name) .withPort(port) .build(); } {code} meanwhile the Container of the JM would expose the blob server port which is configured in the Flink Configuration, {code:java} // From FlinkMasterDeploymentDecorator.java final int blobServerPort = KubernetesUtils.parsePort(flinkConfig, BlobServerOptions.PORT); ... final Container container = createJobManagerContainer(flinkConfig, mainClass, hasLogback, hasLog4j, blobServerPort); {code} so there is a risk that the TM could not executing Task due to dependencies fetching failure if the Service exposes a blob server port which is different from the JM Container when one configures the blob server port with a value different from 6124. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15857) Support --help for the kubernetes-session script
Canbin Zheng created FLINK-15857: Summary: Support --help for the kubernetes-session script Key: FLINK-15857 URL: https://issues.apache.org/jira/browse/FLINK-15857 Project: Flink Issue Type: Sub-task Components: Command Line Client, Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 Add --help support for the kubernetes-session script to show usage of the Kubernetes session options for users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15856) Various Kubernetes command client improvements
Canbin Zheng created FLINK-15856: Summary: Various Kubernetes command client improvements Key: FLINK-15856 URL: https://issues.apache.org/jira/browse/FLINK-15856 Project: Flink Issue Type: Improvement Components: Command Line Client, Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 This is the umbrella issue for the Kubernetes related command client improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15852) Job is submitted to the wrong session cluster
Canbin Zheng created FLINK-15852: Summary: Job is submitted to the wrong session cluster Key: FLINK-15852 URL: https://issues.apache.org/jira/browse/FLINK-15852 Project: Flink Issue Type: Bug Components: Command Line Client Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0, 1.10.1 Steps to reproduce the problem: # Deploy a YARN session cluster by command \{{./bin/yarn-session.sh -d}} # Deploy a Kubernetes session cluster by command \{{./bin/kubernetes-session.sh -Dkubernetes.cluster-id=test ...}} # Try to submit a Job to the Kubernetes session cluster by command \{{./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id=test examples/streaming/WordCount.jar}} It's expected that the Job will be submitted to the Kubernetes session cluster whose cluster-id is *test*, however, the job was submitted to the YARN session cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15843) Do not violently kill TaskManagers
Canbin Zheng created FLINK-15843: Summary: Do not violently kill TaskManagers Key: FLINK-15843 URL: https://issues.apache.org/jira/browse/FLINK-15843 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0 The current solution of stopping a TaskManager instance when JobManager sends a deletion request is by directly calling ${\{KubernetesClient.pods().withName().delete}}, thus that instance would be violently killed with a _KILL_ signal and having no chance to clean up, which could cause problems because we expect the process to gracefully terminate when it is no longer needed. Refer to the guide of [Termination of Pods|[https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods]|https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods],], we know that on Kubernetes a _TERM_ signal would be first sent to the main process in each container, and may be followed up with a force _KILL_ signal if the grace period has expired; the Unix signal will be sent to the process which has PID 1 ([Docker Kill|https://docs.docker.com/engine/reference/commandline/kill/]), however, the TaskManagerRunner Process is spawned by {color:#172b4d}/opt/flink/bin/kubernetes-entry.sh {color}and could never have PID 1, so it would never receive the Unix signal_._ One walk around could be that JobManager firstly sends a *KILL_WORKER* message to the TaskManager, then the TaskManager gracefully terminates itself to ensure that the clean-up is completely finished, lastly, the JobManager deletes the Pod after a configurable graceful shut-down period. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15822) Rethink the necessity of the internal Service
Canbin Zheng created FLINK-15822: Summary: Rethink the necessity of the internal Service Key: FLINK-15822 URL: https://issues.apache.org/jira/browse/FLINK-15822 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0, 1.10.1 The current design introduces two Kubernetes Services when deploying a new session cluster. The rest Service serves external communication while the internal Service mainly serves two purposes: # A discovery service directs the communication from TaskManagers to the JobManager Pod that has labels containing the internal Service’s selector in the non-HA mode, so that the TM Pods could re-register to the new JM Pod once a JM Pod failover occurs, while in the HA mode, there could be one active and multiple standby JM Pods, so we use the Pod IP of the active one for internal communication instead of using the internal Service . # The OwnerReference of all other Kubernetes Resources, including the rest Service, the ConfigMap and the JobManager Deployment. Is it possible that we just create one single Service instead of two? I think things could work quite well with only the rest Service, meanwhile the design and code could be more succinct. This ticket proposes to remove the internal Service, the core changes including # In the non-HA mode, we use the rest Service as the JobManager Pod discovery service. # Set the JobManager Deployment as the OwnerReference of all the other Kubernetes Resources, including the rest Service and the ConfigMap. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15817) Kubernetes Resource leak while deployment exception happens
Canbin Zheng created FLINK-15817: Summary: Kubernetes Resource leak while deployment exception happens Key: FLINK-15817 URL: https://issues.apache.org/jira/browse/FLINK-15817 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0, 1.10.1 When we deploy a new session cluster on Kubernetes cluster, usually there are four steps to create the Kubernetes components, and the creation order is as below: internal Service -> rest Service -> ConfigMap -> JobManager Deployment. After the internal Service is created, any Exceptions that fail the cluster deployment progress would cause Kubernetes Resource leak, for example: # If failed to create rest Service due to service name constraint([FLINK-15816|https://issues.apache.org/jira/browse/FLINK-15816]), the internal Service would not be cleaned up when the deploy progress terminates. # If failed to create JobManager Deployment(a case is that _jobmanager.heap.size_ is too small such as 512M, which is less than the default configuration value of 'containerized.heap-cutoff-min'), the internal Service, the rest Service, and the ConfigMap all leaks. This ticket proposes to do some clean-ups(cleans the residual Services and ConfigMap) if the cluster deployment progress terminates accidentally on the client-side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15816) Limit the maximum length of the value of kubernetes.cluster-id configuration option
Canbin Zheng created FLINK-15816: Summary: Limit the maximum length of the value of kubernetes.cluster-id configuration option Key: FLINK-15816 URL: https://issues.apache.org/jira/browse/FLINK-15816 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0, 1.10.1 Two Kubernetes Service will be created when a session cluster is deployed, one is the internal Service and the other is the rest Service, we set the internal Service name to the value of the _kubernetes.cluster-id_ configuration option and then set the rest Service name to _${kubernetes.cluster-id}_ with a suffix *-rest* appended, said if we set the _kubernetes.cluster-id_ to *session-test*, then the internal Service name will be *session-test* and the rest Service name be *session-test-rest;* there is a constraint in Kubernetes that the Service name must be no more than 63 characters, for the current naming convention it leads to that the value of _kubernetes.cluster-id_ should not be more than 58 characters, otherwise there are scenarios that the internal Service is created successfully then comes up with a ClusterDeploymentException when trying to create the rest Service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15667) Hadoop Configurations Mount Support
Canbin Zheng created FLINK-15667: Summary: Hadoop Configurations Mount Support Key: FLINK-15667 URL: https://issues.apache.org/jira/browse/FLINK-15667 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng Mounts the Hadoop configurations, either as a pre-defined config map, or a local configuration directory on Pod. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15666) GPU scheduling support in Kubernetes mode
Canbin Zheng created FLINK-15666: Summary: GPU scheduling support in Kubernetes mode Key: FLINK-15666 URL: https://issues.apache.org/jira/browse/FLINK-15666 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng This is an umbrella ticket for work on GPU scheduling in Kubernetes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15665) External shuffle service support in Kubernetes mode
Canbin Zheng created FLINK-15665: Summary: External shuffle service support in Kubernetes mode Key: FLINK-15665 URL: https://issues.apache.org/jira/browse/FLINK-15665 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng This is an umbrella ticket for work on Kubernetes-specific external shuffle service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15663) Kerberos Support in Kubernetes Deploy Mode
Canbin Zheng created FLINK-15663: Summary: Kerberos Support in Kubernetes Deploy Mode Key: FLINK-15663 URL: https://issues.apache.org/jira/browse/FLINK-15663 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng This is the umbrella issue for all Kerberos related tasks with relation to Flink on Kubernetes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15662) Make K8s client timeouts configurable
Canbin Zheng created FLINK-15662: Summary: Make K8s client timeouts configurable Key: FLINK-15662 URL: https://issues.apache.org/jira/browse/FLINK-15662 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Kubernetes clients used in the client-side submission and requesting worker pods should have configurable read and connect timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15659) Introduce a client-side StatusWatcher for JM Pods
Canbin Zheng created FLINK-15659: Summary: Introduce a client-side StatusWatcher for JM Pods Key: FLINK-15659 URL: https://issues.apache.org/jira/browse/FLINK-15659 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng It's quite useful to introduce a client-side StatusWatcher to track and log the status of the JobManager Pods so that users who deploy applications on K8s via CLI can learn about the deploying progress conveniently. The status logging could occur on every state change, also at an interval. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15656) Support user-specified pod templates
Canbin Zheng created FLINK-15656: Summary: Support user-specified pod templates Key: FLINK-15656 URL: https://issues.apache.org/jira/browse/FLINK-15656 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng The current approach of introducing new configuration options for each aspect of pod specification a user might wish is becoming unwieldy, we have to maintain more and more Flink side Kubernetes configuration options and users have to learn the gap between the declarative model used by Kubernetes and the configuration model used by Flink. It's a great improvement to allow users to specify pod templates as central places for all customization needs for the jobmanager and taskmanager pods. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15654) Expose podIP for Containers by Environment Variables
Canbin Zheng created FLINK-15654: Summary: Expose podIP for Containers by Environment Variables Key: FLINK-15654 URL: https://issues.apache.org/jira/browse/FLINK-15654 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Environment: {code:java} //代码占 {code} Reporter: Canbin Zheng Expose IP information of a Pod for its containers to use. {code:java} new ContainerBuilder() .addNewEnv() .withName(ENV_JOBMANAGER_BIND_ADDRESS) .withValueFrom(new EnvVarSourceBuilder() .withNewFieldRef("v1", "status.podIP") .build()) .endEnv() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15653) Support to configure request or limit for MEM requirement
Canbin Zheng created FLINK-15653: Summary: Support to configure request or limit for MEM requirement Key: FLINK-15653 URL: https://issues.apache.org/jira/browse/FLINK-15653 Project: Flink Issue Type: New Feature Reporter: Canbin Zheng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15652) Support for imagePullSecrets k8s option
Canbin Zheng created FLINK-15652: Summary: Support for imagePullSecrets k8s option Key: FLINK-15652 URL: https://issues.apache.org/jira/browse/FLINK-15652 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng It's likely that one would pull images from the private image registry, credentials can be passed with the Pod specification through the `imagePullSecrets` parameter, which refers to the k8s secret by name. Implementation wise we expose a new configuration option to the users and then pass it along to the K8S. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15649) Support mounting volumes
Canbin Zheng created FLINK-15649: Summary: Support mounting volumes Key: FLINK-15649 URL: https://issues.apache.org/jira/browse/FLINK-15649 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15648) Support to configure request or limit for CPU requirement
Canbin Zheng created FLINK-15648: Summary: Support to configure request or limit for CPU requirement Key: FLINK-15648 URL: https://issues.apache.org/jira/browse/FLINK-15648 Project: Flink Issue Type: New Feature Reporter: Canbin Zheng The current branch use kubernetes.xx.cpu to configure request and limit resource requirement for a Container, it's an important improvement to separate these two configurations, we can use kubernetes.xx.request.cpu and kubernetes.xx.limit.cpu to specify request and limit resource requirements.{color:#6a8759} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15647) Support to set annotations for JM/TM Pods
Canbin Zheng created FLINK-15647: Summary: Support to set annotations for JM/TM Pods Key: FLINK-15647 URL: https://issues.apache.org/jira/browse/FLINK-15647 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng One can use either labels or annotations to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels. [Kubernetes Annotation|https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15646) Configurable K8s context support
Canbin Zheng created FLINK-15646: Summary: Configurable K8s context support Key: FLINK-15646 URL: https://issues.apache.org/jira/browse/FLINK-15646 Project: Flink Issue Type: New Feature Components: Deployment / Kubernetes Reporter: Canbin Zheng One would be forced to first {{kubectl config use-context }} to switch to the desired context if working with multiple K8S clusters or having multiple K8S "users" for interacting with the specified cluster, so it is an important improvement to add an option(kubernetes.context) for configuring arbitrary contexts when deploying a Flink cluster. If that option is not specified, then the current context in the config file would be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14432) Add 'SHOW SYSTEM FUNCTIONS' support in SQL CLI
Canbin Zheng created FLINK-14432: Summary: Add 'SHOW SYSTEM FUNCTIONS' support in SQL CLI Key: FLINK-14432 URL: https://issues.apache.org/jira/browse/FLINK-14432 Project: Flink Issue Type: Improvement Components: Table SQL / Client Reporter: Canbin Zheng Add 'SHOW SYSTEM FUNCTIONS' as a end-2-end functionality for SQL CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14179) Wrong description of SqlCommand.SHOW_FUNCTIONS
Canbin Zheng created FLINK-14179: Summary: Wrong description of SqlCommand.SHOW_FUNCTIONS Key: FLINK-14179 URL: https://issues.apache.org/jira/browse/FLINK-14179 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.9.0 Reporter: Canbin Zheng Fix For: 1.10.0 Attachments: image-2019-09-24-10-59-26-286.png Currently '*SHOW FUNCTIONS*' lists not only user-defined functions, but also system-defined ones, the description {color:#172b4d}*'Shows all registered user-defined functions.'* not correctly depicts this functionality. I think we can change the description to '*Shows all system-defined and user-defined functions.*'{color} {color:#172b4d}!image-2019-09-24-10-59-26-286.png!{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14171) Add 'SHOW USER FUNCTIONS' support in SQL CLI
Canbin Zheng created FLINK-14171: Summary: Add 'SHOW USER FUNCTIONS' support in SQL CLI Key: FLINK-14171 URL: https://issues.apache.org/jira/browse/FLINK-14171 Project: Flink Issue Type: New Feature Components: Table SQL / Client Affects Versions: 1.9.0 Reporter: Canbin Zheng Fix For: 1.10.0 Currently *listUserDefinedFunctions* has been supported in Executor, I think we can add 'SHOW USER FUNCTIONS' support to make it a end-2-end functionality.-- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14102) Introduce DB2Dialect
Canbin Zheng created FLINK-14102: Summary: Introduce DB2Dialect Key: FLINK-14102 URL: https://issues.apache.org/jira/browse/FLINK-14102 Project: Flink Issue Type: Sub-task Reporter: Canbin Zheng -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-14101) Introduce SqlServerDialect
Canbin Zheng created FLINK-14101: Summary: Introduce SqlServerDialect Key: FLINK-14101 URL: https://issues.apache.org/jira/browse/FLINK-14101 Project: Flink Issue Type: Sub-task Reporter: Canbin Zheng -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-14100) Introduce OracleDialect
Canbin Zheng created FLINK-14100: Summary: Introduce OracleDialect Key: FLINK-14100 URL: https://issues.apache.org/jira/browse/FLINK-14100 Project: Flink Issue Type: Sub-task Reporter: Canbin Zheng -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-14098) Support multiple sql statements splitting by semicolon for TableEnvironment
Canbin Zheng created FLINK-14098: Summary: Support multiple sql statements splitting by semicolon for TableEnvironment Key: FLINK-14098 URL: https://issues.apache.org/jira/browse/FLINK-14098 Project: Flink Issue Type: New Feature Reporter: Canbin Zheng Currently TableEnvironment.sqlUpdate supports single sql statement parsing by invoking SqlParser.parseStmt. Actually, after the work of [CALCITE-2453|https://issues.apache.org/jira/browse/CALCITE-2453], Calcite’s SqlParser is able to parse multiple sql statements split by semicolon, IMO, it’s useful to refactor TableEnvironment.sqlUpdate to support multiple sql statements too, by invoking SqlParser.parseStmtList instead. I am not sure whether this is a duplicated ticket, if it is, let me know, thanks. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-14078) Introduce more JDBCDialect implementations
Canbin Zheng created FLINK-14078: Summary: Introduce more JDBCDialect implementations Key: FLINK-14078 URL: https://issues.apache.org/jira/browse/FLINK-14078 Project: Flink Issue Type: New Feature Components: Connectors / JDBC Reporter: Canbin Zheng MySQL, Derby and Postgres JDBCDialect are available now, maybe we can introduce more JDBCDialect implementations, such as OracleDialect, SqlServerDialect, DB2Dialect, etc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-14075) Remove unnecessary resource close in JDBCOutputFormatTest.java
Canbin Zheng created FLINK-14075: Summary: Remove unnecessary resource close in JDBCOutputFormatTest.java Key: FLINK-14075 URL: https://issues.apache.org/jira/browse/FLINK-14075 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Reporter: Canbin Zheng Fix For: 1.10.0 {code:java} public void clearOutputTable() throws Exception { Class.forName(DRIVER_CLASS); try ( Connection conn = DriverManager.getConnection(DB_URL); Statement stat = conn.createStatement()) { stat.execute("DELETE FROM " + OUTPUT_TABLE); stat.close(); conn.close(); } }{code} Unnecessary close of stat and conn can be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-6269) var could be a val
CanBin Zheng created FLINK-6269: --- Summary: var could be a val Key: FLINK-6269 URL: https://issues.apache.org/jira/browse/FLINK-6269 Project: Flink Issue Type: Wish Components: JobManager Affects Versions: 1.2.0 Reporter: CanBin Zheng Priority: Trivial In JobManager.scala var userCodeLoader = libraryCacheManager.getClassLoader(jobGraph.getJobID) this var could be a val -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-6224) RemoteStreamEnvironment not resolve hostname of JobManager
CanBin Zheng created FLINK-6224: --- Summary: RemoteStreamEnvironment not resolve hostname of JobManager Key: FLINK-6224 URL: https://issues.apache.org/jira/browse/FLINK-6224 Project: Flink Issue Type: Bug Components: Client, DataStream API Affects Versions: 1.2.0 Reporter: CanBin Zheng Assignee: CanBin Zheng I run two examples in the same client. first one use ExecutionEnvironment.createRemoteEnvironment("10.75.203.170", 59551) second one use StreamExecutionEnvironment.createRemoteEnvironment("10.75.203.170", 59551) the first one runs successfully, but the second example fails(connect to JobManager timeout), for the second one, if I change host parameter from ip to hostname, it works. I checked the source code and found, ExecutionEnvironment.createRemoteEnvironment resolves the given hostname, this will lookup the hostname for the given ip. In contrast, the StreamExecutionEnvironment.createRemoteEnvironment won't do this. As Till Rohrmann mentioned, the problem is that with FLINK-2821 [1], we can no longer resolve the hostname on the JobManager, so, we'd better resolve hostname for given ip in RemoteStreamEnvironment too. -- This message was sent by Atlassian JIRA (v6.3.15#6346)