[jira] [Commented] (FLINK-33112) Support placement constraint
[ https://issues.apache.org/jira/browse/FLINK-33112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778513#comment-17778513 ] Junfan Zhang commented on FLINK-33112: -- [https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html] The detailed info is here. > Support placement constraint > > > Key: FLINK-33112 > URL: https://issues.apache.org/jira/browse/FLINK-33112 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for > specify affinity or anti-affinity or colocation with K8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33112) Support placement constraint
[ https://issues.apache.org/jira/browse/FLINK-33112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773083#comment-17773083 ] Junfan Zhang commented on FLINK-33112: -- PTAL [~guoyangze] > Support placement constraint > > > Key: FLINK-33112 > URL: https://issues.apache.org/jira/browse/FLINK-33112 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for > specify affinity or anti-affinity or colocation with K8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33112) Support placement constraint
Junfan Zhang created FLINK-33112: Summary: Support placement constraint Key: FLINK-33112 URL: https://issues.apache.org/jira/browse/FLINK-33112 Project: Flink Issue Type: New Feature Components: Deployment / YARN Reporter: Junfan Zhang Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for specify affinity or anti-affinity or colocation with K8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn
[ https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-27550: - Description: When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of {{QueueInfo.getQueuePath}} instead of {{getQueueName}}. Due to this, it will always print out the yarn all queues log, but it also can be submitted to Yarn successfully. The api of getQueuePath is introduced in the latest hadoop version(https://issues.apache.org/jira/browse/YARN-10658), so it's hard to solve this problem in the older hadoop cluster. According to the above description, the process of checking is unnecessary. was: When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of {{QueueInfo.getQueuePath}} instead of {{getQueueName}}. Due to this, it will always print out the yarn all queues log, but it also can be submitted to Yarn successfully. According to the above description, the process of checking is unnecessary. > Remove checking yarn queues before submitting job to Yarn > - > > Key: FLINK-27550 > URL: https://issues.apache.org/jira/browse/FLINK-27550 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, > it will check the specified yarnQueue whether exists in the queues gotten by > the YarnClient.QueueInfo. > However when using the capacity-scheduler, the yarn queues path should be > retrieved by the api of {{QueueInfo.getQueuePath}} > instead of {{getQueueName}}. > Due to this, it will always print out the yarn all queues log, but it also > can be submitted to Yarn successfully. > The api of getQueuePath is introduced in the latest hadoop > version(https://issues.apache.org/jira/browse/YARN-10658), so it's hard to > solve this problem in the older hadoop cluster. > According to the above description, the process of checking is unnecessary. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn
[ https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-27550: - Description: When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of {{QueueInfo.getQueuePath}} instead of {{getQueueName}}. Due to this, it will always print out the yarn all queues log, but it also can be submitted to Yarn successfully. According to the above description, the process of checking is unnecessary. was: When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of QueueInfo.getQueuePath instead of getQueueName. Due to this, it will always print out the yarn queues log, but it also can be submmitted to Yarn successfully. According to the above description, the process of checking is unnecessary. > Remove checking yarn queues before submitting job to Yarn > - > > Key: FLINK-27550 > URL: https://issues.apache.org/jira/browse/FLINK-27550 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, > it will check the specified yarnQueue whether exists in the queues gotten by > the YarnClient.QueueInfo. > However when using the capacity-scheduler, the yarn queues path should be > retrieved by the api of {{QueueInfo.getQueuePath}} > instead of {{getQueueName}}. > Due to this, it will always print out the yarn all queues log, but it also > can be submitted to Yarn successfully. > According to the above description, the process of checking is unnecessary. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn
[ https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-27550: - Description: When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of QueueInfo.getQueuePath instead of getQueueName. Due to this, it will always print out the yarn queues log, but it also can be submmitted to Yarn successfully. According to the above description, the process of checking is unnecessary. was: When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of QueueInfo.getQueuePath instead of getQueueName. Due to this, it will always print out the yarn queues log, but it also can be submmitted to Yarn successfully. According to the above description, the process of checking is unnecessary. > Remove checking yarn queues before submitting job to Yarn > - > > Key: FLINK-27550 > URL: https://issues.apache.org/jira/browse/FLINK-27550 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will > check the specified yarnQueue whether exists in the queues gotten by the > YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn > queues path should be retrieved by the api of QueueInfo.getQueuePath instead > of getQueueName. Due to this, it will always print out the yarn queues log, > but it also can be submmitted to Yarn successfully. > According to the above description, the process of checking is unnecessary. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn
Junfan Zhang created FLINK-27550: Summary: Remove checking yarn queues before submitting job to Yarn Key: FLINK-27550 URL: https://issues.apache.org/jira/browse/FLINK-27550 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Junfan Zhang When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will check the specified yarnQueue whether exists in the queues gotten by the YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn queues path should be retrieved by the api of QueueInfo.getQueuePath instead of getQueueName. Due to this, it will always print out the yarn queues log, but it also can be submmitted to Yarn successfully. According to the above description, the process of checking is unnecessary. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-26697) Detailed log with flink app name
Junfan Zhang created FLINK-26697: Summary: Detailed log with flink app name Key: FLINK-26697 URL: https://issues.apache.org/jira/browse/FLINK-26697 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26696) Reuse the method of judgment
Junfan Zhang created FLINK-26696: Summary: Reuse the method of judgment Key: FLINK-26696 URL: https://issues.apache.org/jira/browse/FLINK-26696 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26329) Adjust the order of var initialization in FlinkControllerConfig
Junfan Zhang created FLINK-26329: Summary: Adjust the order of var initialization in FlinkControllerConfig Key: FLINK-26329 URL: https://issues.apache.org/jira/browse/FLINK-26329 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26324) Remove duplicate condition judgment
Junfan Zhang created FLINK-26324: Summary: Remove duplicate condition judgment Key: FLINK-26324 URL: https://issues.apache.org/jira/browse/FLINK-26324 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491294#comment-17491294 ] Junfan Zhang commented on FLINK-25495: -- Thanks for your suggestions. I have drafted, if u have time, could you help review it? [~wangyang0918] > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488725#comment-17488725 ] Junfan Zhang commented on FLINK-25495: -- As discussed in mail list, the community will drop per-job mode. So i think this feature should be supported to keep consistent with the mode of per-job. Could u help assign to me? [~wangyang0918] [~knaufk] > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480118#comment-17480118 ] Junfan Zhang commented on FLINK-25749: -- All failure above looks lost connection with mini-yarn, maybe due to unstable network. Maybe we could set config of ipc client to solve, like as follows: {code:java} yarnClusterConf.setInt("ipc.client.connection.maxidletime", 1000); yarnClusterConf.setInt("ipc.client.connect.max.retries", 3); yarnClusterConf.setInt("ipc.client.connect.retry.interval", 10); yarnClusterConf.setInt("ipc.client.connect.timeout", 1000); yarnClusterConf.setInt("ipc.client.connect.max.retries.on.timeouts", 3); {code} [~trohrmann] Do you think so? Maybe I can take over this ticket to improve test stability. > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at >
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471736#comment-17471736 ] Junfan Zhang commented on FLINK-25495: -- [~wangyang0918] Thanks for your reply. {noformat} I am afraid that depending on the attach mode is not reliable since the Flink client may disconnect because of network issues or JobManager failover. {noformat} I'm doubt whether the above problems will also occur when in per-job attach mode. > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25445) TaskExecutor always creates local file for task even when local state store is not used
[ https://issues.apache.org/jira/browse/FLINK-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471347#comment-17471347 ] Junfan Zhang edited comment on FLINK-25445 at 1/10/22, 3:10 AM: Do you mind i take over this ticket? [~zjureel] I'm learning the code of task local-recovery and interested on it. was (Author: zuston): Do you mind i take over this ticket? [~zjureel] I'm learn the code of task local-recovery and interested on it. > TaskExecutor always creates local file for task even when local state store > is not used > --- > > Key: FLINK-25445 > URL: https://issues.apache.org/jira/browse/FLINK-25445 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.12.7, 1.13.5, 1.14.2 >Reporter: Shammon >Priority: Major > Labels: pull-request-available > > `TaskExecutor` will create file, check `localRecoveryEnabled` and load local > state store for each task submission in method `localStateStoreForSubtask`. > For batch jobs, the `localRecoveryEnabled` is always false, and there's no > need to create these local files for task in `TaskExecutor` -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25445) TaskExecutor always creates local file for task even when local state store is not used
[ https://issues.apache.org/jira/browse/FLINK-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471347#comment-17471347 ] Junfan Zhang commented on FLINK-25445: -- Do you mind i take over this ticket? [~zjureel] I'm learn the code of task local-recovery and interested on it. > TaskExecutor always creates local file for task even when local state store > is not used > --- > > Key: FLINK-25445 > URL: https://issues.apache.org/jira/browse/FLINK-25445 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.12.7, 1.13.5, 1.14.2 >Reporter: Shammon >Priority: Major > Labels: pull-request-available > > `TaskExecutor` will create file, check `localRecoveryEnabled` and load local > state store for each task submission in method `localStateStoreForSubtask`. > For batch jobs, the `localRecoveryEnabled` is always false, and there's no > need to create these local files for task in `TaskExecutor` -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25564) TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470741#comment-17470741 ] Junfan Zhang commented on FLINK-25564: -- I can't reproduce this in my local machine, and you? [~trohrmann] > TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure > fails on AZP > - > > Key: FLINK-25564 > URL: https://issues.apache.org/jira/browse/FLINK-25564 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > The test > {{TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure}} > fails on AZP with > {code} > Jan 07 05:07:22 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 31.057 s <<< FAILURE! - in > org.apache.flink.test.recovery.TaskManagerProcessFailureStreamingRecoveryITCase > Jan 07 05:07:22 [ERROR] > org.apache.flink.test.recovery.TaskManagerProcessFailureStreamingRecoveryITCase.testTaskManagerProcessFailure > Time elapsed: 31.012 s <<< FAILURE! > Jan 07 05:07:22 java.lang.AssertionError: The program encountered a > IOExceptionList : /tmp/junit2133275241637829858/junit7793757951823298127 > Jan 07 05:07:22 at org.junit.Assert.fail(Assert.java:89) > Jan 07 05:07:22 at > org.apache.flink.test.recovery.AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure(AbstractTaskManagerProcessFailureRecoveryTest.java:205) > Jan 07 05:07:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Jan 07 05:07:22 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jan 07 05:07:22 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jan 07 05:07:22 at java.lang.reflect.Method.invoke(Method.java:498) > Jan 07 05:07:22 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jan 07 05:07:22 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jan 07 05:07:22 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jan 07 05:07:22 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jan 07 05:07:22 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jan 07 05:07:22 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jan 07 05:07:22 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jan 07 05:07:22 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jan 07 05:07:22 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jan 07 05:07:22 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jan 07 05:07:22 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jan 07 05:07:22 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jan 07 05:07:22 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jan 07 05:07:22 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jan 07 05:07:22 at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > Jan 07 05:07:22 at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > Jan 07 05:07:22 at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > Jan 07 05:07:22 at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > Jan 07 05:07:22 at >
[jira] [Created] (FLINK-25536) Minor Fix: Adjust the order of variable declaration and comment in StateAssignmentOperation
Junfan Zhang created FLINK-25536: Summary: Minor Fix: Adjust the order of variable declaration and comment in StateAssignmentOperation Key: FLINK-25536 URL: https://issues.apache.org/jira/browse/FLINK-25536 Project: Flink Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25534) execute pre-job throws org.apache.flink.table.api.TableException: Failed to execute sql
[ https://issues.apache.org/jira/browse/FLINK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469627#comment-17469627 ] Junfan Zhang commented on FLINK-25534: -- I think you could login to the appmaster container on yarn nodemanager to check the detailed logs. > execute pre-job throws org.apache.flink.table.api.TableException: Failed to > execute sql > --- > > Key: FLINK-25534 > URL: https://issues.apache.org/jira/browse/FLINK-25534 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Reporter: jychen >Priority: Blocker > > org.apache.flink.table.api.TableException: Failed to execute sql > at > org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:777) > ~[flink-table-blink_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:742) > ~[flink-table-blink_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.table.api.internal.StatementSetImpl.execute(StatementSetImpl.java:99) > ~[flink-table-blink_2.12-1.13.3.jar:1.13.3] > at com.cig.cdp.flink.JobApplication.main(JobApplication.java:91) > ~[flink-streaming-core.jar:1.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_312] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_312] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_312] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_312] > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > ...skipping 1 line > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_312] > at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_312] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > [hadoop-common-3.0.0.jar:?] > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > [flink-dist_2.12-1.13.3.jar:1.13.3] > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) > [flink-dist_2.12-1.13.3.jar:1.13.3] > Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: > Could not deploy Yarn job cluster. > at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1957) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:137) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55) > ~[flink-table-blink_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:759) > ~[flink-table-blink_2.12-1.13.3.jar:1.13.3] > ... 19 more > Caused by: > org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN > application unexpectedly switched to state FAILED during deployment. > Diagnostics from YARN: Application application_1641278381631_0020 failed 2 > times in previous 1 milliseconds due to AM Container for > appattempt_1641278381631_0020_02 exited with exi > tCode: 1 > Failing this attempt.Diagnostics: [2022-01-06 09:15:11.758]Exception from > container-launch. > Container id: container_1641278381631_0020_02_01 > Exit code: 1 > [2022-01-06 09:15:11.759]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : >
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468583#comment-17468583 ] Junfan Zhang commented on FLINK-25495: -- [~trohrmann] Polling is always OK. However, when in our internal platform, we integrate Flink batch into apache/Oozie. If having this feature, we don't need to poll. And it will make easy to integrate with other projects. I think this is a usability improvement. By the way, perjob mode also support attach. Of source, if no this feature, other solutions also could solve it. > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468330#comment-17468330 ] Junfan Zhang commented on FLINK-25495: -- Thanks for your reply [~trohrmann] As I know, flink cli don't support attach mode when using the Application mode, refer to https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-clients/src/main/java/org/apache/flink/client/deployment/application/cli/ApplicationClusterDeployer.java#L67. But I hope that the flink client couldn't exit until the application finished, especially in batch. > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25496) ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation failed on azure
[ https://issues.apache.org/jira/browse/FLINK-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467128#comment-17467128 ] Junfan Zhang commented on FLINK-25496: -- Sorry for that. I'll correct it asap. > ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation failed on azure > --- > > Key: FLINK-25496 > URL: https://issues.apache.org/jira/browse/FLINK-25496 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Reporter: Yun Gao >Priority: Major > Labels: test-stability > Fix For: 1.15.0 > > > > {code:java} > Dec 31 02:53:26 [ERROR] Failures: > Dec 31 02:53:26 [ERROR] > ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation:66 > expected:<"main" [prio=5 ]Id=1 RUNNABLE > Dec 31 02:53:26 at ja...> but was:<"main" []Id=1 RUNNABLE > Dec 31 02:53:26 at ja...> > Dec 31 02:53:26 [INFO] > Dec 31 02:53:26 [ERROR] Tests run: 5958, Failures: 1, Errors: 0, Skipped: 26 > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28779=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=13859 > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467125#comment-17467125 ] Junfan Zhang commented on FLINK-25495: -- Could you help check this feature [~trohrmann]? If required, i could take over. Thanks > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25495) Client support attach mode when using the deployment of application mode
[ https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25495: - Description: h2. Why In our internal data platform, we support flink batch and streaming job submission. To reduce the submission worker overload, we use the Flink application mode to submit flink job. It's a nice feature! However, on batch mode, we hope flink client couldn't exit until the batch application finished (No need to get job result, just wait). Now the flink lack this feature, and it is not stated in the document that Application Mode does not support attach. > Client support attach mode when using the deployment of application mode > > > Key: FLINK-25495 > URL: https://issues.apache.org/jira/browse/FLINK-25495 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal data platform, we support flink batch and streaming job > submission. To reduce the submission worker overload, we use the Flink > application mode to submit flink job. It's a nice feature! > However, on batch mode, we hope flink client couldn't exit until the batch > application finished (No need to get job result, just wait). Now the flink > lack this feature, and it is not stated in the document that Application Mode > does not support attach. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25495) Client support attach mode when using the deployment of application mode
Junfan Zhang created FLINK-25495: Summary: Client support attach mode when using the deployment of application mode Key: FLINK-25495 URL: https://issues.apache.org/jira/browse/FLINK-25495 Project: Flink Issue Type: Improvement Components: Client / Job Submission Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464468#comment-17464468 ] Junfan Zhang commented on FLINK-25410: -- Sounds great! [~trohrmann]. It's elegant and general. I think i can take over this ticket to solve using your approach > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang at > this time. > Because the Yarn cluster resources are tight and the scheduling efficiency > becomes low when lots of apps needs to be started. > And flink-cli will not exit until the app status changes to running. > In addition, I also think there is no need to wait when app status is > accepted with detach mode on Yarn. > h2. How > When app in accpeted status, flink-cli should exit directly to release > submitter worker process resource. The PR could refer to : > https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464399#comment-17464399 ] Junfan Zhang commented on FLINK-17808: -- Gentle ping [~yunta] . Do you have any ideas? > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463624#comment-17463624 ] Junfan Zhang commented on FLINK-17808: -- I overlooked this, and thanks [~yunta] for pointing it. I think If filesystem dont support, we could fall back to original implementation in {{{}FsCheckpointMetadataOutputStream{}}}. Maybe we should underline this point in doc. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588 ] Junfan Zhang edited comment on FLINK-25410 at 12/22/21, 5:47 AM: - Could you help check this feature? [~guoyangze] [~trohrmann] [~xtsong] [~yunta] [~wangyang0918] If OK, please assign to me. PR will be attached sooner Thanks ~ was (Author: zuston): Could you help check this feature? [~guoyangze] [~trohrmann] [~xtsong] [~yunta] If OK, please assign to me. PR will be attached sooner Thanks ~ > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang at > this time. > Because the Yarn cluster resources are tight and the scheduling efficiency > becomes low when lots of apps needs to be started. > And flink-cli will not exit until the app status changes to running. > In addition, I also think there is no need to wait when app status is > accepted with detach mode on Yarn. > h2. How > When app in accpeted status, flink-cli should exit directly to release > submitter worker process resource. The PR could refer to : > https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588 ] Junfan Zhang edited comment on FLINK-25410 at 12/22/21, 5:43 AM: - Could you help check this feature? [~guoyangze] [~trohrmann] [~xtsong] [~yunta] If OK, please assign to me. PR will be attached sooner Thanks ~ was (Author: zuston): Could you help check this feature? [~guoyangze] [~trohrmann] [~xtsong] [~yunta] Thanks ~ > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang at > this time. > Because the Yarn cluster resources are tight and the scheduling efficiency > becomes low when lots of apps needs to be started. > And flink-cli will not exit until the app status changes to running. > In addition, I also think there is no need to wait when app status is > accepted with detach mode on Yarn. > h2. How > When app in accpeted status, flink-cli should exit directly to release > submitter worker process resource. The PR could refer to : > https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588 ] Junfan Zhang commented on FLINK-25410: -- Could you help check this feature? [~guoyangze] [~trohrmann] [~xtsong] [~yunta] Thanks ~ > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang at > this time. > Because the Yarn cluster resources are tight and the scheduling efficiency > becomes low when lots of apps needs to be started. > And flink-cli will not exit until the app status changes to running. > In addition, I also think there is no need to wait when app status is > accepted with detach mode on Yarn. > h2. How > When app in accpeted status, flink-cli should exit directly to release > submitter worker process resource. The PR could refer to : > https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25410: - Description: h2. Why In our internal streaming platform, we will use flink-cli tool to submit Flink streaming application on Yarn. However when encountering Hadoop cluster down and then lots of flink apps need to be resubmitted, the submitter of worker in our platform will hang at this time. Because the Yarn cluster resources are tight and the scheduling efficiency becomes low when lots of apps needs to be started. And flink-cli will not exit until the app status changes to running. In addition, I also think there is no need to wait when app status is accepted with detach mode on Yarn. h2. How When app in accpeted status, flink-cli should exit directly to release submitter worker process resource. The PR could refer to : https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 was: h2. Why In our internal streaming platform, we will use flink-cli tool to submit Flink streaming application on Yarn. However when encountering Hadoop cluster down and then lots of flink apps need to be resubmitted, the submitter of worker in our platform will hang. Because the Yarn cluster resources are tight, flink-cli will exit until the app's status change to running > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang at > this time. > Because the Yarn cluster resources are tight and the scheduling efficiency > becomes low when lots of apps needs to be started. > And flink-cli will not exit until the app status changes to running. > In addition, I also think there is no need to wait when app status is > accepted with detach mode on Yarn. > h2. How > When app in accpeted status, flink-cli should exit directly to release > submitter worker process resource. The PR could refer to : > https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25410: - Description: h2. Why In our internal streaming platform, we will use flink-cli tool to submit Flink streaming application on Yarn. However when encountering Hadoop cluster down and then lots of flink apps need to be resubmitted, the submitter of worker in our platform will hang. Because the Yarn cluster resources are tight, flink-cli will exit until the app's status change to running > Flink CLI should exit when app is accepted with detach mode on Yarn > --- > > Key: FLINK-25410 > URL: https://issues.apache.org/jira/browse/FLINK-25410 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > h2. Why > In our internal streaming platform, we will use flink-cli tool to submit > Flink streaming application on Yarn. > However when encountering Hadoop cluster down and then lots of flink apps > need to be resubmitted, the submitter of worker in our platform will hang. > Because the Yarn cluster resources are tight, flink-cli will exit until the > app's status change to running -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn
Junfan Zhang created FLINK-25410: Summary: Flink CLI should exit when app is accepted with detach mode on Yarn Key: FLINK-25410 URL: https://issues.apache.org/jira/browse/FLINK-25410 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463584#comment-17463584 ] Junfan Zhang commented on FLINK-17808: -- [~yunta] As Stephan mentioned above, i also think the first one is the better option. So we could use the {{RecoverableFsDataOutputStream}} close and commit to ensure the writing file atomicity instead of using the {{FSDataOutputStream}} in {{FsCheckpointMetadataOutputStream}}. Right? Please let me know what u think. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463582#comment-17463582 ] Junfan Zhang commented on FLINK-25398: -- [~Thesharing] Thanks for your advice. Agree with u, i will introduce the extra config to control the max depth. Could you help tell me how to introduce new unit test to cover it? I have no ideas on it. :D. If just testing the rpc response and request, i think the previous UT is enough. > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463068#comment-17463068 ] Junfan Zhang commented on FLINK-25398: -- [~xtsong] Besides, Spark print the complete stacktrace in executor's runtime webui, it works well. > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463067#comment-17463067 ] Junfan Zhang commented on FLINK-25398: -- [~xtsong] Thanks for your quick reply >From my perspective, this feature makes sense. Usually we use thread dump in >runtime ui to debug the Flink job and analyze which calling the job stucked. >However due to stacktrace frame depth limitation, I have to login to >nodemanager to check this flink thread info. It's a terriable experience. Hence when debugging the problems, i think the cost of dumping all complete stack is deserved. Do we need to introduce extra user config to enable stack depth limit? Maybe it should be determinzed by user? > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463067#comment-17463067 ] Junfan Zhang edited comment on FLINK-25398 at 12/21/21, 8:26 AM: - [~xtsong] Thanks for your quick reply >From my perspective, this feature makes sense. Usually we use thread dump in >runtime ui to debug the Flink job and analyze which calling the job stucked. >However due to stacktrace frame depth limitation, I have to login to >nodemanager to check the complete thread info. It's a terriable experience. Hence when debugging the problems, i think the cost of dumping all complete stack is deserved. Do we need to introduce extra user config to enable stack depth limit? Maybe it should be determinzed by user? was (Author: zuston): [~xtsong] Thanks for your quick reply >From my perspective, this feature makes sense. Usually we use thread dump in >runtime ui to debug the Flink job and analyze which calling the job stucked. >However due to stacktrace frame depth limitation, I have to login to >nodemanager to check this flink thread info. It's a terriable experience. Hence when debugging the problems, i think the cost of dumping all complete stack is deserved. Do we need to introduce extra user config to enable stack depth limit? Maybe it should be determinzed by user? > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463009#comment-17463009 ] Junfan Zhang commented on FLINK-25398: -- Could you help check this ticket? [~guoyangze] [~yunta] [~xtsong] Besides optimized stacktrace has been attached in above attachment. > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25398: - Attachment: stacktrace.png > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: stacktrace.png > > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25398) Show complete stacktrace when requesting thread dump
[ https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25398: - Description: h2. Why Now the stacktrace is not complete when clicking the task executor's threaddump in runtime webui. Hence it's hard to the initial calling according to the stacktrace. Now the thread stacktrace is limited to 8, refer to openjdk: [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] h2. Solution Using the custom {{stringify}} method to return stacktrace instead of using {{ThreadInfo.toString}} directly > Show complete stacktrace when requesting thread dump > > > Key: FLINK-25398 > URL: https://issues.apache.org/jira/browse/FLINK-25398 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > > h2. Why > Now the stacktrace is not complete when clicking the task executor's > threaddump in runtime webui. Hence it's hard to the initial calling > according to the stacktrace. > Now the thread stacktrace is limited to 8, refer to openjdk: > [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597] > > h2. Solution > Using the custom {{stringify}} method to return stacktrace instead of using > {{ThreadInfo.toString}} directly > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25398) Show complete stacktrace when requesting thread dump
Junfan Zhang created FLINK-25398: Summary: Show complete stacktrace when requesting thread dump Key: FLINK-25398 URL: https://issues.apache.org/jira/browse/FLINK-25398 Project: Flink Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462957#comment-17462957 ] Junfan Zhang commented on FLINK-17808: -- Could i take over this ticket? [~yunta] [~guoyangze] [~zhoujira86] Draft PR link: https://github.com/apache/flink/pull/18157 If OK, i will optimize and add more tests on it. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-22485) Support client attach on application mode
[ https://issues.apache.org/jira/browse/FLINK-22485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460386#comment-17460386 ] Junfan Zhang commented on FLINK-22485: -- Rediscuss this ticket. After reading the code about the application mode implementation, i found that it dont support client attach. Do we need the policy of client attach with Application mode ? I think Yes! In batch mode, we hope that client will exit until application end when using the application mode. [~rmetzger] [~guoyangze] [~chesnay] Could you help check this feature? If OK, i could contribute. Thanks > Support client attach on application mode > - > > Key: FLINK-22485 > URL: https://issues.apache.org/jira/browse/FLINK-22485 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Reporter: Junfan Zhang >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor > > Now, client will not wait until job finish when using application mode. > Can we support client attach on application mode? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-22485) Support client attach on application mode
[ https://issues.apache.org/jira/browse/FLINK-22485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22485: - Priority: Major (was: Not a Priority) > Support client attach on application mode > - > > Key: FLINK-22485 > URL: https://issues.apache.org/jira/browse/FLINK-22485 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Reporter: Junfan Zhang >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor > > Now, client will not wait until job finish when using application mode. > Can we support client attach on application mode? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25313) Enable flink runtime web
[ https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25313: - Summary: Enable flink runtime web (was: Enable flink-web-ui) > Enable flink runtime web > > > Key: FLINK-25313 > URL: https://issues.apache.org/jira/browse/FLINK-25313 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: Junfan Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25313) Enable flink-web-ui
[ https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459847#comment-17459847 ] Junfan Zhang commented on FLINK-25313: -- Sorry for lacking the description. When i run flink application with local mode from flink-training project , I want to see some information in flink runtime web. So adding this flink-runtime-web dependency. Now rename the PR titile to {{Enable flink runtime web}} [~danderson] > Enable flink-web-ui > --- > > Key: FLINK-25313 > URL: https://issues.apache.org/jira/browse/FLINK-25313 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: Junfan Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25313) Enable flink-web-ui
[ https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459630#comment-17459630 ] Junfan Zhang commented on FLINK-25313: -- Linked PR: https://github.com/apache/flink-training/pull/45 > Enable flink-web-ui > --- > > Key: FLINK-25313 > URL: https://issues.apache.org/jira/browse/FLINK-25313 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: Junfan Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25313) Enable flink-web-ui
Junfan Zhang created FLINK-25313: Summary: Enable flink-web-ui Key: FLINK-25313 URL: https://issues.apache.org/jira/browse/FLINK-25313 Project: Flink Issue Type: Improvement Components: Documentation / Training / Exercises Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25268) Support task manager node-label in Yarn deployment
[ https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459604#comment-17459604 ] Junfan Zhang commented on FLINK-25268: -- Thanks [~guoyangze] > Support task manager node-label in Yarn deployment > -- > > Key: FLINK-25268 > URL: https://issues.apache.org/jira/browse/FLINK-25268 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > > Now Flink only support application level node label, it's necessary to > introduce task manager level node-label on Yarn deployment. > h2. Why we need it? > Sometimes we will implement Flink to support deep learning payload using GPU, > so if having this feature, job manager and task managers could use different > nodes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25268) Support task manager node-label in Yarn deployment
[ https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458159#comment-17458159 ] Junfan Zhang edited comment on FLINK-25268 at 12/14/21, 11:28 AM: -- Could you help check this features? [~wangyang0918] [~guoyangze] [~chesnay] Now this PR is just draft, if this feature could be invovled in Flink, i will optimize. And maybe jobmanager(Yarn application) node label should also be supported. was (Author: zuston): Could you help check this features? [~wangyang0918] [~guoyangze] Now this PR is just draft, if this feature could be invovled in Flink, i will optimize. And maybe jobmanager(Yarn application) node label should also be supported. > Support task manager node-label in Yarn deployment > -- > > Key: FLINK-25268 > URL: https://issues.apache.org/jira/browse/FLINK-25268 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > Now Flink only support application level node label, it's necessary to > introduce task manager level node-label on Yarn deployment. > h2. Why we need it? > Sometimes we will implement Flink to support deep learning payload using GPU, > so if having this feature, job manager and task managers could use different > nodes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25268) Support task manager node-label in Yarn deployment
[ https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458159#comment-17458159 ] Junfan Zhang commented on FLINK-25268: -- Could you help check this features? [~wangyang0918] [~guoyangze] Now this PR is just draft, if this feature could be invovled in Flink, i will optimize. And maybe jobmanager(Yarn application) node label should also be supported. > Support task manager node-label in Yarn deployment > -- > > Key: FLINK-25268 > URL: https://issues.apache.org/jira/browse/FLINK-25268 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > Now Flink only support application level node label, it's necessary to > introduce task manager level node-label on Yarn deployment. > h2. Why we need it? > Sometimes we will implement Flink to support deep learning payload using GPU, > so if having this feature, job manager and task managers could use different > nodes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25268) Support task manager node-label in Yarn deployment
[ https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-25268: - Description: Now Flink only support application level node label, it's necessary to introduce task manager level node-label on Yarn deployment. h2. Why we need it? Sometimes we will implement Flink to support deep learning payload using GPU, so if having this feature, job manager and task managers could use different nodes. was:Now Flink only support application level node label, it's necessary to introduce task manager level node-label on Yarn deployment. > Support task manager node-label in Yarn deployment > -- > > Key: FLINK-25268 > URL: https://issues.apache.org/jira/browse/FLINK-25268 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Junfan Zhang >Priority: Major > > Now Flink only support application level node label, it's necessary to > introduce task manager level node-label on Yarn deployment. > h2. Why we need it? > Sometimes we will implement Flink to support deep learning payload using GPU, > so if having this feature, job manager and task managers could use different > nodes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25268) Support task manager node-label in Yarn deployment
Junfan Zhang created FLINK-25268: Summary: Support task manager node-label in Yarn deployment Key: FLINK-25268 URL: https://issues.apache.org/jira/browse/FLINK-25268 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Junfan Zhang Now Flink only support application level node label, it's necessary to introduce task manager level node-label on Yarn deployment. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25153) Inappropriate variable naming
Junfan Zhang created FLINK-25153: Summary: Inappropriate variable naming Key: FLINK-25153 URL: https://issues.apache.org/jira/browse/FLINK-25153 Project: Flink Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451514#comment-17451514 ] Junfan Zhang commented on FLINK-25099: -- If you could provide more detailed info like submmision cli example, i maybe reproduce it. Just guess that nodemanager dont have the flinkcluster nameservice due to using the default yarn cluster config instead of the configured flink.hadoop. [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451479#comment-17451479 ] Junfan Zhang commented on FLINK-25099: -- Good news [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){noformat} > Is there a solution to the above
[jira] [Comment Edited] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451479#comment-17451479 ] Junfan Zhang edited comment on FLINK-25099 at 12/1/21, 2:54 AM: Glad to help u [~libra_816] was (Author: zuston): Good news [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Comment Edited] (FLINK-16654) Implement Application Mode according to FLIP-85
[ https://issues.apache.org/jira/browse/FLINK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451211#comment-17451211 ] Junfan Zhang edited comment on FLINK-16654 at 11/30/21, 4:03 PM: - Now when using the ApplicationMode, the flink cli will be executed in default detach mode. Why not support attach mode that client could wait until job finished? [~wangyang0918] [~kkl0u] [~aljoscha] was (Author: zuston): Now when using the ApplicationMode, the flink cli will be executed in default detach mode. Why not support attach mode that client could wait until job finished? [~wangyang0918] [~kkl0u] [~aljoscha] > Implement Application Mode according to FLIP-85 > --- > > Key: FLINK-16654 > URL: https://issues.apache.org/jira/browse/FLINK-16654 > Project: Flink > Issue Type: New Feature > Components: Client / Job Submission >Affects Versions: 1.10.0 >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas >Priority: Major > Fix For: 1.11.0 > > > This is an umbrella issue that will help us track the progress of > [FLIP-85|https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode]. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-16654) Implement Application Mode according to FLIP-85
[ https://issues.apache.org/jira/browse/FLINK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451211#comment-17451211 ] Junfan Zhang commented on FLINK-16654: -- Now when using the ApplicationMode, the flink cli will be executed in default detach mode. Why not support attach mode that client could wait until job finished? [~wangyang0918] [~kkl0u] [~aljoscha] > Implement Application Mode according to FLIP-85 > --- > > Key: FLINK-16654 > URL: https://issues.apache.org/jira/browse/FLINK-16654 > Project: Flink > Issue Type: New Feature > Components: Client / Job Submission >Affects Versions: 1.10.0 >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas >Priority: Major > Fix For: 1.11.0 > > > This is an umbrella issue that will help us track the progress of > [FLIP-85|https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode]. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451160#comment-17451160 ] Junfan Zhang edited comment on FLINK-25099 at 11/30/21, 2:28 PM: - Just ignore setting the default.fs=flinkcluster. but preserve the others above config. And why not directly specifying the checkpoint path to hdfs://flinkcluster/x. As i know that the conf of default.fs is just for appending the path's scheme and authority, which not specified the whole path like /tmp/. [~libra_816] was (Author: zuston): Just ignore setting the default.fs=flinkcluster. but preserve the others above config. And why not directly specifying the checkpoint path to hdfs://flinkcluster/x. As i know that the conf of default.fs is just for when not specified the whole path, like /tmp/. [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at >
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451160#comment-17451160 ] Junfan Zhang commented on FLINK-25099: -- Just ignore setting the default.fs=flinkcluster. but preserve the others above config. And why not directly specifying the checkpoint path to hdfs://flinkcluster/x. As i know that the conf of default.fs is just for when not specified the whole path, like /tmp/. [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451123#comment-17451123 ] Junfan Zhang commented on FLINK-25099: -- I think in yours architecture, you should not reset the default.fs config, just use the yarn cluster's default.fs [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451071#comment-17451071 ] Junfan Zhang commented on FLINK-25099: -- The submission cluster is {{hdfsn21n159}}. but you set the default.fs=flinkcluster. When ContainerLocalizer, it will using the {{flinkcluster}} namespace in hdfsn21n159, but hdfsn21n159 have no {{flinkcluster}} ns config. [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > Attachments: flink-chenqizhu-client-hdfsn21n163.log > > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >
[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters
[ https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450853#comment-17450853 ] Junfan Zhang commented on FLINK-25099: -- Could you make the log level debug and attach more logs ? [~libra_816] > flink on yarn Accessing two HDFS Clusters > - > > Key: FLINK-25099 > URL: https://issues.apache.org/jira/browse/FLINK-25099 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / State Backends >Affects Versions: 1.13.3 > Environment: flink : 1.13.3 > hadoop : 3.3.0 >Reporter: chenqizhu >Priority: Major > > Flink version 1.13 supports configuration of Hadoop properties in > flink-conf.yaml via flink.hadoop.*. There is A requirement to write > checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, > but this HDFS cluster is not the default HDFS in the flink client (called > cluster A by default). Yaml is configured with nameservices for cluster A and > cluster B, which is similar to HDFS federated mode. > The configuration is as follows: > > {code:java} > flink.hadoop.dfs.nameservices: ACluster,BCluster > flink.hadoop.fs.defaultFS: hdfs://BCluster > flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070 > flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070 > flink.hadoop.dfs.client.failover.proxy.provider.ACluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070 > flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000 > flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070 > flink.hadoop.dfs.client.failover.proxy.provider.BCluster: > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > {code} > > However, an error occurred during the startup of the job, which is reported > as follows: > (change configuration items to A flink local client default HDFS cluster, the > operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) > {noformat} > Caused by: BCluster > java.net.UnknownHostException: BCluster > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){noformat} > Is there a solution to the above problems? The pain
[jira] [Created] (FLINK-24963) Remove the tail separator when outputting yarn queue names
Junfan Zhang created FLINK-24963: Summary: Remove the tail separator when outputting yarn queue names Key: FLINK-24963 URL: https://issues.apache.org/jira/browse/FLINK-24963 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end). Besides we dont set the {{pipeline.classpaths}} param conf. {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. Because these configuration will be written to the flink-conf.yaml and Flink jobmanager will read but fail to read the corresponding key's value as the warn log described. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} was: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. Because these configuration will be written to the flink-conf.yaml and Flink jobmanager will read but fail to read the corresponding key's value as the warn log described. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO
[jira] [Commented] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429095#comment-17429095 ] Junfan Zhang commented on FLINK-24555: -- [~jark] Could you help review it? Thanks > Incorrectly put empty list to flink configuration > - > > Key: FLINK-24555 > URL: https://issues.apache.org/jira/browse/FLINK-24555 > Project: Flink > Issue Type: Bug >Reporter: Junfan Zhang >Priority: Major > > h2. Why > As I found some warn logs in our production flink jobs, like as follows(the > detail logs attached at the end) > {code:java} > 2021-10-14 17:46:41 WARN - [main] - > org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying > to split key and value in configuration file > /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: > "pipeline.classpaths: " > {code} > I dig the flink code and found it put the empty list into configuration by > {{ConfigUtils.encodeCollectionToConfig}}. > h2. How > So it's better to ignore the empty collection to put in the configuration. > Because these configuration will be written to the flink-conf.yaml and Flink > jobmanager will read but fail to read the corresponding key's value as the > warn log described. > h2. Appendix > {code:java} > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: internal.jobgraph-path, job.graph > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: > restart-strategy.failure-rate.max-failures-per-interval, 3 > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: jobmanager.execution.failover-strategy, region > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: high-availability.cluster-id, > application_1633896099002_1043646 > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: jobmanager.rpc.address, localhost > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: execution.runtime-mode, AUTOMATIC > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: security.kerberos.fetch.delegation-token, false > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: execution.savepoint.ignore-unclaimed-state, false > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: parallelism.default, 2 > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: taskmanager.numberOfTaskSlots, 1 > 2021-10-14 17:46:41 WARN - [main] - > org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying > to split key and value in configuration file > /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: > "pipeline.classpaths: " > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: restart-strategy.failure-rate.failure-rate-interval, > 1 d > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: yarn.application.name, > app_StreamEngine_Prod_jiandan_beat_rec_test_all > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: yarn.application.queue, talos.job_streaming > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: taskmanager.memory.process.size, 1728m > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. Because these configuration will be written to the flink-conf.yaml and Flink jobmanager will read but fail to read the corresponding key's value as the warn log described. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} was: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. Because these configuration will be written to the flink-conf.yaml and Flink jobmanager will read but fail to read the corresponding key's value as the warn log described. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] -
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. Because these configuration will be written to the flink-conf.yaml and Flink jobmanager will read but fail to read the corresponding key's value as the warn log described. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} was: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] -
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. h2. Appendix {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} was: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] -
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows(the detail logs attached at the end) {code:java} 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file {code} I dig the flink code and found it put the empty list into configuration by {{ConfigUtils.encodeCollectionToConfig}}. h2. How So it's better to ignore the empty collection to put in the configuration. {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} was: h2. Why As I found some warn logs in our production flink jobs, like as follows {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property:
[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration
[ https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-24555: - Description: h2. Why As I found some warn logs in our production flink jobs, like as follows {code:java} 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: internal.jobgraph-path, job.graph 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.max-failures-per-interval, 3 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: high-availability.cluster-id, application_1633896099002_1043646 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: jobmanager.rpc.address, localhost 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.runtime-mode, AUTOMATIC 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: security.kerberos.fetch.delegation-token, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: execution.savepoint.ignore-unclaimed-state, false 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: parallelism.default, 2 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-10-14 17:46:41 WARN - [main] - org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to split key and value in configuration file /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11: "pipeline.classpaths: " 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: restart-strategy.failure-rate.failure-rate-interval, 1 d 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: yarn.application.queue, talos.job_streaming 2021-10-14 17:46:41 INFO - [main] - org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration property: taskmanager.memory.process.size, 1728m {code} > Incorrectly put empty list to flink configuration > - > > Key: FLINK-24555 > URL: https://issues.apache.org/jira/browse/FLINK-24555 > Project: Flink > Issue Type: Bug >Reporter: Junfan Zhang >Priority: Major > > h2. Why > As I found some warn logs in our production flink jobs, like as follows > {code:java} > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: internal.jobgraph-path, job.graph > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: > restart-strategy.failure-rate.max-failures-per-interval, 3 > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: jobmanager.execution.failover-strategy, region > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: high-availability.cluster-id, > application_1633896099002_1043646 > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: jobmanager.rpc.address, localhost > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: execution.runtime-mode, AUTOMATIC > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: security.kerberos.fetch.delegation-token, false > 2021-10-14 17:46:41 INFO - [main] - > org.apache.flink.configuration.GlobalConfiguration(213) - Loading > configuration property: execution.savepoint.ignore-unclaimed-state, false > 2021-10-14
[jira] [Created] (FLINK-24555) Incorrectly put empty list to flink configuration
Junfan Zhang created FLINK-24555: Summary: Incorrectly put empty list to flink configuration Key: FLINK-24555 URL: https://issues.apache.org/jira/browse/FLINK-24555 Project: Flink Issue Type: Bug Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme
[ https://issues.apache.org/jira/browse/FLINK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407934#comment-17407934 ] Junfan Zhang commented on FLINK-23991: -- Could you help assign ticket to me? [~lirui] [~jark] > Specifying yarn.staging-dir fail when staging scheme is different from > default fs scheme > > > Key: FLINK-23991 > URL: https://issues.apache.org/jira/browse/FLINK-23991 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.13.2 >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > When the yarn.staging-dir path scheme is different from the default fs > scheme, the client will fail fast. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme
[ https://issues.apache.org/jira/browse/FLINK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-23991: - Description: When the yarn.staging-dir path scheme is different from the default fs scheme, the client will fail fast. > Specifying yarn.staging-dir fail when staging scheme is different from > default fs scheme > > > Key: FLINK-23991 > URL: https://issues.apache.org/jira/browse/FLINK-23991 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.13.2 >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > When the yarn.staging-dir path scheme is different from the default fs > scheme, the client will fail fast. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme
Junfan Zhang created FLINK-23991: Summary: Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme Key: FLINK-23991 URL: https://issues.apache.org/jira/browse/FLINK-23991 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.13.2 Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23004) Fix misleading log
Junfan Zhang created FLINK-23004: Summary: Fix misleading log Key: FLINK-23004 URL: https://issues.apache.org/jira/browse/FLINK-23004 Project: Flink Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22919) Remove support for Hadoop1.x in HadoopInputFormatCommonBase.getCredentialsFromUGI
[ https://issues.apache.org/jira/browse/FLINK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359850#comment-17359850 ] Junfan Zhang commented on FLINK-22919: -- [~lirui] Please review it. Thanks > Remove support for Hadoop1.x in > HadoopInputFormatCommonBase.getCredentialsFromUGI > - > > Key: FLINK-22919 > URL: https://issues.apache.org/jira/browse/FLINK-22919 > Project: Flink > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22919) Remove support for Hadoop1.x in HadoopInputFormatCommonBase.getCredentialsFromUGI
Junfan Zhang created FLINK-22919: Summary: Remove support for Hadoop1.x in HadoopInputFormatCommonBase.getCredentialsFromUGI Key: FLINK-22919 URL: https://issues.apache.org/jira/browse/FLINK-22919 Project: Flink Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22690) kerberos integration with flink,kerberos tokens will expire
[ https://issues.apache.org/jira/browse/FLINK-22690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346635#comment-17346635 ] Junfan Zhang commented on FLINK-22690: -- Please add more details info, such as submission mode(Yarn or K8s), with keytab or not and related submission command. [~kevinsun1000] > kerberos integration with flink,kerberos tokens will expire > > > Key: FLINK-22690 > URL: https://issues.apache.org/jira/browse/FLINK-22690 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Reporter: kevinsun >Priority: Major > > flink on yarn job. flink sink hdfs,hive,hbase... ,some times later,kerberos > tokens will expire. > error: Failed to find any Kerberos tgt > eg*:StreamingFileSink* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21232) Introduce pluggable Hadoop delegation token providers
[ https://issues.apache.org/jira/browse/FLINK-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345857#comment-17345857 ] Junfan Zhang edited comment on FLINK-21232 at 5/17/21, 3:54 AM: Hi [~jackwangcs] [~lirui] Nice improvement! Thanks for your work. [~jackwangcs] And any update on it? I am happy to apply this patch to our production environment for testing. was (Author: zuston): Nice improvement! Thanks for your work. [~jackwangcs] And any update on it? [~lirui] I am happy to apply this patch to our production environment for testing. > Introduce pluggable Hadoop delegation token providers > - > > Key: FLINK-21232 > URL: https://issues.apache.org/jira/browse/FLINK-21232 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common, Deployment / YARN >Reporter: jackwangcs >Priority: Major > Labels: auto-unassigned, pull-request-available > > Introduce a pluggable delegation provider via SPI. > Delegation provider could be placed in connector related code and is more > extendable comparing using reflection way to obtain DTs. > Email dicussion thread: > [https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E] > > [https://lists.apache.org/thread.html/r20d4be431ff2f6faff94129b5321a047fcbb0c71c8e092504cd91183%40%3Cdev.flink.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21232) Introduce pluggable Hadoop delegation token providers
[ https://issues.apache.org/jira/browse/FLINK-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345857#comment-17345857 ] Junfan Zhang commented on FLINK-21232: -- Nice improvement! Thanks for your work. [~jackwangcs] And any update on it? [~lirui] I am happy to apply this patch to our production environment for testing. > Introduce pluggable Hadoop delegation token providers > - > > Key: FLINK-21232 > URL: https://issues.apache.org/jira/browse/FLINK-21232 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common, Deployment / YARN >Reporter: jackwangcs >Priority: Major > Labels: auto-unassigned, pull-request-available > > Introduce a pluggable delegation provider via SPI. > Delegation provider could be placed in connector related code and is more > extendable comparing using reflection way to obtain DTs. > Email dicussion thread: > [https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E] > > [https://lists.apache.org/thread.html/r20d4be431ff2f6faff94129b5321a047fcbb0c71c8e092504cd91183%40%3Cdev.flink.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22329) Missing credentials in jobconf causes repeated authentication in Hive datasource
[ https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344497#comment-17344497 ] Junfan Zhang commented on FLINK-22329: -- Update it, please review it again. Thanks [~lirui] > Missing credentials in jobconf causes repeated authentication in Hive > datasource > > > Key: FLINK-22329 > URL: https://issues.apache.org/jira/browse/FLINK-22329 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > Related Flink code: > [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107] > > In this {{getSplits}} method, it will call hadoop {{FileInputFormat's > getSplits}} method. related hadoop code is > [here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426]. > Simple code is as follows > {code:java} > // Hadoop FileInputFormat > public InputSplit[] getSplits(JobConf job, int numSplits) > throws IOException { > StopWatch sw = new StopWatch().start(); > FileStatus[] stats = listStatus(job); > > .. > } > protected FileStatus[] listStatus(JobConf job) throws IOException { > Path[] dirs = getInputPaths(job); > if (dirs.length == 0) { > throw new IOException("No input paths specified in job"); > } > // get tokens for all the required FileSystems.. > TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job); > > // Whether we need to recursive look into the directory structure > .. > } > {code} > > In {{listStatus}} method, it will obtain delegation tokens by calling > {{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will > give up to get delegation tokens when credentials in jobconf. > So it's neccessary to inject current ugi credentials into jobconf. > > Besides, when Flink support delegation tokens directly without keytab([refer > to this PR|https://issues.apache.org/jira/browse/FLINK-21700]), > {{TokenCache.obtainTokensForNamenodes}} will failed without this patch > because of no corresponding credentials. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22329) Missing credentials in jobconf causes repeated authentication in Hive datasource
[ https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22329: - Summary: Missing credentials in jobconf causes repeated authentication in Hive datasource (was: Missing crendentials in jobconf causes repeated authentication in Hive datasource) > Missing credentials in jobconf causes repeated authentication in Hive > datasource > > > Key: FLINK-22329 > URL: https://issues.apache.org/jira/browse/FLINK-22329 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > > Related Flink code: > [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107] > > In this {{getSplits}} method, it will call hadoop {{FileInputFormat's > getSplits}} method. related hadoop code is > [here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426]. > Simple code is as follows > {code:java} > // Hadoop FileInputFormat > public InputSplit[] getSplits(JobConf job, int numSplits) > throws IOException { > StopWatch sw = new StopWatch().start(); > FileStatus[] stats = listStatus(job); > > .. > } > protected FileStatus[] listStatus(JobConf job) throws IOException { > Path[] dirs = getInputPaths(job); > if (dirs.length == 0) { > throw new IOException("No input paths specified in job"); > } > // get tokens for all the required FileSystems.. > TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job); > > // Whether we need to recursive look into the directory structure > .. > } > {code} > > In {{listStatus}} method, it will obtain delegation tokens by calling > {{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will > give up to get delegation tokens when credentials in jobconf. > So it's neccessary to inject current ugi credentials into jobconf. > > Besides, when Flink support delegation tokens directly without keytab([refer > to this PR|https://issues.apache.org/jira/browse/FLINK-21700]), > {{TokenCache.obtainTokensForNamenodes}} will failed without this patch > because of no corresponding credentials. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21700) Allow to disable fetching Hadoop delegation token on Yarn
[ https://issues.apache.org/jira/browse/FLINK-21700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344323#comment-17344323 ] Junfan Zhang commented on FLINK-21700: -- [~fly_in_gis] [~lirui] Could you help set status to resolved. Thanks~ > Allow to disable fetching Hadoop delegation token on Yarn > - > > Key: FLINK-21700 > URL: https://issues.apache.org/jira/browse/FLINK-21700 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > > h3. Why > I want to support Flink Action on Oozie. > As we know, Oozie will obtain HDFS/HBase delegation token before starting > Flink submitter cli. > Actually, Spark support disable fetching delegation token on Spark client, > [related Spark > doc|https://spark.apache.org/docs/latest/running-on-yarn.html#launching-your-application-with-apache-oozie]. > > So i think Flink should allow to disable fetching Hadoop delegation token on > Yarn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343786#comment-17343786 ] Junfan Zhang commented on FLINK-22534: -- Wow, thanks for your review. [~lirui]. > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342391#comment-17342391 ] Junfan Zhang commented on FLINK-22534: -- [~lirui] Thanks for your reply. Yes > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342268#comment-17342268 ] Junfan Zhang commented on FLINK-22534: -- Any ideas on it? [~lirui] [~mapohl] > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340761#comment-17340761 ] Junfan Zhang edited comment on FLINK-22534 at 5/7/21, 11:55 AM: [~mapohl]. Sorry for the late reply. More details infos are added in description. ping [~karmagyz] [~lirui] was (Author: zuston): [~mapohl]. Sorry for the late reply. More details infos are added in description. ping [~karmagyz] [~lirui] [~wangyang] > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340761#comment-17340761 ] Junfan Zhang commented on FLINK-22534: -- [~mapohl]. Sorry for the late reply. More details infos are added in description. ping [~karmagyz] [~lirui] [~wangyang] > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] and [Yarn Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, [refer to code here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. !debug2.PNG! was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. !debug2.PNG! > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and [Yarn > Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, [refer to code > here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209]. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. !debug2.PNG! was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. !debug2.PNG! > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code > [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101] > and Yarn Utils. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, refer to code here. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. !debug2.PNG! was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code HadoopModule and Yarn > Utils. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, refer to code here. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > !debug2.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Attachment: debug2.PNG > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code HadoopModule and Yarn > Utils. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS federation mode, it will cause > the problem of overwriting the different delegation tokens with the same > identifier, refer to code here. > h5. When does the same identifier delegation tokens appear? > Hadoop HA delegation tokens will have the same identifier(Refer to > [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' > service name is different. So we can use service name as alias. > The following figure from > [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the > identifier of HA delegation token is the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? When in HDFS HA mode, Hadoop HA delegation tokens will have the same identifier(Refer to HDFS-9276), but its' service name is different. So we can use service name as alias. The following figure from HDFS-9276 can show that the identifier of HA delegation token is the same. was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS federation mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? Hadoop HA delegation tokens will have the same identifier(Refer to [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' service name is different. So we can use service name as alias. The following figure from [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the identifier of HA delegation token is the same. > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Attachments: debug2.PNG > > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code HadoopModule and Yarn > Utils. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS HA mode, it will cause the > problem of overwriting the different delegation tokens with the same > identifier, refer to code here. > h5. When does the same identifier delegation tokens appear? > When in HDFS HA mode, Hadoop HA delegation tokens will have the same > identifier(Refer to HDFS-9276), but its' service name is different. So we can > use service name as alias. > The following figure from HDFS-9276 can show that the identifier of HA > delegation token is the same. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils. Firstly, I think we could use the same way to set credential alias, like delegation token's service name. It will be more clear. Secondly, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS federation mode, it will cause the problem of overwriting the different delegation tokens with the same identifier, refer to code here. h5. When does the same identifier delegation tokens appear? Hadoop HA delegation tokens will have the same identifier(Refer to [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' service name is different. So we can use service name as alias. The following figure from [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the identifier of HA delegation token is the same. was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to code. I think we could use the same way to set credential alias, like delegation token's service name. Besides, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS federation mode, it will cause > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to Flink code HadoopModule and Yarn > Utils. > Firstly, I think we could use the same way to set credential alias, like > delegation token's service name. It will be more clear. > Secondly, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS federation mode, it will cause > the problem of overwriting the different delegation tokens with the same > identifier, refer to code here. > h5. When does the same identifier delegation tokens appear? > Hadoop HA delegation tokens will have the same identifier(Refer to > [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' > service name is different. So we can use service name as alias. > The following figure from > [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the > identifier of HA delegation token is the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias
[ https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated FLINK-22534: - Description: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to code. I think we could use the same way to set credential alias, like delegation token's service name. Besides, when fetching HDFS delegation token and then inject all tokens to current UserGroupInformation in Hadoop HDFS federation mode, it will cause was: h4. What Set the Hadoop delegation token's service name as credential alias. h4. Why In current implementation, Flink will use delegation token's service name or identifer as credential alias, refer to code. I think we could use the same way to set credential alias, like delegation token's service name. Besides, in > Set delegation token's service name as credential alias > --- > > Key: FLINK-22534 > URL: https://issues.apache.org/jira/browse/FLINK-22534 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hadoop Compatibility >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > > h4. What > Set the Hadoop delegation token's service name as credential alias. > h4. Why > In current implementation, Flink will use delegation token's service name or > identifer as credential alias, refer to code. > I think we could use the same way to set credential alias, like delegation > token's service name. > Besides, when fetching HDFS delegation token and then inject all tokens to > current UserGroupInformation in Hadoop HDFS federation mode, it will cause -- This message was sent by Atlassian Jira (v8.3.4#803005)