[jira] [Commented] (FLINK-33112) Support placement constraint

2023-10-22 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778513#comment-17778513
 ] 

Junfan Zhang commented on FLINK-33112:
--

[https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html]
 The detailed info is here.

> Support placement constraint
> 
>
> Key: FLINK-33112
> URL: https://issues.apache.org/jira/browse/FLINK-33112
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for 
> specify affinity or anti-affinity or colocation with K8s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33112) Support placement constraint

2023-10-08 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773083#comment-17773083
 ] 

Junfan Zhang commented on FLINK-33112:
--

PTAL [~guoyangze] 

> Support placement constraint
> 
>
> Key: FLINK-33112
> URL: https://issues.apache.org/jira/browse/FLINK-33112
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for 
> specify affinity or anti-affinity or colocation with K8s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33112) Support placement constraint

2023-09-18 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-33112:


 Summary: Support placement constraint
 Key: FLINK-33112
 URL: https://issues.apache.org/jira/browse/FLINK-33112
 Project: Flink
  Issue Type: New Feature
  Components: Deployment / YARN
Reporter: Junfan Zhang


Yarn placement constraint is introduced in hadoop3.2.0 , which is useful for 
specify affinity or anti-affinity or colocation with K8s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn

2022-05-09 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-27550:
-
Description: 
When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, 
it will check the specified yarnQueue whether exists in the queues gotten by 
the YarnClient.QueueInfo.

However when using the capacity-scheduler, the yarn queues path should be 
retrieved by the api of {{QueueInfo.getQueuePath}}
instead of {{getQueueName}}. 

Due to this, it will always print out the yarn all queues log, but it also can 
be submitted to Yarn successfully.

The api of getQueuePath is introduced in the latest hadoop 
version(https://issues.apache.org/jira/browse/YARN-10658), so it's hard to
solve this problem in the older hadoop cluster.

According to the above description, the process of checking is unnecessary.

  was:
When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, 
it will check the specified yarnQueue whether exists in the queues gotten by 
the YarnClient.QueueInfo.

However when using the capacity-scheduler, the yarn queues path should be 
retrieved by the api of {{QueueInfo.getQueuePath}}
instead of {{getQueueName}}. 
Due to this, it will always print out the yarn all queues log, but it also can 
be submitted to Yarn successfully.

According to the above description, the process of checking is unnecessary.


> Remove checking yarn queues before submitting job to Yarn
> -
>
> Key: FLINK-27550
> URL: https://issues.apache.org/jira/browse/FLINK-27550
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, 
> it will check the specified yarnQueue whether exists in the queues gotten by 
> the YarnClient.QueueInfo.
> However when using the capacity-scheduler, the yarn queues path should be 
> retrieved by the api of {{QueueInfo.getQueuePath}}
> instead of {{getQueueName}}. 
> Due to this, it will always print out the yarn all queues log, but it also 
> can be submitted to Yarn successfully.
> The api of getQueuePath is introduced in the latest hadoop 
> version(https://issues.apache.org/jira/browse/YARN-10658), so it's hard to
> solve this problem in the older hadoop cluster.
> According to the above description, the process of checking is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn

2022-05-09 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-27550:
-
Description: 
When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, 
it will check the specified yarnQueue whether exists in the queues gotten by 
the YarnClient.QueueInfo.

However when using the capacity-scheduler, the yarn queues path should be 
retrieved by the api of {{QueueInfo.getQueuePath}}
instead of {{getQueueName}}. 
Due to this, it will always print out the yarn all queues log, but it also can 
be submitted to Yarn successfully.

According to the above description, the process of checking is unnecessary.

  was:
When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will 
check the specified yarnQueue whether exists in the queues gotten by the 
YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn 
queues path should be retrieved by the api of QueueInfo.getQueuePath instead of 
getQueueName. Due to this, it will always print out the yarn queues log, but it 
also can be submmitted to Yarn successfully.

According to the above description, the process of checking is unnecessary.


> Remove checking yarn queues before submitting job to Yarn
> -
>
> Key: FLINK-27550
> URL: https://issues.apache.org/jira/browse/FLINK-27550
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> When invoking the method of {{checkYarnQueues}} in YarnClusterDescriptor, 
> it will check the specified yarnQueue whether exists in the queues gotten by 
> the YarnClient.QueueInfo.
> However when using the capacity-scheduler, the yarn queues path should be 
> retrieved by the api of {{QueueInfo.getQueuePath}}
> instead of {{getQueueName}}. 
> Due to this, it will always print out the yarn all queues log, but it also 
> can be submitted to Yarn successfully.
> According to the above description, the process of checking is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn

2022-05-09 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-27550:
-
Description: 
When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will 
check the specified yarnQueue whether exists in the queues gotten by the 
YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn 
queues path should be retrieved by the api of QueueInfo.getQueuePath instead of 
getQueueName. Due to this, it will always print out the yarn queues log, but it 
also can be submmitted to Yarn successfully.

According to the above description, the process of checking is unnecessary.

  was:
When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will 
check the
specified yarnQueue whether exists in the queues gotten by the
YarnClient.QueueInfo. However when using the capacity-scheduler,
the yarn queues path should be retrieved by the api of 
QueueInfo.getQueuePath
instead of getQueueName. Due to this, it will always print out the yarn
queues log, but it also can be submmitted to Yarn successfully.

According to the above description, the process of checking is
unnecessary.


> Remove checking yarn queues before submitting job to Yarn
> -
>
> Key: FLINK-27550
> URL: https://issues.apache.org/jira/browse/FLINK-27550
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will 
> check the specified yarnQueue whether exists in the queues gotten by the 
> YarnClient.QueueInfo. However when using the capacity-scheduler, the yarn 
> queues path should be retrieved by the api of QueueInfo.getQueuePath instead 
> of getQueueName. Due to this, it will always print out the yarn queues log, 
> but it also can be submmitted to Yarn successfully.
> According to the above description, the process of checking is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27550) Remove checking yarn queues before submitting job to Yarn

2022-05-09 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-27550:


 Summary: Remove checking yarn queues before submitting job to Yarn
 Key: FLINK-27550
 URL: https://issues.apache.org/jira/browse/FLINK-27550
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Junfan Zhang


When invoking the method of checkYarnQueues in YarnClusterDescriptor, it will 
check the
specified yarnQueue whether exists in the queues gotten by the
YarnClient.QueueInfo. However when using the capacity-scheduler,
the yarn queues path should be retrieved by the api of 
QueueInfo.getQueuePath
instead of getQueueName. Due to this, it will always print out the yarn
queues log, but it also can be submmitted to Yarn successfully.

According to the above description, the process of checking is
unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-26697) Detailed log with flink app name

2022-03-17 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-26697:


 Summary: Detailed log with flink app name
 Key: FLINK-26697
 URL: https://issues.apache.org/jira/browse/FLINK-26697
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26696) Reuse the method of judgment

2022-03-17 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-26696:


 Summary: Reuse the method of judgment
 Key: FLINK-26696
 URL: https://issues.apache.org/jira/browse/FLINK-26696
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26329) Adjust the order of var initialization in FlinkControllerConfig

2022-02-23 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-26329:


 Summary: Adjust the order of var initialization in 
FlinkControllerConfig
 Key: FLINK-26329
 URL: https://issues.apache.org/jira/browse/FLINK-26329
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26324) Remove duplicate condition judgment

2022-02-23 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-26324:


 Summary: Remove duplicate condition judgment
 Key: FLINK-26324
 URL: https://issues.apache.org/jira/browse/FLINK-26324
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2022-02-12 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491294#comment-17491294
 ] 

Junfan Zhang commented on FLINK-25495:
--

Thanks for your suggestions. I have drafted, if u have time, could you help 
review it? [~wangyang0918] 

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2022-02-08 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488725#comment-17488725
 ] 

Junfan Zhang commented on FLINK-25495:
--

As discussed in mail list, the community will drop per-job mode. So i think 
this feature should be supported to keep consistent with the mode of per-job.

Could u help assign to me? [~wangyang0918] [~knaufk] 

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP

2022-01-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480118#comment-17480118
 ] 

Junfan Zhang commented on FLINK-25749:
--

All failure above looks lost connection with mini-yarn, maybe due to unstable 
network.

Maybe we could set config of ipc client to solve, like as follows:
{code:java}
yarnClusterConf.setInt("ipc.client.connection.maxidletime", 1000);
yarnClusterConf.setInt("ipc.client.connect.max.retries", 3);
yarnClusterConf.setInt("ipc.client.connect.retry.interval", 10);
yarnClusterConf.setInt("ipc.client.connect.timeout", 1000);
yarnClusterConf.setInt("ipc.client.connect.max.retries.on.timeouts", 3);
{code}
 

[~trohrmann] Do you think so? Maybe I can take over this ticket to improve test 
stability.

> YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
> --
>
> Key: FLINK-25749
> URL: https://issues.apache.org/jira/browse/FLINK-25749
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP:
> {code}
> 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: 
> 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [
> 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Resource manager service is not running. Ignore revoking leadership.
> 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped 
> dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0.
> 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO  
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - 
> Interrupted while waiting for queue
> 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: 
> null
> 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>  ~[?:1.8.0_292]
> 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>  ~[?:1.8.0_292]
> 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18  at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
> ~[?:1.8.0_292]
> 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18  at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323)
>  [hadoop-yarn-client-2.8.5.jar:?]
> 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN  
> org.apache.hadoop.ipc.Client [] - Failed to 
> connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to 
> exceeded maximum allowed retries number: 0
> 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 
> java.nio.channels.ClosedByInterruptException: null
> 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 
> java.nio.channels.ClosedByInterruptException: null
> 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18  at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>  ~[?:1.8.0_292]
> 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18  at 
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) 
> ~[?:1.8.0_292]
> 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
>  ~[hadoop-common-2.8.5.jar:?]
> 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18  at 
> org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) 
> ~[hadoop-common-2.8.5.jar:?]
> 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18  at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) 
> [hadoop-common-2.8.5.jar:?]
> 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18  at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) 
> [hadoop-common-2.8.5.jar:?]
> 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18  at 
> org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) 
> [hadoop-common-2.8.5.jar:?]
> 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18  at 
> 

[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2022-01-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471736#comment-17471736
 ] 

Junfan Zhang commented on FLINK-25495:
--

[~wangyang0918] Thanks for your reply.

{noformat}
I am afraid that depending on the attach mode is not reliable since the Flink 
client may disconnect because of network issues or JobManager failover.
{noformat}

I'm doubt whether the above problems will also occur when in per-job attach 
mode.

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25445) TaskExecutor always creates local file for task even when local state store is not used

2022-01-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471347#comment-17471347
 ] 

Junfan Zhang edited comment on FLINK-25445 at 1/10/22, 3:10 AM:


Do you mind i take over this ticket? [~zjureel] 

I'm learning the code of task local-recovery and interested on it.


was (Author: zuston):
Do you mind i take over this ticket? [~zjureel] 

I'm learn the code of task local-recovery and interested on it.

> TaskExecutor always creates local file for task even when local state store 
> is not used
> ---
>
> Key: FLINK-25445
> URL: https://issues.apache.org/jira/browse/FLINK-25445
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.7, 1.13.5, 1.14.2
>Reporter: Shammon
>Priority: Major
>  Labels: pull-request-available
>
> `TaskExecutor` will create file, check `localRecoveryEnabled` and load local 
> state store for each task submission in method `localStateStoreForSubtask`. 
> For batch jobs, the `localRecoveryEnabled` is always false, and there's no 
> need to create these local files for task in `TaskExecutor`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25445) TaskExecutor always creates local file for task even when local state store is not used

2022-01-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471347#comment-17471347
 ] 

Junfan Zhang commented on FLINK-25445:
--

Do you mind i take over this ticket? [~zjureel] 

I'm learn the code of task local-recovery and interested on it.

> TaskExecutor always creates local file for task even when local state store 
> is not used
> ---
>
> Key: FLINK-25445
> URL: https://issues.apache.org/jira/browse/FLINK-25445
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.7, 1.13.5, 1.14.2
>Reporter: Shammon
>Priority: Major
>  Labels: pull-request-available
>
> `TaskExecutor` will create file, check `localRecoveryEnabled` and load local 
> state store for each task submission in method `localStateStoreForSubtask`. 
> For batch jobs, the `localRecoveryEnabled` is always false, and there's no 
> need to create these local files for task in `TaskExecutor`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25564) TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure fails on AZP

2022-01-07 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470741#comment-17470741
 ] 

Junfan Zhang commented on FLINK-25564:
--

I can't reproduce this in my local machine, and you? [~trohrmann]

> TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure
>  fails on AZP
> -
>
> Key: FLINK-25564
> URL: https://issues.apache.org/jira/browse/FLINK-25564
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> The test 
> {{TaskManagerProcessFailureStreamingRecoveryITCase>AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure}}
>  fails on AZP with
> {code}
> Jan 07 05:07:22 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 31.057 s <<< FAILURE! - in 
> org.apache.flink.test.recovery.TaskManagerProcessFailureStreamingRecoveryITCase
> Jan 07 05:07:22 [ERROR] 
> org.apache.flink.test.recovery.TaskManagerProcessFailureStreamingRecoveryITCase.testTaskManagerProcessFailure
>   Time elapsed: 31.012 s  <<< FAILURE!
> Jan 07 05:07:22 java.lang.AssertionError: The program encountered a 
> IOExceptionList : /tmp/junit2133275241637829858/junit7793757951823298127
> Jan 07 05:07:22   at org.junit.Assert.fail(Assert.java:89)
> Jan 07 05:07:22   at 
> org.apache.flink.test.recovery.AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure(AbstractTaskManagerProcessFailureRecoveryTest.java:205)
> Jan 07 05:07:22   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jan 07 05:07:22   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jan 07 05:07:22   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jan 07 05:07:22   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 07 05:07:22   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jan 07 05:07:22   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jan 07 05:07:22   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jan 07 05:07:22   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jan 07 05:07:22   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jan 07 05:07:22   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jan 07 05:07:22   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jan 07 05:07:22   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jan 07 05:07:22   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jan 07 05:07:22   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jan 07 05:07:22   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jan 07 05:07:22   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jan 07 05:07:22   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jan 07 05:07:22   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Jan 07 05:07:22   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Jan 07 05:07:22   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Jan 07 05:07:22   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
> Jan 07 05:07:22   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
> Jan 07 05:07:22   at 
> 

[jira] [Created] (FLINK-25536) Minor Fix: Adjust the order of variable declaration and comment in StateAssignmentOperation

2022-01-05 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25536:


 Summary: Minor Fix: Adjust the order of variable declaration and 
comment in StateAssignmentOperation
 Key: FLINK-25536
 URL: https://issues.apache.org/jira/browse/FLINK-25536
 Project: Flink
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25534) execute pre-job throws org.apache.flink.table.api.TableException: Failed to execute sql

2022-01-05 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469627#comment-17469627
 ] 

Junfan Zhang commented on FLINK-25534:
--

I think you could login to the appmaster container on yarn nodemanager to check 
the detailed logs.

> execute pre-job throws org.apache.flink.table.api.TableException: Failed to 
> execute sql
> ---
>
> Key: FLINK-25534
> URL: https://issues.apache.org/jira/browse/FLINK-25534
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: jychen
>Priority: Blocker
>
> org.apache.flink.table.api.TableException: Failed to execute sql
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:777)
>  ~[flink-table-blink_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:742)
>  ~[flink-table-blink_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.table.api.internal.StatementSetImpl.execute(StatementSetImpl.java:99)
>  ~[flink-table-blink_2.12-1.13.3.jar:1.13.3]
>     at com.cig.cdp.flink.JobApplication.main(JobApplication.java:91) 
> ~[flink-streaming-core.jar:1.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_312]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_312]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_312]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_312]
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812) 
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246) 
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> ...skipping 1 line
>     at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) 
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_312]
>     at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_312]
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>  [hadoop-common-3.0.0.jar:?]
>     at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  [flink-dist_2.12-1.13.3.jar:1.13.3]
>     at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) 
> [flink-dist_2.12-1.13.3.jar:1.13.3]
> Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: 
> Could not deploy Yarn job cluster.
>     at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1957)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:137)
>  ~[flink-dist_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)
>  ~[flink-table-blink_2.12-1.13.3.jar:1.13.3]
>     at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:759)
>  ~[flink-table-blink_2.12-1.13.3.jar:1.13.3]
>     ... 19 more
> Caused by: 
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN 
> application unexpectedly switched to state FAILED during deployment. 
> Diagnostics from YARN: Application application_1641278381631_0020 failed 2 
> times in previous 1 milliseconds due to AM Container for 
> appattempt_1641278381631_0020_02 exited with  exi
> tCode: 1
> Failing this attempt.Diagnostics: [2022-01-06 09:15:11.758]Exception from 
> container-launch.
> Container id: container_1641278381631_0020_02_01
> Exit code: 1
> [2022-01-06 09:15:11.759]Container exited with a non-zero exit code 1. Error 
> file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> 

[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2022-01-04 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468583#comment-17468583
 ] 

Junfan Zhang commented on FLINK-25495:
--

[~trohrmann] 

Polling is always OK.

However, when in our internal platform, we integrate Flink batch into 
apache/Oozie. If having this feature, we don't need to poll. And it will make 
easy to integrate with other projects.

I think this is a usability improvement. By the way, perjob mode also support 
attach.

Of source, if no this feature,  other solutions also could solve it.

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2022-01-03 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468330#comment-17468330
 ] 

Junfan Zhang commented on FLINK-25495:
--

Thanks for your reply [~trohrmann]

As I know, flink cli don't support attach mode when using the Application mode, 
refer to 
https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-clients/src/main/java/org/apache/flink/client/deployment/application/cli/ApplicationClusterDeployer.java#L67.

But I hope that the flink client couldn't exit until the application finished, 
especially in batch.

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25496) ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation failed on azure

2021-12-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467128#comment-17467128
 ] 

Junfan Zhang commented on FLINK-25496:
--

Sorry for that. I'll correct it asap.

>  ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation failed on azure
> ---
>
> Key: FLINK-25496
> URL: https://issues.apache.org/jira/browse/FLINK-25496
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Reporter: Yun Gao
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
>
>  
> {code:java}
> Dec 31 02:53:26 [ERROR] Failures: 
> Dec 31 02:53:26 [ERROR]   
> ThreadDumpInfoTest.testComparedWithDefaultJDKImplemetation:66 
> expected:<"main" [prio=5 ]Id=1 RUNNABLE
> Dec 31 02:53:26   at ja...> but was:<"main" []Id=1 RUNNABLE
> Dec 31 02:53:26   at ja...>
> Dec 31 02:53:26 [INFO] 
> Dec 31 02:53:26 [ERROR] Tests run: 5958, Failures: 1, Errors: 0, Skipped: 26
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28779=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=13859
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25495) Client support attach mode when using the deployment of application mode

2021-12-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467125#comment-17467125
 ] 

Junfan Zhang commented on FLINK-25495:
--

Could you help check this feature [~trohrmann]? 

If required, i could take over.

Thanks 

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25495) Client support attach mode when using the deployment of application mode

2021-12-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25495:
-
Description: 
h2. Why
In our internal data platform, we support flink batch and streaming job 
submission. To reduce the submission worker overload, we use the Flink 
application mode to submit flink job. It's a nice feature!

However, on batch mode, we hope flink client couldn't exit until the batch 
application finished (No need to get job result, just wait). Now the flink lack 
this feature, and it is not stated in the document that Application Mode does 
not support attach.

> Client support attach mode when using the deployment of application mode
> 
>
> Key: FLINK-25495
> URL: https://issues.apache.org/jira/browse/FLINK-25495
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal data platform, we support flink batch and streaming job 
> submission. To reduce the submission worker overload, we use the Flink 
> application mode to submit flink job. It's a nice feature!
> However, on batch mode, we hope flink client couldn't exit until the batch 
> application finished (No need to get job result, just wait). Now the flink 
> lack this feature, and it is not stated in the document that Application Mode 
> does not support attach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25495) Client support attach mode when using the deployment of application mode

2021-12-30 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25495:


 Summary: Client support attach mode when using the deployment of 
application mode
 Key: FLINK-25495
 URL: https://issues.apache.org/jira/browse/FLINK-25495
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-23 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464468#comment-17464468
 ] 

Junfan Zhang commented on FLINK-25410:
--

Sounds great! [~trohrmann]. It's elegant and general.

I think i can take over this ticket to solve using your approach

> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

2021-12-23 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464399#comment-17464399
 ] 

Junfan Zhang commented on FLINK-17808:
--

Gentle ping [~yunta] . Do you have any ideas?

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -
>
> Key: FLINK-17808
> URL: https://issues.apache.org/jira/browse/FLINK-17808
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463624#comment-17463624
 ] 

Junfan Zhang commented on FLINK-17808:
--

I overlooked this, and thanks [~yunta] for pointing it.

I think If filesystem dont support, we could fall back to original 
implementation in {{{}FsCheckpointMetadataOutputStream{}}}. Maybe we should 
underline this point in doc.

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -
>
> Key: FLINK-17808
> URL: https://issues.apache.org/jira/browse/FLINK-17808
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588
 ] 

Junfan Zhang edited comment on FLINK-25410 at 12/22/21, 5:47 AM:
-

Could you help check this feature? [~guoyangze]  [~trohrmann]  [~xtsong] 
[~yunta] [~wangyang0918]

If OK, please assign to me. PR will be attached sooner

Thanks ~


was (Author: zuston):
Could you help check this feature? [~guoyangze]  [~trohrmann]  [~xtsong] 
[~yunta] 

If OK, please assign to me. PR will be attached sooner

Thanks ~

> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588
 ] 

Junfan Zhang edited comment on FLINK-25410 at 12/22/21, 5:43 AM:
-

Could you help check this feature? [~guoyangze]  [~trohrmann]  [~xtsong] 
[~yunta] 

If OK, please assign to me. PR will be attached sooner

Thanks ~


was (Author: zuston):
Could you help check this feature? [~guoyangze]  [~trohrmann]  [~xtsong] 
[~yunta] 

Thanks ~

> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463588#comment-17463588
 ] 

Junfan Zhang commented on FLINK-25410:
--

Could you help check this feature? [~guoyangze]  [~trohrmann]  [~xtsong] 
[~yunta] 

Thanks ~

> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25410:
-
Description: 
h2. Why

In our internal streaming platform, we will use flink-cli tool to submit Flink 
streaming application on Yarn.

However when encountering Hadoop cluster down and then lots of flink apps need 
to be resubmitted, the submitter of worker in our platform will hang at this 
time.

Because the Yarn cluster resources are tight and the scheduling efficiency 
becomes low when lots of apps needs to be started.

And flink-cli will not exit until the app status changes to running.

In addition, I also think there is no need to wait when app status is accepted 
with detach mode on Yarn.
h2. How

When app in accpeted status, flink-cli should exit directly to release 
submitter worker process resource. The PR could refer to : 
https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224

  was:
h2. Why
In our internal streaming platform, we will use flink-cli tool to submit Flink 
streaming application on Yarn. 

However when encountering Hadoop cluster down and then lots of flink apps need 
to be resubmitted, the submitter of worker in our platform will hang.

Because the Yarn cluster resources are tight, flink-cli will exit until the 
app's status change to running


> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25410:
-
Description: 
h2. Why
In our internal streaming platform, we will use flink-cli tool to submit Flink 
streaming application on Yarn. 

However when encountering Hadoop cluster down and then lots of flink apps need 
to be resubmitted, the submitter of worker in our platform will hang.

Because the Yarn cluster resources are tight, flink-cli will exit until the 
app's status change to running

> Flink CLI should exit when app is accepted with detach mode on Yarn
> ---
>
> Key: FLINK-25410
> URL: https://issues.apache.org/jira/browse/FLINK-25410
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn. 
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang.
> Because the Yarn cluster resources are tight, flink-cli will exit until the 
> app's status change to running



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25410) Flink CLI should exit when app is accepted with detach mode on Yarn

2021-12-21 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25410:


 Summary: Flink CLI should exit when app is accepted with detach 
mode on Yarn
 Key: FLINK-25410
 URL: https://issues.apache.org/jira/browse/FLINK-25410
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463584#comment-17463584
 ] 

Junfan Zhang commented on FLINK-17808:
--

[~yunta] 
As Stephan mentioned above, i also think the first one is the better option.

So we could use the {{RecoverableFsDataOutputStream}} close and commit to 
ensure the writing file atomicity instead of using the {{FSDataOutputStream}} 
in {{FsCheckpointMetadataOutputStream}}. Right?

Please let me know what u think.

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -
>
> Key: FLINK-17808
> URL: https://issues.apache.org/jira/browse/FLINK-17808
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463582#comment-17463582
 ] 

Junfan Zhang commented on FLINK-25398:
--

[~Thesharing] Thanks for your advice. Agree with u, i will introduce the extra 
config to control the max depth.

Could you help tell me how to introduce new unit test to cover it? I have no 
ideas on it. :D. If just testing the rpc response and request, i think the 
previous UT is enough.

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463068#comment-17463068
 ] 

Junfan Zhang commented on FLINK-25398:
--

[~xtsong]  Besides, Spark print the complete stacktrace in executor's runtime 
webui, it works well.

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463067#comment-17463067
 ] 

Junfan Zhang commented on FLINK-25398:
--

[~xtsong] Thanks for your quick reply

>From my perspective, this feature makes sense. Usually we use thread dump in 
>runtime ui to debug the Flink job and analyze which calling the job stucked. 
>However due to stacktrace frame depth limitation, I have to login to 
>nodemanager to check this flink thread info. It's a terriable experience. 

Hence when debugging the problems, i think the cost of dumping all complete 
stack is deserved.

Do we need to introduce extra user config to enable stack depth limit? Maybe it 
should be determinzed by user?

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-21 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463067#comment-17463067
 ] 

Junfan Zhang edited comment on FLINK-25398 at 12/21/21, 8:26 AM:
-

[~xtsong] Thanks for your quick reply

>From my perspective, this feature makes sense. Usually we use thread dump in 
>runtime ui to debug the Flink job and analyze which calling the job stucked. 
>However due to stacktrace frame depth limitation, I have to login to 
>nodemanager to check the complete thread info. It's a terriable experience. 

Hence when debugging the problems, i think the cost of dumping all complete 
stack is deserved.

Do we need to introduce extra user config to enable stack depth limit? Maybe it 
should be determinzed by user?


was (Author: zuston):
[~xtsong] Thanks for your quick reply

>From my perspective, this feature makes sense. Usually we use thread dump in 
>runtime ui to debug the Flink job and analyze which calling the job stucked. 
>However due to stacktrace frame depth limitation, I have to login to 
>nodemanager to check this flink thread info. It's a terriable experience. 

Hence when debugging the problems, i think the cost of dumping all complete 
stack is deserved.

Do we need to introduce extra user config to enable stack depth limit? Maybe it 
should be determinzed by user?

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-20 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463009#comment-17463009
 ] 

Junfan Zhang commented on FLINK-25398:
--

Could you help check this ticket? [~guoyangze]  [~yunta]  [~xtsong] 

Besides optimized stacktrace has been attached in above attachment.

 

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-20 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25398:
-
Attachment: stacktrace.png

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: stacktrace.png
>
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-20 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25398:
-
Description: 
h2. Why

Now the stacktrace is not complete when clicking the task executor's threaddump 
 in runtime webui. Hence it's hard to the initial calling according to the 
stacktrace.

Now the thread stacktrace is limited to 8, refer to openjdk: 

[https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]

 

h2. Solution
Using the custom {{stringify}} method to return stacktrace instead of using 
{{ThreadInfo.toString}} directly

 

> Show complete stacktrace when requesting thread dump
> 
>
> Key: FLINK-25398
> URL: https://issues.apache.org/jira/browse/FLINK-25398
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> Now the stacktrace is not complete when clicking the task executor's 
> threaddump  in runtime webui. Hence it's hard to the initial calling 
> according to the stacktrace.
> Now the thread stacktrace is limited to 8, refer to openjdk: 
> [https://github.com/openjdk/jdk/blob/master/src/java.management/share/classes/java/lang/management/ThreadInfo.java#L597]
>  
> h2. Solution
> Using the custom {{stringify}} method to return stacktrace instead of using 
> {{ThreadInfo.toString}} directly
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25398) Show complete stacktrace when requesting thread dump

2021-12-20 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25398:


 Summary: Show complete stacktrace when requesting thread dump
 Key: FLINK-25398
 URL: https://issues.apache.org/jira/browse/FLINK-25398
 Project: Flink
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

2021-12-20 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462957#comment-17462957
 ] 

Junfan Zhang commented on FLINK-17808:
--

Could i take over this ticket? [~yunta] [~guoyangze] [~zhoujira86]

Draft PR link: https://github.com/apache/flink/pull/18157

If OK, i will optimize and add more tests on it.

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -
>
> Key: FLINK-17808
> URL: https://issues.apache.org/jira/browse/FLINK-17808
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-22485) Support client attach on application mode

2021-12-15 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460386#comment-17460386
 ] 

Junfan Zhang commented on FLINK-22485:
--

Rediscuss this ticket. 

After reading the code about the application mode implementation, i found that 
it dont support client attach.

Do we need the policy of client attach with Application mode ? I think Yes!

In batch mode, we hope that client will exit until application end when using 
the application mode.

[~rmetzger] [~guoyangze] [~chesnay]  Could you help check this feature? If OK, 
i could contribute. Thanks

> Support client attach on application mode
> -
>
> Key: FLINK-22485
> URL: https://issues.apache.org/jira/browse/FLINK-22485
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> Now, client will not wait until job finish when using application mode.
> Can we support client attach on application mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-22485) Support client attach on application mode

2021-12-15 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22485:
-
Priority: Major  (was: Not a Priority)

> Support client attach on application mode
> -
>
> Key: FLINK-22485
> URL: https://issues.apache.org/jira/browse/FLINK-22485
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> Now, client will not wait until job finish when using application mode.
> Can we support client attach on application mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25313) Enable flink runtime web

2021-12-15 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25313:
-
Summary: Enable flink runtime web  (was: Enable flink-web-ui)

> Enable flink runtime web
> 
>
> Key: FLINK-25313
> URL: https://issues.apache.org/jira/browse/FLINK-25313
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training / Exercises
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25313) Enable flink-web-ui

2021-12-15 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459847#comment-17459847
 ] 

Junfan Zhang commented on FLINK-25313:
--

Sorry for lacking the description. When i run flink application with local mode 
from flink-training project , I want to see some information in flink runtime 
web. So adding this flink-runtime-web dependency. 

Now rename the PR titile to {{Enable flink runtime web}}

[~danderson]

> Enable flink-web-ui
> ---
>
> Key: FLINK-25313
> URL: https://issues.apache.org/jira/browse/FLINK-25313
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training / Exercises
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25313) Enable flink-web-ui

2021-12-14 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459630#comment-17459630
 ] 

Junfan Zhang commented on FLINK-25313:
--

Linked PR: https://github.com/apache/flink-training/pull/45

> Enable flink-web-ui
> ---
>
> Key: FLINK-25313
> URL: https://issues.apache.org/jira/browse/FLINK-25313
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training / Exercises
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25313) Enable flink-web-ui

2021-12-14 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25313:


 Summary: Enable flink-web-ui
 Key: FLINK-25313
 URL: https://issues.apache.org/jira/browse/FLINK-25313
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25268) Support task manager node-label in Yarn deployment

2021-12-14 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459604#comment-17459604
 ] 

Junfan Zhang commented on FLINK-25268:
--

Thanks [~guoyangze] 

> Support task manager node-label in Yarn deployment
> --
>
> Key: FLINK-25268
> URL: https://issues.apache.org/jira/browse/FLINK-25268
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Now Flink only support application level node label, it's necessary to 
> introduce task manager level node-label on Yarn deployment.
> h2. Why we need it?
> Sometimes we will implement Flink to support deep learning payload using GPU, 
> so if having this feature, job manager and task managers could use different 
> nodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25268) Support task manager node-label in Yarn deployment

2021-12-14 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458159#comment-17458159
 ] 

Junfan Zhang edited comment on FLINK-25268 at 12/14/21, 11:28 AM:
--

Could you help check this features? [~wangyang0918] [~guoyangze] [~chesnay] 
Now this PR is just draft, if this feature could be invovled in Flink, i will 
optimize.

And maybe jobmanager(Yarn application) node label should also be supported.


was (Author: zuston):
Could you help check this features? [~wangyang0918] [~guoyangze]
Now this PR is just draft, if this feature could be invovled in Flink, i will 
optimize.

And maybe jobmanager(Yarn application) node label should also be supported.

> Support task manager node-label in Yarn deployment
> --
>
> Key: FLINK-25268
> URL: https://issues.apache.org/jira/browse/FLINK-25268
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Now Flink only support application level node label, it's necessary to 
> introduce task manager level node-label on Yarn deployment.
> h2. Why we need it?
> Sometimes we will implement Flink to support deep learning payload using GPU, 
> so if having this feature, job manager and task managers could use different 
> nodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25268) Support task manager node-label in Yarn deployment

2021-12-12 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458159#comment-17458159
 ] 

Junfan Zhang commented on FLINK-25268:
--

Could you help check this features? [~wangyang0918] [~guoyangze]
Now this PR is just draft, if this feature could be invovled in Flink, i will 
optimize.

And maybe jobmanager(Yarn application) node label should also be supported.

> Support task manager node-label in Yarn deployment
> --
>
> Key: FLINK-25268
> URL: https://issues.apache.org/jira/browse/FLINK-25268
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Now Flink only support application level node label, it's necessary to 
> introduce task manager level node-label on Yarn deployment.
> h2. Why we need it?
> Sometimes we will implement Flink to support deep learning payload using GPU, 
> so if having this feature, job manager and task managers could use different 
> nodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25268) Support task manager node-label in Yarn deployment

2021-12-12 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25268:
-
Description: 
Now Flink only support application level node label, it's necessary to 
introduce task manager level node-label on Yarn deployment.

h2. Why we need it?

Sometimes we will implement Flink to support deep learning payload using GPU, 
so if having this feature, job manager and task managers could use different 
nodes.

  was:Now Flink only support application level node label, it's necessary to 
introduce task manager level node-label on Yarn deployment.


> Support task manager node-label in Yarn deployment
> --
>
> Key: FLINK-25268
> URL: https://issues.apache.org/jira/browse/FLINK-25268
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Priority: Major
>
> Now Flink only support application level node label, it's necessary to 
> introduce task manager level node-label on Yarn deployment.
> h2. Why we need it?
> Sometimes we will implement Flink to support deep learning payload using GPU, 
> so if having this feature, job manager and task managers could use different 
> nodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25268) Support task manager node-label in Yarn deployment

2021-12-12 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25268:


 Summary: Support task manager node-label in Yarn deployment
 Key: FLINK-25268
 URL: https://issues.apache.org/jira/browse/FLINK-25268
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Junfan Zhang


Now Flink only support application level node label, it's necessary to 
introduce task manager level node-label on Yarn deployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25153) Inappropriate variable naming

2021-12-03 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-25153:


 Summary: Inappropriate variable naming
 Key: FLINK-25153
 URL: https://issues.apache.org/jira/browse/FLINK-25153
 Project: Flink
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451514#comment-17451514
 ] 

Junfan Zhang commented on FLINK-25099:
--

If you could provide more detailed info like submmision cli example, i maybe 
reproduce it.

Just guess that nodemanager dont have the flinkcluster nameservice due to using 
the default yarn cluster config instead of the configured flink.hadoop.

[~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> 

[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451479#comment-17451479
 ] 

Junfan Zhang commented on FLINK-25099:
--

Good news [~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> Is there a solution to the above 

[jira] [Comment Edited] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451479#comment-17451479
 ] 

Junfan Zhang edited comment on FLINK-25099 at 12/1/21, 2:54 AM:


Glad to help u [~libra_816]


was (Author: zuston):
Good news [~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> 

[jira] [Comment Edited] (FLINK-16654) Implement Application Mode according to FLIP-85

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451211#comment-17451211
 ] 

Junfan Zhang edited comment on FLINK-16654 at 11/30/21, 4:03 PM:
-

Now when using the ApplicationMode, the flink cli will be executed in default 
detach mode. Why not support attach mode that client could wait until job 
finished? 

[~wangyang0918] 
[~kkl0u] 
[~aljoscha]


was (Author: zuston):
Now when using the ApplicationMode, the flink cli will be executed in default 
detach mode. Why not support attach mode that client could wait until job 
finished? 

[~wangyang0918] [~kkl0u] [~aljoscha]

> Implement Application Mode according to FLIP-85
> ---
>
> Key: FLINK-16654
> URL: https://issues.apache.org/jira/browse/FLINK-16654
> Project: Flink
>  Issue Type: New Feature
>  Components: Client / Job Submission
>Affects Versions: 1.10.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.11.0
>
>
> This is an umbrella issue that will help us track the progress of 
> [FLIP-85|https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-16654) Implement Application Mode according to FLIP-85

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451211#comment-17451211
 ] 

Junfan Zhang commented on FLINK-16654:
--

Now when using the ApplicationMode, the flink cli will be executed in default 
detach mode. Why not support attach mode that client could wait until job 
finished? 

[~wangyang0918] [~kkl0u] [~aljoscha]

> Implement Application Mode according to FLIP-85
> ---
>
> Key: FLINK-16654
> URL: https://issues.apache.org/jira/browse/FLINK-16654
> Project: Flink
>  Issue Type: New Feature
>  Components: Client / Job Submission
>Affects Versions: 1.10.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.11.0
>
>
> This is an umbrella issue that will help us track the progress of 
> [FLIP-85|https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451160#comment-17451160
 ] 

Junfan Zhang edited comment on FLINK-25099 at 11/30/21, 2:28 PM:
-

Just ignore setting the default.fs=flinkcluster. but preserve the others above 
config. 
And why not directly specifying the checkpoint path to 
hdfs://flinkcluster/x. As i know that the conf of default.fs is just for 
appending the path's scheme and authority, which not specified the whole path 
like /tmp/.

[~libra_816]


was (Author: zuston):
Just ignore setting the default.fs=flinkcluster. but preserve the others above 
config. 
And why not directly specifying the checkpoint path to 
hdfs://flinkcluster/x. As i know that the conf of default.fs is just for 
when not specified the whole path, like /tmp/.

[~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> 

[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451160#comment-17451160
 ] 

Junfan Zhang commented on FLINK-25099:
--

Just ignore setting the default.fs=flinkcluster. but preserve the others above 
config. 
And why not directly specifying the checkpoint path to 
hdfs://flinkcluster/x. As i know that the conf of default.fs is just for 
when not specified the whole path, like /tmp/.

[~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 

[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451123#comment-17451123
 ] 

Junfan Zhang commented on FLINK-25099:
--

I think in yours architecture, you should not reset the default.fs config, just 
use the yarn cluster's default.fs [~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> 

[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451071#comment-17451071
 ] 

Junfan Zhang commented on FLINK-25099:
--

The submission cluster is {{hdfsn21n159}}. but you set the 
default.fs=flinkcluster. When ContainerLocalizer, it will using the 
{{flinkcluster}} namespace in hdfsn21n159, but hdfsn21n159 have no 
{{flinkcluster}} ns config. [~libra_816]


> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
> Attachments: flink-chenqizhu-client-hdfsn21n163.log
>
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> 

[jira] [Commented] (FLINK-25099) flink on yarn Accessing two HDFS Clusters

2021-11-29 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450853#comment-17450853
 ] 

Junfan Zhang commented on FLINK-25099:
--

Could you make the log level debug and attach more logs ? [~libra_816]

> flink on yarn Accessing two HDFS Clusters
> -
>
> Key: FLINK-25099
> URL: https://issues.apache.org/jira/browse/FLINK-25099
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / State Backends
>Affects Versions: 1.13.3
> Environment: flink : 1.13.3
> hadoop : 3.3.0
>Reporter: chenqizhu
>Priority: Major
>
> Flink version 1.13 supports configuration of Hadoop properties in 
> flink-conf.yaml via flink.hadoop.*. There is A requirement to write 
> checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, 
> but this HDFS cluster is not the default HDFS in the flink client (called 
> cluster A by default). Yaml is configured with nameservices for cluster A and 
> cluster B, which is similar to HDFS federated mode.
> The configuration is as follows:
>  
> {code:java}
> flink.hadoop.dfs.nameservices: ACluster,BCluster
> flink.hadoop.fs.defaultFS: hdfs://BCluster
> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.:50070
> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xx:50070
> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xx:50070
> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xx:9000
> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.x:50070
> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> {code}
>  
> However, an error occurred during the startup of the job, which is reported 
> as follows:
> (change configuration items to A flink local client default HDFS cluster, the 
> operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / / ACluster)
> {noformat}
> Caused by: BCluster
> java.net.UnknownHostException: BCluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
> at 
> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:374)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:308)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> Is there a solution to the above problems? The pain 

[jira] [Created] (FLINK-24963) Remove the tail separator when outputting yarn queue names

2021-11-19 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-24963:


 Summary: Remove the tail separator when outputting yarn queue names
 Key: FLINK-24963
 URL: https://issues.apache.org/jira/browse/FLINK-24963
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end). Besides we dont set the 
{{pipeline.classpaths}} param conf.
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

Because these configuration will be written to the flink-conf.yaml and Flink 
jobmanager will read but fail to read the corresponding key's value as the warn 
log described.
h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}

  was:
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

Because these configuration will be written to the flink-conf.yaml and Flink 
jobmanager will read but fail to read the corresponding key's value as the warn 
log described.
h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  

[jira] [Commented] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429095#comment-17429095
 ] 

Junfan Zhang commented on FLINK-24555:
--

[~jark] Could you help review it? Thanks

> Incorrectly put empty list to flink configuration
> -
>
> Key: FLINK-24555
> URL: https://issues.apache.org/jira/browse/FLINK-24555
> Project: Flink
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> As I found some warn logs in our production flink jobs, like as follows(the 
> detail logs attached at the end)
> {code:java}
> 2021-10-14 17:46:41 WARN  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying 
> to split key and value in configuration file 
> /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
>  "pipeline.classpaths: "
> {code}
> I dig the flink code and found it put the empty list into configuration by 
> {{ConfigUtils.encodeCollectionToConfig}}.
> h2. How
> So it's better to ignore the empty collection to put in the configuration.
> Because these configuration will be written to the flink-conf.yaml and Flink 
> jobmanager will read but fail to read the corresponding key's value as the 
> warn log described.
> h2. Appendix
> {code:java}
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: internal.jobgraph-path, job.graph
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: 
> restart-strategy.failure-rate.max-failures-per-interval, 3
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: jobmanager.execution.failover-strategy, region
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: high-availability.cluster-id, 
> application_1633896099002_1043646
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: jobmanager.rpc.address, localhost
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: execution.runtime-mode, AUTOMATIC
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: security.kerberos.fetch.delegation-token, false
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: execution.savepoint.ignore-unclaimed-state, false
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: parallelism.default, 2
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2021-10-14 17:46:41 WARN  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying 
> to split key and value in configuration file 
> /data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
>  "pipeline.classpaths: "
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: restart-strategy.failure-rate.failure-rate-interval, 
> 1 d
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: yarn.application.name, 
> app_StreamEngine_Prod_jiandan_beat_rec_test_all
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: yarn.application.queue, talos.job_streaming
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: taskmanager.memory.process.size, 1728m
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

Because these configuration will be written to the flink-conf.yaml and Flink 
jobmanager will read but fail to read the corresponding key's value as the warn 
log described.
h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}

  was:
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

Because these configuration will be written to the flink-conf.yaml and Flink 
jobmanager will read but fail to read the corresponding key's value as the warn 
log described.
h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 

[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

Because these configuration will be written to the flink-conf.yaml and Flink 
jobmanager will read but fail to read the corresponding key's value as the warn 
log described.
h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}

  was:
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

 

h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 

[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why

As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}
I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.
h2. How

So it's better to ignore the empty collection to put in the configuration.

 

h2. Appendix
{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}

  was:
h2. Why
As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}

I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.

h2. How
So it's better to ignore the empty collection to put in the configuration.


{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 

[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why
As I found some warn logs in our production flink jobs, like as follows(the 
detail logs attached at the end)
{code:java}
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
{code}

I dig the flink code and found it put the empty list into configuration by 
{{ConfigUtils.encodeCollectionToConfig}}.

h2. How
So it's better to ignore the empty collection to put in the configuration.


{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}


  was:
h2. Why
As I found some warn logs in our production flink jobs, like as follows

{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: 

[jira] [Updated] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-24555:
-
Description: 
h2. Why
As I found some warn logs in our production flink jobs, like as follows

{code:java}
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: internal.jobgraph-path, job.graph
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.max-failures-per-interval, 3
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.execution.failover-strategy, region
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: high-availability.cluster-id, application_1633896099002_1043646
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: jobmanager.rpc.address, localhost
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.runtime-mode, AUTOMATIC
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: security.kerberos.fetch.delegation-token, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: execution.savepoint.ignore-unclaimed-state, false
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: parallelism.default, 2
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
2021-10-14 17:46:41 WARN  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(186) - Error while trying to 
split key and value in configuration file 
/data8/yarn/local/usercache/pizza/appcache/application_1633896099002_1043646/container_e429_1633896099002_1043646_01_01/flink-conf.yaml:11:
 "pipeline.classpaths: "
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: restart-strategy.failure-rate.failure-rate-interval, 1 d
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.name, app_StreamEngine_Prod_jiandan_beat_rec_test_all
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: yarn.application.queue, talos.job_streaming
2021-10-14 17:46:41 INFO  - [main] - 
org.apache.flink.configuration.GlobalConfiguration(213) - Loading configuration 
property: taskmanager.memory.process.size, 1728m

{code}


> Incorrectly put empty list to flink configuration
> -
>
> Key: FLINK-24555
> URL: https://issues.apache.org/jira/browse/FLINK-24555
> Project: Flink
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> h2. Why
> As I found some warn logs in our production flink jobs, like as follows
> {code:java}
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: internal.jobgraph-path, job.graph
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: 
> restart-strategy.failure-rate.max-failures-per-interval, 3
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: jobmanager.execution.failover-strategy, region
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: high-availability.cluster-id, 
> application_1633896099002_1043646
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: jobmanager.rpc.address, localhost
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: execution.runtime-mode, AUTOMATIC
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: security.kerberos.fetch.delegation-token, false
> 2021-10-14 17:46:41 INFO  - [main] - 
> org.apache.flink.configuration.GlobalConfiguration(213) - Loading 
> configuration property: execution.savepoint.ignore-unclaimed-state, false
> 2021-10-14 

[jira] [Created] (FLINK-24555) Incorrectly put empty list to flink configuration

2021-10-14 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-24555:


 Summary: Incorrectly put empty list to flink configuration
 Key: FLINK-24555
 URL: https://issues.apache.org/jira/browse/FLINK-24555
 Project: Flink
  Issue Type: Bug
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme

2021-09-01 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407934#comment-17407934
 ] 

Junfan Zhang commented on FLINK-23991:
--

Could you help assign ticket to me? [~lirui] [~jark]

> Specifying yarn.staging-dir fail when staging scheme is different from 
> default fs scheme
> 
>
> Key: FLINK-23991
> URL: https://issues.apache.org/jira/browse/FLINK-23991
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.2
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When the yarn.staging-dir path scheme is different from the default fs 
> scheme, the client will fail fast.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme

2021-08-26 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-23991:
-
Description: When the yarn.staging-dir path scheme is different from the 
default fs scheme, the client will fail fast.

> Specifying yarn.staging-dir fail when staging scheme is different from 
> default fs scheme
> 
>
> Key: FLINK-23991
> URL: https://issues.apache.org/jira/browse/FLINK-23991
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.2
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When the yarn.staging-dir path scheme is different from the default fs 
> scheme, the client will fail fast.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23991) Specifying yarn.staging-dir fail when staging scheme is different from default fs scheme

2021-08-26 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-23991:


 Summary: Specifying yarn.staging-dir fail when staging scheme is 
different from default fs scheme
 Key: FLINK-23991
 URL: https://issues.apache.org/jira/browse/FLINK-23991
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.13.2
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23004) Fix misleading log

2021-06-15 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-23004:


 Summary: Fix misleading log
 Key: FLINK-23004
 URL: https://issues.apache.org/jira/browse/FLINK-23004
 Project: Flink
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22919) Remove support for Hadoop1.x in HadoopInputFormatCommonBase.getCredentialsFromUGI

2021-06-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359850#comment-17359850
 ] 

Junfan Zhang commented on FLINK-22919:
--

[~lirui] Please review it. Thanks

> Remove support for Hadoop1.x in 
> HadoopInputFormatCommonBase.getCredentialsFromUGI
> -
>
> Key: FLINK-22919
> URL: https://issues.apache.org/jira/browse/FLINK-22919
> Project: Flink
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22919) Remove support for Hadoop1.x in HadoopInputFormatCommonBase.getCredentialsFromUGI

2021-06-08 Thread Junfan Zhang (Jira)
Junfan Zhang created FLINK-22919:


 Summary: Remove support for Hadoop1.x in 
HadoopInputFormatCommonBase.getCredentialsFromUGI
 Key: FLINK-22919
 URL: https://issues.apache.org/jira/browse/FLINK-22919
 Project: Flink
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22690) kerberos integration with flink,kerberos tokens will expire

2021-05-18 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346635#comment-17346635
 ] 

Junfan Zhang commented on FLINK-22690:
--

Please add more details info, such as submission mode(Yarn or K8s), with keytab 
or not and related submission command. [~kevinsun1000]

> kerberos integration with flink,kerberos  tokens will expire
> 
>
> Key: FLINK-22690
> URL: https://issues.apache.org/jira/browse/FLINK-22690
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Reporter: kevinsun
>Priority: Major
>
> flink on yarn job. flink sink hdfs,hive,hbase...  ,some times later,kerberos 
> tokens will expire.
> error: Failed to find any Kerberos tgt
> eg*:StreamingFileSink*   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-21232) Introduce pluggable Hadoop delegation token providers

2021-05-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345857#comment-17345857
 ] 

Junfan Zhang edited comment on FLINK-21232 at 5/17/21, 3:54 AM:


Hi [~jackwangcs] [~lirui]
Nice improvement!  Thanks for your work. [~jackwangcs] 
And any update on it? 
I am happy to apply this patch to our production environment for testing.


was (Author: zuston):
Nice improvement!  Thanks for your work. [~jackwangcs] 
And any update on it? [~lirui]
I am happy to apply this patch to our production environment for testing.

> Introduce pluggable Hadoop delegation token providers
> -
>
> Key: FLINK-21232
> URL: https://issues.apache.org/jira/browse/FLINK-21232
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common, Deployment / YARN
>Reporter: jackwangcs
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
>
> Introduce a pluggable delegation provider via SPI. 
> Delegation provider could be placed in connector related code and is more 
> extendable comparing using reflection way to obtain DTs.
> Email dicussion thread:
> [https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E]
>  
> [https://lists.apache.org/thread.html/r20d4be431ff2f6faff94129b5321a047fcbb0c71c8e092504cd91183%40%3Cdev.flink.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21232) Introduce pluggable Hadoop delegation token providers

2021-05-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345857#comment-17345857
 ] 

Junfan Zhang commented on FLINK-21232:
--

Nice improvement!  Thanks for your work. [~jackwangcs] 
And any update on it? [~lirui]
I am happy to apply this patch to our production environment for testing.

> Introduce pluggable Hadoop delegation token providers
> -
>
> Key: FLINK-21232
> URL: https://issues.apache.org/jira/browse/FLINK-21232
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common, Deployment / YARN
>Reporter: jackwangcs
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
>
> Introduce a pluggable delegation provider via SPI. 
> Delegation provider could be placed in connector related code and is more 
> extendable comparing using reflection way to obtain DTs.
> Email dicussion thread:
> [https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E]
>  
> [https://lists.apache.org/thread.html/r20d4be431ff2f6faff94129b5321a047fcbb0c71c8e092504cd91183%40%3Cdev.flink.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22329) Missing credentials in jobconf causes repeated authentication in Hive datasource

2021-05-14 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344497#comment-17344497
 ] 

Junfan Zhang commented on FLINK-22329:
--

Update it, please review it again. Thanks [~lirui]

> Missing credentials in jobconf causes repeated authentication in Hive 
> datasource
> 
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Related Flink code: 
> [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]
>  
> In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
> getSplits}} method. related hadoop code is 
> [here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426].
>  Simple code is as follows
> {code:java}
> // Hadoop FileInputFormat
> public InputSplit[] getSplits(JobConf job, int numSplits)
>   throws IOException {
>   StopWatch sw = new StopWatch().start();
>   FileStatus[] stats = listStatus(job);
>  
>   ..
> }
> protected FileStatus[] listStatus(JobConf job) throws IOException {
>   Path[] dirs = getInputPaths(job);
>   if (dirs.length == 0) {
> throw new IOException("No input paths specified in job");
>   }
>   // get tokens for all the required FileSystems..
>   TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
>   
>   // Whether we need to recursive look into the directory structure
>   ..
> }
> {code}
>  
> In {{listStatus}} method, it will obtain delegation tokens by calling  
> {{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will 
> give up to get delegation tokens when credentials in jobconf.
> So it's neccessary to inject current ugi credentials into jobconf.
>  
> Besides, when Flink support delegation tokens directly without keytab([refer 
> to this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
> {{TokenCache.obtainTokensForNamenodes}} will failed  without this patch 
> because of no corresponding credentials.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22329) Missing credentials in jobconf causes repeated authentication in Hive datasource

2021-05-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22329:
-
Summary: Missing credentials in jobconf causes repeated authentication in 
Hive datasource  (was: Missing crendentials in jobconf causes repeated 
authentication in Hive datasource)

> Missing credentials in jobconf causes repeated authentication in Hive 
> datasource
> 
>
> Key: FLINK-22329
> URL: https://issues.apache.org/jira/browse/FLINK-22329
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Related Flink code: 
> [https://github.com/apache/flink/blob/577113f0c339df844f2cc32b1d4a09d3da28085a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceFileEnumerator.java#L107]
>  
> In this {{getSplits}} method, it will call hadoop {{FileInputFormat's 
> getSplits}} method. related hadoop code is 
> [here|https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L426].
>  Simple code is as follows
> {code:java}
> // Hadoop FileInputFormat
> public InputSplit[] getSplits(JobConf job, int numSplits)
>   throws IOException {
>   StopWatch sw = new StopWatch().start();
>   FileStatus[] stats = listStatus(job);
>  
>   ..
> }
> protected FileStatus[] listStatus(JobConf job) throws IOException {
>   Path[] dirs = getInputPaths(job);
>   if (dirs.length == 0) {
> throw new IOException("No input paths specified in job");
>   }
>   // get tokens for all the required FileSystems..
>   TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, job);
>   
>   // Whether we need to recursive look into the directory structure
>   ..
> }
> {code}
>  
> In {{listStatus}} method, it will obtain delegation tokens by calling  
> {{TokenCache.obtainTokensForNamenodes}} method. Howerver this method will 
> give up to get delegation tokens when credentials in jobconf.
> So it's neccessary to inject current ugi credentials into jobconf.
>  
> Besides, when Flink support delegation tokens directly without keytab([refer 
> to this PR|https://issues.apache.org/jira/browse/FLINK-21700]), 
> {{TokenCache.obtainTokensForNamenodes}} will failed  without this patch 
> because of no corresponding credentials.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21700) Allow to disable fetching Hadoop delegation token on Yarn

2021-05-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344323#comment-17344323
 ] 

Junfan Zhang commented on FLINK-21700:
--

[~fly_in_gis] [~lirui] Could you help set status to resolved. Thanks~

> Allow to disable fetching Hadoop delegation token on Yarn
> -
>
> Key: FLINK-21700
> URL: https://issues.apache.org/jira/browse/FLINK-21700
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> h3. Why
> I want to support Flink Action on Oozie. 
>  As we know, Oozie will obtain HDFS/HBase delegation token before starting 
> Flink submitter cli.
>  Actually, Spark support disable fetching delegation token on Spark client, 
> [related Spark 
> doc|https://spark.apache.org/docs/latest/running-on-yarn.html#launching-your-application-with-apache-oozie].
>  
> So i think Flink should allow to disable fetching Hadoop delegation token on 
> Yarn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343786#comment-17343786
 ] 

Junfan Zhang commented on FLINK-22534:
--

Wow, thanks for your review. [~lirui].

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-11 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342391#comment-17342391
 ] 

Junfan Zhang commented on FLINK-22534:
--

[~lirui] Thanks for your reply.

Yes

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-10 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342268#comment-17342268
 ] 

Junfan Zhang commented on FLINK-22534:
--

Any ideas on it? [~lirui] [~mapohl]

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340761#comment-17340761
 ] 

Junfan Zhang edited comment on FLINK-22534 at 5/7/21, 11:55 AM:


[~mapohl]. Sorry for the late reply.

More details infos are added in description.

ping [~karmagyz] [~lirui] 


was (Author: zuston):
[~mapohl]. Sorry for the late reply.

More details infos are added in description.

ping [~karmagyz] [~lirui] [~wangyang]

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340761#comment-17340761
 ] 

Junfan Zhang commented on FLINK-22534:
--

[~mapohl]. Sorry for the late reply.

More details infos are added in description.

ping [~karmagyz] [~lirui] [~wangyang]

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code 
[HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
 and [Yarn 
Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, [refer 
to code 
here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

  !debug2.PNG!

  was:
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code 
[HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
 and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

  !debug2.PNG!


> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and [Yarn 
> Utils|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, [refer to code 
> here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L209].
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code 
[HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
 and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

  !debug2.PNG!

  was:
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

  !debug2.PNG!


> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code 
> [HadoopModule|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L101]
>  and Yarn Utils.
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, refer to code here.
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

  !debug2.PNG!

  was:
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

 


> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code HadoopModule and Yarn 
> Utils.
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, refer to code here.
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>   !debug2.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Attachment: debug2.PNG

> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential  alias, refer to Flink code HadoopModule and Yarn 
> Utils. 
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS federation mode, it will cause 
> the problem of overwriting the different delegation tokens with the same 
> identifier, refer to code here.
> h5. When does the same identifier delegation tokens appear?
> Hadoop HA delegation tokens will have the same identifier(Refer to 
> [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' 
> service name is different. So we can use service name as alias.
> The following figure from 
> [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the 
> identifier of HA delegation token is the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What

Set the Hadoop delegation token's service name as credential alias.
h4. Why

In current implementation, Flink will use delegation token's service name or 
identifer as credential alias, refer to Flink code HadoopModule and Yarn Utils.

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS HA mode, it will cause the problem 
of overwriting the different delegation tokens with the same identifier, refer 
to code here.
h5. When does the same identifier delegation tokens appear?

When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
identifier(Refer to HDFS-9276), but its' service name is different. So we can 
use service name as alias.

The following figure from HDFS-9276 can show that the identifier of HA 
delegation token is the same.

 

  was:
h4. What
Set the Hadoop delegation token's service name as credential alias.

h4. Why
In current implementation, Flink will use delegation token's service name or 
identifer as credential  alias, refer to Flink code HadoopModule and Yarn 
Utils. 

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS federation mode, it will cause the 
problem of overwriting the different delegation tokens with the same 
identifier, refer to code here.

h5. When does the same identifier delegation tokens appear?
Hadoop HA delegation tokens will have the same identifier(Refer to 
[HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' service 
name is different. So we can use service name as alias.

The following figure from 
[HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the 
identifier of HA delegation token is the same.





> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug2.PNG
>
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential alias, refer to Flink code HadoopModule and Yarn 
> Utils.
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS HA mode, it will cause the 
> problem of overwriting the different delegation tokens with the same 
> identifier, refer to code here.
> h5. When does the same identifier delegation tokens appear?
> When in HDFS HA mode, Hadoop HA delegation tokens will have the same 
> identifier(Refer to HDFS-9276), but its' service name is different. So we can 
> use service name as alias.
> The following figure from HDFS-9276 can show that the identifier of HA 
> delegation token is the same.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What
Set the Hadoop delegation token's service name as credential alias.

h4. Why
In current implementation, Flink will use delegation token's service name or 
identifer as credential  alias, refer to Flink code HadoopModule and Yarn 
Utils. 

Firstly, I think we could use the same way to set credential alias, like 
delegation token's service name. It will be more clear.

Secondly, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS federation mode, it will cause the 
problem of overwriting the different delegation tokens with the same 
identifier, refer to code here.

h5. When does the same identifier delegation tokens appear?
Hadoop HA delegation tokens will have the same identifier(Refer to 
[HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' service 
name is different. So we can use service name as alias.

The following figure from 
[HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the 
identifier of HA delegation token is the same.




  was:
h4. What
Set the Hadoop delegation token's service name as credential alias.

h4. Why
In current implementation, Flink will use delegation token's service name or 
identifer as credential  alias, refer to code. 
I think we could use the same way to set credential alias, like delegation 
token's service name.

Besides, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS federation mode, it will cause 


> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential  alias, refer to Flink code HadoopModule and Yarn 
> Utils. 
> Firstly, I think we could use the same way to set credential alias, like 
> delegation token's service name. It will be more clear.
> Secondly, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS federation mode, it will cause 
> the problem of overwriting the different delegation tokens with the same 
> identifier, refer to code here.
> h5. When does the same identifier delegation tokens appear?
> Hadoop HA delegation tokens will have the same identifier(Refer to 
> [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276]), but its' 
> service name is different. So we can use service name as alias.
> The following figure from 
> [HDFS-9276|https://issues.apache.org/jira/browse/HDFS-9276] can show that the 
> identifier of HA delegation token is the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22534) Set delegation token's service name as credential alias

2021-05-07 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-22534:
-
Description: 
h4. What
Set the Hadoop delegation token's service name as credential alias.

h4. Why
In current implementation, Flink will use delegation token's service name or 
identifer as credential  alias, refer to code. 
I think we could use the same way to set credential alias, like delegation 
token's service name.

Besides, when fetching HDFS delegation token and then inject all tokens to 
current UserGroupInformation in Hadoop HDFS federation mode, it will cause 

  was:
h4. What
Set the Hadoop delegation token's service name as credential alias.

h4. Why
In current implementation, Flink will use delegation token's service name or 
identifer as credential  alias, refer to code. 
I think we could use the same way to set credential alias, like delegation 
token's service name.

Besides, in 


> Set delegation token's service name as credential alias
> ---
>
> Key: FLINK-22534
> URL: https://issues.apache.org/jira/browse/FLINK-22534
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hadoop Compatibility
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> h4. What
> Set the Hadoop delegation token's service name as credential alias.
> h4. Why
> In current implementation, Flink will use delegation token's service name or 
> identifer as credential  alias, refer to code. 
> I think we could use the same way to set credential alias, like delegation 
> token's service name.
> Besides, when fetching HDFS delegation token and then inject all tokens to 
> current UserGroupInformation in Hadoop HDFS federation mode, it will cause 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >