[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @tzulitai ,ok,Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3776 @Rucongzhang ok, understood. I agree that in general the current `AbstractYarnClusterDescriptor` has poor separation of concerns, as is a bit hard to write contained tests. We should remember to add this perhaps when refactoring it for FLIP-6. I'll give this a test run on YARN and then merge it :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @tzulitai ï¼When I have fixed the problem, I want to write a IT case. But, in the Yarn IT case: 1ã It uses the yarn mini cluster, which is for testing.I do not know whether it is using HDFS Delegation token or not. 2ãAnd what's more, the HDFS Delegation Token is used by yarn node manager. It is difficult to judge whether to have this token or not in yarn client. The token is set into yarn application context,but the yarn client does not have API to get yarn application context. How do you think ? Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3776 @Rucongzhang @EronWright thanks for the explanations, the changes looks good to me then. I'll rebase this, perform some tests and then merge this if all goes well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen ,@tzulitai , please review the code ,if it is ok. Please help me to merge the PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user EronWright commented on the issue: https://github.com/apache/flink/pull/3776 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @EronWright ï¼thank you very much . yes you are right. But about solutin 1. We need only add the HDFS delegation token in yarn container context , yarn client not need refresh the token, yarn resource manager can refresh it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user EronWright commented on the issue: https://github.com/apache/flink/pull/3776 @Rucongzhang thanks for the contribution. I think I understand the problem and your solution, which I will recap. I also found YARN-2704 to be useful background. *Problem*: 1. YARN log aggregation depends on an HDFS delegation token, which it obtains from container token storage not from the UGI. In keytab mode, the Flink client doesn't upload any delegation tokens, causing log aggregation to fail. 2. The Flink cluster doesn't renew delegation tokens. Note: Flink does renew _Kerberos tickets_ using the keytab. 3. When the UGI contains both a delegation token and a Kerberos ticket, the delegation token is preferred. After expiration, Flink does not fallback to using the ticket. *Solution*: 1. Change Flink client to upload delegation tokens. Addresses problem 1. 2 Change Flink cluster to filter out the HDFS delegation token from the tokens loaded from storage when populating the UGI. Addresses problem 3. 3 Change JM to propagate its stored tokens to the TM, rather than the tokens from the UGI (which were filtered in (2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @tzulitai , you are right. There are two problems in yarn cluster mode: 1ãwhen we use the keytabï¼we do not set the HDFS delegation token to yarn container context, but yarn need. 2ãwhen we user keytab, and also get HDFS delegation token. The UGI contains both, but UGI use token first to communication with HDFS. The default expire time of HDFS delegation token is 7 days. Flink does not refresh the token. So, I resolve this problem by following solution: 1ãwe user keytab and also get HDFS delegation token. The token is set to yarn container context. And the UGI only use keytab. Maybe the best solution I think the AM need refresh the token like spark. Maybe we can create a FILP to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3776 Would like to follow up on this PR. @Rucongzhang can you confirm my understanding of the problem?: So, the root cause of the issue is that when both token AND keytab is configured, we're incorrectly using the token for authentication? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , we resolve this problem. We only add the HDFS delegation token in JMãTM yarn container context. And when we configuration the keytab, the JMãTM use the keytab to authentication with HDFS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , yes, when configuration keytab, the hadoop code automatically renew delegation tokens .But when token and keytab are available, the hadoop use the token first, but the keytab. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3776 @Rucongzhang My understanding was that the Hadoop code should automatically renew delegation tokens when a Kerberos Keytab is present. @EronWright Can you comment on that assumption? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 After resolving this problem, we find another problem, when we configure the keytab ãprincipal, and add the HDFS delegation token, the JM ãTM also use this token, but not keytab when communication with HDFS. When token is expired, no one in flink to refresh the token. But the purpose of adding this token , which is only used for yarn node manager. We now is resolving this problem. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , I can run the yarn IT case! Thanks very much. But, in the Yarn IT case: 1ã It uses the yarn mini cluster, which is for testing.I do not know whether it is using HDFS Delegation token or not. 2ãAnd what's more, the HDFS Delegation Token is used by yarn node manager. It is difficult to judge whether to have this token or not in yarn client. The token is set into yarn application context,but the yarn client does not have API to get yarn application context. How do you think ? Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , you mean the reason of this problem is the version of hadoop? The version of 2.7.2 is ok? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3776 The Yarn tests cannot with with Hadoop 2.3.0, which is the default version of master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , when I run the YARNHighAvailabilityITCase, occur the following error. I set the flie /etc/hosts as following: 9.96.101.32 9-96-101-32 127.0.0.1 localhost The hadoop version I use is master default version 2.7.0. Do you know how to fix the following error?Thanks a lot in advance! Test testMultipleAMKill(org.apache.flink.yarn.YARNHighAvailabilityITCase) failed with: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "9-96-101-32":8032; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742) at org.apache.hadoop.ipc.Client$Connection.(Client.java:400) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1448) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy76.getApplications(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy77.getApplications(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:285) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:262) at org.apache.flink.yarn.YarnTestBase.checkClusterEmpty(YarnTestBase.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user Rucongzhang commented on the issue: https://github.com/apache/flink/pull/3776 @StephanEwen , yes, i agree with you! I will see how to add the Yarn IT case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3776 Looks like a good fix. I think it would be good to add secure yarn tests (IT Cases) that test this behavior. Otherwise it may soon be accidentally broken again... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---