[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-06-30 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@tzulitai ,ok,Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-06-30 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3776
  
@Rucongzhang ok, understood. I agree that in general the current 
`AbstractYarnClusterDescriptor` has poor separation of concerns, as is a bit 
hard to write contained tests. We should remember to add this perhaps when 
refactoring it for FLIP-6.

I'll give this a test run on YARN and then merge it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-06-20 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@tzulitai ,When I have fixed the problem, I want to write a IT case. But, 
in the Yarn IT case:
1、 It uses the yarn mini cluster, which is for testing.I do not know 
whether it is using HDFS Delegation token or not.
2、And what's more, the HDFS Delegation Token is used by yarn node 
manager. It is difficult to judge whether to have this token or not in yarn 
client. The token is set into yarn application context,but the yarn client does 
not have API to get yarn application context.
How do you think ? Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-06-20 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3776
  
@Rucongzhang @EronWright thanks for the explanations, the changes looks 
good to me then.
I'll rebase this, perform some tests and then merge this if all goes well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-06-08 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen ,@tzulitai , please review the code ,if it is ok. Please help 
me to merge the PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-27 Thread EronWright
Github user EronWright commented on the issue:

https://github.com/apache/flink/pull/3776
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-26 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@EronWright ,thank you very much . yes you are right. But about solutin 
1. We need only add the HDFS delegation token in yarn container context , yarn 
client not need refresh the token, yarn resource manager can refresh it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-26 Thread EronWright
Github user EronWright commented on the issue:

https://github.com/apache/flink/pull/3776
  
@Rucongzhang thanks for the contribution.  I think I understand the problem 
and your solution, which I will recap.  I also found YARN-2704 to be useful 
background.

*Problem*:
1. YARN log aggregation depends on an HDFS delegation token, which it 
obtains from container token storage not from the UGI.  In keytab mode, the 
Flink client doesn't upload any delegation tokens, causing log aggregation to 
fail.
2. The Flink cluster doesn't renew delegation tokens.  Note: Flink does 
renew _Kerberos tickets_ using the keytab.
3. When the UGI contains both a delegation token and a Kerberos ticket, the 
delegation token is preferred.   After expiration, Flink does not fallback to 
using the ticket.

*Solution*:
1. Change Flink client to upload delegation tokens.  Addresses problem 1.
2 Change Flink cluster to filter out the HDFS delegation token from the 
tokens loaded from storage when populating the UGI.  Addresses problem 3.
3 Change JM to propagate its stored tokens to the TM, rather than the 
tokens from the UGI (which were filtered in (2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-25 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@tzulitai , you are right. There are two problems in yarn cluster mode:
1、when we use the keytab,we do not set the HDFS delegation token to 
yarn container context, but yarn need.
2、when we user keytab, and also get  HDFS delegation token. The UGI 
contains both, but UGI use token first to communication with HDFS. The default 
expire time of HDFS delegation token is 7 days. Flink does not refresh the 
token.
So, I resolve this problem by following solution:
1、we user keytab and also get HDFS delegation token.  The token is set to 
yarn container context. And the UGI only use keytab. 
Maybe the best solution I think the AM need refresh the token like spark. 
Maybe we can create a FILP to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-24 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3776
  
Would like to follow up on this PR.

@Rucongzhang can you confirm my understanding of the problem?:
So, the root cause of the issue is that when both token AND keytab is 
configured, we're incorrectly using the token for authentication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-19 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , we resolve this problem. We only add the HDFS delegation 
token in JM、TM yarn container context. And when we configuration the keytab, 
the JM、TM use the keytab to authentication with HDFS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-08 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , yes, when configuration keytab,  the hadoop code 
automatically renew delegation tokens .But when token and keytab are available, 
the hadoop use the token first, but the keytab.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-08 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3776
  
@Rucongzhang My understanding was that the Hadoop code should automatically 
renew delegation tokens when a Kerberos Keytab is present. @EronWright Can you 
comment on that assumption?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-07 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
After resolving this problem, we find another problem, when we configure 
the keytab 、principal, and  add the HDFS delegation token,  the JM 、TM also 
use this token, but not keytab when communication with HDFS. When token is 
expired, no one in flink to refresh the token.  
But the purpose of adding this token , which is only used for  yarn node 
manager. We now is resolving this problem. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-04 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , I can run the yarn IT case! Thanks very much. But, in the 
Yarn IT case:
1、 It uses the yarn mini cluster, which is for testing.I do not know 
whether it is  using HDFS Delegation token or not.
2、And what's more, the HDFS Delegation Token is used by yarn node 
manager. It is difficult to judge whether to have this token or not in yarn 
client. The token is set into yarn application context,but the yarn client does 
not have API to get yarn application context.
How do you think ? Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-03 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , you mean the reason of this problem is the version of 
hadoop? The version of 2.7.2 is ok? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-03 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3776
  
The Yarn tests cannot with with Hadoop 2.3.0, which is the default version 
of master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-05-03 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , when I run the YARNHighAvailabilityITCase, occur the 
following error. I set the flie /etc/hosts as following:
9.96.101.32 9-96-101-32 
127.0.0.1 localhost

The hadoop version I use is master default version 2.7.0. Do you know how 
to fix the following error?Thanks a lot in advance!

Test testMultipleAMKill(org.apache.flink.yarn.YARNHighAvailabilityITCase) 
failed with:
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: "9-96-101-32":8032; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
at org.apache.hadoop.ipc.Client$Connection.(Client.java:400)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1448)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy76.getApplications(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy77.getApplications(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:285)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:262)
at 
org.apache.flink.yarn.YarnTestBase.checkClusterEmpty(YarnTestBase.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 

[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-04-28 Thread Rucongzhang
Github user Rucongzhang commented on the issue:

https://github.com/apache/flink/pull/3776
  
@StephanEwen , yes, i agree with you! I will see how to add the Yarn IT 
case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3776: [FLINK-6376]when deploy flink cluster on the yarn, it is ...

2017-04-26 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3776
  
Looks like a good fix.

I think it would be good to add secure yarn tests (IT Cases) that test this 
behavior. Otherwise it may soon be accidentally broken again...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---