Re: Running continuously on yarn with kerberos

Maximilian Michels Thu, 05 Nov 2015 05:58:25 -0800

Thank you for looking into the problem, Niels. Let us know if you need
anything. We would be happy to merge a pull request once you have verified
the fix.


On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <ni...@basjes.nl> wrote:

> I created https://issues.apache.org/jira/browse/FLINK-2977
>
> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hi Niels,
>> thank you for analyzing the issue so properly. I agree with you. It seems
>> that HDFS and HBase are using their own tokes which we need to transfer
>> from the client to the YARN containers. We should be able to port the fix
>> from Spark (which they got from Storm) into our YARN client.
>> I think we would add this in org.apache.flink.yarn.Utils#setTokensFor().
>>
>> Do you want to implement and verify the fix yourself? If you are to busy
>> at the moment, we can also discuss how we share the work (I'm implementing
>> it, you test the fix)
>>
>>
>> Robert
>>
>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>
>>> Update on the status so far.... I suspect I found a problem in a secure
>>> setup.
>>>
>>> I have created a very simple Flink topology consisting of a streaming
>>> Source (the outputs the timestamp a few times per second) and a Sink (that
>>> puts that timestamp into a single record in HBase).
>>> Running this on a non-secure Yarn cluster works fine.
>>>
>>> To run it on a secured Yarn cluster my main routine now looks like this:
>>>
>>> public static void main(String[] args) throws Exception {
>>>     System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
>>>     UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", 
>>> "/home/nbasjes/.krb/nbasjes.keytab");
>>>
>>>     final StreamExecutionEnvironment env = 
>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>     env.setParallelism(1);
>>>
>>>     DataStream<String> stream = env.addSource(new TimerTicksSource());
>>>     stream.addSink(new SetHBaseRowSink());
>>>     env.execute("Long running Flink application");
>>> }
>>>
>>> When I run this
>>>      flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096
>>> ./kerberos-1.0-SNAPSHOT.jar
>>>
>>> I see after the startup messages:
>>>
>>> 17:13:24,466 INFO  org.apache.hadoop.security.UserGroupInformation
>>>         - Login successful for user nbas...@xxxxxx.net using keytab
>>> file /home/nbasjes/.krb/nbasjes.keytab
>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING.
>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to
>>> SCHEDULED
>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to
>>> DEPLOYING
>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to
>>> RUNNING
>>>
>>> Which looks good.
>>>
>>> However ... no data goes into HBase.
>>> After some digging I found this error in the task managers log:
>>>
>>> 17:13:42,677 WARN  org.apache.hadoop.hbase.ipc.RpcClient                    
>>>      - Exception encountered while connecting to the server : 
>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>>> GSSException: No valid credentials provided (Mechanism level: Failed to 
>>> find any Kerberos tgt)]
>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient                    
>>>      - SASL authentication failed. The most likely cause is missing or 
>>> invalid credentials. Consider 'kinit'.
>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>>> GSSException: No valid credentials provided (Mechanism level: Failed to 
>>> find any Kerberos tgt)]
>>>     at 
>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>>     at 
>>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
>>>     at 
>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815)
>>>     at 
>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>>>
>>>
>>> First starting a yarn-session and then loading my job gives the same
>>> error.
>>>
>>> My best guess at this point is that Flink needs the same fix as
>>> described here:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-6918   (
>>> https://github.com/apache/spark/pull/5586 )
>>>
>>> What do you guys think?
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <m...@apache.org>
>>> wrote:
>>>
>>>> Hi Niels,
>>>>
>>>> You're welcome. Some more information on how this would be configured:
>>>>
>>>> In the kdc.conf, there are two variables:
>>>>
>>>>         max_life = 2h 0m 0s
>>>>         max_renewable_life = 7d 0h 0m 0s
>>>>
>>>> max_life is the maximum life of the current ticket. However, it may be
>>>> renewed up to a time span of max_renewable_life from the first ticket issue
>>>> on. This means that from the first ticket issue, new tickets may be
>>>> requested for one week. Each renewed ticket has a life time of max_life (2
>>>> hours in this case).
>>>>
>>>> Please let us know about any difficulties with long-running streaming
>>>> application and Kerberos.
>>>>
>>>> Best regards,
>>>> Max
>>>>
>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for your feedback.
>>>>> So I guess I'll have to talk to the security guys about having special
>>>>> kerberos ticket expiry times for these types of jobs.
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <m...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Niels,
>>>>>>
>>>>>> Thank you for your question. Flink relies entirely on the Kerberos
>>>>>> support of Hadoop. So your question could also be rephrased to "Does
>>>>>> Hadoop support long-term authentication using Kerberos?". And the
>>>>>> answer is: Yes!
>>>>>>
>>>>>> While Hadoop uses Kerberos tickets to authenticate users with services
>>>>>> initially, the authentication process continues differently
>>>>>> afterwards. Instead of saving the ticket to authenticate on a later
>>>>>> access, Hadoop creates its own security tockens (DelegationToken) that
>>>>>> it passes around. These are authenticated to Kerberos periodically. To
>>>>>> my knowledge, the tokens have a life span identical to the Kerberos
>>>>>> ticket maximum life span. So be sure to set the maximum life span very
>>>>>> high for long streaming jobs. The renewal time, on the other hand, is
>>>>>> not important because Hadoop abstracts this away using its own
>>>>>> security tockens.
>>>>>>
>>>>>> I'm afraid there is not Kerberos how-to yet. If you are on Yarn, then
>>>>>> it is sufficient to authenticate the client with Kerberos. On a Flink
>>>>>> standalone cluster you need to ensure that, initially, all nodes are
>>>>>> authenticated with Kerberos using the kinit tool.
>>>>>>
>>>>>> Feel free to ask if you have more questions and let us know about any
>>>>>> difficulties.
>>>>>>
>>>>>> Best regards,
>>>>>> Max
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <ni...@basjes.nl>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I want to write a long running (i.e. never stop it) streaming flink
>>>>>> > application on a kerberos secured Hadoop/Yarn cluster. My
>>>>>> application needs
>>>>>> > to do things with files on HDFS and HBase tables on that cluster so
>>>>>> having
>>>>>> > the correct kerberos tickets is very important. The stream is to be
>>>>>> ingested
>>>>>> > from Kafka.
>>>>>> >
>>>>>> > One of the things with Kerberos is that the tickets expire after a
>>>>>> > predetermined time. My knowledge about kerberos is very limited so
>>>>>> I hope
>>>>>> > you guys can help me.
>>>>>> >
>>>>>> > My question is actually quite simple: Is there an howto somewhere
>>>>>> on how to
>>>>>> > correctly run a long running flink application with kerberos that
>>>>>> includes a
>>>>>> > solution for the kerberos ticket timeout  ?
>>>>>> >
>>>>>> > Thanks
>>>>>> >
>>>>>> > Niels Basjes
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: Running continuously on yarn with kerberos

Reply via email to