Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos

Cheers,
Max

On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
<stefano.bagh...@radicalbit.io> wrote:
> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
>
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> <stefano.bagh...@radicalbit.io> wrote:
>>
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>>
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>> <stefano.bagh...@radicalbit.io> wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>>
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>>
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>>
>>> and then running the following command:
>>>
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>>
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>>
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>>
>>> Thank you so much in advance.
>>>
>>> --
>>> BR,
>>> Stefano Baghino
>>>
>>> Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit

Reply via email to