Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
Ok, thank you for the very detailed explanation!

On Sun, Mar 6, 2016 at 10:02 PM, Maximilian Michels  wrote:

> Hi Stefano,
>
> That is currently a limitation of the Kerberos implementation. The
> Kerberos authentication is performed only once the Flink cluster is
> brought up. The Yarn session is then tight to a particular user's
> ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
> run long-running jobs because there is a bug in the Kerberos client
> that may let the ticket expire.
>
> The workaround you already mentioned is to use a per-job Yarn cluster.
> There is currently no plan to delegate the user token per job but we
> could certainly think about implementing this in the future.
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos
>
> Cheers,
> Max
>
> On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
>  wrote:
> > One last note: initially I tried to run the session as the same OS user,
> > running kdestroy and then kinit with the other user, having this error.
> > Trying to run the job in a different OS session, authenticating with
> > Kerberos as the user who should run the job, I can't connect to the
> > JobManager. I've added a second log with this error to the gist.
> >
> > On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> >  wrote:
> >>
> >> In the initial description, I meant "I'm trying to access a private
> folder
> >> of the latter", so not the service account. Sorry for the mistake.
> >>
> >> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
> >>  wrote:
> >>>
> >>> Hello everybody,
> >>>
> >>> I'm running some tests on how Flink as a long-running YARN session
> >>> handles security with Kerberos. In particular, I'm running a test
> where I
> >>> run Flink on YARN with a service account and then deploy a job via CLI
> as
> >>> another user; in the job I'm trying to access a private folder of the
> former
> >>> on HDFS but the job fails due to permission issues (the user running
> the job
> >>> is actually the one who ran Flink on YARN in the first place — the
> service
> >>> account).
> >>>
> >>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
> >>>
> >>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
> >>>
> >>> and then running the following command:
> >>>
> >>> bin/flink run examples/batch/WordCount.jar \
> >>> --input hdfs:///user/stefano.baghino/hamlet.txt \
> >>> --output hdfs:///user/stefano.baghino/hamlet.out
> >>>
> >>> Here are the logs:
> >>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
> >>>
> >>> It looks like the YARN session is acting as a proxy for the user
> instead
> >>> of receiving a delegation. Is there a way to change this behavior? Is
> this
> >>> by design? Is there an interest in implementing the delegation (if
> it's not
> >>> already implemented)? Otherwise, is there a workaround, apart from
> running
> >>> one-off jobs on YARN?
> >>>
> >>> Thank you so much in advance.
> >>>
> >>> --
> >>> BR,
> >>> Stefano Baghino
> >>>
> >>> Software Engineer @ Radicalbit
> >>
> >>
> >>
> >>
> >> --
> >> BR,
> >> Stefano Baghino
> >>
> >> Software Engineer @ Radicalbit
> >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Maximilian Michels
Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos

Cheers,
Max

On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
 wrote:
> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
>
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
>  wrote:
>>
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>>
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>>  wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>>
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>>
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>>
>>> and then running the following command:
>>>
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>>
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>>
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>>
>>> Thank you so much in advance.
>>>
>>> --
>>> BR,
>>> Stefano Baghino
>>>
>>> Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
One last note: initially I tried to run the session as the same OS user,
running kdestroy and then kinit with the other user, having this error.
Trying to run the job in a different OS session, authenticating with
Kerberos as the user who should run the job, I can't connect to the
JobManager. I've added a second log with this error to the gist
.

On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> In the initial description, I meant "I'm trying to access a private
> folder of the latter", so not the service account. Sorry for the mistake.
>
> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <
> stefano.bagh...@radicalbit.io> wrote:
>
>> Hello everybody,
>>
>> I'm running some tests on how Flink as a long-running YARN session
>> handles security with Kerberos. In particular, I'm running a test where I
>> run Flink on YARN with a service account and then deploy a job via CLI as
>> another user; in the job I'm trying to access a private folder of the
>> former on HDFS but the job fails due to permission issues (the user running
>> the job is actually the one who ran Flink on YARN in the first place — the
>> service account).
>>
>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>
>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>
>> and then running the following command:
>>
>> bin/flink run examples/batch/WordCount.jar \
>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>> --output hdfs:///user/stefano.baghino/hamlet.out
>>
>> Here are the logs:
>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>
>> It looks like the YARN session is acting as a proxy for the user instead
>> of receiving a delegation. Is there a way to change this behavior? Is this
>> by design? Is there an interest in implementing the delegation (if it's not
>> already implemented)? Otherwise, is there a workaround, apart from running
>> one-off jobs on YARN?
>>
>> Thank you so much in advance.
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
In the initial description, I meant "I'm trying to access a private folder
of the latter", so not the service account. Sorry for the mistake.

On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> Hello everybody,
>
> I'm running some tests on how Flink as a long-running YARN session handles
> security with Kerberos. In particular, I'm running a test where I run Flink
> on YARN with a service account and then deploy a job via CLI as another
> user; in the job I'm trying to access a private folder of the former on
> HDFS but the job fails due to permission issues (the user running the job
> is actually the one who ran Flink on YARN in the first place — the service
> account).
>
> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>
> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>
> and then running the following command:
>
> bin/flink run examples/batch/WordCount.jar \
> --input hdfs:///user/stefano.baghino/hamlet.txt \
> --output hdfs:///user/stefano.baghino/hamlet.out
>
> Here are the logs:
> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>
> It looks like the YARN session is acting as a proxy for the user instead
> of receiving a delegation. Is there a way to change this behavior? Is this
> by design? Is there an interest in implementing the delegation (if it's not
> already implemented)? Otherwise, is there a workaround, apart from running
> one-off jobs on YARN?
>
> Thank you so much in advance.
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles
security with Kerberos. In particular, I'm running a test where I run Flink
on YARN with a service account and then deploy a job via CLI as another
user; in the job I'm trying to access a private folder of the former on
HDFS but the job fails due to permission issues (the user running the job
is actually the one who ran Flink on YARN in the first place — the service
account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out

Here are the logs:
https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78

It looks like the YARN session is acting as a proxy for the user instead of
receiving a delegation. Is there a way to change this behavior? Is this by
design? Is there an interest in implementing the delegation (if it's not
already implemented)? Otherwise, is there a workaround, apart from running
one-off jobs on YARN?

Thank you so much in advance.

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit