Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles
security with Kerberos. In particular, I'm running a test where I run Flink
on YARN with a service account and then deploy a job via CLI as another
user; in the job I'm trying to access a private folder of the former on
HDFS but the job fails due to permission issues (the user running the job
is actually the one who ran Flink on YARN in the first place — the service
account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out

Here are the logs:
https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78

It looks like the YARN session is acting as a proxy for the user instead of
receiving a delegation. Is there a way to change this behavior? Is this by
design? Is there an interest in implementing the delegation (if it's not
already implemented)? Otherwise, is there a workaround, apart from running
one-off jobs on YARN?

Thank you so much in advance.

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to