Re: Security Questions

Marcelo Vanzin Mon, 13 Aug 2018 10:50:44 -0700

Hi Harun,

It's been a while since I've actually run Livy, even though I wrote a
lot of the security-related code. So let me try to clarify a couple of
your questions.

On Thu, Aug 2, 2018 at 1:30 AM, Harun Zengin <[email protected]> wrote:
> My question would then be, how secure is livy? Users can inject custom
> code to run on livy, but this gives them the ability to access the
> filesystem on the host the livy server resides in.

That's correct if the session is running as the same user and on the
same host as the server. And that's the reason why the default
deployment mode for Livy sessions is "yarn cluster" mode.

That means the Livy session will be started elsewhere on the YARN
cluster. When security is also enabled in YARN, that session will be
started as the requesting user, and not as the Livy server user, so
that even supports session isolation.

I don't think it's possible to secure any deployment that runs Spark
in client mode, exactly because of the limitations with local OS users
that you mention.

> And in the case of using HDFS with active directory to secure the
> datasystem, so that users need to specify a kerberos key to access their
> files, how could I manage multiple principals in one server, to get this
> working?

In a proper secure Livy deployment you'd enable proxy user support;
that means Livy would be starting the session on YARN as the
requesting user, not the Livy server user. So both the OS processes
and the kerberos credentials would identify the user, not Livy, in the
Spark processes, and those would have no access to any Livy-owned data
(or data owned by other users, for that matter).

It's been a while, so there may be some gaps yet in all this, but I
believe at least the basic functionality is there to have basic
security in sessions.

-- 
Marcelo

Re: Security Questions

Reply via email to