Hi Harun, It's been a while since I've actually run Livy, even though I wrote a lot of the security-related code. So let me try to clarify a couple of your questions.
On Thu, Aug 2, 2018 at 1:30 AM, Harun Zengin <[email protected]> wrote: > My question would then be, how secure is livy? Users can inject custom > code to run on livy, but this gives them the ability to access the > filesystem on the host the livy server resides in. That's correct if the session is running as the same user and on the same host as the server. And that's the reason why the default deployment mode for Livy sessions is "yarn cluster" mode. That means the Livy session will be started elsewhere on the YARN cluster. When security is also enabled in YARN, that session will be started as the requesting user, and not as the Livy server user, so that even supports session isolation. I don't think it's possible to secure any deployment that runs Spark in client mode, exactly because of the limitations with local OS users that you mention. > And in the case of using HDFS with active directory to secure the > datasystem, so that users need to specify a kerberos key to access their > files, how could I manage multiple principals in one server, to get this > working? In a proper secure Livy deployment you'd enable proxy user support; that means Livy would be starting the session on YARN as the requesting user, not the Livy server user. So both the OS processes and the kerberos credentials would identify the user, not Livy, in the Spark processes, and those would have no access to any Livy-owned data (or data owned by other users, for that matter). It's been a while, so there may be some gaps yet in all this, but I believe at least the basic functionality is there to have basic security in sessions. -- Marcelo
