Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/20891
@mgaido91 I really think it's wrong to try to draw a parallel to something
like Oracle. Oracle is completely unlike Spark - it's a self-contained system
where you don't have any outside visibility except through what Oracle gives
you. Spark relies on a bunch of other systems to do things like run processes
on a cluster, store data, etc. And the things you're trying to hide here are
all visible in those different layers.
Even with Oracle, you could check whether people are running certain tools
on client machines and say "hey, user foo is connecting to Oracle". You may not
know which DB they're connecting to, and you definitely won't know what it is
that they're doing. But you also don't know that with Spark.
To go through your examples:
- user names *are not sensitive information*. You can see them in
/etc/passwd. You can see them by listing files on your fs - *even if you don't
have read permissions on the file itself*, or reading ACLs for those files. If
you want two companies to not see each other, you deploy different clusters
(or, in this case, different SHS reading from different event log directories,
with different authentication for each).
- The app name is arguable. But it's always been public in Spark, so people
shouldn't be using that for anything sensitive. If they are, well, they already
have a security problem right there, today, and your patch won't fix it, since
that data has already leaked. And better hope that app name was not set in any
command line, since those are visible to anyone who can log into the same
machine.
- Who's using the cluster. Again, not sensitive information.
If you want to draw a parallel to something like Oracle, you should be
looking at the thrift server. That one is supposed to be a multi-user service
that shouldn't leak information to users other than the one that submitted a
specific job. I have no idea whether that is the case today, but if it's not,
it would be a completely different change from what you have here.
If you still think this is important, at the very least this needs to be
opt-in. But I'm still very skeptical about the need for this at all.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]