Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20742#discussion_r175170426
  
    --- Diff: docs/security.md ---
    @@ -3,47 +3,291 @@ layout: global
     displayTitle: Spark Security
     title: Security
     ---
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
     
    -Spark currently supports authentication via a shared secret. 
Authentication can be configured to be on via the `spark.authenticate` 
configuration parameter. This parameter controls whether the Spark 
communication protocols do authentication using the shared secret. This 
authentication is a basic handshake to make sure both sides have the same 
shared secret and are allowed to communicate. If the shared secret is not 
identical they will not be allowed to communicate. The shared secret is created 
as follows:
    +# Spark RPC
     
    -* For Spark on [YARN](running-on-yarn.html) and local deployments, 
configuring `spark.authenticate` to `true` will automatically handle generating 
and distributing the shared secret. Each application will use a unique shared 
secret.
    -* For other types of Spark deployments, the Spark parameter 
`spark.authenticate.secret` should be configured on each of the nodes. This 
secret will be used by all the Master/Workers and applications.
    +## Authentication
     
    -## Web UI
    +Spark currently supports authentication for RPC channels using a shared 
secret. Authentication can
    +be turned on by setting the `spark.authenticate` configuration parameter.
     
    -The Spark UI can be secured by using [javax servlet 
filters](http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html) via the 
`spark.ui.filters` setting
    -and by using [https/SSL](http://en.wikipedia.org/wiki/HTTPS) via [SSL 
settings](security.html#ssl-configuration).
    +The exact mechanism used to generate and distribute the shared secret is 
deployment-specific.
     
    -### Authentication
    +For Spark on [YARN](running-on-yarn.html) and local deployments, Spark 
will automatically handle
    +generating and distributing the shared secret. Each application will use a 
unique shared secret. In
    +the case of YARN, this feature relies on YARN RPC encryption being enabled 
for the distribution of
    +secrets to be secure.
     
    -A user may want to secure the UI if it has data that other users should 
not be allowed to see. The javax servlet filter specified by the user can 
authenticate the user and then once the user is logged in, Spark can compare 
that user versus the view ACLs to make sure they are authorized to view the UI. 
The configs `spark.acls.enable`, `spark.ui.view.acls` and 
`spark.ui.view.acls.groups` control the behavior of the ACLs. Note that the 
user who started the application always has view access to the UI.  On YARN, 
the Spark UI uses the standard YARN web application proxy mechanism and will 
authenticate via any installed Hadoop filters.
    +For other resource managers, `spark.authenticate.secret` must be 
configured on each of the nodes.
    +This secret will be shared by all the daemons and applications, so this 
deployment configuration is
    +not as secure as the above, especially when considering multi-tenant 
clusters.
     
    -Spark also supports modify ACLs to control who has access to modify a 
running Spark application. This includes things like killing the application or 
a task. This is controlled by the configs `spark.acls.enable`, 
`spark.modify.acls` and `spark.modify.acls.groups`. Note that if you are 
authenticating the web UI, in order to use the kill button on the web UI it 
might be necessary to add the users in the modify acls to the view acls also. 
On YARN, the modify acls are passed in and control who has modify access via 
YARN interfaces.
    -Spark allows for a set of administrators to be specified in the acls who 
always have view and modify permissions to all the applications. is controlled 
by the configs `spark.admin.acls` and `spark.admin.acls.groups`. This is useful 
on a shared cluster where you might have administrators or support staff who 
help users debug applications.
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
    +<tr>
    +  <td><code>spark.authenticate</code></td>
    +  <td>false</td>
    +  <td>Whether Spark authenticates its internal connections.</td>
    +</tr>
    +<tr>
    +  <td><code>spark.authenticate.secret</code></td>
    +  <td>None</td>
    +  <td>
    +    The secret key used authentication. See above for when this 
configuration should be set.
    +  </td>
    +</tr>
    +</table>
     
    -## Event Logging
    +## Encryption
     
    -If your applications are using event logging, the directory where the 
event logs go (`spark.eventLog.dir`) should be manually created and have the 
proper permissions set on it. If you want those log files secured, the 
permissions should be set to `drwxrwxrwxt` for that directory. The owner of the 
directory should be the super user who is running the history server and the 
group permissions should be restricted to super user group. This will allow all 
users to write to the directory but will prevent unprivileged users from 
removing or renaming a file unless they own the file or directory. The event 
log files will be created by Spark with permissions such that only the user and 
group have read and write access.
    +Spark supports AES-based encryption for RPC connections. For encryption to 
be enabled, RPC
    +authentication must also be enabled and properly configured. AES 
encryption uses the
    +[Apache Commons Crypto](http://commons.apache.org/proper/commons-crypto/) 
library, and Spark's
    +configuration system allows access to that library's configuration for 
advanced users.
    +
    +There is also support for SASL-based encryption, although it should be 
considered deprectated. It
    --- End diff --
    
    typo: deprecated


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to