GitHub user markgrover opened a pull request:
https://github.com/apache/spark/pull/15971
[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI
## What changes were proposed in this pull request?
This patch adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.
This change uses this property to redact information from event logs and
YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.
Here's an image of the UI post-redaction:

Here's the text in the YARN logs, post-redaction:
``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)``
Here's the text in the event logs, post-redaction:
``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...``
## How was this patch tested?
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to
read
the data.
3. The event logs were checked to make sure no mention of secret password
was
present.
4. UI environment tab was checked to make sure there was no secret
information
being displayed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/markgrover/spark master_redaction
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15971.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15971
----
commit 5dd3630d6a937ba8634054940543509d9186e68e
Author: Mark Grover <[email protected]>
Date: 2016-11-17T01:47:17Z
[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI
This commit adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.
This change uses this property to redact information from event logs and
YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.
For testing:
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to
read
the data.
3. The event logs were checked to make sure no mention of secret password
was
present.
4. UI environment tab was checked to make sure there was no secret
information
being displayed.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]