GitHub user markgrover opened a pull request:
https://github.com/apache/spark/pull/17047
[SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit
console
## What changes were proposed in this pull request?
This change redacts senstive information (based on `spark.redaction.regex`
property)
from the Spark Submit console logs. Such sensitive information is already
being
redacted from event logs and yarn logs, etc.
## How was this patch tested?
Testing was done manually to make sure that the console logs were not
printing any
sensitive information.
Here's some output from the console:
```
Spark properties used, including those specified through
--conf and those from the properties file
/etc/spark2/conf/spark-defaults.conf:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
There is a risk if new print statements were added to the console down the
road, sensitive information may still get leaked, since there is no test that
asserts on the console log output. I considered it out of the scope of this
JIRA to write an integration test to make sure new leaks don't happen in the
future.
Running unit tests to make sure nothing else is broken by this change.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/markgrover/spark master_redaction
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17047.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17047
----
commit 000efb1e3152f837e01ce1f80ae108c596f9baa5
Author: Mark Grover <[email protected]>
Date: 2017-02-24T01:30:05Z
[SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit
console output
This change redacts senstive information (based on spark.redaction.regex
property)
from the Spark Submit console logs. Such sensitive information is already
being
redacted from event logs and yarn logs, etc.
Testing was done manually to make sure that the console logs were not
printing any
sensitive information.
Here's some output from the console:
Spark properties used, including those specified through
--conf and those from the properties file
/etc/spark2/conf/spark-defaults.conf:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]