[ https://issues.apache.org/jira/browse/SPARK-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685225#comment-15685225 ]
Mark Grover edited comment on SPARK-18535 at 11/22/16 12:36 AM: ---------------------------------------------------------------- I just issued a PR for this, that adds a new customizable property for determining what configuration properties are sensitive. Attached is an image from the UI with this change. Here's the text in the YARN logs, with this change: {{HADOOP_CREDSTORE_PASSWORD -> *********(redacted)}} Here's the text in the event logs, with this change: {code} ...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",... {code} was (Author: mgrover): I just issued a PR for this, that adds a new customizable property for determining what configuration properties are sensitive. Attached is an image from the UI with this change. Here's the text in the YARN logs, with this change: {{HADOOP_CREDSTORE_PASSWORD -> *********(redacted)}} Here's the text in the event logs, with this change: {{...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...}} > Redact sensitive information from Spark logs and UI > --------------------------------------------------- > > Key: SPARK-18535 > URL: https://issues.apache.org/jira/browse/SPARK-18535 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN > Affects Versions: 2.1.0 > Reporter: Mark Grover > Attachments: redacted.png > > > A Spark user may have to provide a sensitive information for a Spark > configuration property, or a source out an environment variable in the > executor or driver environment that contains sensitive information. A good > example of this would be when reading/writing data from/to S3 using Spark. > The S3 secret and S3 access key can be placed in a [hadoop credential > provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]. > However, one still needs to provide the password for the credential provider > to Spark, which is typically supplied as an environment variable to the > driver and executor environments. This environment variable shows up in logs, > and may also show up in the UI. > 1. For logs, it shows up in a few places: > 1A. Event logs under {{SparkListenerEnvironmentUpdate}} event. > 1B. YARN logs, when printing the executor launch context. > 2. For UI, it would show up in the _Environment_ tab, but it is redacted if > it contains the words "password" or "secret" in it. And, these magic words > are > [hardcoded|https://github.com/apache/spark/blob/a2d464770cd183daa7d727bf377bde9c21e29e6a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala#L30] > and hence not customizable. > This JIRA is to track the work to make sure sensitive information is redacted > from all logs and UIs in Spark, while still being passed on to all relevant > places it needs to get passed on to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org