[GitHub] spark pull request #18873: [BRANCH-2.1][BACKPORT] Fixing python 2.6 tests fo...
Github user dmvieira closed the pull request at: https://github.com/apache/spark/pull/18873 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18873: [BRANCH-2.1][BACKPORT] Fixing python 2.6 tests for jenki...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18873 Closing here... Thank you @felixcheung , @srowen , @vanzin and @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18873 Hey guys, I just opened this PR because I spent a lot of time trying to fix Jenkins tests in my last PR when the error was in test script with python 2.6... I can close it, but I know that more guys will spend more time trying to fix it again. Ok @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18873 It doesn't make sense... If you see all failed builds they're failing because doesn't support python 2.6... I think your CI is running python 2.6 in some machines and python 2.7 or higher in others. Are you sure that want to close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18802 Thank you @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...
Github user dmvieira closed the pull request at: https://github.com/apache/spark/pull/18802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18873 removed other PR related code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18802 I removed test fixes and add another PR: https://github.com/apache/spark/pull/18873 from this branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18873: Fixing python 2.6 tests for jenkings
GitHub user dmvieira opened a pull request: https://github.com/apache/spark/pull/18873 Fixing python 2.6 tests for jenkings ## What changes were proposed in this pull request? I was doing PR https://github.com/apache/spark/pull/18802 and tests always fail. Here I'm fixing Jenkins tests that were failing with python 2.6. Here there are some backports for python 2.6 ## How was this patch tested? Tests passing at Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvieira/spark fix-python-2.6-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18873.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18873 commit 6905976d5fedd7e7dc9e6b578a8bbadfa675fd63 Author: Mark Grover <m...@apache.org> Date: 2016-11-28T16:59:47Z [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI ## What changes were proposed in this pull request? This patch adds a new property called `spark.secret.redactionPattern` that allows users to specify a scala regex to decide which Spark configuration properties and environment variables in driver and executor environments contain sensitive information. When this regex matches the property or environment variable name, its value is redacted from the environment UI and various logs like YARN and event logs. This change uses this property to redact information from event logs and YARN logs. It also, updates the UI code to adhere to this property instead of hardcoding the logic to decipher which properties are sensitive. Here's an image of the UI post-redaction: ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png) Here's the text in the YARN logs, post-redaction: ``HADOOP_CREDSTORE_PASSWORD -> *(redacted)`` Here's the text in the event logs, post-redaction: ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)",...`` ## How was this patch tested? 1. Unit tests are added to ensure that redaction works. 2. A YARN job reading data off of S3 with confidential information (hadoop credential provider password) being provided in the environment variables of driver and executor. And, afterwards, logs were grepped to make sure that no mention of secret password was present. It was also ensure that the job was able to read the data off of S3 correctly, thereby ensuring that the sensitive information was being trickled down to the right places to read the data. 3. The event logs were checked to make sure no mention of secret password was present. 4. UI environment tab was checked to make sure there was no secret information being displayed. Author: Mark Grover <m...@apache.org> Closes #15971 from markgrover/master_redaction. commit 7b419b4a1dcad7be02441e5e3729540022b51b4a Author: Mark Grover <m...@apache.org> Date: 2017-03-02T18:33:56Z [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console ## What changes were proposed in this pull request? This change redacts senstive information (based on `spark.redaction.regex` property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. ## How was this patch tested? Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Author: Mark Grover <m...@apache.org&g
[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18802#discussion_r131717553 --- Diff: dev/run-tests.py --- @@ -121,7 +121,7 @@ def determine_modules_to_test(changed_modules): if modules.root in modules_to_test: return [modules.root] return toposort_flatten( -{m: set(m.dependencies).intersection(modules_to_test) for m in modules_to_test}, sort=True) +dict((m, set(m.dependencies).intersection(modules_to_test)) for m in modules_to_test)) --- End diff -- I can remove it, but tests will fail at Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18765 Closing this PR since https://github.com/apache/spark/pull/18802 is completed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira closed the pull request at: https://github.com/apache/spark/pull/18765 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18765#discussion_r131154059 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)" + private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r --- End diff -- I did it work there... I tested here and UI and spark-submit already working. I think you can close this pull request and focus on #18802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18765#discussion_r131036498 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)" + private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r --- End diff -- I did PR but I don't know why Jenkins fail with access error... It sounds like permission issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18802 I don't know why these tests are breaking. Could some one help me? Permission denied? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18802 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18765#discussion_r130733138 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)" + private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r --- End diff -- I did another pull request with all feature: https://github.com/apache/spark/pull/18802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...
GitHub user dmvieira opened a pull request: https://github.com/apache/spark/pull/18802 [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information ## What changes were proposed in this pull request? Backporting SPARK-18535 and SPARK-19720 to spark 2.1 It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs. Using reference from Mark Grover m...@apache.org PRs ## How was this patch tested? Same tests from PR applied You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvieira/spark feature-redact Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18802.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18802 commit 6905976d5fedd7e7dc9e6b578a8bbadfa675fd63 Author: Mark Grover <m...@apache.org> Date: 2016-11-28T16:59:47Z [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI ## What changes were proposed in this pull request? This patch adds a new property called `spark.secret.redactionPattern` that allows users to specify a scala regex to decide which Spark configuration properties and environment variables in driver and executor environments contain sensitive information. When this regex matches the property or environment variable name, its value is redacted from the environment UI and various logs like YARN and event logs. This change uses this property to redact information from event logs and YARN logs. It also, updates the UI code to adhere to this property instead of hardcoding the logic to decipher which properties are sensitive. Here's an image of the UI post-redaction: ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png) Here's the text in the YARN logs, post-redaction: ``HADOOP_CREDSTORE_PASSWORD -> *(redacted)`` Here's the text in the event logs, post-redaction: ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)",...`` ## How was this patch tested? 1. Unit tests are added to ensure that redaction works. 2. A YARN job reading data off of S3 with confidential information (hadoop credential provider password) being provided in the environment variables of driver and executor. And, afterwards, logs were grepped to make sure that no mention of secret password was present. It was also ensure that the job was able to read the data off of S3 correctly, thereby ensuring that the sensitive information was being trickled down to the right places to read the data. 3. The event logs were checked to make sure no mention of secret password was present. 4. UI environment tab was checked to make sure there was no secret information being displayed. Author: Mark Grover <m...@apache.org> Closes #15971 from markgrover/master_redaction. commit 7b419b4a1dcad7be02441e5e3729540022b51b4a Author: Mark Grover <m...@apache.org> Date: 2017-03-02T18:33:56Z [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console ## What changes were proposed in this pull request? This change redacts senstive information (based on `spark.redaction.regex` property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. ## How was this patch tested? Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18765#discussion_r130676428 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)" + private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r --- End diff -- Hi @markgrover ! My intention here was only fix this security breach making spark-submit redact patten similar to UI redact pattern. I can change it, but it will be a new feature backport and not a bugfix backport --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...
Github user dmvieira commented on a diff in the pull request: https://github.com/apache/spark/pull/18765#discussion_r130572412 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)" + private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r --- End diff -- But I'm following UI logic at spark 2.1 version: https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18765 Please @gatorsmile , check if it is better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/18765 I'm sorry... I was just suggesting it because is a major issue as described here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19720 I'm using airflow for job submit and password appears in log if I want verbose mode in spark submit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18765: [SPARK-19720][CORE] Redact sensitive information ...
GitHub user dmvieira opened a pull request: https://github.com/apache/spark/pull/18765 [SPARK-19720][CORE] Redact sensitive information from SparkSubmit con⦠â¦sole This change redacts senstive information (based on default password and secret regex) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Using reference from Mark Grover <m...@apache.org> Closes #17047 for 2.1.2 spark vesion. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvieira/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18765 commit 9e757820af7990f37d1cb5f8cd9c989fcf815cdf Author: Mark Grover <m...@apache.org> Date: 2017-03-02T18:33:56Z [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console This change redacts senstive information (based on default password and secret regex) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Using reference from Mark Grover <m...@apache.org> Closes #17047 for 2.1.2 spark vesion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15000: [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspa...
Github user dmvieira commented on the issue: https://github.com/apache/spark/pull/15000 Hey guys, How can I do same thing using sparkR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user dmvieira commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-132340568 I'm starting a third-party package as suggested by @srowen and I hope you enjoy. Feel free to collaborate: https://github.com/dmvieira/spark-twitter-stream-receiver --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user dmvieira commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-131894055 So, why not improve it with this PR and then move it to a new project / package when we think about a better solution? We can create an issue or you can talk with stakeholders to discuss about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user dmvieira commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-131885136 But without this path you're restricting a lot Twitter functionalities inside Spark and still supporting Twitter interface. Spark still maintain Twitter API interface even without this path. IMHO if Spark don't want to maintain Twitter interface you should remove Twitter streaming as a package inside Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user dmvieira commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-131880125 Hey guys, I need this patch too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org