[GitHub] spark pull request #18873: [BRANCH-2.1][BACKPORT] Fixing python 2.6 tests fo...

2017-08-09 Thread dmvieira
Github user dmvieira closed the pull request at:

https://github.com/apache/spark/pull/18873


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18873: [BRANCH-2.1][BACKPORT] Fixing python 2.6 tests for jenki...

2017-08-09 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18873
  
Closing here... Thank you @felixcheung , @srowen , @vanzin and @HyukjinKwon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings

2017-08-08 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18873
  
Hey guys, I just opened this PR because I spent a lot of time trying to fix 
Jenkins tests in my last PR when the error was in test script with python 
2.6... I can close it, but I know that more guys will spend more time trying to 
fix it again. Ok @felixcheung


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings

2017-08-07 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18873
  
It doesn't make sense... If you see all failed builds they're failing 
because doesn't support python 2.6... I think your CI is running python 2.6 in 
some machines and python 2.7 or higher in others. Are you sure that want to 
close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...

2017-08-07 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18802
  
Thank you @vanzin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...

2017-08-07 Thread dmvieira
Github user dmvieira closed the pull request at:

https://github.com/apache/spark/pull/18802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18873: Fixing python 2.6 tests for jenkings

2017-08-07 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18873
  
removed other PR related code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...

2017-08-07 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18802
  
I removed test fixes and add another PR: 
https://github.com/apache/spark/pull/18873 from this branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18873: Fixing python 2.6 tests for jenkings

2017-08-07 Thread dmvieira
GitHub user dmvieira opened a pull request:

https://github.com/apache/spark/pull/18873

Fixing python 2.6 tests for jenkings

## What changes were proposed in this pull request?

I was doing PR https://github.com/apache/spark/pull/18802 and tests always 
fail.

Here I'm fixing Jenkins tests that were failing with python 2.6. Here there 
are some backports for python 2.6

## How was this patch tested?

Tests passing at Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dmvieira/spark fix-python-2.6-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18873.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18873


commit 6905976d5fedd7e7dc9e6b578a8bbadfa675fd63
Author: Mark Grover <m...@apache.org>
Date:   2016-11-28T16:59:47Z

[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI

## What changes were proposed in this pull request?

This patch adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.

This change uses this property to redact information from event logs and 
YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.

Here's an image of the UI post-redaction:

![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)

Here's the text in the YARN logs, post-redaction:
``HADOOP_CREDSTORE_PASSWORD -> *(redacted)``

Here's the text in the event logs, post-redaction:

``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)",...``

## How was this patch tested?
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to 
read
the data.
3. The event logs were checked to make sure no mention of secret password 
was
present.
4. UI environment tab was checked to make sure there was no secret 
information
being displayed.

Author: Mark Grover <m...@apache.org>

Closes #15971 from markgrover/master_redaction.

commit 7b419b4a1dcad7be02441e5e3729540022b51b4a
Author: Mark Grover <m...@apache.org>
Date:   2017-03-02T18:33:56Z

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console

## What changes were proposed in this pull request?
This change redacts senstive information (based on `spark.redaction.regex` 
property)
from the Spark Submit console logs. Such sensitive information is already 
being
redacted from event logs and yarn logs, etc.

## How was this patch tested?
Testing was done manually to make sure that the console logs were not 
printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```
There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.

Running unit tests to make sure nothing else is broken by this change.

Author: Mark Grover <m...@apache.org&g

[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...

2017-08-07 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18802#discussion_r131717553
  
--- Diff: dev/run-tests.py ---
@@ -121,7 +121,7 @@ def determine_modules_to_test(changed_modules):
 if modules.root in modules_to_test:
 return [modules.root]
 return toposort_flatten(
-{m: set(m.dependencies).intersection(modules_to_test) for m in 
modules_to_test}, sort=True)
+dict((m, set(m.dependencies).intersection(modules_to_test)) for m 
in modules_to_test))
--- End diff --

I can remove it, but tests will fail at Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...

2017-08-05 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18765
  
Closing this PR since https://github.com/apache/spark/pull/18802 is 
completed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-05 Thread dmvieira
Github user dmvieira closed the pull request at:

https://github.com/apache/spark/pull/18765


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-03 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18765#discussion_r131154059
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)"
+  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
--- End diff --

I did it work there... I tested here and UI and spark-submit already 
working. I think you can close this pull request and focus on #18802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-02 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18765#discussion_r131036498
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)"
+  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
--- End diff --

I did PR but I don't know why Jenkins fail with access error... It sounds 
like permission issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...

2017-08-02 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18802
  
I don't know why these tests are breaking. Could some one help me?

Permission denied?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...

2017-08-02 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18802
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-01 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18765#discussion_r130733138
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)"
+  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
--- End diff --

I did another pull request with all feature: 
https://github.com/apache/spark/pull/18802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Re...

2017-08-01 Thread dmvieira
GitHub user dmvieira opened a pull request:

https://github.com/apache/spark/pull/18802

[SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information

## What changes were proposed in this pull request?

Backporting SPARK-18535 and SPARK-19720 to spark 2.1

It's a backport PR that redacts senstive information by configuration to 
Spark UI and Spark Submit console logs.

Using reference from Mark Grover m...@apache.org PRs

## How was this patch tested?

Same tests from PR applied


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dmvieira/spark feature-redact

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18802


commit 6905976d5fedd7e7dc9e6b578a8bbadfa675fd63
Author: Mark Grover <m...@apache.org>
Date:   2016-11-28T16:59:47Z

[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI

## What changes were proposed in this pull request?

This patch adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.

This change uses this property to redact information from event logs and 
YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.

Here's an image of the UI post-redaction:

![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)

Here's the text in the YARN logs, post-redaction:
``HADOOP_CREDSTORE_PASSWORD -> *(redacted)``

Here's the text in the event logs, post-redaction:

``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*(redacted)",...``

## How was this patch tested?
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to 
read
the data.
3. The event logs were checked to make sure no mention of secret password 
was
present.
4. UI environment tab was checked to make sure there was no secret 
information
being displayed.

Author: Mark Grover <m...@apache.org>

Closes #15971 from markgrover/master_redaction.

commit 7b419b4a1dcad7be02441e5e3729540022b51b4a
Author: Mark Grover <m...@apache.org>
Date:   2017-03-02T18:33:56Z

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console

## What changes were proposed in this pull request?
This change redacts senstive information (based on `spark.redaction.regex` 
property)
from the Spark Submit console logs. Such sensitive information is already 
being
redacted from event logs and yarn logs, etc.

## How was this patch tested?
Testing was done manually to make sure that the console logs were not 
printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```
There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.

Running unit tests to make sure nothing else is broken 

[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-01 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18765#discussion_r130676428
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)"
+  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
--- End diff --

Hi @markgrover ! My intention here was only fix this security breach making 
spark-submit redact patten similar to UI redact pattern. I can change it, but 
it will be a new feature backport and not a bugfix backport


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

2017-08-01 Thread dmvieira
Github user dmvieira commented on a diff in the pull request:

https://github.com/apache/spark/pull/18765#discussion_r130572412
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  private[util] val REDACTION_REPLACEMENT_TEXT = "*(redacted)"
+  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
--- End diff --

But I'm following UI logic at spark 2.1 version: 
https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...

2017-07-31 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18765
  
Please @gatorsmile , check if it is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...

2017-07-30 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/18765
  
I'm sorry... I was just suggesting it because is a major issue as described 
here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19720

I'm using airflow for job submit and password appears in log if I want 
verbose mode in spark submit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18765: [SPARK-19720][CORE] Redact sensitive information ...

2017-07-28 Thread dmvieira
GitHub user dmvieira opened a pull request:

https://github.com/apache/spark/pull/18765

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit con…

…sole

This change redacts senstive information (based on default password and 
secret regex)
from the Spark Submit console logs. Such sensitive information is already 
being
redacted from event logs and yarn logs, etc.

Testing was done manually to make sure that the console logs were not 
printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```
There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.

Running unit tests to make sure nothing else is broken by this change.

Using reference from Mark Grover <m...@apache.org>

Closes #17047 for 2.1.2 spark vesion.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dmvieira/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18765


commit 9e757820af7990f37d1cb5f8cd9c989fcf815cdf
Author: Mark Grover <m...@apache.org>
Date:   2017-03-02T18:33:56Z

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console

This change redacts senstive information (based on default password and 
secret regex)
from the Spark Submit console logs. Such sensitive information is already 
being
redacted from event logs and yarn logs, etc.

Testing was done manually to make sure that the console logs were not 
printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*(redacted))
```
There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.

Running unit tests to make sure nothing else is broken by this change.

Using reference from Mark Grover <m...@apache.org>

Closes #17047 for 2.1.2 spark vesion.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15000: [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspa...

2016-12-16 Thread dmvieira
Github user dmvieira commented on the issue:

https://github.com/apache/spark/pull/15000
  
Hey guys, How can I do same thing using sparkR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2015-08-18 Thread dmvieira
Github user dmvieira commented on the pull request:

https://github.com/apache/spark/pull/1717#issuecomment-132340568
  
I'm starting a third-party package as suggested by @srowen and I hope you 
enjoy. Feel free to collaborate: 
https://github.com/dmvieira/spark-twitter-stream-receiver


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2015-08-17 Thread dmvieira
Github user dmvieira commented on the pull request:

https://github.com/apache/spark/pull/1717#issuecomment-131894055
  
So, why not improve it with this PR and then move it to a new project / 
package when we think about a better solution? We can create an issue or you 
can talk with stakeholders to discuss about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2015-08-17 Thread dmvieira
Github user dmvieira commented on the pull request:

https://github.com/apache/spark/pull/1717#issuecomment-131885136
  
But without this path you're restricting a lot Twitter functionalities 
inside Spark and still supporting Twitter interface. Spark still maintain 
Twitter API interface even without this path. IMHO if Spark don't want to 
maintain Twitter interface you should remove Twitter streaming as a package 
inside Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2015-08-17 Thread dmvieira
Github user dmvieira commented on the pull request:

https://github.com/apache/spark/pull/1717#issuecomment-131880125
  
Hey guys, I need this patch too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org