[spark] branch branch-3.1 updated: [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar

2021-03-21 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 3889b71  [SPARK-34225][CORE] Don't encode further when a URI form 
string is passed to addFile or addJar
3889b71 is described below

commit 3889b7194a118cae9ae8330e560fb09b1c65b407
Author: Kousuke Saruta 
AuthorDate: Mon Mar 22 14:06:41 2021 +0900

[SPARK-34225][CORE] Don't encode further when a URI form string is passed 
to addFile or addJar

### What changes were proposed in this pull request?

This PR fixes an issue that `addFile` and `addJar` further encode even 
though a URI form string is passed.
For example, the following operation will throw exception even though the 
file exists.
```
sc.addFile("file:/foo/test%20file.txt")
```

Another case is `--files` and `--jars` option when we submit an application.
```
bin/spark-shell --files "/foo/test file.txt"
```
The path above is transformed to URI form 
[here](https://github.com/apache/spark/blob/ecf4811764f1ef91954c865a864e0bf6691f99a6/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L400)
 and passed to `addFile` so the same issue happens.

### Why are the changes needed?

This is a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test.

Closes #31718 from sarutak/fix-uri-encode-double.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 0734101bb716b50aa675cee0da21a20692bb44d4)
Signed-off-by: Kousuke Saruta 
---
 .../main/scala/org/apache/spark/SparkContext.scala | 16 ++---
 .../main/scala/org/apache/spark/util/Utils.scala   | 11 ++
 .../scala/org/apache/spark/SparkContextSuite.scala | 40 ++
 3 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 17ceb5f..4d6dec3 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1584,7 +1584,11 @@ class SparkContext(config: SparkConf) extends Logging {
   path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: 
Boolean = false
 ): Unit = {
 val uri = if (!isArchive) {
-  new Path(path).toUri
+  if (Utils.isAbsoluteURI(path) && path.contains("%")) {
+new URI(path)
+  } else {
+new Path(path).toUri
+  }
 } else {
   Utils.resolveURI(path)
 }
@@ -1619,10 +1623,8 @@ class SparkContext(config: SparkConf) extends Logging {
   env.rpcEnv.fileServer.addFile(new File(uri.getPath))
 } else if (uri.getScheme == null) {
   schemeCorrectedURI.toString
-} else if (isArchive) {
-  uri.toString
 } else {
-  path
+  uri.toString
 }
 
 val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis
@@ -1977,7 +1979,11 @@ class SparkContext(config: SparkConf) extends Logging {
 // For local paths with backslashes on Windows, URI throws an exception
 addLocalJarFile(new File(path))
   } else {
-val uri = new Path(path).toUri
+val uri = if (Utils.isAbsoluteURI(path) && path.contains("%")) {
+  new URI(path)
+} else {
+  new Path(path).toUri
+}
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
 uri.getScheme match {
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 080d3bb..1643aa6 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -2065,6 +2065,17 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /** Check whether a path is an absolute URI. */
+  def isAbsoluteURI(path: String): Boolean = {
+try {
+  val uri = new URI(path: String)
+  uri.isAbsolute
+} catch {
+  case _: URISyntaxException =>
+false
+}
+  }
+
   /** Return all non-local paths from a comma-separated list of paths. */
   def nonLocalPaths(paths: String, testWindows: Boolean = false): 
Array[String] = {
 val windows = isWindows || testWindows
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index 0c0a9b8..c4bcccf 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
@@ -1069,6 +1069,46 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 

[spark] branch master updated: [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar

2021-03-21 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0734101  [SPARK-34225][CORE] Don't encode further when a URI form 
string is passed to addFile or addJar
0734101 is described below

commit 0734101bb716b50aa675cee0da21a20692bb44d4
Author: Kousuke Saruta 
AuthorDate: Mon Mar 22 14:06:41 2021 +0900

[SPARK-34225][CORE] Don't encode further when a URI form string is passed 
to addFile or addJar

### What changes were proposed in this pull request?

This PR fixes an issue that `addFile` and `addJar` further encode even 
though a URI form string is passed.
For example, the following operation will throw exception even though the 
file exists.
```
sc.addFile("file:/foo/test%20file.txt")
```

Another case is `--files` and `--jars` option when we submit an application.
```
bin/spark-shell --files "/foo/test file.txt"
```
The path above is transformed to URI form 
[here](https://github.com/apache/spark/blob/ecf4811764f1ef91954c865a864e0bf6691f99a6/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L400)
 and passed to `addFile` so the same issue happens.

### Why are the changes needed?

This is a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test.

Closes #31718 from sarutak/fix-uri-encode-double.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../main/scala/org/apache/spark/SparkContext.scala | 16 ++---
 .../main/scala/org/apache/spark/util/Utils.scala   | 11 ++
 .../scala/org/apache/spark/SparkContextSuite.scala | 40 ++
 3 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 685ce55..b0a5b7c 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1584,7 +1584,11 @@ class SparkContext(config: SparkConf) extends Logging {
   path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: 
Boolean = false
 ): Unit = {
 val uri = if (!isArchive) {
-  new Path(path).toUri
+  if (Utils.isAbsoluteURI(path) && path.contains("%")) {
+new URI(path)
+  } else {
+new Path(path).toUri
+  }
 } else {
   Utils.resolveURI(path)
 }
@@ -1619,10 +1623,8 @@ class SparkContext(config: SparkConf) extends Logging {
   env.rpcEnv.fileServer.addFile(new File(uri.getPath))
 } else if (uri.getScheme == null) {
   schemeCorrectedURI.toString
-} else if (isArchive) {
-  uri.toString
 } else {
-  path
+  uri.toString
 }
 
 val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis
@@ -1977,7 +1979,11 @@ class SparkContext(config: SparkConf) extends Logging {
 // For local paths with backslashes on Windows, URI throws an exception
 (addLocalJarFile(new File(path)), "local")
   } else {
-val uri = new Path(path).toUri
+val uri = if (Utils.isAbsoluteURI(path) && path.contains("%")) {
+  new URI(path)
+} else {
+  new Path(path).toUri
+}
 // SPARK-17650: Make sure this is a valid URL before adding it to the 
list of dependencies
 Utils.validateURL(uri)
 val uriScheme = uri.getScheme
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index eebd009..e27666b 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -2063,6 +2063,17 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /** Check whether a path is an absolute URI. */
+  def isAbsoluteURI(path: String): Boolean = {
+try {
+  val uri = new URI(path: String)
+  uri.isAbsolute
+} catch {
+  case _: URISyntaxException =>
+false
+}
+  }
+
   /** Return all non-local paths from a comma-separated list of paths. */
   def nonLocalPaths(paths: String, testWindows: Boolean = false): 
Array[String] = {
 val windows = isWindows || testWindows
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index 0ba2a03..42b9b0e 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
@@ -1197,6 +1197,46 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 assert(sc.hadoopConfiguration.get(bufferKey).toInt === 65536,
   "spark configs have higher priority than 

[spark] branch master updated (47da944 -> f4de93e)

2021-03-21 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 47da944  [SPARK-34470][ML] VectorSlicer utilize ordering if possible
 add f4de93e  [MINOR][SQL] Spelling: filters - PushedFilers

No new revisions were added by this update.

Summary of changes:
 .../avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala | 2 +-
 .../avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala   | 2 +-
 .../org/apache/spark/sql/execution/datasources/v2/csv/CSVScan.scala | 2 +-
 .../org/apache/spark/sql/execution/datasources/v2/orc/OrcScan.scala | 2 +-
 .../spark/sql/execution/datasources/v2/parquet/ParquetScan.scala| 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala | 6 +++---
 6 files changed, 8 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on pull request #328: Add Attila Zsolt Piros to committers

2021-03-21 Thread GitBox


zhengruifeng commented on pull request #328:
URL: https://github.com/apache/spark-website/pull/328#issuecomment-803711340


   Congrats!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on pull request #326: Add Yi Wu to committers' list

2021-03-21 Thread GitBox


zhengruifeng commented on pull request #326:
URL: https://github.com/apache/spark-website/pull/326#issuecomment-803711163


   Congrats!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on pull request #327: Add Maciej Szymkiewicz to committers

2021-03-21 Thread GitBox


zhengruifeng commented on pull request #327:
URL: https://github.com/apache/spark-website/pull/327#issuecomment-803710949


   Congrats!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on pull request #325: Add Kent Yao to committers

2021-03-21 Thread GitBox


zhengruifeng commented on pull request #325:
URL: https://github.com/apache/spark-website/pull/325#issuecomment-803710825


   Congrats!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c5fd94f -> 47da944)

2021-03-21 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5fd94f  [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 
1.2.1 in Java9+ environment
 add 47da944  [SPARK-34470][ML] VectorSlicer utilize ordering if possible

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/linalg/Vectors.scala | 56 --
 .../org/apache/spark/ml/linalg/VectorsSuite.scala  |  7 +++
 .../org/apache/spark/ml/feature/VectorSlicer.scala | 15 +++---
 3 files changed, 56 insertions(+), 22 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e28296e  [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 
1.2.1 in Java9+ environment
e28296e is described below

commit e28296e1905ee34ce93847e0ee03d8b70b161ef7
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 17:59:55 2021 -0700

[SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in 
Java9+ environment

### What changes were proposed in this pull request?

This PR aims to disable a new test case using Hive 1.2.1 from Java9+ test 
environment.

### Why are the changes needed?

[HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113) upgraded 
Datanucleus to 4.x at Hive 2.0. Datanucleus 3.x doesn't support Java9+.

**Java 9+ Environment**
```
$ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive
...
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.sql.hive.HiveSparkSubmitSuite
[error] (hive / Test / testOnly) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 328 s (05:28), completed Mar 21, 2021, 5:32:39 PM
```

### Does this PR introduce _any_ user-facing change?

Fix the UT in Java9+ environment.

### How was this patch tested?

Manually.

```
$ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive
...
[info] HiveSparkSubmitSuite:
[info] - SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark 
classloader instead of context !!! CANCELED !!! (26 milliseconds)
[info]   org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) 
was true (HiveSparkSubmitSuite.scala:344)
```

Closes #31916 from dongjoon-hyun/SPARK-HiveSparkSubmitSuite.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c5fd94f1197faf8a974c7d7745cdebf42b3430b9)
Signed-off-by: Dongjoon Hyun 
---
 .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala  | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
index 09f9f9e..f9ea4e3 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
@@ -339,6 +339,7 @@ class HiveSparkSubmitSuite
 
   test("SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark 
classloader " +
 "instead of context") {
+assume(!SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))
 val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
 
 // We need to specify the metastore database location in case of conflict 
with other hive

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 3767aac  [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 
1.2.1 in Java9+ environment
3767aac is described below

commit 3767aac9772b5377f98edb906d1abaf9dd4dcab7
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 17:59:55 2021 -0700

[SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in 
Java9+ environment

### What changes were proposed in this pull request?

This PR aims to disable a new test case using Hive 1.2.1 from Java9+ test 
environment.

### Why are the changes needed?

[HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113) upgraded 
Datanucleus to 4.x at Hive 2.0. Datanucleus 3.x doesn't support Java9+.

**Java 9+ Environment**
```
$ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive
...
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.sql.hive.HiveSparkSubmitSuite
[error] (hive / Test / testOnly) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 328 s (05:28), completed Mar 21, 2021, 5:32:39 PM
```

### Does this PR introduce _any_ user-facing change?

Fix the UT in Java9+ environment.

### How was this patch tested?

Manually.

```
$ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive
...
[info] HiveSparkSubmitSuite:
[info] - SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark 
classloader instead of context !!! CANCELED !!! (26 milliseconds)
[info]   org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) 
was true (HiveSparkSubmitSuite.scala:344)
```

Closes #31916 from dongjoon-hyun/SPARK-HiveSparkSubmitSuite.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c5fd94f1197faf8a974c7d7745cdebf42b3430b9)
Signed-off-by: Dongjoon Hyun 
---
 .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala  | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
index a3bff6b..426d93b 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
@@ -340,6 +340,7 @@ class HiveSparkSubmitSuite
 
   test("SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark 
classloader " +
 "instead of context") {
+assume(!SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))
 val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
 
 // We need to specify the metastore database location in case of conflict 
with other hive

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3bc6fe4 -> c5fd94f)

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3bc6fe4  [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits 
by default
 add c5fd94f  [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 
1.2.1 in Java9+ environment

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala  | 1 +
 1 file changed, 1 insertion(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3bc6fe4  [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits 
by default
3bc6fe4 is described below

commit 3bc6fe4e77e1791c0a20387240e93d0175e0fade
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 14:34:02 2021 -0700

[SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default

### What changes were proposed in this pull request?

This PR aims to enable `spark.hadoopRDD.ignoreEmptySplits` by default for 
Apache Spark 3.2.0.

### Why are the changes needed?

Although this is a safe improvement, this hasn't been enabled by default to 
avoid the explicit behavior change. This PR aims to switch the default 
explicitly in Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

Yes, the behavior change is documented.

### How was this patch tested?

Pass the existing CIs.

Closes #31909 from dongjoon-hyun/SPARK-34809.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/core-migration-guide.md   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 6392431..6b1e3d0 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1037,7 +1037,7 @@ package object config {
   .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
empty input splits.")
   .version("2.3.0")
   .booleanConf
-  .createWithDefault(false)
+  .createWithDefault(true)
 
   private[spark] val SECRET_REDACTION_PATTERN =
 ConfigBuilder("spark.redaction.regex")
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 232b9e3..e243b14 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -24,6 +24,8 @@ license: |
 
 ## Upgrading from Core 3.1 to 3.2
 
+- Since Spark 3.2, `spark.hadoopRDD.ignoreEmptySplits` is set to `true` by 
default which means Spark will not create empty partitions for empty input 
splits. To restore the behavior before Spark 3.2, you can set 
`spark.hadoopRDD.ignoreEmptySplits` to `false`.
+
 - Since Spark 3.2, `spark.eventLog.compression.codec` is set to `zstd` by 
default which means Spark will not fallback to use `spark.io.compression.codec` 
anymore.
 
 - Since Spark 3.2, `spark.storage.replication.proactive` is enabled by default 
which means Spark tries to replenish in case of the loss of cached RDD block 
replicas due to executor failures. To restore the behavior before Spark 3.2, 
you can set `spark.storage.replication.proactive` to `false`.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (8dea9b8 -> d1de69f)

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8dea9b8  [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and 
token
 add d1de69f  [SPARK-34813][INFRA][3.1] Remove Scala 2.13 build GitHub 
Action job from branch-3.1

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 22 --
 1 file changed, 22 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 29b981b  [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and 
token
29b981b is described below

commit 29b981b3feacadabf5cef4793506c20897ccd535
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 14:08:34 2021 -0700

[SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

### What changes were proposed in this pull request?

Like we redact secrets and tokens, this PR aims to redact access key.

### Why are the changes needed?

Access key is also worth to hide.

### Does this PR introduce _any_ user-facing change?

This will hide this information from SparkUI (`Spark Properties` and 
`Hadoop Properties` and logs).

### How was this patch tested?

Pass the newly updated UT.

Closes #31912 from dongjoon-hyun/SPARK-34811.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 91689df..4e5b8f0 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -358,7 +358,7 @@ package object config {
 "a property key or value, the value is redacted from the environment 
UI and various logs " +
 "like YARN and event logs.")
   .regexConf
-  .createWithDefault("(?i)secret|password|token".r)
+  .createWithDefault("(?i)secret|password|token|access[.]key".r)
 
   private[spark] val STRING_REDACTION_PATTERN =
 ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 00f96b9..a1a6721 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1013,11 +1013,13 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 // Set some secret keys
 val secretKeys = Seq(
   "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+  "spark.hadoop.fs.s3a.access.key",
   "spark.my.password",
   "spark.my.sECreT")
 secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") }
 // Set a non-secret key
 sparkConf.set("spark.regular.property", "regular_value")
+sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value")
 // Set a property with a regular key but secret in the value
 sparkConf.set("spark.sensitive.property", "has_secret_in_value")
 
@@ -1028,7 +1030,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 secretKeys.foreach { key => assert(redactedConf(key) === 
Utils.REDACTION_REPLACEMENT_TEXT) }
 assert(redactedConf("spark.regular.property") === "regular_value")
 assert(redactedConf("spark.sensitive.property") === 
Utils.REDACTION_REPLACEMENT_TEXT)
-
+assert(redactedConf("spark.hadoop.fs.s3a.access.key") === 
Utils.REDACTION_REPLACEMENT_TEXT)
+assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value")
   }
 
   test("tryWithSafeFinally") {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a42b631  [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and 
token
a42b631 is described below

commit a42b6311cffac8f7130d7d86857dfc48e0121ff7
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 14:08:34 2021 -0700

[SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

### What changes were proposed in this pull request?

Like we redact secrets and tokens, this PR aims to redact access key.

### Why are the changes needed?

Access key is also worth to hide.

### Does this PR introduce _any_ user-facing change?

This will hide this information from SparkUI (`Spark Properties` and 
`Hadoop Properties` and logs).

### How was this patch tested?

Pass the newly updated UT.

Closes #31912 from dongjoon-hyun/SPARK-34811.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 0afbb52..0227dcc 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -926,7 +926,7 @@ package object config {
 "like YARN and event logs.")
   .version("2.1.2")
   .regexConf
-  .createWithDefault("(?i)secret|password|token".r)
+  .createWithDefault("(?i)secret|password|token|access[.]key".r)
 
   private[spark] val STRING_REDACTION_PATTERN =
 ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 931eb6b..44f578a 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1024,11 +1024,13 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 // Set some secret keys
 val secretKeys = Seq(
   "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+  "spark.hadoop.fs.s3a.access.key",
   "spark.my.password",
   "spark.my.sECreT")
 secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") }
 // Set a non-secret key
 sparkConf.set("spark.regular.property", "regular_value")
+sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value")
 // Set a property with a regular key but secret in the value
 sparkConf.set("spark.sensitive.property", "has_secret_in_value")
 
@@ -1039,7 +1041,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 secretKeys.foreach { key => assert(redactedConf(key) === 
Utils.REDACTION_REPLACEMENT_TEXT) }
 assert(redactedConf("spark.regular.property") === "regular_value")
 assert(redactedConf("spark.sensitive.property") === 
Utils.REDACTION_REPLACEMENT_TEXT)
-
+assert(redactedConf("spark.hadoop.fs.s3a.access.key") === 
Utils.REDACTION_REPLACEMENT_TEXT)
+assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value")
   }
 
   test("redact sensitive information in command line args") {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 8dea9b8  [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and 
token
8dea9b8 is described below

commit 8dea9b8071609c6690bd8f3e9d5621d7bed4786a
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 21 14:08:34 2021 -0700

[SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token

### What changes were proposed in this pull request?

Like we redact secrets and tokens, this PR aims to redact access key.

### Why are the changes needed?

Access key is also worth to hide.

### Does this PR introduce _any_ user-facing change?

This will hide this information from SparkUI (`Spark Properties` and 
`Hadoop Properties` and logs).

### How was this patch tested?

Pass the newly updated UT.

Closes #31912 from dongjoon-hyun/SPARK-34811.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index f6de5e4..3daa9f5 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1015,7 +1015,7 @@ package object config {
 "like YARN and event logs.")
   .version("2.1.2")
   .regexConf
-  .createWithDefault("(?i)secret|password|token".r)
+  .createWithDefault("(?i)secret|password|token|access[.]key".r)
 
   private[spark] val STRING_REDACTION_PATTERN =
 ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 18ff960..208e729 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1024,11 +1024,13 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 // Set some secret keys
 val secretKeys = Seq(
   "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+  "spark.hadoop.fs.s3a.access.key",
   "spark.my.password",
   "spark.my.sECreT")
 secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") }
 // Set a non-secret key
 sparkConf.set("spark.regular.property", "regular_value")
+sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value")
 // Set a property with a regular key but secret in the value
 sparkConf.set("spark.sensitive.property", "has_secret_in_value")
 
@@ -1039,7 +1041,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 secretKeys.foreach { key => assert(redactedConf(key) === 
Utils.REDACTION_REPLACEMENT_TEXT) }
 assert(redactedConf("spark.regular.property") === "regular_value")
 assert(redactedConf("spark.sensitive.property") === 
Utils.REDACTION_REPLACEMENT_TEXT)
-
+assert(redactedConf("spark.hadoop.fs.s3a.access.key") === 
Utils.REDACTION_REPLACEMENT_TEXT)
+assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value")
   }
 
   test("redact sensitive information in command line args") {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2888d18 -> 3c32b54)

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2888d18  [SPARK-34784][BUILD] Upgrade Jackson to 2.12.2
 add 3c32b54  [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and 
token

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (da013d0 -> 250c820)

2021-03-21 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from da013d0  [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy 
in Pyspark
 add 250c820  [SPARK-34796][SQL][3.1] Initialize counter variable for LIMIT 
code-gen in doProduce()

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/limit.scala  | 12 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++
 2 files changed, 27 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c799d04 -> 2888d18)

2021-03-21 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c799d04  [SPARK-34810][TEST] Update PostgreSQL test with the latest 
results
 add 2888d18  [SPARK-34784][BUILD] Upgrade Jackson to 2.12.2

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 15 +++
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 15 +++
 pom.xml |  2 +-
 3 files changed, 15 insertions(+), 17 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26625] Add oauthToken to spark.redaction.regex

2021-03-21 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 7879a0c  [SPARK-26625] Add oauthToken to spark.redaction.regex
7879a0c is described below

commit 7879a0cabf6c589ba7fb2ec79ef9b2109deb6b18
Author: Vinoo Ganesh 
AuthorDate: Wed Jan 16 11:43:10 2019 -0800

[SPARK-26625] Add oauthToken to spark.redaction.regex

## What changes were proposed in this pull request?

The regex (spark.redaction.regex) that is used to decide which config 
properties or environment settings are sensitive should also include oauthToken 
to match  spark.kubernetes.authenticate.submission.oauthToken

## How was this patch tested?

Simple regex addition - happy to add a test if needed.

Author: Vinoo Ganesh 

Closes #23555 from vinooganesh/vinooganesh/SPARK-26625.

(cherry picked from commit 01301d09721cc12f1cc66ab52de3da117f5d33e6)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 559b0e1..91689df 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -358,7 +358,7 @@ package object config {
 "a property key or value, the value is redacted from the environment 
UI and various logs " +
 "like YARN and event logs.")
   .regexConf
-  .createWithDefault("(?i)secret|password".r)
+  .createWithDefault("(?i)secret|password|token".r)
 
   private[spark] val STRING_REDACTION_PATTERN =
 ConfigBuilder("spark.redaction.string.regex")

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org