[spark] branch branch-3.1 updated: [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 3889b71 [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar 3889b71 is described below commit 3889b7194a118cae9ae8330e560fb09b1c65b407 Author: Kousuke Saruta AuthorDate: Mon Mar 22 14:06:41 2021 +0900 [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar ### What changes were proposed in this pull request? This PR fixes an issue that `addFile` and `addJar` further encode even though a URI form string is passed. For example, the following operation will throw exception even though the file exists. ``` sc.addFile("file:/foo/test%20file.txt") ``` Another case is `--files` and `--jars` option when we submit an application. ``` bin/spark-shell --files "/foo/test file.txt" ``` The path above is transformed to URI form [here](https://github.com/apache/spark/blob/ecf4811764f1ef91954c865a864e0bf6691f99a6/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L400) and passed to `addFile` so the same issue happens. ### Why are the changes needed? This is a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #31718 from sarutak/fix-uri-encode-double. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit 0734101bb716b50aa675cee0da21a20692bb44d4) Signed-off-by: Kousuke Saruta --- .../main/scala/org/apache/spark/SparkContext.scala | 16 ++--- .../main/scala/org/apache/spark/util/Utils.scala | 11 ++ .../scala/org/apache/spark/SparkContextSuite.scala | 40 ++ 3 files changed, 62 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 17ceb5f..4d6dec3 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -1584,7 +1584,11 @@ class SparkContext(config: SparkConf) extends Logging { path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: Boolean = false ): Unit = { val uri = if (!isArchive) { - new Path(path).toUri + if (Utils.isAbsoluteURI(path) && path.contains("%")) { +new URI(path) + } else { +new Path(path).toUri + } } else { Utils.resolveURI(path) } @@ -1619,10 +1623,8 @@ class SparkContext(config: SparkConf) extends Logging { env.rpcEnv.fileServer.addFile(new File(uri.getPath)) } else if (uri.getScheme == null) { schemeCorrectedURI.toString -} else if (isArchive) { - uri.toString } else { - path + uri.toString } val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis @@ -1977,7 +1979,11 @@ class SparkContext(config: SparkConf) extends Logging { // For local paths with backslashes on Windows, URI throws an exception addLocalJarFile(new File(path)) } else { -val uri = new Path(path).toUri +val uri = if (Utils.isAbsoluteURI(path) && path.contains("%")) { + new URI(path) +} else { + new Path(path).toUri +} // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) uri.getScheme match { diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 080d3bb..1643aa6 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2065,6 +2065,17 @@ private[spark] object Utils extends Logging { } } + /** Check whether a path is an absolute URI. */ + def isAbsoluteURI(path: String): Boolean = { +try { + val uri = new URI(path: String) + uri.isAbsolute +} catch { + case _: URISyntaxException => +false +} + } + /** Return all non-local paths from a comma-separated list of paths. */ def nonLocalPaths(paths: String, testWindows: Boolean = false): Array[String] = { val windows = isWindows || testWindows diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index 0c0a9b8..c4bcccf 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala @@ -1069,6 +1069,46 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
[spark] branch master updated: [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0734101 [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar 0734101 is described below commit 0734101bb716b50aa675cee0da21a20692bb44d4 Author: Kousuke Saruta AuthorDate: Mon Mar 22 14:06:41 2021 +0900 [SPARK-34225][CORE] Don't encode further when a URI form string is passed to addFile or addJar ### What changes were proposed in this pull request? This PR fixes an issue that `addFile` and `addJar` further encode even though a URI form string is passed. For example, the following operation will throw exception even though the file exists. ``` sc.addFile("file:/foo/test%20file.txt") ``` Another case is `--files` and `--jars` option when we submit an application. ``` bin/spark-shell --files "/foo/test file.txt" ``` The path above is transformed to URI form [here](https://github.com/apache/spark/blob/ecf4811764f1ef91954c865a864e0bf6691f99a6/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L400) and passed to `addFile` so the same issue happens. ### Why are the changes needed? This is a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #31718 from sarutak/fix-uri-encode-double. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- .../main/scala/org/apache/spark/SparkContext.scala | 16 ++--- .../main/scala/org/apache/spark/util/Utils.scala | 11 ++ .../scala/org/apache/spark/SparkContextSuite.scala | 40 ++ 3 files changed, 62 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 685ce55..b0a5b7c 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -1584,7 +1584,11 @@ class SparkContext(config: SparkConf) extends Logging { path: String, recursive: Boolean, addedOnSubmit: Boolean, isArchive: Boolean = false ): Unit = { val uri = if (!isArchive) { - new Path(path).toUri + if (Utils.isAbsoluteURI(path) && path.contains("%")) { +new URI(path) + } else { +new Path(path).toUri + } } else { Utils.resolveURI(path) } @@ -1619,10 +1623,8 @@ class SparkContext(config: SparkConf) extends Logging { env.rpcEnv.fileServer.addFile(new File(uri.getPath)) } else if (uri.getScheme == null) { schemeCorrectedURI.toString -} else if (isArchive) { - uri.toString } else { - path + uri.toString } val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis @@ -1977,7 +1979,11 @@ class SparkContext(config: SparkConf) extends Logging { // For local paths with backslashes on Windows, URI throws an exception (addLocalJarFile(new File(path)), "local") } else { -val uri = new Path(path).toUri +val uri = if (Utils.isAbsoluteURI(path) && path.contains("%")) { + new URI(path) +} else { + new Path(path).toUri +} // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) val uriScheme = uri.getScheme diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index eebd009..e27666b 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2063,6 +2063,17 @@ private[spark] object Utils extends Logging { } } + /** Check whether a path is an absolute URI. */ + def isAbsoluteURI(path: String): Boolean = { +try { + val uri = new URI(path: String) + uri.isAbsolute +} catch { + case _: URISyntaxException => +false +} + } + /** Return all non-local paths from a comma-separated list of paths. */ def nonLocalPaths(paths: String, testWindows: Boolean = false): Array[String] = { val windows = isWindows || testWindows diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index 0ba2a03..42b9b0e 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala @@ -1197,6 +1197,46 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu assert(sc.hadoopConfiguration.get(bufferKey).toInt === 65536, "spark configs have higher priority than
[spark] branch master updated (47da944 -> f4de93e)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 47da944 [SPARK-34470][ML] VectorSlicer utilize ordering if possible add f4de93e [MINOR][SQL] Spelling: filters - PushedFilers No new revisions were added by this update. Summary of changes: .../avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala | 2 +- .../avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 2 +- .../org/apache/spark/sql/execution/datasources/v2/csv/CSVScan.scala | 2 +- .../org/apache/spark/sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../spark/sql/execution/datasources/v2/parquet/ParquetScan.scala| 2 +- sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala | 6 +++--- 6 files changed, 8 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zhengruifeng commented on pull request #328: Add Attila Zsolt Piros to committers
zhengruifeng commented on pull request #328: URL: https://github.com/apache/spark-website/pull/328#issuecomment-803711340 Congrats! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zhengruifeng commented on pull request #326: Add Yi Wu to committers' list
zhengruifeng commented on pull request #326: URL: https://github.com/apache/spark-website/pull/326#issuecomment-803711163 Congrats! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zhengruifeng commented on pull request #327: Add Maciej Szymkiewicz to committers
zhengruifeng commented on pull request #327: URL: https://github.com/apache/spark-website/pull/327#issuecomment-803710949 Congrats! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zhengruifeng commented on pull request #325: Add Kent Yao to committers
zhengruifeng commented on pull request #325: URL: https://github.com/apache/spark-website/pull/325#issuecomment-803710825 Congrats! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5fd94f -> 47da944)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5fd94f [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment add 47da944 [SPARK-34470][ML] VectorSlicer utilize ordering if possible No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/linalg/Vectors.scala | 56 -- .../org/apache/spark/ml/linalg/VectorsSuite.scala | 7 +++ .../org/apache/spark/ml/feature/VectorSlicer.scala | 15 +++--- 3 files changed, 56 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e28296e [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment e28296e is described below commit e28296e1905ee34ce93847e0ee03d8b70b161ef7 Author: Dongjoon Hyun AuthorDate: Sun Mar 21 17:59:55 2021 -0700 [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment ### What changes were proposed in this pull request? This PR aims to disable a new test case using Hive 1.2.1 from Java9+ test environment. ### Why are the changes needed? [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113) upgraded Datanucleus to 4.x at Hive 2.0. Datanucleus 3.x doesn't support Java9+. **Java 9+ Environment** ``` $ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive ... [info] *** 1 TEST FAILED *** [error] Failed: Total 1, Failed 1, Errors 0, Passed 0 [error] Failed tests: [error] org.apache.spark.sql.hive.HiveSparkSubmitSuite [error] (hive / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 328 s (05:28), completed Mar 21, 2021, 5:32:39 PM ``` ### Does this PR introduce _any_ user-facing change? Fix the UT in Java9+ environment. ### How was this patch tested? Manually. ``` $ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive ... [info] HiveSparkSubmitSuite: [info] - SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark classloader instead of context !!! CANCELED !!! (26 milliseconds) [info] org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) was true (HiveSparkSubmitSuite.scala:344) ``` Closes #31916 from dongjoon-hyun/SPARK-HiveSparkSubmitSuite. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit c5fd94f1197faf8a974c7d7745cdebf42b3430b9) Signed-off-by: Dongjoon Hyun --- .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala index 09f9f9e..f9ea4e3 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala @@ -339,6 +339,7 @@ class HiveSparkSubmitSuite test("SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark classloader " + "instead of context") { +assume(!SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) val unusedJar = TestUtils.createJarWithClasses(Seq.empty) // We need to specify the metastore database location in case of conflict with other hive - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 3767aac [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment 3767aac is described below commit 3767aac9772b5377f98edb906d1abaf9dd4dcab7 Author: Dongjoon Hyun AuthorDate: Sun Mar 21 17:59:55 2021 -0700 [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment ### What changes were proposed in this pull request? This PR aims to disable a new test case using Hive 1.2.1 from Java9+ test environment. ### Why are the changes needed? [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113) upgraded Datanucleus to 4.x at Hive 2.0. Datanucleus 3.x doesn't support Java9+. **Java 9+ Environment** ``` $ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive ... [info] *** 1 TEST FAILED *** [error] Failed: Total 1, Failed 1, Errors 0, Passed 0 [error] Failed tests: [error] org.apache.spark.sql.hive.HiveSparkSubmitSuite [error] (hive / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 328 s (05:28), completed Mar 21, 2021, 5:32:39 PM ``` ### Does this PR introduce _any_ user-facing change? Fix the UT in Java9+ environment. ### How was this patch tested? Manually. ``` $ build/sbt "hive/testOnly *.HiveSparkSubmitSuite -- -z SPARK-34772" -Phive ... [info] HiveSparkSubmitSuite: [info] - SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark classloader instead of context !!! CANCELED !!! (26 milliseconds) [info] org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) was true (HiveSparkSubmitSuite.scala:344) ``` Closes #31916 from dongjoon-hyun/SPARK-HiveSparkSubmitSuite. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit c5fd94f1197faf8a974c7d7745cdebf42b3430b9) Signed-off-by: Dongjoon Hyun --- .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala index a3bff6b..426d93b 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala @@ -340,6 +340,7 @@ class HiveSparkSubmitSuite test("SPARK-34772: RebaseDateTime loadRebaseRecords should use Spark classloader " + "instead of context") { +assume(!SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) val unusedJar = TestUtils.createJarWithClasses(Seq.empty) // We need to specify the metastore database location in case of conflict with other hive - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3bc6fe4 -> c5fd94f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3bc6fe4 [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default add c5fd94f [SPARK-34772][TESTS][FOLLOWUP] Disable a test case using Hive 1.2.1 in Java9+ environment No new revisions were added by this update. Summary of changes: .../src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3bc6fe4 [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default 3bc6fe4 is described below commit 3bc6fe4e77e1791c0a20387240e93d0175e0fade Author: Dongjoon Hyun AuthorDate: Sun Mar 21 14:34:02 2021 -0700 [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default ### What changes were proposed in this pull request? This PR aims to enable `spark.hadoopRDD.ignoreEmptySplits` by default for Apache Spark 3.2.0. ### Why are the changes needed? Although this is a safe improvement, this hasn't been enabled by default to avoid the explicit behavior change. This PR aims to switch the default explicitly in Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? Yes, the behavior change is documented. ### How was this patch tested? Pass the existing CIs. Closes #31909 from dongjoon-hyun/SPARK-34809. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- docs/core-migration-guide.md | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 6392431..6b1e3d0 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1037,7 +1037,7 @@ package object config { .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for empty input splits.") .version("2.3.0") .booleanConf - .createWithDefault(false) + .createWithDefault(true) private[spark] val SECRET_REDACTION_PATTERN = ConfigBuilder("spark.redaction.regex") diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 232b9e3..e243b14 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -24,6 +24,8 @@ license: | ## Upgrading from Core 3.1 to 3.2 +- Since Spark 3.2, `spark.hadoopRDD.ignoreEmptySplits` is set to `true` by default which means Spark will not create empty partitions for empty input splits. To restore the behavior before Spark 3.2, you can set `spark.hadoopRDD.ignoreEmptySplits` to `false`. + - Since Spark 3.2, `spark.eventLog.compression.codec` is set to `zstd` by default which means Spark will not fallback to use `spark.io.compression.codec` anymore. - Since Spark 3.2, `spark.storage.replication.proactive` is enabled by default which means Spark tries to replenish in case of the loss of cached RDD block replicas due to executor failures. To restore the behavior before Spark 3.2, you can set `spark.storage.replication.proactive` to `false`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (8dea9b8 -> d1de69f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8dea9b8 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token add d1de69f [SPARK-34813][INFRA][3.1] Remove Scala 2.13 build GitHub Action job from branch-3.1 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 22 -- 1 file changed, 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 29b981b [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token 29b981b is described below commit 29b981b3feacadabf5cef4793506c20897ccd535 Author: Dongjoon Hyun AuthorDate: Sun Mar 21 14:08:34 2021 -0700 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token ### What changes were proposed in this pull request? Like we redact secrets and tokens, this PR aims to redact access key. ### Why are the changes needed? Access key is also worth to hide. ### Does this PR introduce _any_ user-facing change? This will hide this information from SparkUI (`Spark Properties` and `Hadoop Properties` and logs). ### How was this patch tested? Pass the newly updated UT. Closes #31912 from dongjoon-hyun/SPARK-34811. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 91689df..4e5b8f0 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -358,7 +358,7 @@ package object config { "a property key or value, the value is redacted from the environment UI and various logs " + "like YARN and event logs.") .regexConf - .createWithDefault("(?i)secret|password|token".r) + .createWithDefault("(?i)secret|password|token|access[.]key".r) private[spark] val STRING_REDACTION_PATTERN = ConfigBuilder("spark.redaction.string.regex") diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 00f96b9..a1a6721 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1013,11 +1013,13 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { // Set some secret keys val secretKeys = Seq( "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD", + "spark.hadoop.fs.s3a.access.key", "spark.my.password", "spark.my.sECreT") secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") } // Set a non-secret key sparkConf.set("spark.regular.property", "regular_value") +sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value") // Set a property with a regular key but secret in the value sparkConf.set("spark.sensitive.property", "has_secret_in_value") @@ -1028,7 +1030,8 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) } assert(redactedConf("spark.regular.property") === "regular_value") assert(redactedConf("spark.sensitive.property") === Utils.REDACTION_REPLACEMENT_TEXT) - +assert(redactedConf("spark.hadoop.fs.s3a.access.key") === Utils.REDACTION_REPLACEMENT_TEXT) +assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value") } test("tryWithSafeFinally") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a42b631 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token a42b631 is described below commit a42b6311cffac8f7130d7d86857dfc48e0121ff7 Author: Dongjoon Hyun AuthorDate: Sun Mar 21 14:08:34 2021 -0700 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token ### What changes were proposed in this pull request? Like we redact secrets and tokens, this PR aims to redact access key. ### Why are the changes needed? Access key is also worth to hide. ### Does this PR introduce _any_ user-facing change? This will hide this information from SparkUI (`Spark Properties` and `Hadoop Properties` and logs). ### How was this patch tested? Pass the newly updated UT. Closes #31912 from dongjoon-hyun/SPARK-34811. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 0afbb52..0227dcc 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -926,7 +926,7 @@ package object config { "like YARN and event logs.") .version("2.1.2") .regexConf - .createWithDefault("(?i)secret|password|token".r) + .createWithDefault("(?i)secret|password|token|access[.]key".r) private[spark] val STRING_REDACTION_PATTERN = ConfigBuilder("spark.redaction.string.regex") diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 931eb6b..44f578a 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1024,11 +1024,13 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { // Set some secret keys val secretKeys = Seq( "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD", + "spark.hadoop.fs.s3a.access.key", "spark.my.password", "spark.my.sECreT") secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") } // Set a non-secret key sparkConf.set("spark.regular.property", "regular_value") +sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value") // Set a property with a regular key but secret in the value sparkConf.set("spark.sensitive.property", "has_secret_in_value") @@ -1039,7 +1041,8 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) } assert(redactedConf("spark.regular.property") === "regular_value") assert(redactedConf("spark.sensitive.property") === Utils.REDACTION_REPLACEMENT_TEXT) - +assert(redactedConf("spark.hadoop.fs.s3a.access.key") === Utils.REDACTION_REPLACEMENT_TEXT) +assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value") } test("redact sensitive information in command line args") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 8dea9b8 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token 8dea9b8 is described below commit 8dea9b8071609c6690bd8f3e9d5621d7bed4786a Author: Dongjoon Hyun AuthorDate: Sun Mar 21 14:08:34 2021 -0700 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token ### What changes were proposed in this pull request? Like we redact secrets and tokens, this PR aims to redact access key. ### Why are the changes needed? Access key is also worth to hide. ### Does this PR introduce _any_ user-facing change? This will hide this information from SparkUI (`Spark Properties` and `Hadoop Properties` and logs). ### How was this patch tested? Pass the newly updated UT. Closes #31912 from dongjoon-hyun/SPARK-34811. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3c32b54a0fbdc55c503bc72a3d39d58bf99e3bfa) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index f6de5e4..3daa9f5 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1015,7 +1015,7 @@ package object config { "like YARN and event logs.") .version("2.1.2") .regexConf - .createWithDefault("(?i)secret|password|token".r) + .createWithDefault("(?i)secret|password|token|access[.]key".r) private[spark] val STRING_REDACTION_PATTERN = ConfigBuilder("spark.redaction.string.regex") diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 18ff960..208e729 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1024,11 +1024,13 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { // Set some secret keys val secretKeys = Seq( "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD", + "spark.hadoop.fs.s3a.access.key", "spark.my.password", "spark.my.sECreT") secretKeys.foreach { key => sparkConf.set(key, "sensitive_value") } // Set a non-secret key sparkConf.set("spark.regular.property", "regular_value") +sparkConf.set("spark.hadoop.fs.s3a.access_key", "regular_value") // Set a property with a regular key but secret in the value sparkConf.set("spark.sensitive.property", "has_secret_in_value") @@ -1039,7 +1041,8 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) } assert(redactedConf("spark.regular.property") === "regular_value") assert(redactedConf("spark.sensitive.property") === Utils.REDACTION_REPLACEMENT_TEXT) - +assert(redactedConf("spark.hadoop.fs.s3a.access.key") === Utils.REDACTION_REPLACEMENT_TEXT) +assert(redactedConf("spark.hadoop.fs.s3a.access_key") === "regular_value") } test("redact sensitive information in command line args") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2888d18 -> 3c32b54)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2888d18 [SPARK-34784][BUILD] Upgrade Jackson to 2.12.2 add 3c32b54 [SPARK-34811][CORE] Redact fs.s3a.access.key like secret and token No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (da013d0 -> 250c820)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from da013d0 [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark add 250c820 [SPARK-34796][SQL][3.1] Initialize counter variable for LIMIT code-gen in doProduce() No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/limit.scala | 12 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c799d04 -> 2888d18)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c799d04 [SPARK-34810][TEST] Update PostgreSQL test with the latest results add 2888d18 [SPARK-34784][BUILD] Upgrade Jackson to 2.12.2 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 15 +++ dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 15 +++ pom.xml | 2 +- 3 files changed, 15 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-26625] Add oauthToken to spark.redaction.regex
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7879a0c [SPARK-26625] Add oauthToken to spark.redaction.regex 7879a0c is described below commit 7879a0cabf6c589ba7fb2ec79ef9b2109deb6b18 Author: Vinoo Ganesh AuthorDate: Wed Jan 16 11:43:10 2019 -0800 [SPARK-26625] Add oauthToken to spark.redaction.regex ## What changes were proposed in this pull request? The regex (spark.redaction.regex) that is used to decide which config properties or environment settings are sensitive should also include oauthToken to match spark.kubernetes.authenticate.submission.oauthToken ## How was this patch tested? Simple regex addition - happy to add a test if needed. Author: Vinoo Ganesh Closes #23555 from vinooganesh/vinooganesh/SPARK-26625. (cherry picked from commit 01301d09721cc12f1cc66ab52de3da117f5d33e6) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 559b0e1..91689df 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -358,7 +358,7 @@ package object config { "a property key or value, the value is redacted from the environment UI and various logs " + "like YARN and event logs.") .regexConf - .createWithDefault("(?i)secret|password".r) + .createWithDefault("(?i)secret|password|token".r) private[spark] val STRING_REDACTION_PATTERN = ConfigBuilder("spark.redaction.string.regex") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org