[spark] branch master updated: [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default

dongjoon Sun, 21 Mar 2021 14:34:55 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 3bc6fe4  [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits 
by default
3bc6fe4 is described below

commit 3bc6fe4e77e1791c0a20387240e93d0175e0fade
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Sun Mar 21 14:34:02 2021 -0700

    [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to enable `spark.hadoopRDD.ignoreEmptySplits` by default for 
Apache Spark 3.2.0.
    
    ### Why are the changes needed?
    
    Although this is a safe improvement, this hasn't been enabled by default to 
avoid the explicit behavior change. This PR aims to switch the default 
explicitly in Apache Spark 3.2.0.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the behavior change is documented.
    
    ### How was this patch tested?
    
    Pass the existing CIs.
    
    Closes #31909 from dongjoon-hyun/SPARK-34809.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/core-migration-guide.md                                       | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 6392431..6b1e3d0 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1037,7 +1037,7 @@ package object config {
       .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
empty input splits.")
       .version("2.3.0")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
 
   private[spark] val SECRET_REDACTION_PATTERN =
     ConfigBuilder("spark.redaction.regex")
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 232b9e3..e243b14 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -24,6 +24,8 @@ license: |
 
 ## Upgrading from Core 3.1 to 3.2
 
+- Since Spark 3.2, `spark.hadoopRDD.ignoreEmptySplits` is set to `true` by 
default which means Spark will not create empty partitions for empty input 
splits. To restore the behavior before Spark 3.2, you can set 
`spark.hadoopRDD.ignoreEmptySplits` to `false`.
+
 - Since Spark 3.2, `spark.eventLog.compression.codec` is set to `zstd` by 
default which means Spark will not fallback to use `spark.io.compression.codec` 
anymore.
 
 - Since Spark 3.2, `spark.storage.replication.proactive` is enabled by default 
which means Spark tries to replenish in case of the loss of cached RDD block 
replicas due to executor failures. To restore the behavior before Spark 3.2, 
you can set `spark.storage.replication.proactive` to `false`.

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default

Reply via email to