(spark) branch master updated: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing

dongjoon Mon, 25 Mar 2024 16:06:21 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 87cae7bc7870 [SPARK-47552][CORE] Set 
`spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing
87cae7bc7870 is described below

commit 87cae7bc7870bacafc6afad99ba86a6efca2a464
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Mon Mar 25 16:06:03 2024 -0700

    [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` 
to 30s if missing
    
    ### What changes were proposed in this pull request?
    
    This PR aims to handle HADOOP-19097 from Apache Spark side. We can remove 
this when Apache Hadoop `3.4.1` releases.
    - https://github.com/apache/hadoop/pull/6601
    
    ### Why are the changes needed?
    
    Apache Hadoop shows a warning to its default configuration. This default 
value issue is fixed at Apache Spark 3.4.1.
    ```
    24/03/25 14:46:21 WARN ConfigurationHelper: Option 
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms 
instead
    ```
    
    This change will suppress Apache Hadoop default warning in the consistent 
way with the future Hadoop releases.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    Manually.
    
    **BUILD**
    ```
    $ dev/make-distribution.sh -Phadoop-cloud
    ```
    
    **BEFORE**
    ```
    scala> 
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
    ...
    24/03/25 15:50:46 WARN ConfigurationHelper: Option 
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms 
instead
    ```
    
    **AFTER**
    ```
    scala> 
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
    ...(ConfigurationHelper warning is gone)...
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45710 from dongjoon-hyun/SPARK-47552.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 core/src/main/scala/org/apache/spark/SparkContext.scala | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d519617c4095..f8f0107ed139 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -417,6 +417,9 @@ class SparkContext(config: SparkConf) extends Logging {
     if (!_conf.contains("spark.app.name")) {
       throw new SparkException("An application name must be set in your 
configuration")
     }
+    // HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s
+    // We can remove this after Apache Hadoop 3.4.1 releases
+    conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", 
"30s")
     // This should be set as early as possible.
     SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing

Reply via email to