[spark] branch branch-3.0 updated: [SPARK-30881][SQL][DOCS] Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold

gengliang Thu, 20 Feb 2020 01:02:37 -0800

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new b38b237  [SPARK-30881][SQL][DOCS] Revise the doc of 
spark.sql.sources.parallelPartitionDiscovery.threshold
b38b237 is described below

commit b38b237ac22c08ce6aadb041b2bf1b78f5f5db75
Author: Gengliang Wang <gengliang.w...@databricks.com>
AuthorDate: Thu Feb 20 00:59:22 2020 -0800

    [SPARK-30881][SQL][DOCS] Revise the doc of 
spark.sql.sources.parallelPartitionDiscovery.threshold
    
    ### What changes were proposed in this pull request?
    
    Revise the doc of SQL configuration 
`spark.sql.sources.parallelPartitionDiscovery.threshold`.
    ### Why are the changes needed?
    
    The doc of configuration 
"spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the 
part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".
    
    We should revise it as effective on all the file-based data sources.
    
    ### Does this PR introduce any user-facing change?
    
    No
    
    ### How was this patch tested?
    
    None. It's just doc.
    
    Closes #27639 from gengliangwang/reviseParallelPartitionDiscovery.
    
    Authored-by: Gengliang Wang <gengliang.w...@databricks.com>
    Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com>
    (cherry picked from commit 92d5d40c8efffd90eb04308b6dad77d3d1e1be14)
    Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com>
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala        | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 612fa86..9cbaaee 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -878,8 +878,8 @@ object SQLConf {
     buildConf("spark.sql.sources.parallelPartitionDiscovery.threshold")
       .doc("The maximum number of paths allowed for listing files at driver 
side. If the number " +
         "of detected paths exceeds this value during partition discovery, it 
tries to list the " +
-        "files with another Spark distributed job. This applies to Parquet, 
ORC, CSV, JSON and " +
-        "LibSVM data sources.")
+        "files with another Spark distributed job. This configuration is 
effective only when " +
+        "using file-based sources such as Parquet, JSON and ORC.")
       .intConf
       .checkValue(parallel => parallel >= 0, "The maximum number of paths 
allowed for listing " +
         "files at driver side must not be negative")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30881][SQL][DOCS] Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold

Reply via email to