Pavel Parkhomenko created SPARK-29004:
-----------------------------------------

             Summary: DataFrameWriter.save does not work with bucketBy [and 
sortBy]
                 Key: SPARK-29004
                 URL: https://issues.apache.org/jira/browse/SPARK-29004
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.4
            Reporter: Pavel Parkhomenko


bucketBy (and sortBy) does not work in DataFrameWriter at least for JSON (seems 
like it does not work for all file-based data sources) despite the 
documentation:

{noformat}
This is applicable for all file-based data sources (e.g. Parquet, JSON) 
starting with Spark 2.1.0.{noformat}

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter@bucketBy(numBuckets:Int,colName:String,colNames:String*):org.apache.spark.sql.DataFrameWriter[T]

Probably issue is here: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L253

Or documentation is wrong and bucketBy is not supported for file based sources.

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to