cloud-fan commented on a change in pull request #27930: [SQL][MINOR] Update the 
DataFrameWriter.bucketBy comment
URL: https://github.com/apache/spark/pull/27930#discussion_r393440110
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
 ##########
 @@ -197,8 +197,8 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   /**
-   * Buckets the output by the given columns. If specified, the output is laid 
out on the file
-   * system similar to Hive's bucketing scheme.
+   * Buckets the output by the given columns. Note that the output follows a 
Spark SQL specific
+   * bucketing scheme based on the Hive scheme.
 
 Review comment:
   I feel the original one is clear enough as it says "similar to". Maybe we 
can add one more sentence: `..., but with a different bucket hash function and 
is not compatible with Hive's bucketing.`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to