cloud-fan commented on a change in pull request #27930: [SQL][MINOR] Update the
DataFrameWriter.bucketBy comment
URL: https://github.com/apache/spark/pull/27930#discussion_r393440110
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -197,8 +197,8 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) {
}
/**
- * Buckets the output by the given columns. If specified, the output is laid
out on the file
- * system similar to Hive's bucketing scheme.
+ * Buckets the output by the given columns. Note that the output follows a
Spark SQL specific
+ * bucketing scheme based on the Hive scheme.
Review comment:
I feel the original one is clear enough as it says "similar to". Maybe we
can add one more sentence: `..., but with a different bucket hash function and
is not compatible with Hive's bucketing.`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]