Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11573#discussion_r55867164
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala 
---
    @@ -24,12 +27,15 @@ package org.apache.spark.sql.execution.datasources
      *
      * @param numBuckets number of buckets.
      * @param bucketColumnNames the names of the columns that used to generate 
the bucket id.
    - * @param sortColumnNames the names of the columns that used to sort data 
in each bucket.
    + * @param sortColumns (name, sort direction) of columns that used to sort 
data in each bucket.
      */
     private[sql] case class BucketSpec(
         numBuckets: Int,
         bucketColumnNames: Seq[String],
    -    sortColumnNames: Seq[String])
    +    sortColumns: Seq[(String, SortDirection)]) {
    --- End diff --
    
    It is great to have `SortDirection` defined explicitly. So, when we create 
it we know what will the order by.
    
    However, this change implies that we can use `descending` as the order, 
which is actually not allowed because `UnsafeKVExternalSorter` only sort keys 
with ascending order (this sorter is used in `DynamicPartitionWriterContainer` 
when we generate bucketing files). So, I think for now, it is better to revert 
this change. In future, we can revisit it when we store sort direction in 
metastore and we can actually sort rows in a bucket file with a descending 
order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to