[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Baike Xia (Code Review) Mon, 07 Nov 2022 18:29:47 -0800

Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )


Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 18:

> Patch Set 18:
>
> Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using 
> Cluster by and so does Hive - 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
> https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html

Hi Manish, glad to see your comment.
In Hive and Spark, "clustered by " is used to specify the bucketed fields and 
number of buckets when the table is created. In select syntax, "cluster by" 
ensures each of N reducers gets non-overlapping ranges , then sorts by those 
ranges at the reducers.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html
https://stackoverflow.com/questions/34495981/difference-between-cluster-by-and-clustered-by-in-hive


--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Manish Maheshwari <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 08 Nov 2022 02:29:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Reply via email to