Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 )
Change subject: IMPALA-3119: DDL support for bucketed tables ...................................................................... Patch Set 18: > Patch Set 18: > > Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using > Cluster by and so does Hive - > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy > https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html Hi Manish, glad to see your comment. In Hive and Spark, "clustered by " is used to specify the bucketed fields and number of buckets when the table is created. In select syntax, "cluster by" ensures each of N reducers gets non-overlapping ranges , then sorts by those ranges at the reducers. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html https://stackoverflow.com/questions/34495981/difference-between-cluster-by-and-clustered-by-in-hive -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 18 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Manish Maheshwari <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 08 Nov 2022 02:29:35 +0000 Gerrit-HasComments: No
