maropu commented on a change in pull request #29883:
URL: https://github.com/apache/spark/pull/29883#discussion_r495635757
##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT
table_comment after TBLPROPERTI
Partitions are created on the table, based on the columns specified.
+* **CLUSTERED BY**
+
+ Specifies bucket columns for bucketing table.
+
+* **SORTED BY**
+
+ Used to sort bucket column, we can combine with `ASC` for ascending order,
with `DESC` for descending order.
+
+* **INTO num_buckets BUCKETS**
+
+ Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
Review comment:
nit: redundant spaces found between `in`/`CLUSTER BY`. And, `in the
CLUSTERED BY clause`?
##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT
table_comment after TBLPROPERTI
Partitions are created on the table, based on the columns specified.
+* **CLUSTERED BY**
+
+ Specifies bucket columns for bucketing table.
Review comment:
`Specifies bucket column names for bucketing a table`?
##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -203,6 +218,17 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
STORED AS INPUTFORMAT
'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';
+
+--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
+CREATE TABLE TEST1(ID INT, AGE STRING)
+ CLUSTERED BY (ID)
+ INTO 4 BUCKETS
+
+--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
+CREATE TABLE TEST2(ID INT, NAME STRING)
Review comment:
ditto
##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -203,6 +218,17 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
STORED AS INPUTFORMAT
'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';
+
+--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
+CREATE TABLE TEST1(ID INT, AGE STRING)
Review comment:
nit: To follow the format of the other examples, `CREATE TABLE
clustered_by_test1 (ID ...`?
##########
File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
##########
@@ -65,6 +68,18 @@ as any order. For example, you can write COMMENT
table_comment after TBLPROPERTI
Partitions are created on the table, based on the columns specified.
+* **CLUSTERED BY**
+
+ Specifies bucket columns for bucketing table.
+
+* **SORTED BY**
+
+ Used to sort bucket column, we can combine with `ASC` for ascending order,
with `DESC` for descending order.
Review comment:
How about rephrasing it like this? `Specifies an ordering of bucket
columns. Optionally, one can use ASC for an ascending order or DESC for a
descending order after any column names in the SORTED BY clause. If not
specified, ASC is assumed by default.`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]