dejankrak-db opened a new pull request, #49084:
URL: https://github.com/apache/spark/pull/49084
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR:
https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a
faster review.
7. If you want to add a new configuration, please read the guideline first
for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
8. If you want to add or modify an error type or message, please read the
guideline first in
'common/utils/src/main/resources/error/README.md'.
-->
### What changes were proposed in this pull request?
This change introduces table and view level collations support in Spark SQL,
allowing CREATE TABLE, ALTER TABLE and CREATE VIEW commands to specify DEFAULT
COLLATION to be used. For CREATE commands, this refers to all the underlying
columns added as part of the table/view creation. For ALTER TABLE command, this
refers to only newly created columns in the future, whereas existing ones are
not affected, i.e. their collation remains the same.
### Why are the changes needed?
Per key customer feedback during SQL collations Private Preview, customers
would like to be able to specify collation for their objects, instead of each
individual columns. This change adds support for table and view level
collations, whereas subsequent changes will add support for other objects such
as schema-level collations.
### Does this PR introduce _any_ user-facing change?
The change follows the SQL Ref Spec for collation support
https://docs.google.com/document/d/1TGUU-Gyz7r7fQEVl4B-idh-uLu0RBZtFKkppzx23eww/edit?tab=t.0
The following syntax is now supported (**bold** parts denote additions):
{ { [CREATE OR] REPLACE TABLE | CREATE [EXTERNAL] TABLE [ IF NOT EXISTS ] }
table_name
[ table_specification ]
[ USING data_source ]
[ table_clauses ]
[ AS query ] }
table_specification
( { column_identifier column_type [ column_properties ] ] } [, ...]
[ , table_constraint ] [...] )
table_clauses
{ OPTIONS clause |
PARTITIONED BY clause |
CLUSTER BY clause |
clustered_by_clause |
LOCATION path [ WITH ( CREDENTIAL credential_name ) ] |
COMMENT table_comment |
TBLPROPERTIES clause |
**DEFAULT COLLATION table_collation_name |**
WITH { ROW FILTER clause } } [...]
CREATE [ OR REPLACE ] [ TEMPORARY ] VIEW [ IF NOT EXISTS ] view_name
[ column_list ]
[ schema_binding |
COMMENT view_comment |
TBLPROPERTIES clause |
**DEFAULT COLLATION collation_name** ] [...]
AS query
ALTER TABLE table_name
{ ADD COLUMN clause |
ALTER COLUMN clause |
DROP COLUMN clause |
RENAME COLUMN clause |
**DEFAULT COLLATION clause | …**
}
### How was this patch tested?
Tests for the new syntax/functionality were added as part of the change.
Also, some of the existing tests were extended/amended to cover the new DEFAULT
COLLATION for table/view objects.
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]