[
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mihailo Milosevic updated SPARK-47210:
--
Description:
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL`
behaviour:
1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output
will be:
- if there are explicit collations and all of them are equal then that
collation will be the output
- if there are multiple different explicit collations
`COLLATION_MISMATCH.EXPLICIT` will be thrown
- if there are no explicit collations and only a single type of non default
collation, that one will be used
- if there are no explicit collations and multiple non-default implicit ones
`COLLATION_MISMATCH.IMPLICIT` will be thrown
*Why are the changes needed?*
We need to be able to compare columns and values with different collations and
set a way of explicitly changing the collation we want to use.
was:
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL`
behaviour:
1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output
will be:
- if there are explicit collations and all of them are equal then that
collation will be the output
- if there are multiple different explicit collations
`COLLATION_MISMATCH.EXPLICIT` will be thrown
- if there are no explicit collations and only a single type of non default
collation, that one will be used
- if there are no explicit collations and multiple non-default implicit ones
`COLLATION_MISMATCH.IMPLICIT` will be thrown
Another thing is that `INDETERMINATE_COLLATION` should only be thrown on
comparison operations, and we should be able to combine different implicit
collations for certain operations like concat and possible others in the future.
This is why I had to add another predefined collation id named
`INDETERMINATE_COLLATION_ID` which means that the result is a combination of
conflicting non-default implicit collations. Right now it has an id of -1 so it
fails if it ever goes to the `CollatorFactory`.
*Why are the changes needed?*
We need to be able to compare columns and values with different collations and
set a way of explicitly changing the collation we want to use.
> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
> Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL`
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the
> output will be:
> - if there are explicit collations and all of them are equal then that
> collation will be the output
> - if there are multiple different explicit collations
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations
> and set a way of explicitly changing the collation we want to use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org