[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Description: 
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL` 
behaviour:

1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output 
will be:
 - if there are explicit collations and all of them are equal then that 
collation will be the output
 - if there are multiple different explicit collations 
`COLLATION_MISMATCH.EXPLICIT` will be thrown
 - if there are no explicit collations and only a single type of non default 
collation, that one will be used
 - if there are no explicit collations and multiple non-default implicit ones 
`COLLATION_MISMATCH.IMPLICIT` will be thrown

*Why are the changes needed?*
We need to be able to compare columns and values with different collations and 
set a way of explicitly changing the collation we want to use.

  was:
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL` 
behaviour:

1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output 
will be:
- if there are explicit collations and all of them are equal then that 
collation will be the output
- if there are multiple different explicit collations 
`COLLATION_MISMATCH.EXPLICIT` will be thrown
- if there are no explicit collations and only a single type of non default 
collation, that one will be used
- if there are no explicit collations and multiple non-default implicit ones 
`COLLATION_MISMATCH.IMPLICIT` will be thrown


Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
comparison operations, and we should be able to combine different implicit 
collations for certain operations like concat and possible others in the future.
This is why I had to add another predefined collation id named 
`INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
conflicting non-default implicit collations. Right now it has an id of -1 so it 
fails if it ever goes to the `CollatorFactory`.


*Why are the changes needed?*
We need to be able to compare columns and values with different collations and 
set a way of explicitly changing the collation we want to use.


> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
>  - if there are explicit collations and all of them are equal then that 
> collation will be the output
>  - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
>  - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
>  - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Summary: Addition of implicit casting without indeterminate support  (was: 
Implicit casting on collated expressions)

> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org