[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47927:
---

Assignee: Emil Ejbyfeldt

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47927.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46156
[https://github.com/apache/spark/pull/46156]

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48019:
---

Assignee: Gene Pang

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48019.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46254
[https://github.com/apache/spark/pull/46254]

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47476:
---

Assignee: Uroš Bojanić

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47476.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45704
[https://github.com/apache/spark/pull/45704]

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47351.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46165
[https://github.com/apache/spark/pull/46165]

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47351:
---

Assignee: Uroš Bojanić

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47350.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46158
[https://github.com/apache/spark/pull/46158]

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47922) Implement try_parse_json

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47922.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46141
[https://github.com/apache/spark/pull/46141]

> Implement try_parse_json
> 
>
> Key: SPARK-47922
> URL: https://issues.apache.org/jira/browse/SPARK-47922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement try_parse_json expression that runs parse_json on valid string 
> inputs and returns null when the input string is malformed. Note that this 
> expression also only supports string input types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47958.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46187
[https://github.com/apache/spark/pull/46187]

> Task Scheduler may not know about executor when using LocalSchedulerBackend
> ---
>
> Key: SPARK-47958
> URL: https://issues.apache.org/jira/browse/SPARK-47958
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When using LocalSchedulerBackend, the task scheduler will not know about the 
> executor until a task is run, which can lead to unexpected behavior in tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47764:
---

Assignee: Bo Zhang

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47764.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45930
[https://github.com/apache/spark/pull/45930]

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47418.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46181
[https://github.com/apache/spark/pull/46181]

> Optimize string predicate expressions for UTF8_BINARY_LCASE collation
> -
>
> Key: SPARK-47418
> URL: https://issues.apache.org/jira/browse/SPARK-47418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string 
> Spark functions using optimized lowercase comparison approach introduced by 
> [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to 
> the latest design and code structure imposed by [~uros-db] in 
> https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation 
> support is introduced for Spark SQL expressions. In addition, review previous 
> Jira tickets under the current parent in order to understand how 
> *StringPredicate* expressions are currently used and tested in Spark:
>  * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131]
>  * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248]
>  * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295]
> These tickets should help you understand what changes were introduced in 
> order to enable collation support for these functions. Lastly, feel free to 
> use your chosen Spark SQL Editor to play around with the existing functions 
> and learn more about how they work.
>  
> The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE 
> implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* 
> functions so that they use optimized lowercase comparison approach (following 
> the general logic in Nikola's PR), and benchmark the results accordingly. As 
> for testing, the currently existing unit test cases and end-to-end tests 
> should already fully cover the expected behaviour of *StringPredicate* 
> expressions for all collation types. In other words, the objective of this 
> ticket is only to enhance the internal implementation, without introducing 
> any user-facing changes to Spark SQL API.
>  
> Finally, feel free to refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47873:
---

Assignee: Stefan Kandic

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47873.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46083
[https://github.com/apache/spark/pull/46083]

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference

2024-04-23 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47956:
---

 Summary: sanity check for unresolved LCA reference
 Key: SPARK-47956
 URL: https://issues.apache.org/jira/browse/SPARK-47956
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47352.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46104
[https://github.com/apache/spark/pull/46104]

> Fix Upper, Lower, InitCap collation awareness
> -
>
> Key: SPARK-47352
> URL: https://issues.apache.org/jira/browse/SPARK-47352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47412.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46041
[https://github.com/apache/spark/pull/46041]

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47412:
---

Assignee: Gideon P

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47411:
---

Assignee: Milan Dankovic

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47411.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45643
[https://github.com/apache/spark/pull/45643]

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47900) Fix check for implicit collation

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47900.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46116
[https://github.com/apache/spark/pull/46116]

> Fix check for implicit collation
> 
>
> Key: SPARK-47900
> URL: https://issues.apache.org/jira/browse/SPARK-47900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47413:
---

Assignee: Gideon P

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47413.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46040
[https://github.com/apache/spark/pull/46040]

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47902.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46120
[https://github.com/apache/spark/pull/46120]

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47902:
---

Assignee: Aleksandar Tomic

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46935) Consolidate error documentation

2024-04-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46935.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44971
[https://github.com/apache/spark/pull/44971]

> Consolidate error documentation
> ---
>
> Key: SPARK-46935
> URL: https://issues.apache.org/jira/browse/SPARK-46935
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46935) Consolidate error documentation

2024-04-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46935:
---

Assignee: Nicholas Chammas

> Consolidate error documentation
> ---
>
> Key: SPARK-46935
> URL: https://issues.apache.org/jira/browse/SPARK-46935
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.

2024-04-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-47463:

Fix Version/s: 3.5.2

> An error occurred while pushing down the filter of if expression for iceberg 
> datasource.
> 
>
> Key: SPARK-47463
> URL: https://issues.apache.org/jira/browse/SPARK-47463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
> Environment: Spark 3.5.0
> Iceberg 1.4.3
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> Reproduce:
> {code:java}
> create table t1(c1 int) using iceberg;
> select * from
> (select if(c1 = 1, c1, null) as c1 from t1) t
> where t.c1 > 0; {code}
> Error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase 
> optimization failed with an internal error. You hit a bug in Spark or the 
> Spark plugins you use. Please, report this bug to the corresponding 
> communities or vendors, and provide the full stack trace.
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:107)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
>   at 
> org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
>   at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
>   at 
> org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> 

[jira] [Resolved] (SPARK-47895) group by all should be idempotent

2024-04-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47895.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46113
[https://github.com/apache/spark/pull/46113]

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47895) group by all should be idempotent

2024-04-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47895:
---

Assignee: Wenchen Fan

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47895) group by all should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-47895:

Summary: group by all should be idempotent  (was: group by ordinal should 
be idempotent)

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47895) group by ordinal should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47895:
---

 Summary: group by ordinal should be idempotent
 Key: SPARK-47895
 URL: https://issues.apache.org/jira/browse/SPARK-47895
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47839:
---

Assignee: Kelvin Jiang

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47839.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46034
[https://github.com/apache/spark/pull/46034]

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47846.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46046
[https://github.com/apache/spark/pull/46046]

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47846:
---

Assignee: Harsh Motwani

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47360) Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations)

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47360.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46003
[https://github.com/apache/spark/pull/46003]

> Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck 
> (all collations)
> --
>
> Key: SPARK-47360
> URL: https://issues.apache.org/jira/browse/SPARK-47360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47416) Add benchmark for stringpredicate expressions

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47416.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46078
[https://github.com/apache/spark/pull/46078]

> Add benchmark for stringpredicate expressions
> -
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47863.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46097
[https://github.com/apache/spark/pull/46097]

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47863:
---

Assignee: Vladimir Golubev

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47822:
---

Assignee: Harsh Motwani

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47822.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46017
[https://github.com/apache/spark/pull/46017]

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47821) Add is_variant_null expression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47821.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46011
[https://github.com/apache/spark/pull/46011]

> Add is_variant_null expression
> --
>
> Key: SPARK-47821
> URL: https://issues.apache.org/jira/browse/SPARK-47821
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> adds a `is_variant_null` expression, which returns whether a given variant 
> value represents a variant null (note the difference between a variant null 
> and an engine null)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47867) Support Variant in JSON scan.

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47867.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46071
[https://github.com/apache/spark/pull/46071]

> Support Variant in JSON scan.
> -
>
> Key: SPARK-47867
> URL: https://issues.apache.org/jira/browse/SPARK-47867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47417:
---

Assignee: Nikola Mandic

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47417.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45933
[https://github.com/apache/spark/pull/45933]

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47356.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46061
[https://github.com/apache/spark/pull/46061]

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47356:
---

Assignee: Mihailo Milosevic

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47420) Fix CollationSupport test output

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47420.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46058
[https://github.com/apache/spark/pull/46058]

> Fix CollationSupport test output
> 
>
> Key: SPARK-47420
> URL: https://issues.apache.org/jira/browse/SPARK-47420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47769.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45934
[https://github.com/apache/spark/pull/45934]

> Add schema_of_variant_agg expression.
> -
>
> Key: SPARK-47769
> URL: https://issues.apache.org/jira/browse/SPARK-47769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47769:
---

Assignee: Chenhao Li

> Add schema_of_variant_agg expression.
> -
>
> Key: SPARK-47769
> URL: https://issues.apache.org/jira/browse/SPARK-47769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47463.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45589
[https://github.com/apache/spark/pull/45589]

> An error occurred while pushing down the filter of if expression for iceberg 
> datasource.
> 
>
> Key: SPARK-47463
> URL: https://issues.apache.org/jira/browse/SPARK-47463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
> Environment: Spark 3.5.0
> Iceberg 1.4.3
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Reproduce:
> {code:java}
> create table t1(c1 int) using iceberg;
> select * from
> (select if(c1 = 1, c1, null) as c1 from t1) t
> where t.c1 > 0; {code}
> Error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase 
> optimization failed with an internal error. You hit a bug in Spark or the 
> Spark plugins you use. Please, report this bug to the corresponding 
> communities or vendors, and provide the full stack trace.
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:107)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
>   at 
> org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
>   at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
>   at 
> org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.AssertionError: assertion failed

[jira] [Resolved] (SPARK-46810) Clarify error class terminology

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46810.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44902
[https://github.com/apache/spark/pull/44902]

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It aligns most closely to the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "category" is low impact and 
> may not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms do not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> h1. Option 3: "Error Class" and "State Class"
>  * SQL state class: 42
>  * SQL state sub-class: K01
>  * SQL state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change 

[jira] [Assigned] (SPARK-46810) Clarify error class terminology

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46810:
---

Assignee: Nicholas Chammas

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It aligns most closely to the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "category" is low impact and 
> may not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms do not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> h1. Option 3: "Error Class" and "State Class"
>  * SQL state class: 42
>  * SQL state sub-class: K01
>  * SQL state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "state class" is low impact 
> and may not show up in user-facing documentation at all. (See my 

[jira] [Resolved] (SPARK-47357) Add support for Upper, Lower, InitCap (all collations)

2024-04-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47357.
-
Fix Version/s: 4.0.0
 Assignee: Mihailo Milosevic
   Resolution: Fixed

> Add support for Upper, Lower, InitCap (all collations)
> --
>
> Key: SPARK-47357
> URL: https://issues.apache.org/jira/browse/SPARK-47357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47803) Support cast to variant.

2024-04-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47803.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45989
[https://github.com/apache/spark/pull/45989]

> Support cast to variant.
> 
>
> Key: SPARK-47803
> URL: https://issues.apache.org/jira/browse/SPARK-47803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47765) Add SET COLLATION to parser rules

2024-04-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47765.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45946
[https://github.com/apache/spark/pull/45946]

> Add SET COLLATION to parser rules
> -
>
> Key: SPARK-47765
> URL: https://issues.apache.org/jira/browse/SPARK-47765
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47765) Add SET COLLATION to parser rules

2024-04-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47765:
---

Assignee: Mihailo Milosevic

> Add SET COLLATION to parser rules
> -
>
> Key: SPARK-47765
> URL: https://issues.apache.org/jira/browse/SPARK-47765
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47800) Add method for converting v2 identifier to table identifier

2024-04-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47800.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45985
[https://github.com/apache/spark/pull/45985]

> Add method for converting v2 identifier to table identifier
> ---
>
> Key: SPARK-47800
> URL: https://issues.apache.org/jira/browse/SPARK-47800
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Move conversion of v2 identifier object to v1 table identifier to new method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47800) Add method for converting v2 identifier to table identifier

2024-04-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47800:
---

Assignee: Uros Stankovic

> Add method for converting v2 identifier to table identifier
> ---
>
> Key: SPARK-47800
> URL: https://issues.apache.org/jira/browse/SPARK-47800
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Minor
>  Labels: pull-request-available
>
> Move conversion of v2 identifier object to v1 table identifier to new method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47410) refactor UTF8String and CollationFactory

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47410:
---

Assignee: Uroš Bojanić

> refactor UTF8String and CollationFactory
> 
>
> Key: SPARK-47410
> URL: https://issues.apache.org/jira/browse/SPARK-47410
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> This ticket addresses the need to refactor the {{UTF8String}} and 
> {{CollationFactory}} classes within Spark to enhance support for 
> collation-aware expressions. The goal is to improve code structure, 
> maintainability, readability, and testing coverage for collation-aware Spark 
> SQL expressions.
> The changed introduced herein should simplify addition of new collation-aware 
> operations and ensure consistent testing across the codebase.
>  
> To further support the addition of collation support for new Spark 
> expressions, here are a couple of guidelines to follow:
>  
> // 1. Collation-aware expression implementation
> CollationSupport.java
>  * should serve as a static entry point for collation-aware expressions, 
> providing custom support
>  * for example: one by one Spark expression with corresponding collation 
> support
>  * also note that: CollationAwareUTF8String should be used for 
> collation-aware UTF8String operations & other utility methods
> CollationFactory.java
>  * should continue to serve as a static provider for high-level collation 
> interface
>  * for example: interacting with external ICU components such as Collator, 
> StringSearch, etc.
>  * also note that: no low-level / expression-specific code should generally 
> be found here
> UTF8String.java
>  * should be largely collation-unaware, and generally be used only as 
> storage, nothing else
>  * for example: don’t change this class at all (with the only one-time 
> exception of: semanticEquals/Compare)
>  * also note that: no collation-aware operation implementations (using 
> collationId) should be put here
> stringExpressions.scala / regexpExpressions.scala / other 
> “sql.catalyst.expressions” (for example: Between.scala)
>  * should only contain minimal changes in order to re-route collation-aware 
> implementations to CollationSupport
>  * for example: most changes should be in relation to: adding collationId, 
> using correct data types, replacements, etc.
>  * also note that: nullSafeEval & doGenCode should likely note contain 
> introduce extra branching based on collationId
>  
> // 2. Collation-aware expression testing
> CollationSuite.scala
>  * should be used for testing more general collation concepts
>  * for example: collate/collation expressions, collation names, DDL, casting, 
> aggregate, shuffle, join, etc.
>  * also note that: no extra tests should generally be added
> CollationSupportSuite.java
>  * should be used for expression unit tests, these tests should be as 
> rigorous as possible in order to cover various cases
>  * for example: unit tests that test collation-aware expression 
> implementation for various collations (binary, lowercase, ICU)
>  * also note that: these tests should generally be written after adding 
> appropriate expression support in CollationSupport.java
> CollationStringExpressionsSuite.scala / CollationRegexpExpressionsSuite.scala 
> / CollationExpressionSuite.scala
>  * should be used for expression end-to-end tests, these tests should only 
> cover crucial expression behaviour
>  * for example: SQL tests that verify query execution results, expected 
> return data types, casting, unsupported collation handling, etc.
>  * also note that: these tests should generally be written after enabling 
> appropriate expression support in stringExpressions.scala
>  
> // 3. Closing notes
>  * Carefully think about performance implications of newly added custom 
> collation-aware expression implementation
>  * for example: be very careful with extra string allocations (UTF8Strings -> 
> (Java) String -> UTF8Strings, etc.)
>  * also note that: some operations introduce very heavy performance penalties 
> (we should avoid the ones we can)
>  
>  * Make sure to test all newly added expressions and completely (unit tests, 
> end-to-end tests, etc.)
>  * for example: consider edge, such as: empty strings, uppercase and 
> lowercase mix, different byte-length chars, etc.
>  * also note that: all similar tests should be uniform & readable and be kept 
> in one place for various expressions
>  
>  * Consider how new expressions interact with the rest of the system 
> (casting; collation support level - use correct AbstractStringType, etc.)
>  * for example: we should watch out for casting, test it 

[jira] [Resolved] (SPARK-47410) refactor UTF8String and CollationFactory

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47410.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45978
[https://github.com/apache/spark/pull/45978]

> refactor UTF8String and CollationFactory
> 
>
> Key: SPARK-47410
> URL: https://issues.apache.org/jira/browse/SPARK-47410
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This ticket addresses the need to refactor the {{UTF8String}} and 
> {{CollationFactory}} classes within Spark to enhance support for 
> collation-aware expressions. The goal is to improve code structure, 
> maintainability, readability, and testing coverage for collation-aware Spark 
> SQL expressions.
> The changed introduced herein should simplify addition of new collation-aware 
> operations and ensure consistent testing across the codebase.
>  
> To further support the addition of collation support for new Spark 
> expressions, here are a couple of guidelines to follow:
>  
> // 1. Collation-aware expression implementation
> CollationSupport.java
>  * should serve as a static entry point for collation-aware expressions, 
> providing custom support
>  * for example: one by one Spark expression with corresponding collation 
> support
>  * also note that: CollationAwareUTF8String should be used for 
> collation-aware UTF8String operations & other utility methods
> CollationFactory.java
>  * should continue to serve as a static provider for high-level collation 
> interface
>  * for example: interacting with external ICU components such as Collator, 
> StringSearch, etc.
>  * also note that: no low-level / expression-specific code should generally 
> be found here
> UTF8String.java
>  * should be largely collation-unaware, and generally be used only as 
> storage, nothing else
>  * for example: don’t change this class at all (with the only one-time 
> exception of: semanticEquals/Compare)
>  * also note that: no collation-aware operation implementations (using 
> collationId) should be put here
> stringExpressions.scala / regexpExpressions.scala / other 
> “sql.catalyst.expressions” (for example: Between.scala)
>  * should only contain minimal changes in order to re-route collation-aware 
> implementations to CollationSupport
>  * for example: most changes should be in relation to: adding collationId, 
> using correct data types, replacements, etc.
>  * also note that: nullSafeEval & doGenCode should likely note contain 
> introduce extra branching based on collationId
>  
> // 2. Collation-aware expression testing
> CollationSuite.scala
>  * should be used for testing more general collation concepts
>  * for example: collate/collation expressions, collation names, DDL, casting, 
> aggregate, shuffle, join, etc.
>  * also note that: no extra tests should generally be added
> CollationSupportSuite.java
>  * should be used for expression unit tests, these tests should be as 
> rigorous as possible in order to cover various cases
>  * for example: unit tests that test collation-aware expression 
> implementation for various collations (binary, lowercase, ICU)
>  * also note that: these tests should generally be written after adding 
> appropriate expression support in CollationSupport.java
> CollationStringExpressionsSuite.scala / CollationRegexpExpressionsSuite.scala 
> / CollationExpressionSuite.scala
>  * should be used for expression end-to-end tests, these tests should only 
> cover crucial expression behaviour
>  * for example: SQL tests that verify query execution results, expected 
> return data types, casting, unsupported collation handling, etc.
>  * also note that: these tests should generally be written after enabling 
> appropriate expression support in stringExpressions.scala
>  
> // 3. Closing notes
>  * Carefully think about performance implications of newly added custom 
> collation-aware expression implementation
>  * for example: be very careful with extra string allocations (UTF8Strings -> 
> (Java) String -> UTF8Strings, etc.)
>  * also note that: some operations introduce very heavy performance penalties 
> (we should avoid the ones we can)
>  
>  * Make sure to test all newly added expressions and completely (unit tests, 
> end-to-end tests, etc.)
>  * for example: consider edge, such as: empty strings, uppercase and 
> lowercase mix, different byte-length chars, etc.
>  * also note that: all similar tests should be uniform & readable and be kept 
> in one place for various expressions
>  
>  * Consider how new expressions interact with the rest of the system 
> (casting; 

[jira] [Resolved] (SPARK-47617) Add TPC-DS testing infrastructure for collations

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47617.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45739
[https://github.com/apache/spark/pull/45739]

> Add TPC-DS testing infrastructure for collations
> 
>
> Key: SPARK-47617
> URL: https://issues.apache.org/jira/browse/SPARK-47617
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nikola Mandic
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As collation support grows across all SQL features and new collation types 
> are added, we need to have reliable testing model covering as many standard 
> SQL capabilities as possible.
> We can utilize TPC-DS testing infrastructure already present in Spark. The 
> idea is to vary TPC-DS table string columns by adding multiple collations 
> with different ordering rules and case sensitivity, producing new tables. 
> These tables should yield the same results against predefined TPC-DS queries 
> for certain batches of collations. For example, when comparing query runs on 
> table where columns are first collated as UTF8_BINARY and then as 
> UTF8_BINARY_LCASE, we should be getting same results after converting to 
> lowercase.
> Introduce new query suite which tests the described behavior with available 
> collations (utf8_binary and unicode) combined with case conversions 
> (lowercase, uppercase, randomized case for fuzzy testing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47617) Add TPC-DS testing infrastructure for collations

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47617:
---

Assignee: Nikola Mandic

> Add TPC-DS testing infrastructure for collations
> 
>
> Key: SPARK-47617
> URL: https://issues.apache.org/jira/browse/SPARK-47617
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nikola Mandic
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>
> As collation support grows across all SQL features and new collation types 
> are added, we need to have reliable testing model covering as many standard 
> SQL capabilities as possible.
> We can utilize TPC-DS testing infrastructure already present in Spark. The 
> idea is to vary TPC-DS table string columns by adding multiple collations 
> with different ordering rules and case sensitivity, producing new tables. 
> These tables should yield the same results against predefined TPC-DS queries 
> for certain batches of collations. For example, when comparing query runs on 
> table where columns are first collated as UTF8_BINARY and then as 
> UTF8_BINARY_LCASE, we should be getting same results after converting to 
> lowercase.
> Introduce new query suite which tests the described behavior with available 
> collations (utf8_binary and unicode) combined with case conversions 
> (lowercase, uppercase, randomized case for fuzzy testing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47736) Add support for AbstractArrayType

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47736:
---

Assignee: Mihailo Milosevic

> Add support for AbstractArrayType
> -
>
> Key: SPARK-47736
> URL: https://issues.apache.org/jira/browse/SPARK-47736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47736) Add support for AbstractArrayType

2024-04-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47736.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45891
[https://github.com/apache/spark/pull/45891]

> Add support for AbstractArrayType
> -
>
> Key: SPARK-47736
> URL: https://issues.apache.org/jira/browse/SPARK-47736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47809) checkExceptionInExpression should check error for each codegen mode

2024-04-10 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47809:
---

 Summary: checkExceptionInExpression should check error for each 
codegen mode
 Key: SPARK-47809
 URL: https://issues.apache.org/jira/browse/SPARK-47809
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47001.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45146
[https://github.com/apache/spark/pull/45146]

> Pushdown Verification in Optimizer.scala should support changed data types
> --
>
> Key: SPARK-47001
> URL: https://issues.apache.org/jira/browse/SPARK-47001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When pushing a filter down in a union the data type may not match exactly if 
> the filter was constructed using the child dataframe reference. This is 
> because the unions output is updated with a structype merge of union which 
> can turn non-nullable to nullable. These are still the same column despite 
> the different nullability so the filter should be safe to push down. As it 
> currently stands we get an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47274) Provide more useful context for PySpark DataFrame API errors

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47274.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45377
[https://github.com/apache/spark/pull/45377]

> Provide more useful context for PySpark DataFrame API errors
> 
>
> Key: SPARK-47274
> URL: https://issues.apache.org/jira/browse/SPARK-47274
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Errors originating from PySpark operations can be difficult to debug with 
> limited context in the error messages. While improvements on the JVM side 
> have been made to offer detailed error contexts, PySpark errors often lack 
> this level of detail. Adding detailed context about the location within the 
> user's PySpark code where the error occurred will help debuggability for 
> PySpark users.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47274) Provide more useful context for PySpark DataFrame API errors

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47274:
---

Assignee: Haejoon Lee

> Provide more useful context for PySpark DataFrame API errors
> 
>
> Key: SPARK-47274
> URL: https://issues.apache.org/jira/browse/SPARK-47274
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Errors originating from PySpark operations can be difficult to debug with 
> limited context in the error messages. While improvements on the JVM side 
> have been made to offer detailed error contexts, PySpark errors often lack 
> this level of detail. Adding detailed context about the location within the 
> user's PySpark code where the error occurred will help debuggability for 
> PySpark users.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47802) Revert mapping ( star ) to named_struct ( star )

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47802.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45987
[https://github.com/apache/spark/pull/45987]

> Revert mapping ( star ) to named_struct ( star )
> 
>
> Key: SPARK-47802
> URL: https://issues.apache.org/jira/browse/SPARK-47802
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Turning star within parens into named_struct ( star) as opposed to ignoring 
> the parens turns out to be more risky than anticipated. Given that this was 
> done solely for consistency with ( c1, c2...) it's best to not go there at 
> all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47802) Revert mapping ( star ) to named_struct ( star )

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47802:
---

Assignee: Serge Rielau

> Revert mapping ( star ) to named_struct ( star )
> 
>
> Key: SPARK-47802
> URL: https://issues.apache.org/jira/browse/SPARK-47802
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> Turning star within parens into named_struct ( star) as opposed to ignoring 
> the parens turns out to be more risky than anticipated. Given that this was 
> done solely for consistency with ( c1, c2...) it's best to not go there at 
> all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47775) Support remaining scalar types in the variant spec.

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47775:
---

Assignee: Chenhao Li

> Support remaining scalar types in the variant spec.
> ---
>
> Key: SPARK-47775
> URL: https://issues.apache.org/jira/browse/SPARK-47775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47775) Support remaining scalar types in the variant spec.

2024-04-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47775.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45945
[https://github.com/apache/spark/pull/45945]

> Support remaining scalar types in the variant spec.
> ---
>
> Key: SPARK-47775
> URL: https://issues.apache.org/jira/browse/SPARK-47775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47682) Support cast from variant.

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47682.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45807
[https://github.com/apache/spark/pull/45807]

> Support cast from variant.
> --
>
> Key: SPARK-47682
> URL: https://issues.apache.org/jira/browse/SPARK-47682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47504) Resolve AbstractDataType simpleStrings for StringTypeCollated

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47504.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45694
[https://github.com/apache/spark/pull/45694]

> Resolve AbstractDataType simpleStrings for StringTypeCollated
> -
>
> Key: SPARK-47504
> URL: https://issues.apache.org/jira/browse/SPARK-47504
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *SPARK-47296* introduced a change to fail all unsupported functions. Because 
> of this change expected *inputTypes* in *ExpectsInputTypes* had to be 
> changed. This change introduced a change on user side which will print 
> *"STRING_ANY_COLLATION"* in places where before we printed *"STRING"* when an 
> error occurred. Concretely if we get an input of Int where 
> *StringTypeAnyCollation* was expected, we will throw this faulty message for 
> users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47681) Add schema_of_variant expression.

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47681.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45806
[https://github.com/apache/spark/pull/45806]

> Add schema_of_variant expression.
> -
>
> Key: SPARK-47681
> URL: https://issues.apache.org/jira/browse/SPARK-47681
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47657) Implement filter pushdown support for collation per file source

2024-04-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47657:
---

Assignee: Stefan Kandic

> Implement filter pushdown support for collation per file source
> ---
>
> Key: SPARK-47657
> URL: https://issues.apache.org/jira/browse/SPARK-47657
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47657) Implement filter pushdown support for collation per file source

2024-04-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47657.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45782
[https://github.com/apache/spark/pull/45782]

> Implement filter pushdown support for collation per file source
> ---
>
> Key: SPARK-47657
> URL: https://issues.apache.org/jira/browse/SPARK-47657
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47713) Fix a self-join failure

2024-04-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47713.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45846
[https://github.com/apache/spark/pull/45846]

> Fix a self-join failure
> ---
>
> Key: SPARK-47713
> URL: https://issues.apache.org/jira/browse/SPARK-47713
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47289) Allow extensions to log extended information in explain plan

2024-04-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47289.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45488
[https://github.com/apache/spark/pull/45488]

> Allow extensions to log extended information in explain plan
> 
>
> Key: SPARK-47289
> URL: https://issues.apache.org/jira/browse/SPARK-47289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> With session extensions, Spark planning can be extended to apply additional 
> rules and modify the execution plan. If an extension replaces a node in the 
> plan, the new node will be displayed in the plan. However, it is sometimes 
> useful for extensions provided extended information to the end user to 
> explain the impact of the extension. For instance an extension may 
> automatically enable/disable some feature that it provides and can provide 
> this extended information in the plan. 
> The proposal is to optionally turn on extended plan information from 
> extensions. Extensions can add additional planning information via a new 
> interface that internally uses a new TreeNodeTag, say 'explainPlan'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47289) Allow extensions to log extended information in explain plan

2024-04-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47289:
---

Assignee: Parth Chandra

> Allow extensions to log extended information in explain plan
> 
>
> Key: SPARK-47289
> URL: https://issues.apache.org/jira/browse/SPARK-47289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: pull-request-available
>
> With session extensions, Spark planning can be extended to apply additional 
> rules and modify the execution plan. If an extension replaces a node in the 
> plan, the new node will be displayed in the plan. However, it is sometimes 
> useful for extensions provided extended information to the end user to 
> explain the impact of the extension. For instance an extension may 
> automatically enable/disable some feature that it provides and can provide 
> this extended information in the plan. 
> The proposal is to optionally turn on extended plan information from 
> extensions. Extensions can add additional planning information via a new 
> interface that internally uses a new TreeNodeTag, say 'explainPlan'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47081) Support Query Execution Progress Messages

2024-04-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47081.
-
Resolution: Fixed

Issue resolved by pull request 45150
[https://github.com/apache/spark/pull/45150]

> Support Query Execution Progress Messages
> -
>
> Key: SPARK-47081
> URL: https://issues.apache.org/jira/browse/SPARK-47081
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Connect should support reporting basic query progress to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47081) Support Query Execution Progress Messages

2024-04-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47081:
---

Assignee: Martin Grund

> Support Query Execution Progress Messages
> -
>
> Key: SPARK-47081
> URL: https://issues.apache.org/jira/browse/SPARK-47081
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Connect should support reporting basic query progress to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-04-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47210:
---

Assignee: Mihailo Milosevic

> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
>  - if there are explicit collations and all of them are equal then that 
> collation will be the output
>  - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
>  - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
>  - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-04-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47210.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45383
[https://github.com/apache/spark/pull/45383]

> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
>  - if there are explicit collations and all of them are equal then that 
> collation will be the output
>  - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
>  - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
>  - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47714) set spark.sql.legacy.timeParserPolicy to CORRECTED by default

2024-04-03 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47714:
---

 Summary: set spark.sql.legacy.timeParserPolicy to CORRECTED by 
default
 Key: SPARK-47714
 URL: https://issues.apache.org/jira/browse/SPARK-47714
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47634) Legacy support for map normalization

2024-04-02 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47634.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45760
[https://github.com/apache/spark/pull/45760]

> Legacy support for map normalization
> 
>
> Key: SPARK-47634
> URL: https://issues.apache.org/jira/browse/SPARK-47634
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stevo Mitric
>Assignee: Stevo Mitric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add legacy support for creating a map without normalizing keys before 
> inserting in `ArrayBasedMapBuilder`.
>  
> Key normalization change can be found in this PR: 
> https://issues.apache.org/jira/browse/SPARK-47563



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47689) Do not wrap query execution error during data writing

2024-04-02 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47689:
---

 Summary: Do not wrap query execution error during data writing
 Key: SPARK-47689
 URL: https://issues.apache.org/jira/browse/SPARK-47689
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47551) Add variant_get expression.

2024-04-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47551.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45708
[https://github.com/apache/spark/pull/45708]

> Add variant_get expression.
> ---
>
> Key: SPARK-47551
> URL: https://issues.apache.org/jira/browse/SPARK-47551
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46840) Collation benchmarking

2024-04-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46840.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45453
[https://github.com/apache/spark/pull/45453]

> Collation benchmarking
> --
>
> Key: SPARK-46840
> URL: https://issues.apache.org/jira/browse/SPARK-46840
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46840) Collation benchmarking

2024-04-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46840:
---

Assignee: Gideon P

> Collation benchmarking
> --
>
> Key: SPARK-46840
> URL: https://issues.apache.org/jira/browse/SPARK-46840
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47569) Disallow comparing variant.

2024-04-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47569.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45726
[https://github.com/apache/spark/pull/45726]

> Disallow comparing variant.
> ---
>
> Key: SPARK-47569
> URL: https://issues.apache.org/jira/browse/SPARK-47569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47572) Enforce Window partitionSpec is orderable.

2024-03-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47572.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45730
[https://github.com/apache/spark/pull/45730]

> Enforce Window partitionSpec is orderable.
> --
>
> Key: SPARK-47572
> URL: https://issues.apache.org/jira/browse/SPARK-47572
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.1, 3.3.4
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47572) Enforce Window partitionSpec is orderable.

2024-03-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47572:
---

Assignee: Chenhao Li

> Enforce Window partitionSpec is orderable.
> --
>
> Key: SPARK-47572
> URL: https://issues.apache.org/jira/browse/SPARK-47572
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.1, 3.3.4
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47511) Canonicalize With expressions by re-assigning IDs

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47511.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45649
[https://github.com/apache/spark/pull/45649]

> Canonicalize With expressions by re-assigning IDs
> -
>
> Key: SPARK-47511
> URL: https://issues.apache.org/jira/browse/SPARK-47511
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The current canonicalization of `With` expressions takes into account the ID 
> of the common expressions, which comes from a global monotonically increasing 
> ID. This means that queries with `With` expressions (e.g. `NULLIF` 
> expressions) will have inconsistent canonicalizations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47525) Support subquery correlation joining on map attributes

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47525.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45673
[https://github.com/apache/spark/pull/45673]

> Support subquery correlation joining on map attributes
> --
>
> Key: SPARK-47525
> URL: https://issues.apache.org/jira/browse/SPARK-47525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, when a subquery is correlated on a condition like `outer_map[1] = 
> inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
> which is unsupported, so the query cannot run - for example:
>  
> {code:java}
> scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select 
> v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 
> where v1.value[1] = v2.value[1])").explain
> org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
>  Unsupported subquery expression: Correlated column reference 'v1.value' 
> cannot be map type. SQLSTATE: 0A000; line 1 pos 49
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
> ... {code}
> However, if we rewrite the query to pull out the map access `outer_map[1]` 
> into the outer plan, it succeeds:
>  
> {code:java}
> scala> sql("""with tmp as (
> select value[0] as value0, value[1] as value1 from v
> )
> select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from 
> tmp v2 where v1.value1 = v2.value1)""").explain{code}
> Another point that can be improved is that, even if the data type supports 
> join, we still don’t need to join on the full attribute, and we can get a 
> better plan by doing the same rewrite to pull out the extract expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47525) Support subquery correlation joining on map attributes

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47525:
---

Assignee: Jack Chen

> Support subquery correlation joining on map attributes
> --
>
> Key: SPARK-47525
> URL: https://issues.apache.org/jira/browse/SPARK-47525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when a subquery is correlated on a condition like `outer_map[1] = 
> inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
> which is unsupported, so the query cannot run - for example:
>  
> {code:java}
> scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select 
> v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 
> where v1.value[1] = v2.value[1])").explain
> org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
>  Unsupported subquery expression: Correlated column reference 'v1.value' 
> cannot be map type. SQLSTATE: 0A000; line 1 pos 49
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
> ... {code}
> However, if we rewrite the query to pull out the map access `outer_map[1]` 
> into the outer plan, it succeeds:
>  
> {code:java}
> scala> sql("""with tmp as (
> select value[0] as value0, value[1] as value1 from v
> )
> select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from 
> tmp v2 where v1.value1 = v2.value1)""").explain{code}
> Another point that can be improved is that, even if the data type supports 
> join, we still don’t need to join on the full attribute, and we can get a 
> better plan by doing the same rewrite to pull out the extract expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >