from:"Wenchen Fan \(Jira\)"

[jira] [Resolved] (SPARK-48031) Add schema evolution options to views

2024-05-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48031.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46267
[https://github.com/apache/spark/pull/46267]

> Add schema evolution options to views 
> --
>
> Key: SPARK-48031
> URL: https://issues.apache.org/jira/browse/SPARK-48031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We want to provide the ability for views to react to changes in the query 
> resolution in manners differently than just failing the view.
> For example we want the view to be able to compensate for type changes by 
> casting the query result to the view column types.
> Or to adopt any type of column arity changes into a view.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-48260:
---

 Summary: disable output committer coordination in one test of 
ParquetIOSuite
 Key: SPARK-48260
 URL: https://issues.apache.org/jira/browse/SPARK-48260
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-13 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-48252:
---

 Summary: Update CommonExpressionRef when necessary
 Key: SPARK-48252
 URL: https://issues.apache.org/jira/browse/SPARK-48252
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48146) Fix error with aggregate function in With child

2024-05-10 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48146:
---

Assignee: Kelvin Jiang

> Fix error with aggregate function in With child
> ---
>
> Key: SPARK-48146
> URL: https://issues.apache.org/jira/browse/SPARK-48146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Right now, if we have an aggregate function in the child of a With 
> expression, we fail an assertion. However, queries like this used to work:
> {code:sql}
> select
> id between cast(max(id between 1 and 2) as int) and id
> from range(10)
> group by id
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48146) Fix error with aggregate function in With child

2024-05-10 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48146.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46443
[https://github.com/apache/spark/pull/46443]

> Fix error with aggregate function in With child
> ---
>
> Key: SPARK-48146
> URL: https://issues.apache.org/jira/browse/SPARK-48146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Right now, if we have an aggregate function in the child of a With 
> expression, we fail an assertion. However, queries like this used to work:
> {code:sql}
> select
> id between cast(max(id between 1 and 2) as int) and id
> from range(10)
> group by id
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48158) XML expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48158.
-
Fix Version/s: 4.0.0
 Assignee: Uroš Bojanić
   Resolution: Fixed

> XML expressions (all collations)
> 
>
> Key: SPARK-48158
> URL: https://issues.apache.org/jira/browse/SPARK-48158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for *XML* built-in string functions in Spark 
> ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *xmlExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *XML* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48222:
---

Assignee: Nicholas Chammas

> Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
> -
>
> Key: SPARK-48222
> URL: https://issues.apache.org/jira/browse/SPARK-48222
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48222.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46512
[https://github.com/apache/spark/pull/46512]

> Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
> -
>
> Key: SPARK-48222
> URL: https://issues.apache.org/jira/browse/SPARK-48222
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47409:
---

Assignee: David Milicevic

> StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
> --
>
> Key: SPARK-47409
> URL: https://issues.apache.org/jira/browse/SPARK-47409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTrim* built-in string function in 
> Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, 
> {*}StringTrimRight{*}). First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|[https://www.postgresql.org/docs/]].
>  
> The goal for this Jira ticket is to implement the *StringTrim* function so it 
> supports binary & lowercase collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47409.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46206
[https://github.com/apache/spark/pull/46206]

> StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
> --
>
> Key: SPARK-47409
> URL: https://issues.apache.org/jira/browse/SPARK-47409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: David Milicevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringTrim* built-in string function in 
> Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, 
> {*}StringTrimRight{*}). First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|[https://www.postgresql.org/docs/]].
>  
> The goal for this Jira ticket is to implement the *StringTrim* function so it 
> supports binary & lowercase collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47421) URL expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47421.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46460
[https://github.com/apache/spark/pull/46460]

> URL expressions (all collations)
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47421) URL expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47421:
---

Assignee: Uroš Bojanić

> URL expressions (all collations)
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47354) Variant expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47354.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46424
[https://github.com/apache/spark/pull/46424]

> Variant expressions (all collations)
> 
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47354) Variant expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47354:
---

Assignee: Uroš Bojanić

> Variant expressions (all collations)
> 
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48186) Add support for AbstractMapType

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48186.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46458
[https://github.com/apache/spark/pull/46458]

> Add support for AbstractMapType
> ---
>
> Key: SPARK-48186
> URL: https://issues.apache.org/jira/browse/SPARK-48186
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48186) Add support for AbstractMapType

2024-05-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48186:
---

Assignee: Uroš Bojanić

> Add support for AbstractMapType
> ---
>
> Key: SPARK-48186
> URL: https://issues.apache.org/jira/browse/SPARK-48186
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48197) avoid assert error for invalid lambda function

2024-05-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48197.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46475
[https://github.com/apache/spark/pull/46475]

> avoid assert error for invalid lambda function
> --
>
> Key: SPARK-48197
> URL: https://issues.apache.org/jira/browse/SPARK-48197
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48204) fix release script for Spark 4.0+

2024-05-08 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-48204:
---

 Summary: fix release script for Spark 4.0+
 Key: SPARK-48204
 URL: https://issues.apache.org/jira/browse/SPARK-48204
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48161) JSON expressions (all collations)

2024-05-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48161.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46462
[https://github.com/apache/spark/pull/46462]

> JSON expressions (all collations)
> -
>
> Key: SPARK-48161
> URL: https://issues.apache.org/jira/browse/SPARK-48161
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48188) Consistently use normalized plan for cache

2024-05-07 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-48188:
---

 Summary: Consistently use normalized plan for cache
 Key: SPARK-48188
 URL: https://issues.apache.org/jira/browse/SPARK-48188
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47297) Format expressions (all collations)

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47297:
---

Assignee: Uroš Bojanić

> Format expressions (all collations)
> ---
>
> Key: SPARK-47297
> URL: https://issues.apache.org/jira/browse/SPARK-47297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47297) Format expressions (all collations)

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47297.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46423
[https://github.com/apache/spark/pull/46423]

> Format expressions (all collations)
> ---
>
> Key: SPARK-47297
> URL: https://issues.apache.org/jira/browse/SPARK-47297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48173) CheckAnalsis should see the entire query plan

2024-05-07 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-48173:
---

 Summary: CheckAnalsis should see the entire query plan
 Key: SPARK-48173
 URL: https://issues.apache.org/jira/browse/SPARK-48173
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48143:
---

Assignee: Vladimir Golubev

> UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE 
> mode
> ---
>
> Key: SPARK-48143
> URL: https://issues.apache.org/jira/browse/SPARK-48143
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> Parsing partially-malformed CSV in permissive mode is slow due to heavy 
> exception construction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48143.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46400
[https://github.com/apache/spark/pull/46400]

> UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE 
> mode
> ---
>
> Key: SPARK-48143
> URL: https://issues.apache.org/jira/browse/SPARK-48143
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing partially-malformed CSV in permissive mode is slow due to heavy 
> exception construction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47267) Hash functions should respect collation

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47267:
---

Assignee: Uroš Bojanić

> Hash functions should respect collation
> ---
>
> Key: SPARK-47267
> URL: https://issues.apache.org/jira/browse/SPARK-47267
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> All functions in `hash_funcs` group should respec collation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47267) Hash functions should respect collation

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47267.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46422
[https://github.com/apache/spark/pull/46422]

> Hash functions should respect collation
> ---
>
> Key: SPARK-47267
> URL: https://issues.apache.org/jira/browse/SPARK-47267
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> All functions in `hash_funcs` group should respec collation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48166.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46428
[https://github.com/apache/spark/pull/46428]

> Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
> -
>
> Key: SPARK-48166
> URL: https://issues.apache.org/jira/browse/SPARK-48166
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> BadRecordException should not be used as user-facing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48166:
---

Assignee: Vladimir Golubev

> Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
> -
>
> Key: SPARK-48166
> URL: https://issues.apache.org/jira/browse/SPARK-48166
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Minor
>  Labels: pull-request-available
>
> BadRecordException should not be used as user-facing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-48027:

Affects Version/s: (was: 3.5.1)
   (was: 3.4.3)

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         default.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM default.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             default.table_a2
>         where grass_date = '2024-04-25'
>         select
>             loan_id
>         from
>             default.table_a1
>         where grass_date = '2024-04-24' 
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-05-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48027.
-
Fix Version/s: 4.0.0
 Assignee: angerszhu
   Resolution: Fixed

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         default.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM default.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             default.table_a2
>         where grass_date = '2024-04-25'
>         select
>             loan_id
>         from
>             default.table_a1
>         where grass_date = '2024-04-24' 
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47359) StringTranslate (all collations)

2024-04-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47359.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45820
[https://github.com/apache/spark/pull/45820]

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47359:
---

Assignee: Milan Dankovic

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48003) Hll sketch aggregate support for strings with collation

2024-04-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48003.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46241
[https://github.com/apache/spark/pull/46241]

> Hll sketch aggregate support for strings with collation
> ---
>
> Key: SPARK-48003
> URL: https://issues.apache.org/jira/browse/SPARK-48003
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47566) SubstringIndex

2024-04-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47566.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45725
[https://github.com/apache/spark/pull/45725]

> SubstringIndex
> --
>
> Key: SPARK-47566
> URL: https://issues.apache.org/jira/browse/SPARK-47566
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *SubstringIndex* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *SubstringIndex* functions 
> so that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47566) SubstringIndex

2024-04-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47566:
---

Assignee: Milan Dankovic

> SubstringIndex
> --
>
> Key: SPARK-47566
> URL: https://issues.apache.org/jira/browse/SPARK-47566
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *SubstringIndex* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *SubstringIndex* functions 
> so that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48033.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46269
[https://github.com/apache/spark/pull/46269]

> Support Generated Column expressions that are `RuntimeReplaceable`
> --
>
> Key: SPARK-48033
> URL: https://issues.apache.org/jira/browse/SPARK-48033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, default columns that have a default of a `RuntimeReplaceable` 
> expression fails.
> This is because the `AlterTableCommand` constant folds before replacing 
> expressions with the actual implementation. For example:
> ```
> sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET")
> sql("INSERT INTO t VALUES(DEFAULT)")
> ```
> fails because `parse_json` is `RuntimeReplaceable` and is evaluated before 
> the analyzer inserts the correct expression into the plan
> This is especially important for Variant types because literal variants are 
> difficult to create - `parse_json` will likely be used the majority of the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48033:
---

Assignee: Richard Chen

> Support Generated Column expressions that are `RuntimeReplaceable`
> --
>
> Key: SPARK-48033
> URL: https://issues.apache.org/jira/browse/SPARK-48033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
>
> Currently, default columns that have a default of a `RuntimeReplaceable` 
> expression fails.
> This is because the `AlterTableCommand` constant folds before replacing 
> expressions with the actual implementation. For example:
> ```
> sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET")
> sql("INSERT INTO t VALUES(DEFAULT)")
> ```
> fails because `parse_json` is `RuntimeReplaceable` and is evaluated before 
> the analyzer inserts the correct expression into the plan
> This is especially important for Variant types because literal variants are 
> difficult to create - `parse_json` will likely be used the majority of the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47741:
---

Assignee: Milan Stefanovic

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Assignee: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47741) Handle stack overflow when parsing query

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47741.
-
Resolution: Fixed

Issue resolved by pull request 45896
[https://github.com/apache/spark/pull/45896]

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Assignee: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47148.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45234
[https://github.com/apache/spark/pull/45234]

> Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
> ---
>
> Key: SPARK-47148
> URL: https://issues.apache.org/jira/browse/SPARK-47148
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, SQL
>Affects Versions: 4.0.0
>Reporter: Eren Avsarogullari
>Assignee: Eren Avsarogullari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the 
> cancellation. This causes unnecessary stage materialization by submitting 
> Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is 
> already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or 
> *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should 
> just be skipped without materializing it.
> Please find sample use-case:
> *1- Stage Materialization Steps:*
> When stage materialization is failed:
> {code:java}
> 1.1- ShuffleQueryStage1 - is materialized successfully,
> 1.2- ShuffleQueryStage2 - materialization is failed,
> 1.3- ShuffleQueryStage3 - Not materialized yet so 
> ShuffleQueryStage3.shuffleFuture is not initialized yet{code}
> *2- Stage Cancellation Steps:*
> {code:java}
> 2.1- ShuffleQueryStage1 - is canceled due to already materialized,
> 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as 
> default by AQE because it could not be materialized,
> 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet 
> but currently, it is also tried to cancel and this stage requires to be 
> materialized first.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47567) StringLocate

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47567.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45791
[https://github.com/apache/spark/pull/45791]

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47567:
---

Assignee: Milan Dankovic

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47939.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46209
[https://github.com/apache/spark/pull/46209]

> Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER 
> error
> 
>
> Key: SPARK-47939
> URL: https://issues.apache.org/jira/browse/SPARK-47939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *Succeeds:* scala> spark.sql("select ?", Array(1)).show();
> *Fails:* spark.sql("describe select ?", Array(1)).show();
> *Fails:* spark.sql("explain select ?", Array(1)).show();
> Failures are of the form:
> org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
> and provide a mapping of the parameter to either a SQL literal or collection 
> constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 
> 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- 
> OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47939:
---

Assignee: Vladimir Golubev

> Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER 
> error
> 
>
> Key: SPARK-47939
> URL: https://issues.apache.org/jira/browse/SPARK-47939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *Succeeds:* scala> spark.sql("select ?", Array(1)).show();
> *Fails:* spark.sql("describe select ?", Array(1)).show();
> *Fails:* spark.sql("explain select ?", Array(1)).show();
> Failures are of the form:
> org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
> and provide a mapping of the parameter to either a SQL literal or collection 
> constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 
> 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- 
> OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47927:
---

Assignee: Emil Ejbyfeldt

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47927.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46156
[https://github.com/apache/spark/pull/46156]

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48019:
---

Assignee: Gene Pang

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48019.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46254
[https://github.com/apache/spark/pull/46254]

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47476:
---

Assignee: Uroš Bojanić

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47476.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45704
[https://github.com/apache/spark/pull/45704]

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47351.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46165
[https://github.com/apache/spark/pull/46165]

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47351:
---

Assignee: Uroš Bojanić

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47350.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46158
[https://github.com/apache/spark/pull/46158]

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47922) Implement try_parse_json

2024-04-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47922.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46141
[https://github.com/apache/spark/pull/46141]

> Implement try_parse_json
> 
>
> Key: SPARK-47922
> URL: https://issues.apache.org/jira/browse/SPARK-47922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement try_parse_json expression that runs parse_json on valid string 
> inputs and returns null when the input string is malformed. Note that this 
> expression also only supports string input types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend

2024-04-24 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47958.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46187
[https://github.com/apache/spark/pull/46187]

> Task Scheduler may not know about executor when using LocalSchedulerBackend
> ---
>
> Key: SPARK-47958
> URL: https://issues.apache.org/jira/browse/SPARK-47958
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When using LocalSchedulerBackend, the task scheduler will not know about the 
> executor until a task is run, which can lead to unexpected behavior in tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47764:
---

Assignee: Bo Zhang

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47764.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45930
[https://github.com/apache/spark/pull/45930]

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation

2024-04-24 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47418.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46181
[https://github.com/apache/spark/pull/46181]

> Optimize string predicate expressions for UTF8_BINARY_LCASE collation
> -
>
> Key: SPARK-47418
> URL: https://issues.apache.org/jira/browse/SPARK-47418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string 
> Spark functions using optimized lowercase comparison approach introduced by 
> [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to 
> the latest design and code structure imposed by [~uros-db] in 
> https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation 
> support is introduced for Spark SQL expressions. In addition, review previous 
> Jira tickets under the current parent in order to understand how 
> *StringPredicate* expressions are currently used and tested in Spark:
>  * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131]
>  * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248]
>  * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295]
> These tickets should help you understand what changes were introduced in 
> order to enable collation support for these functions. Lastly, feel free to 
> use your chosen Spark SQL Editor to play around with the existing functions 
> and learn more about how they work.
>  
> The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE 
> implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* 
> functions so that they use optimized lowercase comparison approach (following 
> the general logic in Nikola's PR), and benchmark the results accordingly. As 
> for testing, the currently existing unit test cases and end-to-end tests 
> should already fully cover the expected behaviour of *StringPredicate* 
> expressions for all collation types. In other words, the objective of this 
> ticket is only to enhance the internal implementation, without introducing 
> any user-facing changes to Spark SQL API.
>  
> Finally, feel free to refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47873:
---

Assignee: Stefan Kandic

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47873.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46083
[https://github.com/apache/spark/pull/46083]

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference

2024-04-23 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-47956:
---

 Summary: sanity check for unresolved LCA reference
 Key: SPARK-47956
 URL: https://issues.apache.org/jira/browse/SPARK-47956
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness

2024-04-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47352.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46104
[https://github.com/apache/spark/pull/46104]

> Fix Upper, Lower, InitCap collation awareness
> -
>
> Key: SPARK-47352
> URL: https://issues.apache.org/jira/browse/SPARK-47352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47412.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46041
[https://github.com/apache/spark/pull/46041]

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47412:
---

Assignee: Gideon P

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47411:
---

Assignee: Milan Dankovic

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47411.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45643
[https://github.com/apache/spark/pull/45643]

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47900) Fix check for implicit collation

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47900.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46116
[https://github.com/apache/spark/pull/46116]

> Fix check for implicit collation
> 
>
> Key: SPARK-47900
> URL: https://issues.apache.org/jira/browse/SPARK-47900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47413:
---

Assignee: Gideon P

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47413.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46040
[https://github.com/apache/spark/pull/46040]

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47902.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46120
[https://github.com/apache/spark/pull/46120]

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47902:
---

Assignee: Aleksandar Tomic

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46935) Consolidate error documentation

2024-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46935.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44971
[https://github.com/apache/spark/pull/44971]

> Consolidate error documentation
> ---
>
> Key: SPARK-46935
> URL: https://issues.apache.org/jira/browse/SPARK-46935
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46935) Consolidate error documentation

2024-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46935:
---

Assignee: Nicholas Chammas

> Consolidate error documentation
> ---
>
> Key: SPARK-46935
> URL: https://issues.apache.org/jira/browse/SPARK-46935
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.

2024-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-47463:

Fix Version/s: 3.5.2

> An error occurred while pushing down the filter of if expression for iceberg 
> datasource.
> 
>
> Key: SPARK-47463
> URL: https://issues.apache.org/jira/browse/SPARK-47463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
> Environment: Spark 3.5.0
> Iceberg 1.4.3
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> Reproduce:
> {code:java}
> create table t1(c1 int) using iceberg;
> select * from
> (select if(c1 = 1, c1, null) as c1 from t1) t
> where t.c1 > 0; {code}
> Error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase 
> optimization failed with an internal error. You hit a bug in Spark or the 
> Spark plugins you use. Please, report this bug to the corresponding 
> communities or vendors, and provide the full stack trace.
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:107)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
>   at 
> org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
>   at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
>   at 
> org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.execution.datasourc

[jira] [Resolved] (SPARK-47895) group by all should be idempotent

2024-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47895.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46113
[https://github.com/apache/spark/pull/46113]

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47895) group by all should be idempotent

2024-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47895:
---

Assignee: Wenchen Fan

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47895) group by all should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-47895:

Summary: group by all should be idempotent  (was: group by ordinal should 
be idempotent)

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47895) group by ordinal should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-47895:
---

 Summary: group by ordinal should be idempotent
 Key: SPARK-47895
 URL: https://issues.apache.org/jira/browse/SPARK-47895
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47839:
---

Assignee: Kelvin Jiang

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47839.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46034
[https://github.com/apache/spark/pull/46034]

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47846.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46046
[https://github.com/apache/spark/pull/46046]

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47846:
---

Assignee: Harsh Motwani

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47360) Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations)

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47360.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46003
[https://github.com/apache/spark/pull/46003]

> Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck 
> (all collations)
> --
>
> Key: SPARK-47360
> URL: https://issues.apache.org/jira/browse/SPARK-47360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47416) Add benchmark for stringpredicate expressions

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47416.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46078
[https://github.com/apache/spark/pull/46078]

> Add benchmark for stringpredicate expressions
> -
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47863.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46097
[https://github.com/apache/spark/pull/46097]

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47863:
---

Assignee: Vladimir Golubev

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47822:
---

Assignee: Harsh Motwani

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47822.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46017
[https://github.com/apache/spark/pull/46017]

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47821) Add is_variant_null expression

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47821.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46011
[https://github.com/apache/spark/pull/46011]

> Add is_variant_null expression
> --
>
> Key: SPARK-47821
> URL: https://issues.apache.org/jira/browse/SPARK-47821
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> adds a `is_variant_null` expression, which returns whether a given variant 
> value represents a variant null (note the difference between a variant null 
> and an engine null)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47867) Support Variant in JSON scan.

2024-04-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47867.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46071
[https://github.com/apache/spark/pull/46071]

> Support Variant in JSON scan.
> -
>
> Key: SPARK-47867
> URL: https://issues.apache.org/jira/browse/SPARK-47867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47417:
---

Assignee: Nikola Mandic

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47417.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45933
[https://github.com/apache/spark/pull/45933]

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47356.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46061
[https://github.com/apache/spark/pull/46061]

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47356:
---

Assignee: Mihailo Milosevic

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47420) Fix CollationSupport test output

2024-04-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47420.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46058
[https://github.com/apache/spark/pull/46058]

> Fix CollationSupport test output
> 
>
> Key: SPARK-47420
> URL: https://issues.apache.org/jira/browse/SPARK-47420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47769.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45934
[https://github.com/apache/spark/pull/45934]

> Add schema_of_variant_agg expression.
> -
>
> Key: SPARK-47769
> URL: https://issues.apache.org/jira/browse/SPARK-47769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47769:
---

Assignee: Chenhao Li

> Add schema_of_variant_agg expression.
> -
>
> Key: SPARK-47769
> URL: https://issues.apache.org/jira/browse/SPARK-47769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.

2024-04-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47463.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45589
[https://github.com/apache/spark/pull/45589]

> An error occurred while pushing down the filter of if expression for iceberg 
> datasource.
> 
>
> Key: SPARK-47463
> URL: https://issues.apache.org/jira/browse/SPARK-47463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
> Environment: Spark 3.5.0
> Iceberg 1.4.3
>Reporter: Zhen Wang
>Assignee: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Reproduce:
> {code:java}
> create table t1(c1 int) using iceberg;
> select * from
> (select if(c1 = 1, c1, null) as c1 from t1) t
> where t.c1 > 0; {code}
> Error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase 
> optimization failed with an internal error. You hit a bug in Spark or the 
> Spark plugins you use. Please, report this bug to the corresponding 
> communities or vendors, and provide the full stack trace.
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:107)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
>   at 
> org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
>   at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
>   at 
> org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.AssertionError: assertion failed
>

[jira] [Resolved] (SPARK-46810) Clarify error class terminology

2024-04-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46810.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44902
[https://github.com/apache/spark/pull/44902]

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It aligns most closely to the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "category" is low impact and 
> may not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms do not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> h1. Option 3: "Error Class" and "State Class"
>  * SQL state class: 42
>  * SQL state sub-class: K01
>  * SQL state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 2606 matches

Mail list logo