[jira] [Resolved] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48031. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46267 [https://github.com/apache/spark/pull/46267] > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
Wenchen Fan created SPARK-48260: --- Summary: disable output committer coordination in one test of ParquetIOSuite Key: SPARK-48260 URL: https://issues.apache.org/jira/browse/SPARK-48260 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary
Wenchen Fan created SPARK-48252: --- Summary: Update CommonExpressionRef when necessary Key: SPARK-48252 URL: https://issues.apache.org/jira/browse/SPARK-48252 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48146: --- Assignee: Kelvin Jiang > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48146. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46443 [https://github.com/apache/spark/pull/46443] > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48158) XML expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48158. - Fix Version/s: 4.0.0 Assignee: Uroš Bojanić Resolution: Fixed > XML expressions (all collations) > > > Key: SPARK-48158 > URL: https://issues.apache.org/jira/browse/SPARK-48158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *XML* built-in string functions in Spark > ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *xmlExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *XML* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48222: --- Assignee: Nicholas Chammas > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48222. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46512 [https://github.com/apache/spark/pull/46512] > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47409: --- Assignee: David Milicevic > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47409. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46206 [https://github.com/apache/spark/pull/46206] > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47421. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46460 [https://github.com/apache/spark/pull/46460] > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47421: --- Assignee: Uroš Bojanić > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47354. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46424 [https://github.com/apache/spark/pull/46424] > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47354: --- Assignee: Uroš Bojanić > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48186. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46458 [https://github.com/apache/spark/pull/46458] > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48186: --- Assignee: Uroš Bojanić > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48197) avoid assert error for invalid lambda function
[ https://issues.apache.org/jira/browse/SPARK-48197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48197. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46475 [https://github.com/apache/spark/pull/46475] > avoid assert error for invalid lambda function > -- > > Key: SPARK-48197 > URL: https://issues.apache.org/jira/browse/SPARK-48197 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48204) fix release script for Spark 4.0+
Wenchen Fan created SPARK-48204: --- Summary: fix release script for Spark 4.0+ Key: SPARK-48204 URL: https://issues.apache.org/jira/browse/SPARK-48204 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48161) JSON expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48161. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46462 [https://github.com/apache/spark/pull/46462] > JSON expressions (all collations) > - > > Key: SPARK-48161 > URL: https://issues.apache.org/jira/browse/SPARK-48161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48188) Consistently use normalized plan for cache
Wenchen Fan created SPARK-48188: --- Summary: Consistently use normalized plan for cache Key: SPARK-48188 URL: https://issues.apache.org/jira/browse/SPARK-48188 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47297: --- Assignee: Uroš Bojanić > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47297. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46423 [https://github.com/apache/spark/pull/46423] > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48173) CheckAnalsis should see the entire query plan
Wenchen Fan created SPARK-48173: --- Summary: CheckAnalsis should see the entire query plan Key: SPARK-48173 URL: https://issues.apache.org/jira/browse/SPARK-48173 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48143: --- Assignee: Vladimir Golubev > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48143. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46400 [https://github.com/apache/spark/pull/46400] > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47267: --- Assignee: Uroš Bojanić > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47267. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46422 [https://github.com/apache/spark/pull/46422] > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
[ https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48166. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46428 [https://github.com/apache/spark/pull/46428] > Unwanted use of internal BadRecordException in VariantExpressionEvalUtils > - > > Key: SPARK-48166 > URL: https://issues.apache.org/jira/browse/SPARK-48166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > BadRecordException should not be used as user-facing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
[ https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48166: --- Assignee: Vladimir Golubev > Unwanted use of internal BadRecordException in VariantExpressionEvalUtils > - > > Key: SPARK-48166 > URL: https://issues.apache.org/jira/browse/SPARK-48166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > > BadRecordException should not be used as user-facing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-48027: Affects Version/s: (was: 3.5.1) (was: 3.4.3) > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48027. - Fix Version/s: 4.0.0 Assignee: angerszhu Resolution: Fixed > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47359) StringTranslate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47359. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45820 [https://github.com/apache/spark/pull/45820] > StringTranslate (all collations) > > > Key: SPARK-47359 > URL: https://issues.apache.org/jira/browse/SPARK-47359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringTranslate* built-in string function > in Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringTranslate* function > so it supports all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47359: --- Assignee: Milan Dankovic > StringTranslate (all collations) > > > Key: SPARK-47359 > URL: https://issues.apache.org/jira/browse/SPARK-47359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringTranslate* built-in string function > in Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringTranslate* function > so it supports all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48003) Hll sketch aggregate support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48003. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46241 [https://github.com/apache/spark/pull/46241] > Hll sketch aggregate support for strings with collation > --- > > Key: SPARK-48003 > URL: https://issues.apache.org/jira/browse/SPARK-48003 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47566. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45725 [https://github.com/apache/spark/pull/45725] > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47566: --- Assignee: Milan Dankovic > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48033. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46269 [https://github.com/apache/spark/pull/46269] > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48033: --- Assignee: Richard Chen > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47741: --- Assignee: Milan Stefanovic > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47741. - Resolution: Fixed Issue resolved by pull request 45896 [https://github.com/apache/spark/pull/45896] > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
[ https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47148. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45234 [https://github.com/apache/spark/pull/45234] > Avoid to materialize AQE ExchangeQueryStageExec on the cancellation > --- > > Key: SPARK-47148 > URL: https://issues.apache.org/jira/browse/SPARK-47148 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL >Affects Versions: 4.0.0 >Reporter: Eren Avsarogullari >Assignee: Eren Avsarogullari >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the > cancellation. This causes unnecessary stage materialization by submitting > Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is > already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or > *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should > just be skipped without materializing it. > Please find sample use-case: > *1- Stage Materialization Steps:* > When stage materialization is failed: > {code:java} > 1.1- ShuffleQueryStage1 - is materialized successfully, > 1.2- ShuffleQueryStage2 - materialization is failed, > 1.3- ShuffleQueryStage3 - Not materialized yet so > ShuffleQueryStage3.shuffleFuture is not initialized yet{code} > *2- Stage Cancellation Steps:* > {code:java} > 2.1- ShuffleQueryStage1 - is canceled due to already materialized, > 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as > default by AQE because it could not be materialized, > 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet > but currently, it is also tried to cancel and this stage requires to be > materialized first.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47567. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45791 [https://github.com/apache/spark/pull/45791] > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47567: --- Assignee: Milan Dankovic > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47939. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46209 [https://github.com/apache/spark/pull/46209] > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47939: --- Assignee: Vladimir Golubev > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47927: --- Assignee: Emil Ejbyfeldt > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47927. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46156 [https://github.com/apache/spark/pull/46156] > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48019: --- Assignee: Gene Pang > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48019. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46254 [https://github.com/apache/spark/pull/46254] > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47476: --- Assignee: Uroš Bojanić > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47476. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45704 [https://github.com/apache/spark/pull/45704] > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47351. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46165 [https://github.com/apache/spark/pull/46165] > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47351: --- Assignee: Uroš Bojanić > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47350. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46158 [https://github.com/apache/spark/pull/46158] > SplitPart (binary & lowercase collation only) > - > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47922) Implement try_parse_json
[ https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47922. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46141 [https://github.com/apache/spark/pull/46141] > Implement try_parse_json > > > Key: SPARK-47922 > URL: https://issues.apache.org/jira/browse/SPARK-47922 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement try_parse_json expression that runs parse_json on valid string > inputs and returns null when the input string is malformed. Note that this > expression also only supports string input types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47958. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46187 [https://github.com/apache/spark/pull/46187] > Task Scheduler may not know about executor when using LocalSchedulerBackend > --- > > Key: SPARK-47958 > URL: https://issues.apache.org/jira/browse/SPARK-47958 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0 >Reporter: Davin Tjong >Assignee: Davin Tjong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When using LocalSchedulerBackend, the task scheduler will not know about the > executor until a task is run, which can lead to unexpected behavior in tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47764: --- Assignee: Bo Zhang > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47764. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45930 [https://github.com/apache/spark/pull/45930] > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation
[ https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47418. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46181 [https://github.com/apache/spark/pull/46181] > Optimize string predicate expressions for UTF8_BINARY_LCASE collation > - > > Key: SPARK-47418 > URL: https://issues.apache.org/jira/browse/SPARK-47418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string > Spark functions using optimized lowercase comparison approach introduced by > [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to > the latest design and code structure imposed by [~uros-db] in > https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation > support is introduced for Spark SQL expressions. In addition, review previous > Jira tickets under the current parent in order to understand how > *StringPredicate* expressions are currently used and tested in Spark: > * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131] > * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248] > * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295] > These tickets should help you understand what changes were introduced in > order to enable collation support for these functions. Lastly, feel free to > use your chosen Spark SQL Editor to play around with the existing functions > and learn more about how they work. > > The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE > implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* > functions so that they use optimized lowercase comparison approach (following > the general logic in Nikola's PR), and benchmark the results accordingly. As > for testing, the currently existing unit test cases and end-to-end tests > should already fully cover the expected behaviour of *StringPredicate* > expressions for all collation types. In other words, the objective of this > ticket is only to enhance the internal implementation, without introducing > any user-facing changes to Spark SQL API. > > Finally, feel free to refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47873: --- Assignee: Stefan Kandic > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47873. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46083 [https://github.com/apache/spark/pull/46083] > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference
Wenchen Fan created SPARK-47956: --- Summary: sanity check for unresolved LCA reference Key: SPARK-47956 URL: https://issues.apache.org/jira/browse/SPARK-47956 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness
[ https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47352. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46104 [https://github.com/apache/spark/pull/46104] > Fix Upper, Lower, InitCap collation awareness > - > > Key: SPARK-47352 > URL: https://issues.apache.org/jira/browse/SPARK-47352 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47412. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46041 [https://github.com/apache/spark/pull/46041] > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47412: --- Assignee: Gideon P > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47411: --- Assignee: Milan Dankovic > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47411. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45643 [https://github.com/apache/spark/pull/45643] > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47900) Fix check for implicit collation
[ https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47900. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46116 [https://github.com/apache/spark/pull/46116] > Fix check for implicit collation > > > Key: SPARK-47900 > URL: https://issues.apache.org/jira/browse/SPARK-47900 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47413: --- Assignee: Gideon P > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47413. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46040 [https://github.com/apache/spark/pull/46040] > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47902. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46120 [https://github.com/apache/spark/pull/46120] > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47902: --- Assignee: Aleksandar Tomic > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46935) Consolidate error documentation
[ https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46935. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44971 [https://github.com/apache/spark/pull/44971] > Consolidate error documentation > --- > > Key: SPARK-46935 > URL: https://issues.apache.org/jira/browse/SPARK-46935 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46935) Consolidate error documentation
[ https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46935: --- Assignee: Nicholas Chammas > Consolidate error documentation > --- > > Key: SPARK-46935 > URL: https://issues.apache.org/jira/browse/SPARK-46935 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.
[ https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-47463: Fix Version/s: 3.5.2 > An error occurred while pushing down the filter of if expression for iceberg > datasource. > > > Key: SPARK-47463 > URL: https://issues.apache.org/jira/browse/SPARK-47463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 > Environment: Spark 3.5.0 > Iceberg 1.4.3 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > Reproduce: > {code:java} > create table t1(c1 int) using iceberg; > select * from > (select if(c1 = 1, c1, null) as c1 from t1) t > where t.c1 > 0; {code} > Error: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > optimization failed with an internal error. You hit a bug in Spark or the > Spark plugins you use. Please, report this bug to the corresponding > communities or vendors, and provide the full stack trace. > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536) > at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144) > at > org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179) > at > org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238) > at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284) > at > org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > at > org.apache.spark.sql.execution.datasourc
[jira] [Resolved] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47895. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46113 [https://github.com/apache/spark/pull/46113] > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47895: --- Assignee: Wenchen Fan > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-47895: Summary: group by all should be idempotent (was: group by ordinal should be idempotent) > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47895) group by ordinal should be idempotent
Wenchen Fan created SPARK-47895: --- Summary: group by ordinal should be idempotent Key: SPARK-47895 URL: https://issues.apache.org/jira/browse/SPARK-47895 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47839: --- Assignee: Kelvin Jiang > Fix Aggregate bug in RewriteWithExpression > -- > > Key: SPARK-47839 > URL: https://issues.apache.org/jira/browse/SPARK-47839 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > The following query will fail: > {code:SQL} > SELECT NULLIF(id + 1, 1) > from range(10) > group by id > {code} > This is because {{NullIf}} gets rewritten to {{With}}, then > {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of > the aggregate, resulting in an invalid plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47839. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46034 [https://github.com/apache/spark/pull/46034] > Fix Aggregate bug in RewriteWithExpression > -- > > Key: SPARK-47839 > URL: https://issues.apache.org/jira/browse/SPARK-47839 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The following query will fail: > {code:SQL} > SELECT NULLIF(id + 1, 1) > from range(10) > group by id > {code} > This is because {{NullIf}} gets rewritten to {{With}}, then > {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of > the aggregate, resulting in an invalid plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47846) Add support for Variant schema in from_json
[ https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47846. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46046 [https://github.com/apache/spark/pull/46046] > Add support for Variant schema in from_json > --- > > Key: SPARK-47846 > URL: https://issues.apache.org/jira/browse/SPARK-47846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding support for the variant type in the from_json expression. > "select from_json('', 'variant')" should interpret json_string > as a variant type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47846) Add support for Variant schema in from_json
[ https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47846: --- Assignee: Harsh Motwani > Add support for Variant schema in from_json > --- > > Key: SPARK-47846 > URL: https://issues.apache.org/jira/browse/SPARK-47846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Adding support for the variant type in the from_json expression. > "select from_json('', 'variant')" should interpret json_string > as a variant type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47360) Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47360. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46003 [https://github.com/apache/spark/pull/46003] > Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck > (all collations) > -- > > Key: SPARK-47360 > URL: https://issues.apache.org/jira/browse/SPARK-47360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47416) Add benchmark for stringpredicate expressions
[ https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47416. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46078 [https://github.com/apache/spark/pull/46078] > Add benchmark for stringpredicate expressions > - > > Key: SPARK-47416 > URL: https://issues.apache.org/jira/browse/SPARK-47416 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47863) endsWith and startsWith don't work correctly for some collations
[ https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47863. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46097 [https://github.com/apache/spark/pull/46097] > endsWith and startsWith don't work correctly for some collations > > > Key: SPARK-47863 > URL: https://issues.apache.org/jira/browse/SPARK-47863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use > {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to > compare prefixes/suffixes. This is not correct, since sometimes string parts > (suffix/prefix) of different lengths are actually equal in context of > case-insensitive and lower-case collations. > Example test cases that highlight the problem: > {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} > {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}} > {{The first passes, since it uses *StringSearch* directly, the second one > does not.}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47863) endsWith and startsWith don't work correctly for some collations
[ https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47863: --- Assignee: Vladimir Golubev > endsWith and startsWith don't work correctly for some collations > > > Key: SPARK-47863 > URL: https://issues.apache.org/jira/browse/SPARK-47863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use > {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to > compare prefixes/suffixes. This is not correct, since sometimes string parts > (suffix/prefix) of different lengths are actually equal in context of > case-insensitive and lower-case collations. > Example test cases that highlight the problem: > {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} > {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}} > {{The first passes, since it uses *StringSearch* directly, the second one > does not.}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47822) Prohibit Hash expressions from hashing Variant type
[ https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47822: --- Assignee: Harsh Motwani > Prohibit Hash expressions from hashing Variant type > --- > > Key: SPARK-47822 > URL: https://issues.apache.org/jira/browse/SPARK-47822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Prohibiting Hash functions from being applied on the Variant type. This is > because they haven't been implemented on the variant type and crash during > execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47822) Prohibit Hash expressions from hashing Variant type
[ https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47822. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46017 [https://github.com/apache/spark/pull/46017] > Prohibit Hash expressions from hashing Variant type > --- > > Key: SPARK-47822 > URL: https://issues.apache.org/jira/browse/SPARK-47822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Prohibiting Hash functions from being applied on the Variant type. This is > because they haven't been implemented on the variant type and crash during > execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47821) Add is_variant_null expression
[ https://issues.apache.org/jira/browse/SPARK-47821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47821. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46011 [https://github.com/apache/spark/pull/46011] > Add is_variant_null expression > -- > > Key: SPARK-47821 > URL: https://issues.apache.org/jira/browse/SPARK-47821 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > adds a `is_variant_null` expression, which returns whether a given variant > value represents a variant null (note the difference between a variant null > and an engine null) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47867) Support Variant in JSON scan.
[ https://issues.apache.org/jira/browse/SPARK-47867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47867. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46071 [https://github.com/apache/spark/pull/46071] > Support Variant in JSON scan. > - > > Key: SPARK-47867 > URL: https://issues.apache.org/jira/browse/SPARK-47867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47417: --- Assignee: Nikola Mandic > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47417. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45933 [https://github.com/apache/spark/pull/45933] > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47356. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46061 [https://github.com/apache/spark/pull/46061] > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47356: --- Assignee: Mihailo Milosevic > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47420) Fix CollationSupport test output
[ https://issues.apache.org/jira/browse/SPARK-47420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47420. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46058 [https://github.com/apache/spark/pull/46058] > Fix CollationSupport test output > > > Key: SPARK-47420 > URL: https://issues.apache.org/jira/browse/SPARK-47420 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47769) Add schema_of_variant_agg expression.
[ https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47769. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45934 [https://github.com/apache/spark/pull/45934] > Add schema_of_variant_agg expression. > - > > Key: SPARK-47769 > URL: https://issues.apache.org/jira/browse/SPARK-47769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47769) Add schema_of_variant_agg expression.
[ https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47769: --- Assignee: Chenhao Li > Add schema_of_variant_agg expression. > - > > Key: SPARK-47769 > URL: https://issues.apache.org/jira/browse/SPARK-47769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.
[ https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47463. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45589 [https://github.com/apache/spark/pull/45589] > An error occurred while pushing down the filter of if expression for iceberg > datasource. > > > Key: SPARK-47463 > URL: https://issues.apache.org/jira/browse/SPARK-47463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 > Environment: Spark 3.5.0 > Iceberg 1.4.3 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Reproduce: > {code:java} > create table t1(c1 int) using iceberg; > select * from > (select if(c1 = 1, c1, null) as c1 from t1) t > where t.c1 > 0; {code} > Error: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > optimization failed with an internal error. You hit a bug in Spark or the > Spark plugins you use. Please, report this bug to the corresponding > communities or vendors, and provide the full stack trace. > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536) > at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144) > at > org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179) > at > org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238) > at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284) > at > org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.AssertionError: assertion failed >
[jira] [Resolved] (SPARK-46810) Clarify error class terminology
[ https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46810. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44902 [https://github.com/apache/spark/pull/44902] > Clarify error class terminology > --- > > Key: SPARK-46810 > URL: https://issues.apache.org/jira/browse/SPARK-46810 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > We use inconsistent terminology when talking about error classes. I'd like to > get some clarity on that before contributing any potential improvements to > this part of the documentation. > Consider > [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. > It has several key pieces of hierarchical information that have inconsistent > names throughout our documentation and codebase: > * 42 > ** K01 > *** INCOMPLETE_TYPE_DEFINITION > ARRAY > MAP > STRUCT > What are the names of these different levels of information? > Some examples of inconsistent terminology: > * [Over > here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] > we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION > we call that an "error class". So what exactly is a class, the 42 or the > INCOMPLETE_TYPE_DEFINITION? > * [Over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] > we call K01 the "subclass". But [over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] > we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for > INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". > So what exactly is a subclass? > * [On this > page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] > we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other > places we refer to it as an "error class". > I don't think we should leave this status quo as-is. I see a couple of ways > to fix this. > h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition" > One solution is to use the following terms: > * Error class: 42 > * Error sub-class: K01 > * Error state: 42K01 > * Error condition: INCOMPLETE_TYPE_DEFINITION > * Error sub-condition: ARRAY, MAP, STRUCT > Pros: > * This terminology seems (to me at least) the most natural and intuitive. > * It aligns most closely to the SQL standard. > Cons: > * We use {{errorClass}} [all over our > codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30] > – literally in thousands of places – to refer to strings like > INCOMPLETE_TYPE_DEFINITION. > ** It's probably not practical to update all these usages to say > {{errorCondition}} instead, so if we go with this approach there will be a > divide between the terminology we use in user-facing documentation vs. what > the code base uses. > ** We can perhaps rename the existing {{error-classes.json}} to > {{error-conditions.json}} but clarify the reason for this divide between code > and user docs in the documentation for {{ErrorClassesJsonReader}} . > h1. Option 2: 42 becomes an "Error Category" > Another approach is to use the following terminology: > * Error category: 42 > * Error sub-category: K01 > * Error state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change from calling "42" a "class" to a "category" is low impact and > may not show up in user-facing documentation at all. (See my side note below.) > Cons: > * These terms do not align with the SQL standard. > * We will have to retire the term "error condition", which we have [already > used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md] > in user-facing documentation. > h1. Option 3: "Error Class" and "State Class" > * SQL state class: 42 > * SQL state sub-class: K01 > * SQL state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change from