[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47927: --- Assignee: Emil Ejbyfeldt > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47927. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46156 [https://github.com/apache/spark/pull/46156] > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48019: --- Assignee: Gene Pang > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48019. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46254 [https://github.com/apache/spark/pull/46254] > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47476: --- Assignee: Uroš Bojanić > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47476. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45704 [https://github.com/apache/spark/pull/45704] > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47351. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46165 [https://github.com/apache/spark/pull/46165] > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47351: --- Assignee: Uroš Bojanić > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47350. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46158 [https://github.com/apache/spark/pull/46158] > SplitPart (binary & lowercase collation only) > - > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47922) Implement try_parse_json
[ https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47922. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46141 [https://github.com/apache/spark/pull/46141] > Implement try_parse_json > > > Key: SPARK-47922 > URL: https://issues.apache.org/jira/browse/SPARK-47922 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement try_parse_json expression that runs parse_json on valid string > inputs and returns null when the input string is malformed. Note that this > expression also only supports string input types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47958. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46187 [https://github.com/apache/spark/pull/46187] > Task Scheduler may not know about executor when using LocalSchedulerBackend > --- > > Key: SPARK-47958 > URL: https://issues.apache.org/jira/browse/SPARK-47958 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0 >Reporter: Davin Tjong >Assignee: Davin Tjong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When using LocalSchedulerBackend, the task scheduler will not know about the > executor until a task is run, which can lead to unexpected behavior in tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47764: --- Assignee: Bo Zhang > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47764. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45930 [https://github.com/apache/spark/pull/45930] > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation
[ https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47418. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46181 [https://github.com/apache/spark/pull/46181] > Optimize string predicate expressions for UTF8_BINARY_LCASE collation > - > > Key: SPARK-47418 > URL: https://issues.apache.org/jira/browse/SPARK-47418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string > Spark functions using optimized lowercase comparison approach introduced by > [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to > the latest design and code structure imposed by [~uros-db] in > https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation > support is introduced for Spark SQL expressions. In addition, review previous > Jira tickets under the current parent in order to understand how > *StringPredicate* expressions are currently used and tested in Spark: > * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131] > * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248] > * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295] > These tickets should help you understand what changes were introduced in > order to enable collation support for these functions. Lastly, feel free to > use your chosen Spark SQL Editor to play around with the existing functions > and learn more about how they work. > > The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE > implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* > functions so that they use optimized lowercase comparison approach (following > the general logic in Nikola's PR), and benchmark the results accordingly. As > for testing, the currently existing unit test cases and end-to-end tests > should already fully cover the expected behaviour of *StringPredicate* > expressions for all collation types. In other words, the objective of this > ticket is only to enhance the internal implementation, without introducing > any user-facing changes to Spark SQL API. > > Finally, feel free to refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47873: --- Assignee: Stefan Kandic > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47873. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46083 [https://github.com/apache/spark/pull/46083] > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference
Wenchen Fan created SPARK-47956: --- Summary: sanity check for unresolved LCA reference Key: SPARK-47956 URL: https://issues.apache.org/jira/browse/SPARK-47956 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness
[ https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47352. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46104 [https://github.com/apache/spark/pull/46104] > Fix Upper, Lower, InitCap collation awareness > - > > Key: SPARK-47352 > URL: https://issues.apache.org/jira/browse/SPARK-47352 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47412. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46041 [https://github.com/apache/spark/pull/46041] > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47412: --- Assignee: Gideon P > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47411: --- Assignee: Milan Dankovic > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47411. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45643 [https://github.com/apache/spark/pull/45643] > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47900) Fix check for implicit collation
[ https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47900. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46116 [https://github.com/apache/spark/pull/46116] > Fix check for implicit collation > > > Key: SPARK-47900 > URL: https://issues.apache.org/jira/browse/SPARK-47900 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47413: --- Assignee: Gideon P > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47413. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46040 [https://github.com/apache/spark/pull/46040] > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47902. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46120 [https://github.com/apache/spark/pull/46120] > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable
[ https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47902: --- Assignee: Aleksandar Tomic > Compute Current Time* expressions should be foldable > > > Key: SPARK-47902 > URL: https://issues.apache.org/jira/browse/SPARK-47902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > > Following PR - https://github.com/apache/spark/pull/44261 changed "compute > current time" family of expressions to be unevaluable, given that these > expressions are supposed to be replaced with literals by QO. Unevaluable > implies that these expressions are not foldable, even though they will be > replaced by literals. > If these expressions were used in places that require constant folding (e.g. > RAND()) new behavior would be to raise an error which is a regression > comparing to behavior prior to spark 4.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46935) Consolidate error documentation
[ https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46935. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44971 [https://github.com/apache/spark/pull/44971] > Consolidate error documentation > --- > > Key: SPARK-46935 > URL: https://issues.apache.org/jira/browse/SPARK-46935 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46935) Consolidate error documentation
[ https://issues.apache.org/jira/browse/SPARK-46935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46935: --- Assignee: Nicholas Chammas > Consolidate error documentation > --- > > Key: SPARK-46935 > URL: https://issues.apache.org/jira/browse/SPARK-46935 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.
[ https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-47463: Fix Version/s: 3.5.2 > An error occurred while pushing down the filter of if expression for iceberg > datasource. > > > Key: SPARK-47463 > URL: https://issues.apache.org/jira/browse/SPARK-47463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 > Environment: Spark 3.5.0 > Iceberg 1.4.3 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > Reproduce: > {code:java} > create table t1(c1 int) using iceberg; > select * from > (select if(c1 = 1, c1, null) as c1 from t1) t > where t.c1 > 0; {code} > Error: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > optimization failed with an internal error. You hit a bug in Spark or the > Spark plugins you use. Please, report this bug to the corresponding > communities or vendors, and provide the full stack trace. > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536) > at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144) > at > org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179) > at > org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238) > at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284) > at > org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > at >
[jira] [Resolved] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47895. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46113 [https://github.com/apache/spark/pull/46113] > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47895: --- Assignee: Wenchen Fan > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47895) group by all should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-47895: Summary: group by all should be idempotent (was: group by ordinal should be idempotent) > group by all should be idempotent > - > > Key: SPARK-47895 > URL: https://issues.apache.org/jira/browse/SPARK-47895 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47895) group by ordinal should be idempotent
Wenchen Fan created SPARK-47895: --- Summary: group by ordinal should be idempotent Key: SPARK-47895 URL: https://issues.apache.org/jira/browse/SPARK-47895 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47839: --- Assignee: Kelvin Jiang > Fix Aggregate bug in RewriteWithExpression > -- > > Key: SPARK-47839 > URL: https://issues.apache.org/jira/browse/SPARK-47839 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > The following query will fail: > {code:SQL} > SELECT NULLIF(id + 1, 1) > from range(10) > group by id > {code} > This is because {{NullIf}} gets rewritten to {{With}}, then > {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of > the aggregate, resulting in an invalid plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47839. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46034 [https://github.com/apache/spark/pull/46034] > Fix Aggregate bug in RewriteWithExpression > -- > > Key: SPARK-47839 > URL: https://issues.apache.org/jira/browse/SPARK-47839 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The following query will fail: > {code:SQL} > SELECT NULLIF(id + 1, 1) > from range(10) > group by id > {code} > This is because {{NullIf}} gets rewritten to {{With}}, then > {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of > the aggregate, resulting in an invalid plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47846) Add support for Variant schema in from_json
[ https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47846. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46046 [https://github.com/apache/spark/pull/46046] > Add support for Variant schema in from_json > --- > > Key: SPARK-47846 > URL: https://issues.apache.org/jira/browse/SPARK-47846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding support for the variant type in the from_json expression. > "select from_json('', 'variant')" should interpret json_string > as a variant type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47846) Add support for Variant schema in from_json
[ https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47846: --- Assignee: Harsh Motwani > Add support for Variant schema in from_json > --- > > Key: SPARK-47846 > URL: https://issues.apache.org/jira/browse/SPARK-47846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Adding support for the variant type in the from_json expression. > "select from_json('', 'variant')" should interpret json_string > as a variant type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47360) Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47360. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46003 [https://github.com/apache/spark/pull/46003] > Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck > (all collations) > -- > > Key: SPARK-47360 > URL: https://issues.apache.org/jira/browse/SPARK-47360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47416) Add benchmark for stringpredicate expressions
[ https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47416. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46078 [https://github.com/apache/spark/pull/46078] > Add benchmark for stringpredicate expressions > - > > Key: SPARK-47416 > URL: https://issues.apache.org/jira/browse/SPARK-47416 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47863) endsWith and startsWith don't work correctly for some collations
[ https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47863. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46097 [https://github.com/apache/spark/pull/46097] > endsWith and startsWith don't work correctly for some collations > > > Key: SPARK-47863 > URL: https://issues.apache.org/jira/browse/SPARK-47863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use > {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to > compare prefixes/suffixes. This is not correct, since sometimes string parts > (suffix/prefix) of different lengths are actually equal in context of > case-insensitive and lower-case collations. > Example test cases that highlight the problem: > {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} > {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}} > {{The first passes, since it uses *StringSearch* directly, the second one > does not.}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47863) endsWith and startsWith don't work correctly for some collations
[ https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47863: --- Assignee: Vladimir Golubev > endsWith and startsWith don't work correctly for some collations > > > Key: SPARK-47863 > URL: https://issues.apache.org/jira/browse/SPARK-47863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use > {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to > compare prefixes/suffixes. This is not correct, since sometimes string parts > (suffix/prefix) of different lengths are actually equal in context of > case-insensitive and lower-case collations. > Example test cases that highlight the problem: > {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} > {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for > *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}} > {{The first passes, since it uses *StringSearch* directly, the second one > does not.}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47822) Prohibit Hash expressions from hashing Variant type
[ https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47822: --- Assignee: Harsh Motwani > Prohibit Hash expressions from hashing Variant type > --- > > Key: SPARK-47822 > URL: https://issues.apache.org/jira/browse/SPARK-47822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Prohibiting Hash functions from being applied on the Variant type. This is > because they haven't been implemented on the variant type and crash during > execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47822) Prohibit Hash expressions from hashing Variant type
[ https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47822. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46017 [https://github.com/apache/spark/pull/46017] > Prohibit Hash expressions from hashing Variant type > --- > > Key: SPARK-47822 > URL: https://issues.apache.org/jira/browse/SPARK-47822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Prohibiting Hash functions from being applied on the Variant type. This is > because they haven't been implemented on the variant type and crash during > execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47821) Add is_variant_null expression
[ https://issues.apache.org/jira/browse/SPARK-47821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47821. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46011 [https://github.com/apache/spark/pull/46011] > Add is_variant_null expression > -- > > Key: SPARK-47821 > URL: https://issues.apache.org/jira/browse/SPARK-47821 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > adds a `is_variant_null` expression, which returns whether a given variant > value represents a variant null (note the difference between a variant null > and an engine null) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47867) Support Variant in JSON scan.
[ https://issues.apache.org/jira/browse/SPARK-47867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47867. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46071 [https://github.com/apache/spark/pull/46071] > Support Variant in JSON scan. > - > > Key: SPARK-47867 > URL: https://issues.apache.org/jira/browse/SPARK-47867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47417: --- Assignee: Nikola Mandic > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47417. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45933 [https://github.com/apache/spark/pull/45933] > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47356. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46061 [https://github.com/apache/spark/pull/46061] > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47356: --- Assignee: Mihailo Milosevic > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47420) Fix CollationSupport test output
[ https://issues.apache.org/jira/browse/SPARK-47420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47420. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46058 [https://github.com/apache/spark/pull/46058] > Fix CollationSupport test output > > > Key: SPARK-47420 > URL: https://issues.apache.org/jira/browse/SPARK-47420 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47769) Add schema_of_variant_agg expression.
[ https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47769. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45934 [https://github.com/apache/spark/pull/45934] > Add schema_of_variant_agg expression. > - > > Key: SPARK-47769 > URL: https://issues.apache.org/jira/browse/SPARK-47769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47769) Add schema_of_variant_agg expression.
[ https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47769: --- Assignee: Chenhao Li > Add schema_of_variant_agg expression. > - > > Key: SPARK-47769 > URL: https://issues.apache.org/jira/browse/SPARK-47769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.
[ https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47463. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45589 [https://github.com/apache/spark/pull/45589] > An error occurred while pushing down the filter of if expression for iceberg > datasource. > > > Key: SPARK-47463 > URL: https://issues.apache.org/jira/browse/SPARK-47463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 > Environment: Spark 3.5.0 > Iceberg 1.4.3 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Reproduce: > {code:java} > create table t1(c1 int) using iceberg; > select * from > (select if(c1 = 1, c1, null) as c1 from t1) t > where t.c1 > 0; {code} > Error: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > optimization failed with an internal error. You hit a bug in Spark or the > Spark plugins you use. Please, report this bug to the corresponding > communities or vendors, and provide the full stack trace. > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536) > at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144) > at > org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179) > at > org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238) > at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284) > at > org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.AssertionError: assertion failed
[jira] [Resolved] (SPARK-46810) Clarify error class terminology
[ https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46810. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44902 [https://github.com/apache/spark/pull/44902] > Clarify error class terminology > --- > > Key: SPARK-46810 > URL: https://issues.apache.org/jira/browse/SPARK-46810 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > We use inconsistent terminology when talking about error classes. I'd like to > get some clarity on that before contributing any potential improvements to > this part of the documentation. > Consider > [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. > It has several key pieces of hierarchical information that have inconsistent > names throughout our documentation and codebase: > * 42 > ** K01 > *** INCOMPLETE_TYPE_DEFINITION > ARRAY > MAP > STRUCT > What are the names of these different levels of information? > Some examples of inconsistent terminology: > * [Over > here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] > we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION > we call that an "error class". So what exactly is a class, the 42 or the > INCOMPLETE_TYPE_DEFINITION? > * [Over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] > we call K01 the "subclass". But [over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] > we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for > INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". > So what exactly is a subclass? > * [On this > page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] > we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other > places we refer to it as an "error class". > I don't think we should leave this status quo as-is. I see a couple of ways > to fix this. > h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition" > One solution is to use the following terms: > * Error class: 42 > * Error sub-class: K01 > * Error state: 42K01 > * Error condition: INCOMPLETE_TYPE_DEFINITION > * Error sub-condition: ARRAY, MAP, STRUCT > Pros: > * This terminology seems (to me at least) the most natural and intuitive. > * It aligns most closely to the SQL standard. > Cons: > * We use {{errorClass}} [all over our > codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30] > – literally in thousands of places – to refer to strings like > INCOMPLETE_TYPE_DEFINITION. > ** It's probably not practical to update all these usages to say > {{errorCondition}} instead, so if we go with this approach there will be a > divide between the terminology we use in user-facing documentation vs. what > the code base uses. > ** We can perhaps rename the existing {{error-classes.json}} to > {{error-conditions.json}} but clarify the reason for this divide between code > and user docs in the documentation for {{ErrorClassesJsonReader}} . > h1. Option 2: 42 becomes an "Error Category" > Another approach is to use the following terminology: > * Error category: 42 > * Error sub-category: K01 > * Error state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change from calling "42" a "class" to a "category" is low impact and > may not show up in user-facing documentation at all. (See my side note below.) > Cons: > * These terms do not align with the SQL standard. > * We will have to retire the term "error condition", which we have [already > used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md] > in user-facing documentation. > h1. Option 3: "Error Class" and "State Class" > * SQL state class: 42 > * SQL state sub-class: K01 > * SQL state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change
[jira] [Assigned] (SPARK-46810) Clarify error class terminology
[ https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46810: --- Assignee: Nicholas Chammas > Clarify error class terminology > --- > > Key: SPARK-46810 > URL: https://issues.apache.org/jira/browse/SPARK-46810 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > > We use inconsistent terminology when talking about error classes. I'd like to > get some clarity on that before contributing any potential improvements to > this part of the documentation. > Consider > [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. > It has several key pieces of hierarchical information that have inconsistent > names throughout our documentation and codebase: > * 42 > ** K01 > *** INCOMPLETE_TYPE_DEFINITION > ARRAY > MAP > STRUCT > What are the names of these different levels of information? > Some examples of inconsistent terminology: > * [Over > here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] > we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION > we call that an "error class". So what exactly is a class, the 42 or the > INCOMPLETE_TYPE_DEFINITION? > * [Over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] > we call K01 the "subclass". But [over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] > we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for > INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". > So what exactly is a subclass? > * [On this > page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] > we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other > places we refer to it as an "error class". > I don't think we should leave this status quo as-is. I see a couple of ways > to fix this. > h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition" > One solution is to use the following terms: > * Error class: 42 > * Error sub-class: K01 > * Error state: 42K01 > * Error condition: INCOMPLETE_TYPE_DEFINITION > * Error sub-condition: ARRAY, MAP, STRUCT > Pros: > * This terminology seems (to me at least) the most natural and intuitive. > * It aligns most closely to the SQL standard. > Cons: > * We use {{errorClass}} [all over our > codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30] > – literally in thousands of places – to refer to strings like > INCOMPLETE_TYPE_DEFINITION. > ** It's probably not practical to update all these usages to say > {{errorCondition}} instead, so if we go with this approach there will be a > divide between the terminology we use in user-facing documentation vs. what > the code base uses. > ** We can perhaps rename the existing {{error-classes.json}} to > {{error-conditions.json}} but clarify the reason for this divide between code > and user docs in the documentation for {{ErrorClassesJsonReader}} . > h1. Option 2: 42 becomes an "Error Category" > Another approach is to use the following terminology: > * Error category: 42 > * Error sub-category: K01 > * Error state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change from calling "42" a "class" to a "category" is low impact and > may not show up in user-facing documentation at all. (See my side note below.) > Cons: > * These terms do not align with the SQL standard. > * We will have to retire the term "error condition", which we have [already > used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md] > in user-facing documentation. > h1. Option 3: "Error Class" and "State Class" > * SQL state class: 42 > * SQL state sub-class: K01 > * SQL state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > Pros: > * We continue to use "error class" as we do today in our code base. > * The change from calling "42" a "class" to a "state class" is low impact > and may not show up in user-facing documentation at all. (See my
[jira] [Resolved] (SPARK-47357) Add support for Upper, Lower, InitCap (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47357. - Fix Version/s: 4.0.0 Assignee: Mihailo Milosevic Resolution: Fixed > Add support for Upper, Lower, InitCap (all collations) > -- > > Key: SPARK-47357 > URL: https://issues.apache.org/jira/browse/SPARK-47357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47803) Support cast to variant.
[ https://issues.apache.org/jira/browse/SPARK-47803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47803. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45989 [https://github.com/apache/spark/pull/45989] > Support cast to variant. > > > Key: SPARK-47803 > URL: https://issues.apache.org/jira/browse/SPARK-47803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47765) Add SET COLLATION to parser rules
[ https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47765. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45946 [https://github.com/apache/spark/pull/45946] > Add SET COLLATION to parser rules > - > > Key: SPARK-47765 > URL: https://issues.apache.org/jira/browse/SPARK-47765 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47765) Add SET COLLATION to parser rules
[ https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47765: --- Assignee: Mihailo Milosevic > Add SET COLLATION to parser rules > - > > Key: SPARK-47765 > URL: https://issues.apache.org/jira/browse/SPARK-47765 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47800) Add method for converting v2 identifier to table identifier
[ https://issues.apache.org/jira/browse/SPARK-47800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47800. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45985 [https://github.com/apache/spark/pull/45985] > Add method for converting v2 identifier to table identifier > --- > > Key: SPARK-47800 > URL: https://issues.apache.org/jira/browse/SPARK-47800 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Move conversion of v2 identifier object to v1 table identifier to new method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47800) Add method for converting v2 identifier to table identifier
[ https://issues.apache.org/jira/browse/SPARK-47800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47800: --- Assignee: Uros Stankovic > Add method for converting v2 identifier to table identifier > --- > > Key: SPARK-47800 > URL: https://issues.apache.org/jira/browse/SPARK-47800 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Minor > Labels: pull-request-available > > Move conversion of v2 identifier object to v1 table identifier to new method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47410) refactor UTF8String and CollationFactory
[ https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47410: --- Assignee: Uroš Bojanić > refactor UTF8String and CollationFactory > > > Key: SPARK-47410 > URL: https://issues.apache.org/jira/browse/SPARK-47410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > This ticket addresses the need to refactor the {{UTF8String}} and > {{CollationFactory}} classes within Spark to enhance support for > collation-aware expressions. The goal is to improve code structure, > maintainability, readability, and testing coverage for collation-aware Spark > SQL expressions. > The changed introduced herein should simplify addition of new collation-aware > operations and ensure consistent testing across the codebase. > > To further support the addition of collation support for new Spark > expressions, here are a couple of guidelines to follow: > > // 1. Collation-aware expression implementation > CollationSupport.java > * should serve as a static entry point for collation-aware expressions, > providing custom support > * for example: one by one Spark expression with corresponding collation > support > * also note that: CollationAwareUTF8String should be used for > collation-aware UTF8String operations & other utility methods > CollationFactory.java > * should continue to serve as a static provider for high-level collation > interface > * for example: interacting with external ICU components such as Collator, > StringSearch, etc. > * also note that: no low-level / expression-specific code should generally > be found here > UTF8String.java > * should be largely collation-unaware, and generally be used only as > storage, nothing else > * for example: don’t change this class at all (with the only one-time > exception of: semanticEquals/Compare) > * also note that: no collation-aware operation implementations (using > collationId) should be put here > stringExpressions.scala / regexpExpressions.scala / other > “sql.catalyst.expressions” (for example: Between.scala) > * should only contain minimal changes in order to re-route collation-aware > implementations to CollationSupport > * for example: most changes should be in relation to: adding collationId, > using correct data types, replacements, etc. > * also note that: nullSafeEval & doGenCode should likely note contain > introduce extra branching based on collationId > > // 2. Collation-aware expression testing > CollationSuite.scala > * should be used for testing more general collation concepts > * for example: collate/collation expressions, collation names, DDL, casting, > aggregate, shuffle, join, etc. > * also note that: no extra tests should generally be added > CollationSupportSuite.java > * should be used for expression unit tests, these tests should be as > rigorous as possible in order to cover various cases > * for example: unit tests that test collation-aware expression > implementation for various collations (binary, lowercase, ICU) > * also note that: these tests should generally be written after adding > appropriate expression support in CollationSupport.java > CollationStringExpressionsSuite.scala / CollationRegexpExpressionsSuite.scala > / CollationExpressionSuite.scala > * should be used for expression end-to-end tests, these tests should only > cover crucial expression behaviour > * for example: SQL tests that verify query execution results, expected > return data types, casting, unsupported collation handling, etc. > * also note that: these tests should generally be written after enabling > appropriate expression support in stringExpressions.scala > > // 3. Closing notes > * Carefully think about performance implications of newly added custom > collation-aware expression implementation > * for example: be very careful with extra string allocations (UTF8Strings -> > (Java) String -> UTF8Strings, etc.) > * also note that: some operations introduce very heavy performance penalties > (we should avoid the ones we can) > > * Make sure to test all newly added expressions and completely (unit tests, > end-to-end tests, etc.) > * for example: consider edge, such as: empty strings, uppercase and > lowercase mix, different byte-length chars, etc. > * also note that: all similar tests should be uniform & readable and be kept > in one place for various expressions > > * Consider how new expressions interact with the rest of the system > (casting; collation support level - use correct AbstractStringType, etc.) > * for example: we should watch out for casting, test it
[jira] [Resolved] (SPARK-47410) refactor UTF8String and CollationFactory
[ https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47410. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45978 [https://github.com/apache/spark/pull/45978] > refactor UTF8String and CollationFactory > > > Key: SPARK-47410 > URL: https://issues.apache.org/jira/browse/SPARK-47410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This ticket addresses the need to refactor the {{UTF8String}} and > {{CollationFactory}} classes within Spark to enhance support for > collation-aware expressions. The goal is to improve code structure, > maintainability, readability, and testing coverage for collation-aware Spark > SQL expressions. > The changed introduced herein should simplify addition of new collation-aware > operations and ensure consistent testing across the codebase. > > To further support the addition of collation support for new Spark > expressions, here are a couple of guidelines to follow: > > // 1. Collation-aware expression implementation > CollationSupport.java > * should serve as a static entry point for collation-aware expressions, > providing custom support > * for example: one by one Spark expression with corresponding collation > support > * also note that: CollationAwareUTF8String should be used for > collation-aware UTF8String operations & other utility methods > CollationFactory.java > * should continue to serve as a static provider for high-level collation > interface > * for example: interacting with external ICU components such as Collator, > StringSearch, etc. > * also note that: no low-level / expression-specific code should generally > be found here > UTF8String.java > * should be largely collation-unaware, and generally be used only as > storage, nothing else > * for example: don’t change this class at all (with the only one-time > exception of: semanticEquals/Compare) > * also note that: no collation-aware operation implementations (using > collationId) should be put here > stringExpressions.scala / regexpExpressions.scala / other > “sql.catalyst.expressions” (for example: Between.scala) > * should only contain minimal changes in order to re-route collation-aware > implementations to CollationSupport > * for example: most changes should be in relation to: adding collationId, > using correct data types, replacements, etc. > * also note that: nullSafeEval & doGenCode should likely note contain > introduce extra branching based on collationId > > // 2. Collation-aware expression testing > CollationSuite.scala > * should be used for testing more general collation concepts > * for example: collate/collation expressions, collation names, DDL, casting, > aggregate, shuffle, join, etc. > * also note that: no extra tests should generally be added > CollationSupportSuite.java > * should be used for expression unit tests, these tests should be as > rigorous as possible in order to cover various cases > * for example: unit tests that test collation-aware expression > implementation for various collations (binary, lowercase, ICU) > * also note that: these tests should generally be written after adding > appropriate expression support in CollationSupport.java > CollationStringExpressionsSuite.scala / CollationRegexpExpressionsSuite.scala > / CollationExpressionSuite.scala > * should be used for expression end-to-end tests, these tests should only > cover crucial expression behaviour > * for example: SQL tests that verify query execution results, expected > return data types, casting, unsupported collation handling, etc. > * also note that: these tests should generally be written after enabling > appropriate expression support in stringExpressions.scala > > // 3. Closing notes > * Carefully think about performance implications of newly added custom > collation-aware expression implementation > * for example: be very careful with extra string allocations (UTF8Strings -> > (Java) String -> UTF8Strings, etc.) > * also note that: some operations introduce very heavy performance penalties > (we should avoid the ones we can) > > * Make sure to test all newly added expressions and completely (unit tests, > end-to-end tests, etc.) > * for example: consider edge, such as: empty strings, uppercase and > lowercase mix, different byte-length chars, etc. > * also note that: all similar tests should be uniform & readable and be kept > in one place for various expressions > > * Consider how new expressions interact with the rest of the system > (casting;
[jira] [Resolved] (SPARK-47617) Add TPC-DS testing infrastructure for collations
[ https://issues.apache.org/jira/browse/SPARK-47617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47617. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45739 [https://github.com/apache/spark/pull/45739] > Add TPC-DS testing infrastructure for collations > > > Key: SPARK-47617 > URL: https://issues.apache.org/jira/browse/SPARK-47617 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As collation support grows across all SQL features and new collation types > are added, we need to have reliable testing model covering as many standard > SQL capabilities as possible. > We can utilize TPC-DS testing infrastructure already present in Spark. The > idea is to vary TPC-DS table string columns by adding multiple collations > with different ordering rules and case sensitivity, producing new tables. > These tables should yield the same results against predefined TPC-DS queries > for certain batches of collations. For example, when comparing query runs on > table where columns are first collated as UTF8_BINARY and then as > UTF8_BINARY_LCASE, we should be getting same results after converting to > lowercase. > Introduce new query suite which tests the described behavior with available > collations (utf8_binary and unicode) combined with case conversions > (lowercase, uppercase, randomized case for fuzzy testing). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47617) Add TPC-DS testing infrastructure for collations
[ https://issues.apache.org/jira/browse/SPARK-47617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47617: --- Assignee: Nikola Mandic > Add TPC-DS testing infrastructure for collations > > > Key: SPARK-47617 > URL: https://issues.apache.org/jira/browse/SPARK-47617 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > > As collation support grows across all SQL features and new collation types > are added, we need to have reliable testing model covering as many standard > SQL capabilities as possible. > We can utilize TPC-DS testing infrastructure already present in Spark. The > idea is to vary TPC-DS table string columns by adding multiple collations > with different ordering rules and case sensitivity, producing new tables. > These tables should yield the same results against predefined TPC-DS queries > for certain batches of collations. For example, when comparing query runs on > table where columns are first collated as UTF8_BINARY and then as > UTF8_BINARY_LCASE, we should be getting same results after converting to > lowercase. > Introduce new query suite which tests the described behavior with available > collations (utf8_binary and unicode) combined with case conversions > (lowercase, uppercase, randomized case for fuzzy testing). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47736) Add support for AbstractArrayType
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47736: --- Assignee: Mihailo Milosevic > Add support for AbstractArrayType > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47736) Add support for AbstractArrayType
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47736. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45891 [https://github.com/apache/spark/pull/45891] > Add support for AbstractArrayType > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47809) checkExceptionInExpression should check error for each codegen mode
Wenchen Fan created SPARK-47809: --- Summary: checkExceptionInExpression should check error for each codegen mode Key: SPARK-47809 URL: https://issues.apache.org/jira/browse/SPARK-47809 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types
[ https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47001. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45146 [https://github.com/apache/spark/pull/45146] > Pushdown Verification in Optimizer.scala should support changed data types > -- > > Key: SPARK-47001 > URL: https://issues.apache.org/jira/browse/SPARK-47001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When pushing a filter down in a union the data type may not match exactly if > the filter was constructed using the child dataframe reference. This is > because the unions output is updated with a structype merge of union which > can turn non-nullable to nullable. These are still the same column despite > the different nullability so the filter should be safe to push down. As it > currently stands we get an exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47274) Provide more useful context for PySpark DataFrame API errors
[ https://issues.apache.org/jira/browse/SPARK-47274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47274. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45377 [https://github.com/apache/spark/pull/45377] > Provide more useful context for PySpark DataFrame API errors > > > Key: SPARK-47274 > URL: https://issues.apache.org/jira/browse/SPARK-47274 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Errors originating from PySpark operations can be difficult to debug with > limited context in the error messages. While improvements on the JVM side > have been made to offer detailed error contexts, PySpark errors often lack > this level of detail. Adding detailed context about the location within the > user's PySpark code where the error occurred will help debuggability for > PySpark users. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47274) Provide more useful context for PySpark DataFrame API errors
[ https://issues.apache.org/jira/browse/SPARK-47274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47274: --- Assignee: Haejoon Lee > Provide more useful context for PySpark DataFrame API errors > > > Key: SPARK-47274 > URL: https://issues.apache.org/jira/browse/SPARK-47274 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Errors originating from PySpark operations can be difficult to debug with > limited context in the error messages. While improvements on the JVM side > have been made to offer detailed error contexts, PySpark errors often lack > this level of detail. Adding detailed context about the location within the > user's PySpark code where the error occurred will help debuggability for > PySpark users. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47802) Revert mapping ( star ) to named_struct ( star )
[ https://issues.apache.org/jira/browse/SPARK-47802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47802. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45987 [https://github.com/apache/spark/pull/45987] > Revert mapping ( star ) to named_struct ( star ) > > > Key: SPARK-47802 > URL: https://issues.apache.org/jira/browse/SPARK-47802 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Turning star within parens into named_struct ( star) as opposed to ignoring > the parens turns out to be more risky than anticipated. Given that this was > done solely for consistency with ( c1, c2...) it's best to not go there at > all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47802) Revert mapping ( star ) to named_struct ( star )
[ https://issues.apache.org/jira/browse/SPARK-47802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47802: --- Assignee: Serge Rielau > Revert mapping ( star ) to named_struct ( star ) > > > Key: SPARK-47802 > URL: https://issues.apache.org/jira/browse/SPARK-47802 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > Turning star within parens into named_struct ( star) as opposed to ignoring > the parens turns out to be more risky than anticipated. Given that this was > done solely for consistency with ( c1, c2...) it's best to not go there at > all. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47775) Support remaining scalar types in the variant spec.
[ https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47775: --- Assignee: Chenhao Li > Support remaining scalar types in the variant spec. > --- > > Key: SPARK-47775 > URL: https://issues.apache.org/jira/browse/SPARK-47775 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47775) Support remaining scalar types in the variant spec.
[ https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47775. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45945 [https://github.com/apache/spark/pull/45945] > Support remaining scalar types in the variant spec. > --- > > Key: SPARK-47775 > URL: https://issues.apache.org/jira/browse/SPARK-47775 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47682) Support cast from variant.
[ https://issues.apache.org/jira/browse/SPARK-47682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47682. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45807 [https://github.com/apache/spark/pull/45807] > Support cast from variant. > -- > > Key: SPARK-47682 > URL: https://issues.apache.org/jira/browse/SPARK-47682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47504) Resolve AbstractDataType simpleStrings for StringTypeCollated
[ https://issues.apache.org/jira/browse/SPARK-47504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47504. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45694 [https://github.com/apache/spark/pull/45694] > Resolve AbstractDataType simpleStrings for StringTypeCollated > - > > Key: SPARK-47504 > URL: https://issues.apache.org/jira/browse/SPARK-47504 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *SPARK-47296* introduced a change to fail all unsupported functions. Because > of this change expected *inputTypes* in *ExpectsInputTypes* had to be > changed. This change introduced a change on user side which will print > *"STRING_ANY_COLLATION"* in places where before we printed *"STRING"* when an > error occurred. Concretely if we get an input of Int where > *StringTypeAnyCollation* was expected, we will throw this faulty message for > users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47681) Add schema_of_variant expression.
[ https://issues.apache.org/jira/browse/SPARK-47681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47681. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45806 [https://github.com/apache/spark/pull/45806] > Add schema_of_variant expression. > - > > Key: SPARK-47681 > URL: https://issues.apache.org/jira/browse/SPARK-47681 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47657) Implement filter pushdown support for collation per file source
[ https://issues.apache.org/jira/browse/SPARK-47657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47657: --- Assignee: Stefan Kandic > Implement filter pushdown support for collation per file source > --- > > Key: SPARK-47657 > URL: https://issues.apache.org/jira/browse/SPARK-47657 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47657) Implement filter pushdown support for collation per file source
[ https://issues.apache.org/jira/browse/SPARK-47657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47657. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45782 [https://github.com/apache/spark/pull/45782] > Implement filter pushdown support for collation per file source > --- > > Key: SPARK-47657 > URL: https://issues.apache.org/jira/browse/SPARK-47657 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47713) Fix a self-join failure
[ https://issues.apache.org/jira/browse/SPARK-47713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47713. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45846 [https://github.com/apache/spark/pull/45846] > Fix a self-join failure > --- > > Key: SPARK-47713 > URL: https://issues.apache.org/jira/browse/SPARK-47713 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47289) Allow extensions to log extended information in explain plan
[ https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47289. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45488 [https://github.com/apache/spark/pull/45488] > Allow extensions to log extended information in explain plan > > > Key: SPARK-47289 > URL: https://issues.apache.org/jira/browse/SPARK-47289 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Parth Chandra >Assignee: Parth Chandra >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > With session extensions, Spark planning can be extended to apply additional > rules and modify the execution plan. If an extension replaces a node in the > plan, the new node will be displayed in the plan. However, it is sometimes > useful for extensions provided extended information to the end user to > explain the impact of the extension. For instance an extension may > automatically enable/disable some feature that it provides and can provide > this extended information in the plan. > The proposal is to optionally turn on extended plan information from > extensions. Extensions can add additional planning information via a new > interface that internally uses a new TreeNodeTag, say 'explainPlan'. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47289) Allow extensions to log extended information in explain plan
[ https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47289: --- Assignee: Parth Chandra > Allow extensions to log extended information in explain plan > > > Key: SPARK-47289 > URL: https://issues.apache.org/jira/browse/SPARK-47289 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Parth Chandra >Assignee: Parth Chandra >Priority: Major > Labels: pull-request-available > > With session extensions, Spark planning can be extended to apply additional > rules and modify the execution plan. If an extension replaces a node in the > plan, the new node will be displayed in the plan. However, it is sometimes > useful for extensions provided extended information to the end user to > explain the impact of the extension. For instance an extension may > automatically enable/disable some feature that it provides and can provide > this extended information in the plan. > The proposal is to optionally turn on extended plan information from > extensions. Extensions can add additional planning information via a new > interface that internally uses a new TreeNodeTag, say 'explainPlan'. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47081) Support Query Execution Progress Messages
[ https://issues.apache.org/jira/browse/SPARK-47081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47081. - Resolution: Fixed Issue resolved by pull request 45150 [https://github.com/apache/spark/pull/45150] > Support Query Execution Progress Messages > - > > Key: SPARK-47081 > URL: https://issues.apache.org/jira/browse/SPARK-47081 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Connect should support reporting basic query progress to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47081) Support Query Execution Progress Messages
[ https://issues.apache.org/jira/browse/SPARK-47081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47081: --- Assignee: Martin Grund > Support Query Execution Progress Messages > - > > Key: SPARK-47081 > URL: https://issues.apache.org/jira/browse/SPARK-47081 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Connect should support reporting basic query progress to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47210) Addition of implicit casting without indeterminate support
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47210: --- Assignee: Mihailo Milosevic > Addition of implicit casting without indeterminate support > -- > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47210) Addition of implicit casting without indeterminate support
[ https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47210. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45383 [https://github.com/apache/spark/pull/45383] > Addition of implicit casting without indeterminate support > -- > > Key: SPARK-47210 > URL: https://issues.apache.org/jira/browse/SPARK-47210 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *What changes were proposed in this pull request?* > This PR adds automatic casting and collations resolution as per `PGSQL` > behaviour: > 1. Collations set on the metadata level are implicit > 2. Collations set using the `COLLATE` expression are explicit > 3. When there is a combination of expressions of multiple collations the > output will be: > - if there are explicit collations and all of them are equal then that > collation will be the output > - if there are multiple different explicit collations > `COLLATION_MISMATCH.EXPLICIT` will be thrown > - if there are no explicit collations and only a single type of non default > collation, that one will be used > - if there are no explicit collations and multiple non-default implicit ones > `COLLATION_MISMATCH.IMPLICIT` will be thrown > *Why are the changes needed?* > We need to be able to compare columns and values with different collations > and set a way of explicitly changing the collation we want to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47714) set spark.sql.legacy.timeParserPolicy to CORRECTED by default
Wenchen Fan created SPARK-47714: --- Summary: set spark.sql.legacy.timeParserPolicy to CORRECTED by default Key: SPARK-47714 URL: https://issues.apache.org/jira/browse/SPARK-47714 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47634) Legacy support for map normalization
[ https://issues.apache.org/jira/browse/SPARK-47634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47634. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45760 [https://github.com/apache/spark/pull/45760] > Legacy support for map normalization > > > Key: SPARK-47634 > URL: https://issues.apache.org/jira/browse/SPARK-47634 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stevo Mitric >Assignee: Stevo Mitric >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add legacy support for creating a map without normalizing keys before > inserting in `ArrayBasedMapBuilder`. > > Key normalization change can be found in this PR: > https://issues.apache.org/jira/browse/SPARK-47563 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47689) Do not wrap query execution error during data writing
Wenchen Fan created SPARK-47689: --- Summary: Do not wrap query execution error during data writing Key: SPARK-47689 URL: https://issues.apache.org/jira/browse/SPARK-47689 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47551) Add variant_get expression.
[ https://issues.apache.org/jira/browse/SPARK-47551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47551. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45708 [https://github.com/apache/spark/pull/45708] > Add variant_get expression. > --- > > Key: SPARK-47551 > URL: https://issues.apache.org/jira/browse/SPARK-47551 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46840) Collation benchmarking
[ https://issues.apache.org/jira/browse/SPARK-46840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46840. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45453 [https://github.com/apache/spark/pull/45453] > Collation benchmarking > -- > > Key: SPARK-46840 > URL: https://issues.apache.org/jira/browse/SPARK-46840 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46840) Collation benchmarking
[ https://issues.apache.org/jira/browse/SPARK-46840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46840: --- Assignee: Gideon P > Collation benchmarking > -- > > Key: SPARK-46840 > URL: https://issues.apache.org/jira/browse/SPARK-46840 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47569) Disallow comparing variant.
[ https://issues.apache.org/jira/browse/SPARK-47569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47569. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45726 [https://github.com/apache/spark/pull/45726] > Disallow comparing variant. > --- > > Key: SPARK-47569 > URL: https://issues.apache.org/jira/browse/SPARK-47569 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47572) Enforce Window partitionSpec is orderable.
[ https://issues.apache.org/jira/browse/SPARK-47572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47572. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45730 [https://github.com/apache/spark/pull/45730] > Enforce Window partitionSpec is orderable. > -- > > Key: SPARK-47572 > URL: https://issues.apache.org/jira/browse/SPARK-47572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.1, 3.3.4 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47572) Enforce Window partitionSpec is orderable.
[ https://issues.apache.org/jira/browse/SPARK-47572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47572: --- Assignee: Chenhao Li > Enforce Window partitionSpec is orderable. > -- > > Key: SPARK-47572 > URL: https://issues.apache.org/jira/browse/SPARK-47572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.1, 3.3.4 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47511) Canonicalize With expressions by re-assigning IDs
[ https://issues.apache.org/jira/browse/SPARK-47511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47511. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45649 [https://github.com/apache/spark/pull/45649] > Canonicalize With expressions by re-assigning IDs > - > > Key: SPARK-47511 > URL: https://issues.apache.org/jira/browse/SPARK-47511 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The current canonicalization of `With` expressions takes into account the ID > of the common expressions, which comes from a global monotonically increasing > ID. This means that queries with `With` expressions (e.g. `NULLIF` > expressions) will have inconsistent canonicalizations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47525) Support subquery correlation joining on map attributes
[ https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47525. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45673 [https://github.com/apache/spark/pull/45673] > Support subquery correlation joining on map attributes > -- > > Key: SPARK-47525 > URL: https://issues.apache.org/jira/browse/SPARK-47525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, when a subquery is correlated on a condition like `outer_map[1] = > inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself, > which is unsupported, so the query cannot run - for example: > > {code:java} > scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select > v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 > where v1.value[1] = v2.value[1])").explain > org.apache.spark.sql.AnalysisException: > [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] > Unsupported subquery expression: Correlated column reference 'v1.value' > cannot be map type. SQLSTATE: 0A000; line 1 pos 49 > at > org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463) > ... {code} > However, if we rewrite the query to pull out the map access `outer_map[1]` > into the outer plan, it succeeds: > > {code:java} > scala> sql("""with tmp as ( > select value[0] as value0, value[1] as value1 from v > ) > select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from > tmp v2 where v1.value1 = v2.value1)""").explain{code} > Another point that can be improved is that, even if the data type supports > join, we still don’t need to join on the full attribute, and we can get a > better plan by doing the same rewrite to pull out the extract expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47525) Support subquery correlation joining on map attributes
[ https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47525: --- Assignee: Jack Chen > Support subquery correlation joining on map attributes > -- > > Key: SPARK-47525 > URL: https://issues.apache.org/jira/browse/SPARK-47525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > > Currently, when a subquery is correlated on a condition like `outer_map[1] = > inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself, > which is unsupported, so the query cannot run - for example: > > {code:java} > scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select > v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 > where v1.value[1] = v2.value[1])").explain > org.apache.spark.sql.AnalysisException: > [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] > Unsupported subquery expression: Correlated column reference 'v1.value' > cannot be map type. SQLSTATE: 0A000; line 1 pos 49 > at > org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463) > ... {code} > However, if we rewrite the query to pull out the map access `outer_map[1]` > into the outer plan, it succeeds: > > {code:java} > scala> sql("""with tmp as ( > select value[0] as value0, value[1] as value1 from v > ) > select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from > tmp v2 where v1.value1 = v2.value1)""").explain{code} > Another point that can be improved is that, even if the data type supports > join, we still don’t need to join on the full attribute, and we can get a > better plan by doing the same rewrite to pull out the extract expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org