[jira] [Assigned] (SPARK-48364) Type casting for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48364: --- Assignee: Uroš Bojanić > Type casting for AbstractMapType > > > Key: SPARK-48364 > URL: https://issues.apache.org/jira/browse/SPARK-48364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48364) Type casting for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48364. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46661 [https://github.com/apache/spark/pull/46661] > Type casting for AbstractMapType > > > Key: SPARK-48364 > URL: https://issues.apache.org/jira/browse/SPARK-48364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48215. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46561 [https://github.com/apache/spark/pull/46561] > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48215: --- Assignee: Nebojsa Savic > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48305) CurrentLike - Database/Schema, Catalog, User (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48305. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46613 [https://github.com/apache/spark/pull/46613] > CurrentLike - Database/Schema, Catalog, User (all collations) > - > > Key: SPARK-48305 > URL: https://issues.apache.org/jira/browse/SPARK-48305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48175) Store collation information in metadata and not in type for SER/DE
[ https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48175: --- Assignee: Stefan Kandic > Store collation information in metadata and not in type for SER/DE > -- > > Key: SPARK-48175 > URL: https://issues.apache.org/jira/browse/SPARK-48175 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Changing serialization and deserialization of collated strings so that the > collation information is put in the metadata of the enclosing struct field - > and then read back from there during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48175) Store collation information in metadata and not in type for SER/DE
[ https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48175. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46280 [https://github.com/apache/spark/pull/46280] > Store collation information in metadata and not in type for SER/DE > -- > > Key: SPARK-48175 > URL: https://issues.apache.org/jira/browse/SPARK-48175 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Changing serialization and deserialization of collated strings so that the > collation information is put in the metadata of the enclosing struct field - > and then read back from there during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy
[ https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48308. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46619 [https://github.com/apache/spark/pull/46619] > Unify getting data schema without partition columns in FileSourceStrategy > - > > Key: SPARK-48308 > URL: https://issues.apache.org/jira/browse/SPARK-48308 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > In > [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191] > the schema of the data excluding partition columns is computed 2 times in a > slightly different way: > > {code:java} > val dataColumnsWithoutPartitionCols = > dataColumns.filterNot(partitionSet.contains) {code} > > vs > {code:java} > val readDataColumns = dataColumns > .filterNot(partitionColumns.contains) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy
[ https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48308: --- Assignee: Johan Lasperas > Unify getting data schema without partition columns in FileSourceStrategy > - > > Key: SPARK-48308 > URL: https://issues.apache.org/jira/browse/SPARK-48308 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > > In > [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191] > the schema of the data excluding partition columns is computed 2 times in a > slightly different way: > > {code:java} > val dataColumnsWithoutPartitionCols = > dataColumns.filterNot(partitionSet.contains) {code} > > vs > {code:java} > val readDataColumns = dataColumns > .filterNot(partitionColumns.contains) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48288) Add source data type to connector.Cast expression
[ https://issues.apache.org/jira/browse/SPARK-48288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48288. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46596 [https://github.com/apache/spark/pull/46596] > Add source data type to connector.Cast expression > - > > Key: SPARK-48288 > URL: https://issues.apache.org/jira/browse/SPARK-48288 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, > V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast > expression. > Catalyst cast have expression data type, but connector cast does not have it. > Since some casts are not allowed on external engine, we need to know source > and target data type, since we want finer granularity to block some > unsupported casts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48252. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46552 [https://github.com/apache/spark/pull/46552] > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48252: --- Assignee: Wenchen Fan > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48172. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46588 [https://github.com/apache/spark/pull/46588] > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48277) Improve error message for ErrorClassesJsonReader.getErrorMessage
[ https://issues.apache.org/jira/browse/SPARK-48277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48277. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46584 [https://github.com/apache/spark/pull/46584] > Improve error message for ErrorClassesJsonReader.getErrorMessage > > > Key: SPARK-48277 > URL: https://issues.apache.org/jira/browse/SPARK-48277 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48160: --- Assignee: Uroš Bojanić > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48160. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46508 [https://github.com/apache/spark/pull/46508] > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48162: --- Assignee: Uroš Bojanić > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48162. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46461 [https://github.com/apache/spark/pull/46461] > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
[ https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-48271: Summary: Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER (was: support char/varchar in RowEncoder) > Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER > - > > Key: SPARK-48271 > URL: https://issues.apache.org/jira/browse/SPARK-48271 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48263) Collate function support for non UTF8_BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48263. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46574 [https://github.com/apache/spark/pull/46574] > Collate function support for non UTF8_BINARY strings > > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48263) Collate function support for non UTF8_BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48263: --- Assignee: Nebojsa Savic > Collate function support for non UTF8_BINARY strings > > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Assignee: Nebojsa Savic >Priority: Major > Labels: pull-request-available > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48172. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46437 [https://github.com/apache/spark/pull/46437] > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
[ https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48155. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46523 [https://github.com/apache/spark/pull/46523] > PropagateEmpty relation cause LogicalQueryStage only with broadcast without > join then execute failed > > > Key: SPARK-48155 > URL: https://issues.apache.org/jira/browse/SPARK-48155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.5.1, 3.3.4 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: > === Applying Rule > org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= > cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + > INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp))) > ! :- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 > ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 > more fields]24/05/07 09:48:55 ERROR [main] > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 > more fields] > ! +- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 {code} > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.UnsupportedOperationException: BroadcastExchange does not support > the execute() code path.at > org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) > at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at
[jira] [Assigned] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
[ https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48155: --- Assignee: angerszhu > PropagateEmpty relation cause LogicalQueryStage only with broadcast without > join then execute failed > > > Key: SPARK-48155 > URL: https://issues.apache.org/jira/browse/SPARK-48155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.5.1, 3.3.4 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: > === Applying Rule > org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= > cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + > INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, > Some(Asia/Singapore)) as timestamp))) > ! :- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 > ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 > more fields]24/05/07 09:48:55 ERROR [main] > Project [date#124, station_name#0, shipment_id#14] > +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 > AND station_type#1 IN (3,12)) > +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 > more fields] > ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 > more fields] > ! +- LogicalQueryStage Generate > explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), > false, [date#124], BroadcastQueryStage 0 {code} > {code:java} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.UnsupportedOperationException: BroadcastExchange does not support > the execute() code path.at > org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) > at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) > at >
[jira] [Created] (SPARK-48271) support char/varchar in RowEncoder
Wenchen Fan created SPARK-48271: --- Summary: support char/varchar in RowEncoder Key: SPARK-48271 URL: https://issues.apache.org/jira/browse/SPARK-48271 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48157. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46504 [https://github.com/apache/spark/pull/46504] > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48157: --- Assignee: Uroš Bojanić > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48229) inputFile expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48229. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46503 [https://github.com/apache/spark/pull/46503] > inputFile expressions (all collations) > -- > > Key: SPARK-48229 > URL: https://issues.apache.org/jira/browse/SPARK-48229 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48265: --- Assignee: angerszhu > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48265. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46568 [https://github.com/apache/spark/pull/46568] > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48241. - Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46565 [https://github.com/apache/spark/pull/46565] > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48241: --- Assignee: Jiayi Liu > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48206. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46492 [https://github.com/apache/spark/pull/46492] > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48206: --- Assignee: Kelvin Jiang > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48031: --- Assignee: Serge Rielau > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48031. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46267 [https://github.com/apache/spark/pull/46267] > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
Wenchen Fan created SPARK-48260: --- Summary: disable output committer coordination in one test of ParquetIOSuite Key: SPARK-48260 URL: https://issues.apache.org/jira/browse/SPARK-48260 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary
Wenchen Fan created SPARK-48252: --- Summary: Update CommonExpressionRef when necessary Key: SPARK-48252 URL: https://issues.apache.org/jira/browse/SPARK-48252 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48146: --- Assignee: Kelvin Jiang > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48146. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46443 [https://github.com/apache/spark/pull/46443] > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48158) XML expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48158. - Fix Version/s: 4.0.0 Assignee: Uroš Bojanić Resolution: Fixed > XML expressions (all collations) > > > Key: SPARK-48158 > URL: https://issues.apache.org/jira/browse/SPARK-48158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *XML* built-in string functions in Spark > ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *xmlExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *XML* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48222: --- Assignee: Nicholas Chammas > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
[ https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48222. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46512 [https://github.com/apache/spark/pull/46512] > Sync Ruby Bundler to 2.4.22 and refresh Gem lock file > - > > Key: SPARK-48222 > URL: https://issues.apache.org/jira/browse/SPARK-48222 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47409: --- Assignee: David Milicevic > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47409. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46206 [https://github.com/apache/spark/pull/46206] > StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) > -- > > Key: SPARK-47409 > URL: https://issues.apache.org/jira/browse/SPARK-47409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringTrim* built-in string function in > Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, > {*}StringTrimRight{*}). First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|[https://www.postgresql.org/docs/]]. > > The goal for this Jira ticket is to implement the *StringTrim* function so it > supports binary & lowercase collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47421. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46460 [https://github.com/apache/spark/pull/46460] > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47421) URL expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47421: --- Assignee: Uroš Bojanić > URL expressions (all collations) > > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47354. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46424 [https://github.com/apache/spark/pull/46424] > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47354) Variant expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47354: --- Assignee: Uroš Bojanić > Variant expressions (all collations) > > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48186. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46458 [https://github.com/apache/spark/pull/46458] > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48186) Add support for AbstractMapType
[ https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48186: --- Assignee: Uroš Bojanić > Add support for AbstractMapType > --- > > Key: SPARK-48186 > URL: https://issues.apache.org/jira/browse/SPARK-48186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48197) avoid assert error for invalid lambda function
[ https://issues.apache.org/jira/browse/SPARK-48197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48197. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46475 [https://github.com/apache/spark/pull/46475] > avoid assert error for invalid lambda function > -- > > Key: SPARK-48197 > URL: https://issues.apache.org/jira/browse/SPARK-48197 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48204) fix release script for Spark 4.0+
Wenchen Fan created SPARK-48204: --- Summary: fix release script for Spark 4.0+ Key: SPARK-48204 URL: https://issues.apache.org/jira/browse/SPARK-48204 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48161) JSON expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48161. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46462 [https://github.com/apache/spark/pull/46462] > JSON expressions (all collations) > - > > Key: SPARK-48161 > URL: https://issues.apache.org/jira/browse/SPARK-48161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48188) Consistently use normalized plan for cache
Wenchen Fan created SPARK-48188: --- Summary: Consistently use normalized plan for cache Key: SPARK-48188 URL: https://issues.apache.org/jira/browse/SPARK-48188 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47297: --- Assignee: Uroš Bojanić > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47297. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46423 [https://github.com/apache/spark/pull/46423] > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48173) CheckAnalsis should see the entire query plan
Wenchen Fan created SPARK-48173: --- Summary: CheckAnalsis should see the entire query plan Key: SPARK-48173 URL: https://issues.apache.org/jira/browse/SPARK-48173 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48143: --- Assignee: Vladimir Golubev > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48143. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46400 [https://github.com/apache/spark/pull/46400] > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47267: --- Assignee: Uroš Bojanić > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47267. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46422 [https://github.com/apache/spark/pull/46422] > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
[ https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48166. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46428 [https://github.com/apache/spark/pull/46428] > Unwanted use of internal BadRecordException in VariantExpressionEvalUtils > - > > Key: SPARK-48166 > URL: https://issues.apache.org/jira/browse/SPARK-48166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > BadRecordException should not be used as user-facing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
[ https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48166: --- Assignee: Vladimir Golubev > Unwanted use of internal BadRecordException in VariantExpressionEvalUtils > - > > Key: SPARK-48166 > URL: https://issues.apache.org/jira/browse/SPARK-48166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > > BadRecordException should not be used as user-facing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-48027: Affects Version/s: (was: 3.5.1) (was: 3.4.3) > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48027. - Fix Version/s: 4.0.0 Assignee: angerszhu Resolution: Fixed > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47359) StringTranslate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47359. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45820 [https://github.com/apache/spark/pull/45820] > StringTranslate (all collations) > > > Key: SPARK-47359 > URL: https://issues.apache.org/jira/browse/SPARK-47359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringTranslate* built-in string function > in Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringTranslate* function > so it supports all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47359: --- Assignee: Milan Dankovic > StringTranslate (all collations) > > > Key: SPARK-47359 > URL: https://issues.apache.org/jira/browse/SPARK-47359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringTranslate* built-in string function > in Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringTranslate* function > so it supports all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48003) Hll sketch aggregate support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48003. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46241 [https://github.com/apache/spark/pull/46241] > Hll sketch aggregate support for strings with collation > --- > > Key: SPARK-48003 > URL: https://issues.apache.org/jira/browse/SPARK-48003 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47566. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45725 [https://github.com/apache/spark/pull/45725] > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47566: --- Assignee: Milan Dankovic > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48033. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46269 [https://github.com/apache/spark/pull/46269] > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48033: --- Assignee: Richard Chen > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47741: --- Assignee: Milan Stefanovic > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47741. - Resolution: Fixed Issue resolved by pull request 45896 [https://github.com/apache/spark/pull/45896] > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
[ https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47148. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45234 [https://github.com/apache/spark/pull/45234] > Avoid to materialize AQE ExchangeQueryStageExec on the cancellation > --- > > Key: SPARK-47148 > URL: https://issues.apache.org/jira/browse/SPARK-47148 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL >Affects Versions: 4.0.0 >Reporter: Eren Avsarogullari >Assignee: Eren Avsarogullari >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the > cancellation. This causes unnecessary stage materialization by submitting > Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is > already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or > *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should > just be skipped without materializing it. > Please find sample use-case: > *1- Stage Materialization Steps:* > When stage materialization is failed: > {code:java} > 1.1- ShuffleQueryStage1 - is materialized successfully, > 1.2- ShuffleQueryStage2 - materialization is failed, > 1.3- ShuffleQueryStage3 - Not materialized yet so > ShuffleQueryStage3.shuffleFuture is not initialized yet{code} > *2- Stage Cancellation Steps:* > {code:java} > 2.1- ShuffleQueryStage1 - is canceled due to already materialized, > 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as > default by AQE because it could not be materialized, > 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet > but currently, it is also tried to cancel and this stage requires to be > materialized first.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47567. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45791 [https://github.com/apache/spark/pull/45791] > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47567: --- Assignee: Milan Dankovic > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47939. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46209 [https://github.com/apache/spark/pull/46209] > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47939: --- Assignee: Vladimir Golubev > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47927: --- Assignee: Emil Ejbyfeldt > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47927. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46156 [https://github.com/apache/spark/pull/46156] > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Labels: correctness, pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48019: --- Assignee: Gene Pang > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48019. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46254 [https://github.com/apache/spark/pull/46254] > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47476: --- Assignee: Uroš Bojanić > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47476. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45704 [https://github.com/apache/spark/pull/45704] > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47351. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46165 [https://github.com/apache/spark/pull/46165] > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47351: --- Assignee: Uroš Bojanić > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47350. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46158 [https://github.com/apache/spark/pull/46158] > SplitPart (binary & lowercase collation only) > - > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47922) Implement try_parse_json
[ https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47922. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46141 [https://github.com/apache/spark/pull/46141] > Implement try_parse_json > > > Key: SPARK-47922 > URL: https://issues.apache.org/jira/browse/SPARK-47922 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement try_parse_json expression that runs parse_json on valid string > inputs and returns null when the input string is malformed. Note that this > expression also only supports string input types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47958. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46187 [https://github.com/apache/spark/pull/46187] > Task Scheduler may not know about executor when using LocalSchedulerBackend > --- > > Key: SPARK-47958 > URL: https://issues.apache.org/jira/browse/SPARK-47958 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0 >Reporter: Davin Tjong >Assignee: Davin Tjong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When using LocalSchedulerBackend, the task scheduler will not know about the > executor until a task is run, which can lead to unexpected behavior in tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47764: --- Assignee: Bo Zhang > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47764. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45930 [https://github.com/apache/spark/pull/45930] > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation
[ https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47418. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46181 [https://github.com/apache/spark/pull/46181] > Optimize string predicate expressions for UTF8_BINARY_LCASE collation > - > > Key: SPARK-47418 > URL: https://issues.apache.org/jira/browse/SPARK-47418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string > Spark functions using optimized lowercase comparison approach introduced by > [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to > the latest design and code structure imposed by [~uros-db] in > https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation > support is introduced for Spark SQL expressions. In addition, review previous > Jira tickets under the current parent in order to understand how > *StringPredicate* expressions are currently used and tested in Spark: > * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131] > * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248] > * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295] > These tickets should help you understand what changes were introduced in > order to enable collation support for these functions. Lastly, feel free to > use your chosen Spark SQL Editor to play around with the existing functions > and learn more about how they work. > > The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE > implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* > functions so that they use optimized lowercase comparison approach (following > the general logic in Nikola's PR), and benchmark the results accordingly. As > for testing, the currently existing unit test cases and end-to-end tests > should already fully cover the expected behaviour of *StringPredicate* > expressions for all collation types. In other words, the objective of this > ticket is only to enhance the internal implementation, without introducing > any user-facing changes to Spark SQL API. > > Finally, feel free to refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47873: --- Assignee: Stefan Kandic > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47873. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46083 [https://github.com/apache/spark/pull/46083] > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference
Wenchen Fan created SPARK-47956: --- Summary: sanity check for unresolved LCA reference Key: SPARK-47956 URL: https://issues.apache.org/jira/browse/SPARK-47956 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness
[ https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47352. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46104 [https://github.com/apache/spark/pull/46104] > Fix Upper, Lower, InitCap collation awareness > - > > Key: SPARK-47352 > URL: https://issues.apache.org/jira/browse/SPARK-47352 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47412. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46041 [https://github.com/apache/spark/pull/46041] > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47412: --- Assignee: Gideon P > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org