[jira] [Assigned] (SPARK-48364) Type casting for AbstractMapType

2024-05-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48364:
---

Assignee: Uroš Bojanić

> Type casting for AbstractMapType
> 
>
> Key: SPARK-48364
> URL: https://issues.apache.org/jira/browse/SPARK-48364
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48364) Type casting for AbstractMapType

2024-05-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48364.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46661
[https://github.com/apache/spark/pull/46661]

> Type casting for AbstractMapType
> 
>
> Key: SPARK-48364
> URL: https://issues.apache.org/jira/browse/SPARK-48364
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48215) DateFormatClass (all collations)

2024-05-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48215.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46561
[https://github.com/apache/spark/pull/46561]

> DateFormatClass (all collations)
> 
>
> Key: SPARK-48215
> URL: https://issues.apache.org/jira/browse/SPARK-48215
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nebojsa Savic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *DateFormatClass* built-in function in 
> Spark. First confirm what is the expected behaviour for this expression when 
> given collated strings, and then move on to implementation and testing. You 
> will find this expression in the *datetimeExpressions.scala* file, and it 
> should be considered a pass-through function with respect to collation 
> awareness. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *DateFormatClass* 
> expression so that it supports all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48215) DateFormatClass (all collations)

2024-05-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48215:
---

Assignee: Nebojsa Savic

> DateFormatClass (all collations)
> 
>
> Key: SPARK-48215
> URL: https://issues.apache.org/jira/browse/SPARK-48215
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nebojsa Savic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *DateFormatClass* built-in function in 
> Spark. First confirm what is the expected behaviour for this expression when 
> given collated strings, and then move on to implementation and testing. You 
> will find this expression in the *datetimeExpressions.scala* file, and it 
> should be considered a pass-through function with respect to collation 
> awareness. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *DateFormatClass* 
> expression so that it supports all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48305) CurrentLike - Database/Schema, Catalog, User (all collations)

2024-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48305.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46613
[https://github.com/apache/spark/pull/46613]

> CurrentLike - Database/Schema, Catalog, User (all collations)
> -
>
> Key: SPARK-48305
> URL: https://issues.apache.org/jira/browse/SPARK-48305
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48175) Store collation information in metadata and not in type for SER/DE

2024-05-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48175:
---

Assignee: Stefan Kandic

> Store collation information in metadata and not in type for SER/DE
> --
>
> Key: SPARK-48175
> URL: https://issues.apache.org/jira/browse/SPARK-48175
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> Changing serialization and deserialization of collated strings so that the 
> collation information is put in the metadata of the enclosing struct field - 
> and then read back from there during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48175) Store collation information in metadata and not in type for SER/DE

2024-05-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48175.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46280
[https://github.com/apache/spark/pull/46280]

> Store collation information in metadata and not in type for SER/DE
> --
>
> Key: SPARK-48175
> URL: https://issues.apache.org/jira/browse/SPARK-48175
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Changing serialization and deserialization of collated strings so that the 
> collation information is put in the metadata of the enclosing struct field - 
> and then read back from there during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy

2024-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48308.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46619
[https://github.com/apache/spark/pull/46619]

> Unify getting data schema without partition columns in FileSourceStrategy
> -
>
> Key: SPARK-48308
> URL: https://issues.apache.org/jira/browse/SPARK-48308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In 
> [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191]
>  the schema of the data excluding partition columns is computed 2 times in a 
> slightly different way:
>  
> {code:java}
> val dataColumnsWithoutPartitionCols = 
> dataColumns.filterNot(partitionSet.contains) {code}
>  
> vs 
> {code:java}
> val readDataColumns = dataColumns
>   .filterNot(partitionColumns.contains) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48308) Unify getting data schema without partition columns in FileSourceStrategy

2024-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48308:
---

Assignee: Johan Lasperas

> Unify getting data schema without partition columns in FileSourceStrategy
> -
>
> Key: SPARK-48308
> URL: https://issues.apache.org/jira/browse/SPARK-48308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Trivial
>  Labels: pull-request-available
>
> In 
> [FileSourceStrategy,|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L191]
>  the schema of the data excluding partition columns is computed 2 times in a 
> slightly different way:
>  
> {code:java}
> val dataColumnsWithoutPartitionCols = 
> dataColumns.filterNot(partitionSet.contains) {code}
>  
> vs 
> {code:java}
> val readDataColumns = dataColumns
>   .filterNot(partitionColumns.contains) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48288) Add source data type to connector.Cast expression

2024-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48288.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46596
[https://github.com/apache/spark/pull/46596]

> Add source data type to connector.Cast expression
> -
>
> Key: SPARK-48288
> URL: https://issues.apache.org/jira/browse/SPARK-48288
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, 
> V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast 
> expression.
> Catalyst cast have expression data type, but connector cast does not have it.
> Since some casts are not allowed on external engine, we need to know source 
> and target data type, since we want finer granularity to block some 
> unsupported casts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48252.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46552
[https://github.com/apache/spark/pull/46552]

> Update CommonExpressionRef when necessary
> -
>
> Key: SPARK-48252
> URL: https://issues.apache.org/jira/browse/SPARK-48252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48252:
---

Assignee: Wenchen Fan

> Update CommonExpressionRef when necessary
> -
>
> Key: SPARK-48252
> URL: https://issues.apache.org/jira/browse/SPARK-48252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48172.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46588
[https://github.com/apache/spark/pull/46588]

> Fix escaping issues in JDBCDialects
> ---
>
> Key: SPARK-48172
> URL: https://issues.apache.org/jira/browse/SPARK-48172
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48277) Improve error message for ErrorClassesJsonReader.getErrorMessage

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48277.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46584
[https://github.com/apache/spark/pull/46584]

> Improve error message for ErrorClassesJsonReader.getErrorMessage
> 
>
> Key: SPARK-48277
> URL: https://issues.apache.org/jira/browse/SPARK-48277
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48160) XPath expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48160:
---

Assignee: Uroš Bojanić

> XPath expressions (all collations)
> --
>
> Key: SPARK-48160
> URL: https://issues.apache.org/jira/browse/SPARK-48160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48160) XPath expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48160.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46508
[https://github.com/apache/spark/pull/46508]

> XPath expressions (all collations)
> --
>
> Key: SPARK-48160
> URL: https://issues.apache.org/jira/browse/SPARK-48160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48162) Miscellaneous expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48162:
---

Assignee: Uroš Bojanić

> Miscellaneous expressions (all collations)
> --
>
> Key: SPARK-48162
> URL: https://issues.apache.org/jira/browse/SPARK-48162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48162) Miscellaneous expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48162.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46461
[https://github.com/apache/spark/pull/46461]

> Miscellaneous expressions (all collations)
> --
>
> Key: SPARK-48162
> URL: https://issues.apache.org/jira/browse/SPARK-48162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-48271:

Summary: Turn match error in RowEncoder into 
UNSUPPORTED_DATA_TYPE_FOR_ENCODER  (was: support char/varchar in RowEncoder)

> Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
> -
>
> Key: SPARK-48271
> URL: https://issues.apache.org/jira/browse/SPARK-48271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48263) Collate function support for non UTF8_BINARY strings

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48263.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46574
[https://github.com/apache/spark/pull/46574]

> Collate function support for non UTF8_BINARY strings
> 
>
> Key: SPARK-48263
> URL: https://issues.apache.org/jira/browse/SPARK-48263
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nebojsa Savic
>Assignee: Nebojsa Savic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When default collation level config is set to some collation other than 
> UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or 
> collation) expression, this will fail because it is only accepting 
> StringType(0) as argument for collation name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48263) Collate function support for non UTF8_BINARY strings

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48263:
---

Assignee: Nebojsa Savic

> Collate function support for non UTF8_BINARY strings
> 
>
> Key: SPARK-48263
> URL: https://issues.apache.org/jira/browse/SPARK-48263
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nebojsa Savic
>Assignee: Nebojsa Savic
>Priority: Major
>  Labels: pull-request-available
>
> When default collation level config is set to some collation other than 
> UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or 
> collation) expression, this will fail because it is only accepting 
> StringType(0) as argument for collation name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48172.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46437
[https://github.com/apache/spark/pull/46437]

> Fix escaping issues in JDBCDialects
> ---
>
> Key: SPARK-48172
> URL: https://issues.apache.org/jira/browse/SPARK-48172
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48155.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46523
[https://github.com/apache/spark/pull/46523]

> PropagateEmpty relation cause LogicalQueryStage only with broadcast without 
> join then execute failed
> 
>
> Key: SPARK-48155
> URL: https://issues.apache.org/jira/browse/SPARK-48155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.5.1, 3.3.4
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> 24/05/07 09:48:55 ERROR [main] PlanChangeLogger:
> === Applying Rule 
> org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation ===
>  Project [date#124, station_name#0, shipment_id#14]
>  +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
> AND station_type#1 IN (3,12))
>     +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
> more fields] 
> !      +- Join LeftOuter, ((cast(date#124 as timestamp) >= 
> cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, 
> Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + 
> INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, 
> Some(Asia/Singapore)) as timestamp)))
> !         :- LogicalQueryStage Generate 
> explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
> false, [date#124], BroadcastQueryStage 0
> !         +- LocalRelation , [shipment_id#14, station_name#5, ... 3 
> more fields]24/05/07 09:48:55 ERROR [main] 
> Project [date#124, station_name#0, shipment_id#14]
>  +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
> AND station_type#1 IN (3,12))
>     +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
> more fields]
> !      +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 
> more fields]
> !         +- LogicalQueryStage Generate 
> explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
> false, [date#124], BroadcastQueryStage 0 {code}
> {code:java}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.lang.UnsupportedOperationException: BroadcastExchange does not support 
> the execute() code path.at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497)
> at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50)
> at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)  
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at 

[jira] [Assigned] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48155:
---

Assignee: angerszhu

> PropagateEmpty relation cause LogicalQueryStage only with broadcast without 
> join then execute failed
> 
>
> Key: SPARK-48155
> URL: https://issues.apache.org/jira/browse/SPARK-48155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.5.1, 3.3.4
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> 24/05/07 09:48:55 ERROR [main] PlanChangeLogger:
> === Applying Rule 
> org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation ===
>  Project [date#124, station_name#0, shipment_id#14]
>  +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
> AND station_type#1 IN (3,12))
>     +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
> more fields] 
> !      +- Join LeftOuter, ((cast(date#124 as timestamp) >= 
> cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, 
> Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + 
> INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, 
> Some(Asia/Singapore)) as timestamp)))
> !         :- LogicalQueryStage Generate 
> explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
> false, [date#124], BroadcastQueryStage 0
> !         +- LocalRelation , [shipment_id#14, station_name#5, ... 3 
> more fields]24/05/07 09:48:55 ERROR [main] 
> Project [date#124, station_name#0, shipment_id#14]
>  +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
> AND station_type#1 IN (3,12))
>     +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
> more fields]
> !      +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 
> more fields]
> !         +- LogicalQueryStage Generate 
> explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
> false, [date#124], BroadcastQueryStage 0 {code}
> {code:java}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> java.lang.UnsupportedOperationException: BroadcastExchange does not support 
> the execute() code path.at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497)
> at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50)
> at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)  
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
> at 
> 

[jira] [Created] (SPARK-48271) support char/varchar in RowEncoder

2024-05-14 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48271:
---

 Summary: support char/varchar in RowEncoder
 Key: SPARK-48271
 URL: https://issues.apache.org/jira/browse/SPARK-48271
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48157) CSV expressions (all collations)

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48157.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46504
[https://github.com/apache/spark/pull/46504]

> CSV expressions (all collations)
> 
>
> Key: SPARK-48157
> URL: https://issues.apache.org/jira/browse/SPARK-48157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for *CSV* built-in string functions in Spark 
> ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *csvExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *CSV* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48157) CSV expressions (all collations)

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48157:
---

Assignee: Uroš Bojanić

> CSV expressions (all collations)
> 
>
> Key: SPARK-48157
> URL: https://issues.apache.org/jira/browse/SPARK-48157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for *CSV* built-in string functions in Spark 
> ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *csvExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *CSV* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48229) inputFile expressions (all collations)

2024-05-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48229.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46503
[https://github.com/apache/spark/pull/46503]

> inputFile expressions (all collations)
> --
>
> Key: SPARK-48229
> URL: https://issues.apache.org/jira/browse/SPARK-48229
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48265:
---

Assignee: angerszhu

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]                                           
>                                                                        +- 
> Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))
> !      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) 
> AND (grass_region#735 = BR)) AND isnotnull(grass_region#735))               
> +- Relation db.table[,... 91 more fields] parquet
> !      :        +- Relation db.table[,... 91 more fields] parquet
> !      +- LocalLimit 21
> !         +- Project [item_id#738L]
> !            +- LocalRelation , [, ... 91 more fields]
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian 
> Products has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
> effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> NormalizeFloatingNumbers has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> ReplaceUpdateFieldsExpression has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
> Query has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has 
> no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
> PartitionPruning has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
> cannot be pushed down has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
>  GlobalLimit 21                                                               
>                                                            GlobalLimit 21
> !+- LocalLimit 21                                                             
>                                                            +- LocalLimit 
> least(, ... 2 more fields)
> !   +- LocalLimit 21                                                          
>                                                               +- Project 
> [item_id#647L]
> !      +- Project [item_id#647L]                                              
>                                                                  +- Filter 
> (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
> BR)) AND isnotnull(grass_region#735))
> !         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
> Relation db.table[,... 91 more fields] parquet
> !            +- Relation db.table[,... 91 more fields] parquet
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48265.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46568
[https://github.com/apache/spark/pull/46568]

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]                                           
>                                                                        +- 
> Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))
> !      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) 
> AND (grass_region#735 = BR)) AND isnotnull(grass_region#735))               
> +- Relation db.table[,... 91 more fields] parquet
> !      :        +- Relation db.table[,... 91 more fields] parquet
> !      +- LocalLimit 21
> !         +- Project [item_id#738L]
> !            +- LocalRelation , [, ... 91 more fields]
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian 
> Products has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
> effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> NormalizeFloatingNumbers has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> ReplaceUpdateFieldsExpression has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
> Query has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has 
> no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
> PartitionPruning has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
> cannot be pushed down has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
>  GlobalLimit 21                                                               
>                                                            GlobalLimit 21
> !+- LocalLimit 21                                                             
>                                                            +- LocalLimit 
> least(, ... 2 more fields)
> !   +- LocalLimit 21                                                          
>                                                               +- Project 
> [item_id#647L]
> !      +- Project [item_id#647L]                                              
>                                                                  +- Filter 
> (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
> BR)) AND isnotnull(grass_region#735))
> !         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
> Relation db.table[,... 91 more fields] parquet
> !            +- Relation db.table[,... 91 more fields] parquet
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48241.
-
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46565
[https://github.com/apache/spark/pull/46565]

> CSV parsing failure with char/varchar type columns
> --
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jiayi Liu
>Assignee: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48241:
---

Assignee: Jiayi Liu

> CSV parsing failure with char/varchar type columns
> --
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jiayi Liu
>Assignee: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48206.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46492
[https://github.com/apache/spark/pull/46492]

> Add tests for window expression rewrites in RewriteWithExpression
> -
>
> Key: SPARK-48206
> URL: https://issues.apache.org/jira/browse/SPARK-48206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Window expressions can be potentially problematic if we pull out a window 
> expression outside a `Window` operator. Right now this shouldn't happen but 
> we should add some tests to make sure it doesn't break.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48206:
---

Assignee: Kelvin Jiang

> Add tests for window expression rewrites in RewriteWithExpression
> -
>
> Key: SPARK-48206
> URL: https://issues.apache.org/jira/browse/SPARK-48206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Window expressions can be potentially problematic if we pull out a window 
> expression outside a `Window` operator. Right now this shouldn't happen but 
> we should add some tests to make sure it doesn't break.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48031) Add schema evolution options to views

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48031:
---

Assignee: Serge Rielau

> Add schema evolution options to views 
> --
>
> Key: SPARK-48031
> URL: https://issues.apache.org/jira/browse/SPARK-48031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> We want to provide the ability for views to react to changes in the query 
> resolution in manners differently than just failing the view.
> For example we want the view to be able to compensate for type changes by 
> casting the query result to the view column types.
> Or to adopt any type of column arity changes into a view.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48031) Add schema evolution options to views

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48031.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46267
[https://github.com/apache/spark/pull/46267]

> Add schema evolution options to views 
> --
>
> Key: SPARK-48031
> URL: https://issues.apache.org/jira/browse/SPARK-48031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We want to provide the ability for views to react to changes in the query 
> resolution in manners differently than just failing the view.
> For example we want the view to be able to compensate for type changes by 
> casting the query result to the view column types.
> Or to adopt any type of column arity changes into a view.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48260:
---

 Summary: disable output committer coordination in one test of 
ParquetIOSuite
 Key: SPARK-48260
 URL: https://issues.apache.org/jira/browse/SPARK-48260
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-13 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48252:
---

 Summary: Update CommonExpressionRef when necessary
 Key: SPARK-48252
 URL: https://issues.apache.org/jira/browse/SPARK-48252
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48146) Fix error with aggregate function in With child

2024-05-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48146:
---

Assignee: Kelvin Jiang

> Fix error with aggregate function in With child
> ---
>
> Key: SPARK-48146
> URL: https://issues.apache.org/jira/browse/SPARK-48146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Right now, if we have an aggregate function in the child of a With 
> expression, we fail an assertion. However, queries like this used to work:
> {code:sql}
> select
> id between cast(max(id between 1 and 2) as int) and id
> from range(10)
> group by id
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48146) Fix error with aggregate function in With child

2024-05-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48146.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46443
[https://github.com/apache/spark/pull/46443]

> Fix error with aggregate function in With child
> ---
>
> Key: SPARK-48146
> URL: https://issues.apache.org/jira/browse/SPARK-48146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Right now, if we have an aggregate function in the child of a With 
> expression, we fail an assertion. However, queries like this used to work:
> {code:sql}
> select
> id between cast(max(id between 1 and 2) as int) and id
> from range(10)
> group by id
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48158) XML expressions (all collations)

2024-05-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48158.
-
Fix Version/s: 4.0.0
 Assignee: Uroš Bojanić
   Resolution: Fixed

> XML expressions (all collations)
> 
>
> Key: SPARK-48158
> URL: https://issues.apache.org/jira/browse/SPARK-48158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for *XML* built-in string functions in Spark 
> ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *xmlExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *XML* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48222:
---

Assignee: Nicholas Chammas

> Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
> -
>
> Key: SPARK-48222
> URL: https://issues.apache.org/jira/browse/SPARK-48222
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48222) Sync Ruby Bundler to 2.4.22 and refresh Gem lock file

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48222.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46512
[https://github.com/apache/spark/pull/46512]

> Sync Ruby Bundler to 2.4.22 and refresh Gem lock file
> -
>
> Key: SPARK-48222
> URL: https://issues.apache.org/jira/browse/SPARK-48222
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47409:
---

Assignee: David Milicevic

> StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
> --
>
> Key: SPARK-47409
> URL: https://issues.apache.org/jira/browse/SPARK-47409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTrim* built-in string function in 
> Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, 
> {*}StringTrimRight{*}). First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|[https://www.postgresql.org/docs/]].
>  
> The goal for this Jira ticket is to implement the *StringTrim* function so it 
> supports binary & lowercase collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47409.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46206
[https://github.com/apache/spark/pull/46206]

> StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only)
> --
>
> Key: SPARK-47409
> URL: https://issues.apache.org/jira/browse/SPARK-47409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: David Milicevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringTrim* built-in string function in 
> Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, 
> {*}StringTrimRight{*}). First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|[https://www.postgresql.org/docs/]].
>  
> The goal for this Jira ticket is to implement the *StringTrim* function so it 
> supports binary & lowercase collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47421) URL expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47421.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46460
[https://github.com/apache/spark/pull/46460]

> URL expressions (all collations)
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47421) URL expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47421:
---

Assignee: Uroš Bojanić

> URL expressions (all collations)
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47354) Variant expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47354.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46424
[https://github.com/apache/spark/pull/46424]

> Variant expressions (all collations)
> 
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47354) Variant expressions (all collations)

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47354:
---

Assignee: Uroš Bojanić

> Variant expressions (all collations)
> 
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48186) Add support for AbstractMapType

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48186.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46458
[https://github.com/apache/spark/pull/46458]

> Add support for AbstractMapType
> ---
>
> Key: SPARK-48186
> URL: https://issues.apache.org/jira/browse/SPARK-48186
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48186) Add support for AbstractMapType

2024-05-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48186:
---

Assignee: Uroš Bojanić

> Add support for AbstractMapType
> ---
>
> Key: SPARK-48186
> URL: https://issues.apache.org/jira/browse/SPARK-48186
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48197) avoid assert error for invalid lambda function

2024-05-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48197.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46475
[https://github.com/apache/spark/pull/46475]

> avoid assert error for invalid lambda function
> --
>
> Key: SPARK-48197
> URL: https://issues.apache.org/jira/browse/SPARK-48197
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48204) fix release script for Spark 4.0+

2024-05-08 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48204:
---

 Summary: fix release script for Spark 4.0+
 Key: SPARK-48204
 URL: https://issues.apache.org/jira/browse/SPARK-48204
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48161) JSON expressions (all collations)

2024-05-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48161.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46462
[https://github.com/apache/spark/pull/46462]

> JSON expressions (all collations)
> -
>
> Key: SPARK-48161
> URL: https://issues.apache.org/jira/browse/SPARK-48161
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48188) Consistently use normalized plan for cache

2024-05-08 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48188:
---

 Summary: Consistently use normalized plan for cache
 Key: SPARK-48188
 URL: https://issues.apache.org/jira/browse/SPARK-48188
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47297) Format expressions (all collations)

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47297:
---

Assignee: Uroš Bojanić

> Format expressions (all collations)
> ---
>
> Key: SPARK-47297
> URL: https://issues.apache.org/jira/browse/SPARK-47297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47297) Format expressions (all collations)

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47297.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46423
[https://github.com/apache/spark/pull/46423]

> Format expressions (all collations)
> ---
>
> Key: SPARK-47297
> URL: https://issues.apache.org/jira/browse/SPARK-47297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48173) CheckAnalsis should see the entire query plan

2024-05-07 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48173:
---

 Summary: CheckAnalsis should see the entire query plan
 Key: SPARK-48173
 URL: https://issues.apache.org/jira/browse/SPARK-48173
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48143:
---

Assignee: Vladimir Golubev

> UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE 
> mode
> ---
>
> Key: SPARK-48143
> URL: https://issues.apache.org/jira/browse/SPARK-48143
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> Parsing partially-malformed CSV in permissive mode is slow due to heavy 
> exception construction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48143.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46400
[https://github.com/apache/spark/pull/46400]

> UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE 
> mode
> ---
>
> Key: SPARK-48143
> URL: https://issues.apache.org/jira/browse/SPARK-48143
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing partially-malformed CSV in permissive mode is slow due to heavy 
> exception construction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47267) Hash functions should respect collation

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47267:
---

Assignee: Uroš Bojanić

> Hash functions should respect collation
> ---
>
> Key: SPARK-47267
> URL: https://issues.apache.org/jira/browse/SPARK-47267
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> All functions in `hash_funcs` group should respec collation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47267) Hash functions should respect collation

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47267.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46422
[https://github.com/apache/spark/pull/46422]

> Hash functions should respect collation
> ---
>
> Key: SPARK-47267
> URL: https://issues.apache.org/jira/browse/SPARK-47267
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> All functions in `hash_funcs` group should respec collation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48166.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46428
[https://github.com/apache/spark/pull/46428]

> Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
> -
>
> Key: SPARK-48166
> URL: https://issues.apache.org/jira/browse/SPARK-48166
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> BadRecordException should not be used as user-facing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48166) Unwanted use of internal BadRecordException in VariantExpressionEvalUtils

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48166:
---

Assignee: Vladimir Golubev

> Unwanted use of internal BadRecordException in VariantExpressionEvalUtils
> -
>
> Key: SPARK-48166
> URL: https://issues.apache.org/jira/browse/SPARK-48166
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Minor
>  Labels: pull-request-available
>
> BadRecordException should not be used as user-facing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-48027:

Affects Version/s: (was: 3.5.1)
   (was: 3.4.3)

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         default.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM default.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             default.table_a2
>         where grass_date = '2024-04-25'
>         select
>             loan_id
>         from
>             default.table_a1
>         where grass_date = '2024-04-24' 
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-05-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48027.
-
Fix Version/s: 4.0.0
 Assignee: angerszhu
   Resolution: Fixed

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         default.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM default.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             default.table_a2
>         where grass_date = '2024-04-25'
>         select
>             loan_id
>         from
>             default.table_a1
>         where grass_date = '2024-04-24' 
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47359) StringTranslate (all collations)

2024-04-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47359.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45820
[https://github.com/apache/spark/pull/45820]

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47359:
---

Assignee: Milan Dankovic

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48003) Hll sketch aggregate support for strings with collation

2024-04-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48003.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46241
[https://github.com/apache/spark/pull/46241]

> Hll sketch aggregate support for strings with collation
> ---
>
> Key: SPARK-48003
> URL: https://issues.apache.org/jira/browse/SPARK-48003
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47566) SubstringIndex

2024-04-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47566.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45725
[https://github.com/apache/spark/pull/45725]

> SubstringIndex
> --
>
> Key: SPARK-47566
> URL: https://issues.apache.org/jira/browse/SPARK-47566
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *SubstringIndex* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *SubstringIndex* functions 
> so that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47566) SubstringIndex

2024-04-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47566:
---

Assignee: Milan Dankovic

> SubstringIndex
> --
>
> Key: SPARK-47566
> URL: https://issues.apache.org/jira/browse/SPARK-47566
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *SubstringIndex* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *SubstringIndex* functions 
> so that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48033.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46269
[https://github.com/apache/spark/pull/46269]

> Support Generated Column expressions that are `RuntimeReplaceable`
> --
>
> Key: SPARK-48033
> URL: https://issues.apache.org/jira/browse/SPARK-48033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, default columns that have a default of a `RuntimeReplaceable` 
> expression fails.
> This is because the `AlterTableCommand` constant folds before replacing 
> expressions with the actual implementation. For example:
> ```
> sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET")
> sql("INSERT INTO t VALUES(DEFAULT)")
> ```
> fails because `parse_json` is `RuntimeReplaceable` and is evaluated before 
> the analyzer inserts the correct expression into the plan
> This is especially important for Variant types because literal variants are 
> difficult to create - `parse_json` will likely be used the majority of the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48033:
---

Assignee: Richard Chen

> Support Generated Column expressions that are `RuntimeReplaceable`
> --
>
> Key: SPARK-48033
> URL: https://issues.apache.org/jira/browse/SPARK-48033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
>
> Currently, default columns that have a default of a `RuntimeReplaceable` 
> expression fails.
> This is because the `AlterTableCommand` constant folds before replacing 
> expressions with the actual implementation. For example:
> ```
> sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET")
> sql("INSERT INTO t VALUES(DEFAULT)")
> ```
> fails because `parse_json` is `RuntimeReplaceable` and is evaluated before 
> the analyzer inserts the correct expression into the plan
> This is especially important for Variant types because literal variants are 
> difficult to create - `parse_json` will likely be used the majority of the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47741:
---

Assignee: Milan Stefanovic

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Assignee: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47741) Handle stack overflow when parsing query

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47741.
-
Resolution: Fixed

Issue resolved by pull request 45896
[https://github.com/apache/spark/pull/45896]

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Assignee: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47148.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45234
[https://github.com/apache/spark/pull/45234]

> Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
> ---
>
> Key: SPARK-47148
> URL: https://issues.apache.org/jira/browse/SPARK-47148
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, SQL
>Affects Versions: 4.0.0
>Reporter: Eren Avsarogullari
>Assignee: Eren Avsarogullari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the 
> cancellation. This causes unnecessary stage materialization by submitting 
> Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is 
> already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or 
> *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should 
> just be skipped without materializing it.
> Please find sample use-case:
> *1- Stage Materialization Steps:*
> When stage materialization is failed:
> {code:java}
> 1.1- ShuffleQueryStage1 - is materialized successfully,
> 1.2- ShuffleQueryStage2 - materialization is failed,
> 1.3- ShuffleQueryStage3 - Not materialized yet so 
> ShuffleQueryStage3.shuffleFuture is not initialized yet{code}
> *2- Stage Cancellation Steps:*
> {code:java}
> 2.1- ShuffleQueryStage1 - is canceled due to already materialized,
> 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as 
> default by AQE because it could not be materialized,
> 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet 
> but currently, it is also tried to cancel and this stage requires to be 
> materialized first.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47567) StringLocate

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47567.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45791
[https://github.com/apache/spark/pull/45791]

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47567:
---

Assignee: Milan Dankovic

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47939.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46209
[https://github.com/apache/spark/pull/46209]

> Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER 
> error
> 
>
> Key: SPARK-47939
> URL: https://issues.apache.org/jira/browse/SPARK-47939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *Succeeds:* scala> spark.sql("select ?", Array(1)).show();
> *Fails:* spark.sql("describe select ?", Array(1)).show();
> *Fails:* spark.sql("explain select ?", Array(1)).show();
> Failures are of the form:
> org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
> and provide a mapping of the parameter to either a SQL literal or collection 
> constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 
> 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- 
> OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47939:
---

Assignee: Vladimir Golubev

> Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER 
> error
> 
>
> Key: SPARK-47939
> URL: https://issues.apache.org/jira/browse/SPARK-47939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *Succeeds:* scala> spark.sql("select ?", Array(1)).show();
> *Fails:* spark.sql("describe select ?", Array(1)).show();
> *Fails:* spark.sql("explain select ?", Array(1)).show();
> Failures are of the form:
> org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
> and provide a mapping of the parameter to either a SQL literal or collection 
> constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 
> 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- 
> OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47927:
---

Assignee: Emil Ejbyfeldt

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47927) Nullability after join not respected in UDF

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47927.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46156
[https://github.com/apache/spark/pull/46156]

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48019:
---

Assignee: Gene Pang

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48019.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46254
[https://github.com/apache/spark/pull/46254]

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47476:
---

Assignee: Uroš Bojanić

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47476.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45704
[https://github.com/apache/spark/pull/45704]

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47351.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46165
[https://github.com/apache/spark/pull/46165]

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47351:
---

Assignee: Uroš Bojanić

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47350.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46158
[https://github.com/apache/spark/pull/46158]

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47922) Implement try_parse_json

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47922.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46141
[https://github.com/apache/spark/pull/46141]

> Implement try_parse_json
> 
>
> Key: SPARK-47922
> URL: https://issues.apache.org/jira/browse/SPARK-47922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement try_parse_json expression that runs parse_json on valid string 
> inputs and returns null when the input string is malformed. Note that this 
> expression also only supports string input types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47958.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46187
[https://github.com/apache/spark/pull/46187]

> Task Scheduler may not know about executor when using LocalSchedulerBackend
> ---
>
> Key: SPARK-47958
> URL: https://issues.apache.org/jira/browse/SPARK-47958
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When using LocalSchedulerBackend, the task scheduler will not know about the 
> executor until a task is run, which can lead to unexpected behavior in tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47764:
---

Assignee: Bo Zhang

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47764.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45930
[https://github.com/apache/spark/pull/45930]

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47418.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46181
[https://github.com/apache/spark/pull/46181]

> Optimize string predicate expressions for UTF8_BINARY_LCASE collation
> -
>
> Key: SPARK-47418
> URL: https://issues.apache.org/jira/browse/SPARK-47418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string 
> Spark functions using optimized lowercase comparison approach introduced by 
> [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to 
> the latest design and code structure imposed by [~uros-db] in 
> https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation 
> support is introduced for Spark SQL expressions. In addition, review previous 
> Jira tickets under the current parent in order to understand how 
> *StringPredicate* expressions are currently used and tested in Spark:
>  * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131]
>  * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248]
>  * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295]
> These tickets should help you understand what changes were introduced in 
> order to enable collation support for these functions. Lastly, feel free to 
> use your chosen Spark SQL Editor to play around with the existing functions 
> and learn more about how they work.
>  
> The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE 
> implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* 
> functions so that they use optimized lowercase comparison approach (following 
> the general logic in Nikola's PR), and benchmark the results accordingly. As 
> for testing, the currently existing unit test cases and end-to-end tests 
> should already fully cover the expected behaviour of *StringPredicate* 
> expressions for all collation types. In other words, the objective of this 
> ticket is only to enhance the internal implementation, without introducing 
> any user-facing changes to Spark SQL API.
>  
> Finally, feel free to refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47873:
---

Assignee: Stefan Kandic

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47873.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46083
[https://github.com/apache/spark/pull/46083]

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47956) sanity check for unresolved LCA reference

2024-04-23 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47956:
---

 Summary: sanity check for unresolved LCA reference
 Key: SPARK-47956
 URL: https://issues.apache.org/jira/browse/SPARK-47956
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47352.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46104
[https://github.com/apache/spark/pull/46104]

> Fix Upper, Lower, InitCap collation awareness
> -
>
> Key: SPARK-47352
> URL: https://issues.apache.org/jira/browse/SPARK-47352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47412.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46041
[https://github.com/apache/spark/pull/46041]

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47412:
---

Assignee: Gideon P

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >