[jira] [Resolved] (SPARK-47581) SQL catalyst: Migrate logWarn with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47581.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45904
[https://github.com/apache/spark/pull/45904]

> SQL catalyst: Migrate logWarn with variables to structured logging framework
> 
>
> Key: SPARK-47581
> URL: https://issues.apache.org/jira/browse/SPARK-47581
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47754) Postgres: Support reading multidimensional arrays

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47754.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45917
[https://github.com/apache/spark/pull/45917]

> Postgres: Support reading multidimensional arrays
> -
>
> Key: SPARK-47754
> URL: https://issues.apache.org/jira/browse/SPARK-47754
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47771) Make max_by, min_by doctests deterministic

2024-04-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47771.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45939
[https://github.com/apache/spark/pull/45939]

> Make max_by, min_by doctests deterministic
> --
>
> Key: SPARK-47771
> URL: https://issues.apache.org/jira/browse/SPARK-47771
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47775) Support remaining scalar types in the variant spec.

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47775:
---
Labels: pull-request-available  (was: )

> Support remaining scalar types in the variant spec.
> ---
>
> Key: SPARK-47775
> URL: https://issues.apache.org/jira/browse/SPARK-47775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47775) Support remaining scalar types in the variant spec.

2024-04-08 Thread Chenhao Li (Jira)
Chenhao Li created SPARK-47775:
--

 Summary: Support remaining scalar types in the variant spec.
 Key: SPARK-47775
 URL: https://issues.apache.org/jira/browse/SPARK-47775
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Chenhao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47774:
---
Labels: pull-request-available  (was: )

> Remove redundant rules from `MimaExcludes`
> --
>
> Key: SPARK-47774
> URL: https://issues.apache.org/jira/browse/SPARK-47774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
SPIP doc: 
https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the 

[jira] [Created] (SPARK-47774) Remove redundant rules from `MimaExcludes`

2024-04-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47774:
-

 Summary: Remove redundant rules from `MimaExcludes`
 Key: SPARK-47774
 URL: https://issues.apache.org/jira/browse/SPARK-47774
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase. 

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase. 

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency. The design 
proposal advocates for the incorporation of the TransformSupport interface and 
its specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency. The design 
proposal advocates for the incorporation of the TransformSupport interface and 
its specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers.   (was: This SPIP outlines the 
integration of Gluten's physical plan conversion, validation, and fallback 
framework into Apache Spark. The goal is to enhance Spark's flexibility and 
robustness in executing physical plans and to leverage Gluten's performance 
optimizations. Currently, Spark lacks an official cross-platform execution 
support for physical plans. Gluten's mechanism, which employs the Substrait 
standard, can convert and optimize Spark's physical plans, thus improving 
portability, interoperability, and execution efficiency. The design proposal 
advocates for the incorporation of the TransformSupport interface and its 
specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. )

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency. The design proposal advocates for the incorporation of the 
> TransformSupport interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: This SPIP outlines the integration of Gluten's physical plan 
conversion, validation, and fallback framework into Apache Spark. The goal is 
to enhance Spark's flexibility and robustness in executing physical plans and 
to leverage Gluten's performance optimizations. Currently, Spark lacks an 
official cross-platform execution support for physical plans. Gluten's 
mechanism, which employs the Substrait standard, can convert and optimize 
Spark's physical plans, thus improving portability, interoperability, and 
execution efficiency. The design proposal advocates for the incorporation of 
the TransformSupport interface and its specialized 
variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This SPIP outlines the integration of Gluten's physical plan conversion, 
> validation, and fallback framework into Apache Spark. The goal is to enhance 
> Spark's flexibility and robustness in executing physical plans and to 
> leverage Gluten's performance optimizations. Currently, Spark lacks an 
> official cross-platform execution support for physical plans. Gluten's 
> mechanism, which employs the Substrait standard, can convert and optimize 
> Spark's physical plans, thus improving portability, interoperability, and 
> execution efficiency. The design proposal advocates for the incorporation of 
> the TransformSupport interface and its specialized 
> variants—LeafTransformSupport, UnaryTransformSupport, and 
> BinaryTransformSupport. These are instrumental in streamlining the conversion 
> of different operator types into a Substrait-based common format. The 
> validation phase entails a thorough assessment of the Substrait plan against 
> native backends to ensure compatibility. In instances where validation does 
> not succeed, Spark's native operators will be deployed, with requisite 
> transformations to adapt data formats accordingly. The proposal emphasizes 
> the centrality of the plan transformation phase, positing it as the 
> foundational step. The subsequent validation and fallback procedures are 
> slated for consideration upon the successful establishment of the initial 
> phase.  The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)
Ke Jia created SPARK-47773:
--

 Summary: Enhancing the Flexibility of Spark's Physical Plan to 
Enable Execution on Various Native Engines
 Key: SPARK-47773
 URL: https://issues.apache.org/jira/browse/SPARK-47773
 Project: Spark
  Issue Type: Epic
  Components: SQL
Affects Versions: 3.5.1
Reporter: Ke Jia






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47770:
--
Fix Version/s: 3.5.2
   3.4.3

> Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of 
> failing
> --
>
> Key: SPARK-47770
> URL: https://issues.apache.org/jira/browse/SPARK-47770
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2, 3.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-08 Thread Bo Xiong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835136#comment-17835136
 ] 

Bo Xiong commented on SPARK-47759:
--

I've submitted [a fix|https://github.com/apache/spark/pull/45942].  Please help 
get it merged to the master branch.

Once that's merged, I'll submit other pull requests to patch v3.5.0 and above.  
Thanks!

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> 

[jira] [Resolved] (SPARK-47682) Support cast from variant.

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47682.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45807
[https://github.com/apache/spark/pull/45807]

> Support cast from variant.
> --
>
> Key: SPARK-47682
> URL: https://issues.apache.org/jira/browse/SPARK-47682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Fix Version/s: 3.5.2
   3.5.1
Affects Version/s: 3.5.1
   (was: 4.0.0)

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
> at 
> 

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the retry goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Summary: Apps being stuck after JavaUtils.timeStringAs fails to parse a 
legitimate time string  (was: Apps being stuck with an unexpected stack trace 
when reading/parsing a legitimate time string)

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the rety goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
>  

[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the rety goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 

[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a legitimate time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Summary: Apps being stuck with an unexpected stack trace when 
reading/parsing a legitimate time string  (was: Apps being stuck with an 
unexpected stack trace when reading/parsing a time string)

> Apps being stuck with an unexpected stack trace when reading/parsing a 
> legitimate time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the rety goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> 

[jira] [Resolved] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47770.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45938
[https://github.com/apache/spark/pull/45938]

> Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of 
> failing
> --
>
> Key: SPARK-47770
> URL: https://issues.apache.org/jira/browse/SPARK-47770
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47772) Fix the doctest of mode function

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47772:
---
Labels: pull-request-available  (was: )

> Fix the doctest of mode function
> 
>
> Key: SPARK-47772
> URL: https://issues.apache.org/jira/browse/SPARK-47772
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47771) Make max_by, min_by doctests deterministic

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47771:
---
Labels: pull-request-available  (was: )

> Make max_by, min_by doctests deterministic
> --
>
> Key: SPARK-47771
> URL: https://issues.apache.org/jira/browse/SPARK-47771
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47771) Make max_by, min_by doctests deterministic

2024-04-08 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47771:
-

 Summary: Make max_by, min_by doctests deterministic
 Key: SPARK-47771
 URL: https://issues.apache.org/jira/browse/SPARK-47771
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47770:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Bug)

> Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of 
> failing
> --
>
> Key: SPARK-47770
> URL: https://issues.apache.org/jira/browse/SPARK-47770
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47770:
-

Assignee: Dongjoon Hyun

> Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of 
> failing
> --
>
> Key: SPARK-47770
> URL: https://issues.apache.org/jira/browse/SPARK-47770
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47770:
---
Labels: pull-request-available  (was: )

> Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of 
> failing
> --
>
> Key: SPARK-47770
> URL: https://issues.apache.org/jira/browse/SPARK-47770
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing

2024-04-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47770:
-

 Summary: Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return 
false instead of failing
 Key: SPARK-47770
 URL: https://issues.apache.org/jira/browse/SPARK-47770
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47589) Hive-thriftserver: Migrate logError with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47589.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45936
[https://github.com/apache/spark/pull/45936]

> Hive-thriftserver: Migrate logError with variables to structured logging 
> framework
> --
>
> Key: SPARK-47589
> URL: https://issues.apache.org/jira/browse/SPARK-47589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835080#comment-17835080
 ] 

Gengliang Wang commented on SPARK-47588:


I am working on this one

> Hive module: Migrate logInfo with variables to structured logging framework
> ---
>
> Key: SPARK-47588
> URL: https://issues.apache.org/jira/browse/SPARK-47588
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47587) Hive module: Migrate logWarn with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47587:
--

Assignee: BingKun Pan

> Hive module: Migrate logWarn with variables to structured logging framework
> ---
>
> Key: SPARK-47587
> URL: https://issues.apache.org/jira/browse/SPARK-47587
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47586) Hive module: Migrate logError with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47586:
--

Assignee: Haejoon Lee

> Hive module: Migrate logError with variables to structured logging framework
> 
>
> Key: SPARK-47586
> URL: https://issues.apache.org/jira/browse/SPARK-47586
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework

2024-04-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47591:
--

Assignee: Haejoon Lee

> Hive-thriftserver: Migrate logInfo with variables to structured logging 
> framework
> -
>
> Key: SPARK-47591
> URL: https://issues.apache.org/jira/browse/SPARK-47591
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47589) Hive-thriftserver: Migrate logError with variables to structured logging framework

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47589:
---
Labels: pull-request-available  (was: )

> Hive-thriftserver: Migrate logError with variables to structured logging 
> framework
> --
>
> Key: SPARK-47589
> URL: https://issues.apache.org/jira/browse/SPARK-47589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47769:
---
Labels: pull-request-available  (was: )

> Add schema_of_variant_agg expression.
> -
>
> Key: SPARK-47769
> URL: https://issues.apache.org/jira/browse/SPARK-47769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47769) Add schema_of_variant_agg expression.

2024-04-08 Thread Chenhao Li (Jira)
Chenhao Li created SPARK-47769:
--

 Summary: Add schema_of_variant_agg expression.
 Key: SPARK-47769
 URL: https://issues.apache.org/jira/browse/SPARK-47769
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Chenhao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47417) Ascii, Chr, Base64, UnBase64 (all collations)

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47417:
---
Labels: pull-request-available  (was: )

> Ascii, Chr, Base64, UnBase64 (all collations)
> -
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47410) refactor UTF8String and CollationFactory

2024-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47410:
-
Summary: refactor UTF8String and CollationFactory  (was: Refactor 
UTF8String and CollationFactory)

> refactor UTF8String and CollationFactory
> 
>
> Key: SPARK-47410
> URL: https://issues.apache.org/jira/browse/SPARK-47410
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47410) Refactor UTF8String and CollationFactory

2024-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47410:
-
Summary: Refactor UTF8String and CollationFactory  (was: StringTrimLeft, 
StringTrimRight (all collations))

> Refactor UTF8String and CollationFactory
> 
>
> Key: SPARK-47410
> URL: https://issues.apache.org/jira/browse/SPARK-47410
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47737) Bump PyArrow to 10.0.0

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47737.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45892
[https://github.com/apache/spark/pull/45892]

> Bump PyArrow to 10.0.0
> --
>
> Key: SPARK-47737
> URL: https://issues.apache.org/jira/browse/SPARK-47737
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For more rich API support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47737) Bump PyArrow to 10.0.0

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47737:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Bug)

> Bump PyArrow to 10.0.0
> --
>
> Key: SPARK-47737
> URL: https://issues.apache.org/jira/browse/SPARK-47737
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For more rich API support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47737) Bump PyArrow to 10.0.0

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47737:
-

Assignee: Haejoon Lee

> Bump PyArrow to 10.0.0
> --
>
> Key: SPARK-47737
> URL: https://issues.apache.org/jira/browse/SPARK-47737
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> For more rich API support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47725) Set up the CI for pyspark-connect package

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47725:
-

Assignee: Hyukjin Kwon

> Set up the CI for pyspark-connect package
> -
>
> Key: SPARK-47725
> URL: https://issues.apache.org/jira/browse/SPARK-47725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47725) Set up the CI for pyspark-connect package

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47725.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45870
[https://github.com/apache/spark/pull/45870]

> Set up the CI for pyspark-connect package
> -
>
> Key: SPARK-47725
> URL: https://issues.apache.org/jira/browse/SPARK-47725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47767:
---
Labels: pull-request-available  (was: )

> Show offset value in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-47767
> URL: https://issues.apache.org/jira/browse/SPARK-47767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: guihuawen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Show the offset value in TakeOrderedAndProjectExec.
>  
> For example:
>  
> explain select * from test_limit_offset order by a  limit 2  offset 1;
> plan
> == Physical Plan ==
> TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS 
> FIRST], output=[a#171|#171])
> +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], 
> HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition 
> Cols: []]
>  
> No offset is displayed. If it is displayed, it will be more user-friendly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec

2024-04-08 Thread guihuawen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guihuawen updated SPARK-47767:
--
Description: 
Show the offset value in TakeOrderedAndProjectExec.

 

For example:

 

explain select * from test_limit_offset order by a  limit 2  offset 1;

plan

== Physical Plan ==

TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS 
FIRST], output=[a#171|#171])

+- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], 
HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition 
Cols: []]

 

No offset is displayed. If it is displayed, it will be more user-friendly

 

  was:
Show the offset value in TakeOrderedAndProjectExec.

 

For example:

 

explain select * from test_limit_offset order by a  limit 2  offset 1;

plan

== Physical Plan ==

TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST], output=[a#171])

+- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171], 
HiveTableRelation [`spark_catalog`.`bigdata_qa`.`test_limit_offset`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171], Partition Cols: 
[]]

 

No offset is displayed. If it is displayed, it will be more user-friendly

 


> Show offset value in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-47767
> URL: https://issues.apache.org/jira/browse/SPARK-47767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: guihuawen
>Priority: Major
> Fix For: 4.0.0
>
>
> Show the offset value in TakeOrderedAndProjectExec.
>  
> For example:
>  
> explain select * from test_limit_offset order by a  limit 2  offset 1;
> plan
> == Physical Plan ==
> TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS 
> FIRST], output=[a#171|#171])
> +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], 
> HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition 
> Cols: []]
>  
> No offset is displayed. If it is displayed, it will be more user-friendly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47768) Data Source names unavailable when using Delta Share and Kafka SQL

2024-04-08 Thread David Perkins (Jira)
David Perkins created SPARK-47768:
-

 Summary: Data Source names unavailable when using Delta Share and 
Kafka SQL
 Key: SPARK-47768
 URL: https://issues.apache.org/jira/browse/SPARK-47768
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 3.5.1
 Environment: I'm using Spark 3.5.1 on Kubernetes with the Spark 
operator.

My project includes these depenedencies:
implementation 'org.apache.spark:spark-core_2.12:3.5.1'
implementation 'org.apache.spark:spark-sql_2.12:3.5.1'
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.17.0'
sparkConnectorShadowJar 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1'
sparkConnectorShadowJar 'io.delta:delta-sharing-spark_2.12:3.1.0'
 
The `sparkConnectorShadowJar` is packaged into a shadow jar and copied onto the 
'apache/spark:3.5.1' docker image.
Reporter: David Perkins


I have a simple Spark application that is reading from a csv file via Delta 
Share and writing the contents to Kafka. When both the Delta Share Kafka SQL 
libraries are included in the project, Spark is unable to load them by their 
format short names.

If I use one of them without the other, everything works fine. When both are 
included, then I get this root exception: ClassNotFoundException: 
deltaSharing.DefaultSource.

If I specify the source class names (
io.delta.sharing.spark.DeltaSharingDataSource, 
org.apache.spark.sql.kafka010.KafkaSourceProvider) instead of the short names, 
it works correctly.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec

2024-04-08 Thread guihuawen (Jira)
guihuawen created SPARK-47767:
-

 Summary: Show offset value in TakeOrderedAndProjectExec
 Key: SPARK-47767
 URL: https://issues.apache.org/jira/browse/SPARK-47767
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0, 3.4.0, 4.0.0
Reporter: guihuawen
 Fix For: 4.0.0


Show the offset value in TakeOrderedAndProjectExec.

 

For example:

 

explain select * from test_limit_offset order by a  limit 2  offset 1;

plan

== Physical Plan ==

TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST], output=[a#171])

+- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171], 
HiveTableRelation [`spark_catalog`.`bigdata_qa`.`test_limit_offset`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171], Partition Cols: 
[]]

 

No offset is displayed. If it is displayed, it will be more user-friendly

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0

2024-04-08 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-47766:

Description: 
We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and 
hence we need to address those

 

com.fasterxml.jackson.core:jackson-databind              causing    
*CVE-2022-42003* and *CVE-2022-42004*

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)

 

 

com.google.protobuf:protobuf-java      

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  causing *CVE-2021-22569,* 
*CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510*

 

net.minidev:json-smart                                                         
causing *CVE-2021-31684,* *CVE-2023-1370*

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  

 

 

org.apache.avro:avro 

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)        causing 
*CVE-2023-39410*    

 

 

org.apache.commons:commons-compress         causing *CVE-2024-25710, 
CVE-2024-26308* 

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) 

 

 

Most of these have gone in hadoop client  runtime 3.4.0

 

Is there a plan to support hadoop 3.4.0 ?

  was:
I have a data pipeline set up in such a way that it reads data from a Kafka 
source, does some transformation on the data using pyspark, then writes the 
output into a sink (Kafka, Redis, etc).

 

My entire pipeline in written in SQL, so I wish to use the .sql() method to 
execute SQL on my streaming source directly.

 

However, I'm running into the issue where my watermark is not being recognized 
by the downstream query via the .sql() method.

 

```
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 
16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> print(pyspark.__version__)
3.5.1
>>> from pyspark.sql import SparkSession
>>>
>>> session = SparkSession.builder \
...     .config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
...     .getOrCreate()
>>> from pyspark.sql.functions import col, from_json
>>> from pyspark.sql.types import StructField, StructType, TimestampType, 
>>> LongType, DoubleType, IntegerType
>>> schema = StructType(
...     [
...         StructField('createTime', TimestampType(), True),
...         StructField('orderId', LongType(), True),
...         StructField('payAmount', DoubleType(), True),
...         StructField('payPlatform', IntegerType(), True),
...         StructField('provinceId', IntegerType(), True),
...     ])
>>>
>>> streaming_df = session.readStream\
...     .format("kafka")\
...     .option("kafka.bootstrap.servers", "localhost:9092")\
...     .option("subscribe", "payment_msg")\
...     .option("startingOffsets","earliest")\
...     .load()\
...     .select(from_json(col("value").cast("string"), 
schema).alias("parsed_value"))\
...     .select("parsed_value.*")\
...     .withWatermark("createTime", "10 seconds")
>>>
>>> streaming_df.createOrReplaceTempView("streaming_df")
>>> session.sql("""
... SELECT
...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
...     FROM streaming_df
...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
...     ORDER BY window.start
... """)\
...   .writeStream\
...   .format("kafka") \
...   .option("checkpointLocation", "checkpoint") \
...   .option("kafka.bootstrap.servers", "localhost:9092") \
...   .option("topic", "sink") \
...   .start()
```
 
This throws exception
```
pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
supported when there are streaming aggregations on streaming 
DataFrames/DataSets without watermark; line 6 pos 4;
```
 

 


> Extend spark 3.5.1 to support hadoop-client-api 3.4.0, 
> hadoop-client-runtime-3.4.0
> --
>
> Key: SPARK-47766
> URL: https://issues.apache.org/jira/browse/SPARK-47766
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Ramakrishna
>Priority: Blocker
>  Labels: pull-request-available
>
> We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and 
> hence we need to address those
>  
> com.fasterxml.jackson.core:jackson-databind              causing    
> *CVE-2022-42003* and *CVE-2022-42004*
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)
>  
>  
> com.google.protobuf:protobuf-java      
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  causing 
> *CVE-2021-22569,* *CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510*
>  
> net.minidev:json-smart                                                        
>  causing *CVE-2021-31684,* *CVE-2023-1370*
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  
>  
>  
> org.apache.avro:avro 
> 

[jira] [Created] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0

2024-04-08 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-47766:
---

 Summary: Extend spark 3.5.1 to support hadoop-client-api 3.4.0, 
hadoop-client-runtime-3.4.0
 Key: SPARK-47766
 URL: https://issues.apache.org/jira/browse/SPARK-47766
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.1
Reporter: Ramakrishna


I have a data pipeline set up in such a way that it reads data from a Kafka 
source, does some transformation on the data using pyspark, then writes the 
output into a sink (Kafka, Redis, etc).

 

My entire pipeline in written in SQL, so I wish to use the .sql() method to 
execute SQL on my streaming source directly.

 

However, I'm running into the issue where my watermark is not being recognized 
by the downstream query via the .sql() method.

 

```
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 
16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> print(pyspark.__version__)
3.5.1
>>> from pyspark.sql import SparkSession
>>>
>>> session = SparkSession.builder \
...     .config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
...     .getOrCreate()
>>> from pyspark.sql.functions import col, from_json
>>> from pyspark.sql.types import StructField, StructType, TimestampType, 
>>> LongType, DoubleType, IntegerType
>>> schema = StructType(
...     [
...         StructField('createTime', TimestampType(), True),
...         StructField('orderId', LongType(), True),
...         StructField('payAmount', DoubleType(), True),
...         StructField('payPlatform', IntegerType(), True),
...         StructField('provinceId', IntegerType(), True),
...     ])
>>>
>>> streaming_df = session.readStream\
...     .format("kafka")\
...     .option("kafka.bootstrap.servers", "localhost:9092")\
...     .option("subscribe", "payment_msg")\
...     .option("startingOffsets","earliest")\
...     .load()\
...     .select(from_json(col("value").cast("string"), 
schema).alias("parsed_value"))\
...     .select("parsed_value.*")\
...     .withWatermark("createTime", "10 seconds")
>>>
>>> streaming_df.createOrReplaceTempView("streaming_df")
>>> session.sql("""
... SELECT
...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
...     FROM streaming_df
...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
...     ORDER BY window.start
... """)\
...   .writeStream\
...   .format("kafka") \
...   .option("checkpointLocation", "checkpoint") \
...   .option("kafka.bootstrap.servers", "localhost:9092") \
...   .option("topic", "sink") \
...   .start()
```
 
This throws exception
```
pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
supported when there are streaming aggregations on streaming 
DataFrames/DataSets without watermark; line 6 pos 4;
```
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47318) AuthEngine key exchange needs additional KDF round

2024-04-08 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834987#comment-17834987
 ] 

Dongjoon Hyun commented on SPARK-47318:
---

I added a target version (3.4.3) based on the dev mailing list discussion.

- https://lists.apache.org/thread/htq3hwfyh6kg28d8bq2n3v60fpn7s375

>  AuthEngine key exchange needs additional KDF round
> ---
>
> Key: SPARK-47318
> URL: https://issues.apache.org/jira/browse/SPARK-47318
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 4.0.0
>Reporter: Steve Weis
>Priority: Minor
>  Labels: pull-request-available
>
> AuthEngine implements a bespoke [key exchange protocol 
> |[https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto]|https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto].]
>  based on the NNpsk0 Noise pattern and using X25519.
> The Spark code improperly uses the derived shared secret directly, which is 
> an encoded X coordinate. This should be passed into a KDF rather than used 
> directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47318) AuthEngine key exchange needs additional KDF round

2024-04-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47318:
--
Target Version/s: 3.4.3

>  AuthEngine key exchange needs additional KDF round
> ---
>
> Key: SPARK-47318
> URL: https://issues.apache.org/jira/browse/SPARK-47318
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 4.0.0
>Reporter: Steve Weis
>Priority: Minor
>  Labels: pull-request-available
>
> AuthEngine implements a bespoke [key exchange protocol 
> |[https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto]|https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto].]
>  based on the NNpsk0 Noise pattern and using X25519.
> The Spark code improperly uses the derived shared secret directly, which is 
> an encoded X coordinate. This should be passed into a KDF rather than used 
> directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47504) Resolve AbstractDataType simpleStrings for StringTypeCollated

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47504.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45694
[https://github.com/apache/spark/pull/45694]

> Resolve AbstractDataType simpleStrings for StringTypeCollated
> -
>
> Key: SPARK-47504
> URL: https://issues.apache.org/jira/browse/SPARK-47504
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *SPARK-47296* introduced a change to fail all unsupported functions. Because 
> of this change expected *inputTypes* in *ExpectsInputTypes* had to be 
> changed. This change introduced a change on user side which will print 
> *"STRING_ANY_COLLATION"* in places where before we printed *"STRING"* when an 
> error occurred. Concretely if we get an input of Int where 
> *StringTypeAnyCollation* was expected, we will throw this faulty message for 
> users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47681) Add schema_of_variant expression.

2024-04-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47681.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45806
[https://github.com/apache/spark/pull/45806]

> Add schema_of_variant expression.
> -
>
> Key: SPARK-47681
> URL: https://issues.apache.org/jira/browse/SPARK-47681
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47765) Add SET COLLATION to parser rules

2024-04-08 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-47765:
-

 Summary: Add SET COLLATION to parser rules
 Key: SPARK-47765
 URL: https://issues.apache.org/jira/browse/SPARK-47765
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47764:
---
Labels: pull-request-available  (was: )

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-08 Thread Bo Zhang (Jira)
Bo Zhang created SPARK-47764:


 Summary: Cleanup shuffle dependencies for Spark Connect SQL 
executions
 Key: SPARK-47764
 URL: https://issues.apache.org/jira/browse/SPARK-47764
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Bo Zhang


Shuffle dependencies are created by shuffle map stages, which consists of files 
on disks and the corresponding references in Spark JVM heap memory. Currently 
Spark cleanup unused  shuffle dependencies through JVM GCs, and periodic GCs 
are triggered once every 30 minutes (see ContextCleaner). However, we still 
found cases in which the size of the shuffle data files are too large, which 
makes shuffle data migration slow.

 

We do have chances to cleanup shuffle dependencies, especially for SQL queries 
created by Spark Connect, since we do have better control of the DataFrame 
instances there. Even if DataFrame instances are reused in the client side, on 
the server side the instances are still recreated. 

 

We might also provide the option to 1. cleanup eagerly after each query 
executions, or 2. only mark the shuffle executions and do not migrate them at 
node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47286) IN operator support

2024-04-08 Thread Aleksandar Tomic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandar Tomic resolved SPARK-47286.
--
   Fix Version/s: 4.0.0
Target Version/s: 4.0.0
  Resolution: Fixed

> IN operator support
> ---
>
> Key: SPARK-47286
> URL: https://issues.apache.org/jira/browse/SPARK-47286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
> Fix For: 4.0.0
>
>
> At this point following query works fine:
> ```
>  sql("select * from t1 where ucs_basic_lcase in ('aaa' collate 
> 'ucs_basic_lcase', 'bbb' collate 'ucs_basic_lcase')").show()
> ```
> But if we were to miss explicit collate or even mix collations:
> ```
>   sql("select * from t1 where ucs_basic_lcase in ('aaa' collate 
> 'ucs_basic_lcase', 'bbb'").show()
> ```
> Query would still run and return invalid results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47763) Reeanble Protobuf function doctests

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47763:


 Summary: Reeanble Protobuf function doctests
 Key: SPARK-47763
 URL: https://issues.apache.org/jira/browse/SPARK-47763
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47746) Use column ordinals instead of prefix ordering columns in the range scan encoder

2024-04-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-47746.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45905
[https://github.com/apache/spark/pull/45905]

> Use column ordinals instead of prefix ordering columns in the range scan 
> encoder
> 
>
> Key: SPARK-47746
> URL: https://issues.apache.org/jira/browse/SPARK-47746
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Neil Ramaswamy
>Assignee: Neil Ramaswamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, the State V2 implementations do projections in their state 
> managers, and then provide some prefix (ordering) columns to the 
> RocksDBStateEncoder. However, we can avoid doing extra projection by just 
> reading the ordinals we need, in the order we need, in the state encoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47587) Hive module: Migrate logWarn with variables to structured logging framework

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47587:
---
Labels: pull-request-available  (was: )

> Hive module: Migrate logWarn with variables to structured logging framework
> ---
>
> Key: SPARK-47587
> URL: https://issues.apache.org/jira/browse/SPARK-47587
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47591:
---
Labels: pull-request-available  (was: )

> Hive-thriftserver: Migrate logInfo with variables to structured logging 
> framework
> -
>
> Key: SPARK-47591
> URL: https://issues.apache.org/jira/browse/SPARK-47591
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes

2024-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47761:
---
Labels: pull-request-available  (was: )

> Oracle: Support reading AnsiIntervalTypes
> -
>
> Key: SPARK-47761
> URL: https://issues.apache.org/jira/browse/SPARK-47761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47762.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45924
[https://github.com/apache/spark/pull/45924]

> Add pyspark.sql.connect.protobuf into setup.py
> --
>
> Key: SPARK-47762
> URL: https://issues.apache.org/jira/browse/SPARK-47762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47762:
-
Fix Version/s: 3.5.2

> Add pyspark.sql.connect.protobuf into setup.py
> --
>
> Key: SPARK-47762
> URL: https://issues.apache.org/jira/browse/SPARK-47762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework

2024-04-08 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834828#comment-17834828
 ] 

Haejoon Lee commented on SPARK-47591:
-

I'm working on this :) 

> Hive-thriftserver: Migrate logInfo with variables to structured logging 
> framework
> -
>
> Key: SPARK-47591
> URL: https://issues.apache.org/jira/browse/SPARK-47591
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47762:


 Summary: Add pyspark.sql.connect.protobuf into setup.py
 Key: SPARK-47762
 URL: https://issues.apache.org/jira/browse/SPARK-47762
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 3.5.1, 4.0.0
Reporter: Hyukjin Kwon


We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes

2024-04-08 Thread Kent Yao (Jira)
Kent Yao created SPARK-47761:


 Summary: Oracle: Support reading AnsiIntervalTypes
 Key: SPARK-47761
 URL: https://issues.apache.org/jira/browse/SPARK-47761
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834804#comment-17834804
 ] 

Uroš Bojanić commented on SPARK-47413:
--

[~gpgp] Thank you, of course! Take a look at 
[SPARK-47412|https://issues.apache.org/jira/browse/SPARK-47412] and let me know 
what you think

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834803#comment-17834803
 ] 

Uroš Bojanić commented on SPARK-47412:
--

[~gpgp] Thank you for your hard work on 
[SPARK-47413|https://issues.apache.org/jira/browse/SPARK-47413]! We'll put your 
[PR|https://github.com/apache/spark/pull/45738/] under final review, so feel 
free to move on to this ticket. This one should be relatively simple as well, 
and you've also got some experience under your belt already. Nevertheless, feel 
free to let me know if you have any questions!

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46143) pyspark.pandas read_excel implementation at version 3.4.1

2024-04-08 Thread comet (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834802#comment-17834802
 ] 

comet commented on SPARK-46143:
---

voted for this issue.

> pyspark.pandas read_excel implementation at version 3.4.1
> -
>
> Key: SPARK-46143
> URL: https://issues.apache.org/jira/browse/SPARK-46143
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.1
> Environment: pyspark 3.4.1.5.3 build 20230713.
> Running on Microsoft Fabric workspace at runtime 1.2.
> Tested the same scenario on a spark 3.4.1 standalone deployment on docker 
> documented at https://github.com/mpavanetti/sparkenv
>  
>  
>Reporter: Matheus Pavanetti
>Priority: Major
> Attachments: MicrosoftTeams-image.png, 
> image-2023-11-28-13-20-40-275.png, image-2023-11-28-13-20-51-291.png
>
>
> Hello, 
> I would like to report an issue with pyspark.pandas implementation on 
> read_excel function.
> Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which 
> potentially uses an older version of pandas on it's implementations of 
> pyspark.pandas.
> The function read_excel from pandas doesn't expect a parameter called 
> "squeeze" however it's implemented as part of pyspark.pandas and the 
> parameter "squeeze" is being passed to the pandas function.
>  
> !image-2023-11-28-13-20-40-275.png!
>  
> I've been digging into it for further investigation into pyspark 3.4.1 
> documentation
> [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500]
>  
> This is the point I found that "squeeze" parameter is being passed to pandas 
> read_excel function which is not expected.
> It seems like it was deprecated as part of pyspark 3.4.0 but still being used 
> in the implementation.
>  
> !image-2023-11-28-13-20-51-291.png!
>  
> I believe this is an issue with pyspark implementation 3.4.1 not necessaily 
> with fabric. However fabric uses this version as its 1.2 build.
>  
> I am able to work around that for now by download the excel from the one lake 
> to the spark driver, loading that to the memory with pandas and then 
> converting to a spark dataframe etc or I made it work downgrading the build
> I downloaded the pyspark build 20230713 to my local, made the changes and 
> re-compiled it and it worked locally. So it means that is related to the 
> implementation and they would have to fix or I do a downgrade to older 
> version like 3.3.3 or try the latest 3.5.0 which is not the case for fabric
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47412:
-
Description: 
Enable collation support for the *StringLPad* & *StringRPad* built-in string 
functions in Spark. First confirm what is the expected behaviour for these 
functions when given collated strings, then move on to the implementation that 
would enable handling strings of all collation types. Implement the 
corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
(CollationSuite) to reflect how this function should be used with collation in 
SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with 
the existing functions to learn more about how they work. In addition, look 
into the possible use-cases and implementation of similar functions within 
other other open-source DBMS, such as 
[PostgreSQL|https://www.postgresql.org/docs/].

 

The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
functions so that they support all collation types currently supported in 
Spark. To understand what changes were introduced in order to enable full 
collation support for other existing functions in Spark, take a look at the 
Spark PRs and Jira tickets for completed tasks in this parent (for example: 
Contains, StartsWith, EndsWith).

 

Read more about ICU [Collation Concepts|http://example.com/] and 
[Collator|http://example.com/] class. Also, refer to the Unicode Technical 
Standard for 
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].

  was:
Enable collation support for the *Substring* built-in string function in Spark 
(including *Right* and *Left* functions). First confirm what is the expected 
behaviour for these functions when given collated strings, then move on to the 
implementation that would enable handling strings of all collation types. 
Implement the corresponding unit tests (CollationStringExpressionsSuite) and 
E2E tests (CollationSuite) to reflect how this function should be used with 
collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to 
experiment with the existing functions to learn more about how they work. In 
addition, look into the possible use-cases and implementation of similar 
functions within other other open-source DBMS, such as 
[PostgreSQL|https://www.postgresql.org/docs/].

 

The goal for this Jira ticket is to implement the {*}Substring{*}, {*}Right{*}, 
and *Left* functions so that they support all collation types currently 
supported in Spark. To understand what changes were introduced in order to 
enable full collation support for other existing functions in Spark, take a 
look at the Spark PRs and Jira tickets for completed tasks in this parent (for 
example: Contains, StartsWith, EndsWith).

 

Read more about ICU [Collation Concepts|http://example.com/] and 
[Collator|http://example.com/] class. Also, refer to the Unicode Technical 
Standard for 
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].


> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> 

[jira] [Updated] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47412:
-
Summary: StringLPad, StringRPad (all collations)  (was: StringLPad, 
BinaryPad, StringRPad (all collations))

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47412) StringLPad, BinaryPad, StringRPad (all collations)

2024-04-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47412:
-
Description: 
Enable collation support for the *Substring* built-in string function in Spark 
(including *Right* and *Left* functions). First confirm what is the expected 
behaviour for these functions when given collated strings, then move on to the 
implementation that would enable handling strings of all collation types. 
Implement the corresponding unit tests (CollationStringExpressionsSuite) and 
E2E tests (CollationSuite) to reflect how this function should be used with 
collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to 
experiment with the existing functions to learn more about how they work. In 
addition, look into the possible use-cases and implementation of similar 
functions within other other open-source DBMS, such as 
[PostgreSQL|https://www.postgresql.org/docs/].

 

The goal for this Jira ticket is to implement the {*}Substring{*}, {*}Right{*}, 
and *Left* functions so that they support all collation types currently 
supported in Spark. To understand what changes were introduced in order to 
enable full collation support for other existing functions in Spark, take a 
look at the Spark PRs and Jira tickets for completed tasks in this parent (for 
example: Contains, StartsWith, EndsWith).

 

Read more about ICU [Collation Concepts|http://example.com/] and 
[Collator|http://example.com/] class. Also, refer to the Unicode Technical 
Standard for 
[collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].

> StringLPad, BinaryPad, StringRPad (all collations)
> --
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40782) Upgrade Jackson-databind to 2.13.4.1

2024-04-08 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834801#comment-17834801
 ] 

Ramakrishna commented on SPARK-40782:
-

Hi this seems to be an issue still as transitive dependency in hadoop 

 

│ com.fasterxml.jackson.core:jackson-databind                  │ CVE-2022-42003 
     │ HIGH     │ fixed  │ 2.12.7            │ 2.12.7.1, 2.13.4.2               
│ jackson-databind: deep wrapper array nesting wrt             │

│ (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) 

 

Is thrre a fix for this ?

> Upgrade Jackson-databind to 2.13.4.1
> 
>
> Key: SPARK-40782
> URL: https://issues.apache.org/jira/browse/SPARK-40782
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
>
> #3590: Add check in primitive value deserializers to avoid deep wrapper array
>   nesting wrt `UNWRAP_SINGLE_VALUE_ARRAYS` [CVE-2022-42003]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-08 Thread Gideon P (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834800#comment-17834800
 ] 

Gideon P commented on SPARK-47413:
--

[~uros-db] can you find me an additional ticket to work on, for once I finish 
this one? 

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a time string. Note that we manually killed 
the stuck app instances and the rety goes thru on the same cluster (without any 
code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 

[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a time string. Note that we manually killed 
the stuck app instances and the rety goes thru on the same cluster (without 
requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 

[jira] [Updated] (SPARK-47759) App being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Summary: App being stuck with an unexpected stack trace when 
reading/parsing a time string  (was: A Spark app being stuck with an unexpected 
stack trace when reading/parsing a time string)

> App being stuck with an unexpected stack trace when reading/parsing a time 
> string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a time string.
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> 

[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Summary: Apps being stuck with an unexpected stack trace when 
reading/parsing a time string  (was: App being stuck with an unexpected stack 
trace when reading/parsing a time string)

> Apps being stuck with an unexpected stack trace when reading/parsing a time 
> string
> --
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a time string.
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> 

[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a time string.

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at 

[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a time string.

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at 

[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string

2024-04-08 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a time string.

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at 

[jira] [Created] (SPARK-47760) Reeanble Avro function doctests

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47760:


 Summary: Reeanble Avro function doctests
 Key: SPARK-47760
 URL: https://issues.apache.org/jira/browse/SPARK-47760
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org