[jira] [Resolved] (SPARK-47581) SQL catalyst: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47581. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45904 [https://github.com/apache/spark/pull/45904] > SQL catalyst: Migrate logWarn with variables to structured logging framework > > > Key: SPARK-47581 > URL: https://issues.apache.org/jira/browse/SPARK-47581 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47754) Postgres: Support reading multidimensional arrays
[ https://issues.apache.org/jira/browse/SPARK-47754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47754. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45917 [https://github.com/apache/spark/pull/45917] > Postgres: Support reading multidimensional arrays > - > > Key: SPARK-47754 > URL: https://issues.apache.org/jira/browse/SPARK-47754 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47771) Make max_by, min_by doctests deterministic
[ https://issues.apache.org/jira/browse/SPARK-47771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47771. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45939 [https://github.com/apache/spark/pull/45939] > Make max_by, min_by doctests deterministic > -- > > Key: SPARK-47771 > URL: https://issues.apache.org/jira/browse/SPARK-47771 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47775) Support remaining scalar types in the variant spec.
[ https://issues.apache.org/jira/browse/SPARK-47775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47775: --- Labels: pull-request-available (was: ) > Support remaining scalar types in the variant spec. > --- > > Key: SPARK-47775 > URL: https://issues.apache.org/jira/browse/SPARK-47775 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47775) Support remaining scalar types in the variant spec.
Chenhao Li created SPARK-47775: -- Summary: Support remaining scalar types in the variant spec. Key: SPARK-47775 URL: https://issues.apache.org/jira/browse/SPARK-47775 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Chenhao Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`
[ https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47774: --- Labels: pull-request-available (was: ) > Remove redundant rules from `MimaExcludes` > -- > > Key: SPARK-47774 > URL: https://issues.apache.org/jira/browse/SPARK-47774 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: SPIP doc: https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. was: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the
[jira] [Created] (SPARK-47774) Remove redundant rules from `MimaExcludes`
Dongjoon Hyun created SPARK-47774: - Summary: Remove redundant rules from `MimaExcludes` Key: SPARK-47774 URL: https://issues.apache.org/jira/browse/SPARK-47774 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. was: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. was: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. was:This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. (was: This SPIP outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. ) > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. The design proposal advocates for the incorporation of the > TransformSupport interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough >
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: This SPIP outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > This SPIP outlines the integration of Gluten's physical plan conversion, > validation, and fallback framework into Apache Spark. The goal is to enhance > Spark's flexibility and robustness in executing physical plans and to > leverage Gluten's performance optimizations. Currently, Spark lacks an > official cross-platform execution support for physical plans. Gluten's > mechanism, which employs the Substrait standard, can convert and optimize > Spark's physical plans, thus improving portability, interoperability, and > execution efficiency. The design proposal advocates for the incorporation of > the TransformSupport interface and its specialized > variants—LeafTransformSupport, UnaryTransformSupport, and > BinaryTransformSupport. These are instrumental in streamlining the conversion > of different operator types into a Substrait-based common format. The > validation phase entails a thorough assessment of the Substrait plan against > native backends to ensure compatibility. In instances where validation does > not succeed, Spark's native operators will be deployed, with requisite > transformations to adapt data formats accordingly. The proposal emphasizes > the centrality of the plan transformation phase, positing it as the > foundational step. The subsequent validation and fallback procedures are > slated for consideration upon the successful establishment of the initial > phase. The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
Ke Jia created SPARK-47773: -- Summary: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines Key: SPARK-47773 URL: https://issues.apache.org/jira/browse/SPARK-47773 Project: Spark Issue Type: Epic Components: SQL Affects Versions: 3.5.1 Reporter: Ke Jia -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
[ https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47770: -- Fix Version/s: 3.5.2 3.4.3 > Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of > failing > -- > > Key: SPARK-47770 > URL: https://issues.apache.org/jira/browse/SPARK-47770 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835136#comment-17835136 ] Bo Xiong commented on SPARK-47759: -- I've submitted [a fix|https://github.com/apache/spark/pull/45942]. Please help get it merged to the master branch. Once that's merged, I'll submit other pull requests to patch v3.5.0 and above. Thanks! > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at >
[jira] [Resolved] (SPARK-47682) Support cast from variant.
[ https://issues.apache.org/jira/browse/SPARK-47682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47682. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45807 [https://github.com/apache/spark/pull/45807] > Support cast from variant. > -- > > Key: SPARK-47682 > URL: https://issues.apache.org/jira/browse/SPARK-47682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Fix Version/s: 3.5.2 3.5.1 Affects Version/s: 3.5.1 (was: 4.0.0) > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) > at >
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the retry goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Summary: Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string (was: Apps being stuck with an unexpected stack trace when reading/parsing a legitimate time string) > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the rety goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) >
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Summary: Apps being stuck with an unexpected stack trace when reading/parsing a legitimate time string (was: Apps being stuck with an unexpected stack trace when reading/parsing a time string) > Apps being stuck with an unexpected stack trace when reading/parsing a > legitimate time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the rety goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at >
[jira] [Resolved] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
[ https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47770. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45938 [https://github.com/apache/spark/pull/45938] > Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of > failing > -- > > Key: SPARK-47770 > URL: https://issues.apache.org/jira/browse/SPARK-47770 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47772) Fix the doctest of mode function
[ https://issues.apache.org/jira/browse/SPARK-47772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47772: --- Labels: pull-request-available (was: ) > Fix the doctest of mode function > > > Key: SPARK-47772 > URL: https://issues.apache.org/jira/browse/SPARK-47772 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47771) Make max_by, min_by doctests deterministic
[ https://issues.apache.org/jira/browse/SPARK-47771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47771: --- Labels: pull-request-available (was: ) > Make max_by, min_by doctests deterministic > -- > > Key: SPARK-47771 > URL: https://issues.apache.org/jira/browse/SPARK-47771 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47771) Make max_by, min_by doctests deterministic
Ruifeng Zheng created SPARK-47771: - Summary: Make max_by, min_by doctests deterministic Key: SPARK-47771 URL: https://issues.apache.org/jira/browse/SPARK-47771 Project: Spark Issue Type: Improvement Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
[ https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47770: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Bug) > Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of > failing > -- > > Key: SPARK-47770 > URL: https://issues.apache.org/jira/browse/SPARK-47770 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
[ https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47770: - Assignee: Dongjoon Hyun > Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of > failing > -- > > Key: SPARK-47770 > URL: https://issues.apache.org/jira/browse/SPARK-47770 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
[ https://issues.apache.org/jira/browse/SPARK-47770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47770: --- Labels: pull-request-available (was: ) > Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of > failing > -- > > Key: SPARK-47770 > URL: https://issues.apache.org/jira/browse/SPARK-47770 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47770) Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
Dongjoon Hyun created SPARK-47770: - Summary: Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing Key: SPARK-47770 URL: https://issues.apache.org/jira/browse/SPARK-47770 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47589) Hive-thriftserver: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47589. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45936 [https://github.com/apache/spark/pull/45936] > Hive-thriftserver: Migrate logError with variables to structured logging > framework > -- > > Key: SPARK-47589 > URL: https://issues.apache.org/jira/browse/SPARK-47589 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835080#comment-17835080 ] Gengliang Wang commented on SPARK-47588: I am working on this one > Hive module: Migrate logInfo with variables to structured logging framework > --- > > Key: SPARK-47588 > URL: https://issues.apache.org/jira/browse/SPARK-47588 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47587) Hive module: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47587: -- Assignee: BingKun Pan > Hive module: Migrate logWarn with variables to structured logging framework > --- > > Key: SPARK-47587 > URL: https://issues.apache.org/jira/browse/SPARK-47587 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47586) Hive module: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47586: -- Assignee: Haejoon Lee > Hive module: Migrate logError with variables to structured logging framework > > > Key: SPARK-47586 > URL: https://issues.apache.org/jira/browse/SPARK-47586 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47591: -- Assignee: Haejoon Lee > Hive-thriftserver: Migrate logInfo with variables to structured logging > framework > - > > Key: SPARK-47591 > URL: https://issues.apache.org/jira/browse/SPARK-47591 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47589) Hive-thriftserver: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47589: --- Labels: pull-request-available (was: ) > Hive-thriftserver: Migrate logError with variables to structured logging > framework > -- > > Key: SPARK-47589 > URL: https://issues.apache.org/jira/browse/SPARK-47589 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47769) Add schema_of_variant_agg expression.
[ https://issues.apache.org/jira/browse/SPARK-47769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47769: --- Labels: pull-request-available (was: ) > Add schema_of_variant_agg expression. > - > > Key: SPARK-47769 > URL: https://issues.apache.org/jira/browse/SPARK-47769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47769) Add schema_of_variant_agg expression.
Chenhao Li created SPARK-47769: -- Summary: Add schema_of_variant_agg expression. Key: SPARK-47769 URL: https://issues.apache.org/jira/browse/SPARK-47769 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Chenhao Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47417) Ascii, Chr, Base64, UnBase64 (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47417: --- Labels: pull-request-available (was: ) > Ascii, Chr, Base64, UnBase64 (all collations) > - > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47410) refactor UTF8String and CollationFactory
[ https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47410: - Summary: refactor UTF8String and CollationFactory (was: Refactor UTF8String and CollationFactory) > refactor UTF8String and CollationFactory > > > Key: SPARK-47410 > URL: https://issues.apache.org/jira/browse/SPARK-47410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47410) Refactor UTF8String and CollationFactory
[ https://issues.apache.org/jira/browse/SPARK-47410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47410: - Summary: Refactor UTF8String and CollationFactory (was: StringTrimLeft, StringTrimRight (all collations)) > Refactor UTF8String and CollationFactory > > > Key: SPARK-47410 > URL: https://issues.apache.org/jira/browse/SPARK-47410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47737) Bump PyArrow to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47737. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45892 [https://github.com/apache/spark/pull/45892] > Bump PyArrow to 10.0.0 > -- > > Key: SPARK-47737 > URL: https://issues.apache.org/jira/browse/SPARK-47737 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > For more rich API support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47737) Bump PyArrow to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47737: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Bug) > Bump PyArrow to 10.0.0 > -- > > Key: SPARK-47737 > URL: https://issues.apache.org/jira/browse/SPARK-47737 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > For more rich API support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47737) Bump PyArrow to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47737: - Assignee: Haejoon Lee > Bump PyArrow to 10.0.0 > -- > > Key: SPARK-47737 > URL: https://issues.apache.org/jira/browse/SPARK-47737 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > For more rich API support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47725) Set up the CI for pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47725: - Assignee: Hyukjin Kwon > Set up the CI for pyspark-connect package > - > > Key: SPARK-47725 > URL: https://issues.apache.org/jira/browse/SPARK-47725 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47725) Set up the CI for pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47725. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45870 [https://github.com/apache/spark/pull/45870] > Set up the CI for pyspark-connect package > - > > Key: SPARK-47725 > URL: https://issues.apache.org/jira/browse/SPARK-47725 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec
[ https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47767: --- Labels: pull-request-available (was: ) > Show offset value in TakeOrderedAndProjectExec > -- > > Key: SPARK-47767 > URL: https://issues.apache.org/jira/browse/SPARK-47767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Show the offset value in TakeOrderedAndProjectExec. > > For example: > > explain select * from test_limit_offset order by a limit 2 offset 1; > plan > == Physical Plan == > TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS > FIRST], output=[a#171|#171]) > +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], > HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition > Cols: []] > > No offset is displayed. If it is displayed, it will be more user-friendly > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec
[ https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guihuawen updated SPARK-47767: -- Description: Show the offset value in TakeOrderedAndProjectExec. For example: explain select * from test_limit_offset order by a limit 2 offset 1; plan == Physical Plan == TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS FIRST], output=[a#171|#171]) +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition Cols: []] No offset is displayed. If it is displayed, it will be more user-friendly was: Show the offset value in TakeOrderedAndProjectExec. For example: explain select * from test_limit_offset order by a limit 2 offset 1; plan == Physical Plan == TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST], output=[a#171]) +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171], HiveTableRelation [`spark_catalog`.`bigdata_qa`.`test_limit_offset`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171], Partition Cols: []] No offset is displayed. If it is displayed, it will be more user-friendly > Show offset value in TakeOrderedAndProjectExec > -- > > Key: SPARK-47767 > URL: https://issues.apache.org/jira/browse/SPARK-47767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: guihuawen >Priority: Major > Fix For: 4.0.0 > > > Show the offset value in TakeOrderedAndProjectExec. > > For example: > > explain select * from test_limit_offset order by a limit 2 offset 1; > plan > == Physical Plan == > TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS > FIRST], output=[a#171|#171]) > +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], > HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition > Cols: []] > > No offset is displayed. If it is displayed, it will be more user-friendly > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47768) Data Source names unavailable when using Delta Share and Kafka SQL
David Perkins created SPARK-47768: - Summary: Data Source names unavailable when using Delta Share and Kafka SQL Key: SPARK-47768 URL: https://issues.apache.org/jira/browse/SPARK-47768 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 3.5.1 Environment: I'm using Spark 3.5.1 on Kubernetes with the Spark operator. My project includes these depenedencies: implementation 'org.apache.spark:spark-core_2.12:3.5.1' implementation 'org.apache.spark:spark-sql_2.12:3.5.1' implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.17.0' sparkConnectorShadowJar 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1' sparkConnectorShadowJar 'io.delta:delta-sharing-spark_2.12:3.1.0' The `sparkConnectorShadowJar` is packaged into a shadow jar and copied onto the 'apache/spark:3.5.1' docker image. Reporter: David Perkins I have a simple Spark application that is reading from a csv file via Delta Share and writing the contents to Kafka. When both the Delta Share Kafka SQL libraries are included in the project, Spark is unable to load them by their format short names. If I use one of them without the other, everything works fine. When both are included, then I get this root exception: ClassNotFoundException: deltaSharing.DefaultSource. If I specify the source class names ( io.delta.sharing.spark.DeltaSharingDataSource, org.apache.spark.sql.kafka010.KafkaSourceProvider) instead of the short names, it works correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec
guihuawen created SPARK-47767: - Summary: Show offset value in TakeOrderedAndProjectExec Key: SPARK-47767 URL: https://issues.apache.org/jira/browse/SPARK-47767 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0, 3.4.0, 4.0.0 Reporter: guihuawen Fix For: 4.0.0 Show the offset value in TakeOrderedAndProjectExec. For example: explain select * from test_limit_offset order by a limit 2 offset 1; plan == Physical Plan == TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST], output=[a#171]) +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171], HiveTableRelation [`spark_catalog`.`bigdata_qa`.`test_limit_offset`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171], Partition Cols: []] No offset is displayed. If it is displayed, it will be more user-friendly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0
[ https://issues.apache.org/jira/browse/SPARK-47766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna updated SPARK-47766: Description: We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and hence we need to address those com.fasterxml.jackson.core:jackson-databind causing *CVE-2022-42003* and *CVE-2022-42004* (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) com.google.protobuf:protobuf-java (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) causing *CVE-2021-22569,* *CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510* net.minidev:json-smart causing *CVE-2021-31684,* *CVE-2023-1370* (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) org.apache.avro:avro (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) causing *CVE-2023-39410* org.apache.commons:commons-compress causing *CVE-2024-25710, CVE-2024-26308* (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) Most of these have gone in hadoop client runtime 3.4.0 Is there a plan to support hadoop 3.4.0 ? was: I have a data pipeline set up in such a way that it reads data from a Kafka source, does some transformation on the data using pyspark, then writes the output into a sink (Kafka, Redis, etc). My entire pipeline in written in SQL, so I wish to use the .sql() method to execute SQL on my streaming source directly. However, I'm running into the issue where my watermark is not being recognized by the downstream query via the .sql() method. ``` Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pyspark >>> print(pyspark.__version__) 3.5.1 >>> from pyspark.sql import SparkSession >>> >>> session = SparkSession.builder \ ... .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ ... .getOrCreate() >>> from pyspark.sql.functions import col, from_json >>> from pyspark.sql.types import StructField, StructType, TimestampType, >>> LongType, DoubleType, IntegerType >>> schema = StructType( ... [ ... StructField('createTime', TimestampType(), True), ... StructField('orderId', LongType(), True), ... StructField('payAmount', DoubleType(), True), ... StructField('payPlatform', IntegerType(), True), ... StructField('provinceId', IntegerType(), True), ... ]) >>> >>> streaming_df = session.readStream\ ... .format("kafka")\ ... .option("kafka.bootstrap.servers", "localhost:9092")\ ... .option("subscribe", "payment_msg")\ ... .option("startingOffsets","earliest")\ ... .load()\ ... .select(from_json(col("value").cast("string"), schema).alias("parsed_value"))\ ... .select("parsed_value.*")\ ... .withWatermark("createTime", "10 seconds") >>> >>> streaming_df.createOrReplaceTempView("streaming_df") >>> session.sql(""" ... SELECT ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount ... FROM streaming_df ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') ... ORDER BY window.start ... """)\ ... .writeStream\ ... .format("kafka") \ ... .option("checkpointLocation", "checkpoint") \ ... .option("kafka.bootstrap.servers", "localhost:9092") \ ... .option("topic", "sink") \ ... .start() ``` This throws exception ``` pyspark.errors.exceptions.captured.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark; line 6 pos 4; ``` > Extend spark 3.5.1 to support hadoop-client-api 3.4.0, > hadoop-client-runtime-3.4.0 > -- > > Key: SPARK-47766 > URL: https://issues.apache.org/jira/browse/SPARK-47766 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Ramakrishna >Priority: Blocker > Labels: pull-request-available > > We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and > hence we need to address those > > com.fasterxml.jackson.core:jackson-databind causing > *CVE-2022-42003* and *CVE-2022-42004* > (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) > > > com.google.protobuf:protobuf-java > (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) causing > *CVE-2021-22569,* *CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510* > > net.minidev:json-smart > causing *CVE-2021-31684,* *CVE-2023-1370* > (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) > > > org.apache.avro:avro >
[jira] [Created] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0
Ramakrishna created SPARK-47766: --- Summary: Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0 Key: SPARK-47766 URL: https://issues.apache.org/jira/browse/SPARK-47766 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.1 Reporter: Ramakrishna I have a data pipeline set up in such a way that it reads data from a Kafka source, does some transformation on the data using pyspark, then writes the output into a sink (Kafka, Redis, etc). My entire pipeline in written in SQL, so I wish to use the .sql() method to execute SQL on my streaming source directly. However, I'm running into the issue where my watermark is not being recognized by the downstream query via the .sql() method. ``` Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pyspark >>> print(pyspark.__version__) 3.5.1 >>> from pyspark.sql import SparkSession >>> >>> session = SparkSession.builder \ ... .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ ... .getOrCreate() >>> from pyspark.sql.functions import col, from_json >>> from pyspark.sql.types import StructField, StructType, TimestampType, >>> LongType, DoubleType, IntegerType >>> schema = StructType( ... [ ... StructField('createTime', TimestampType(), True), ... StructField('orderId', LongType(), True), ... StructField('payAmount', DoubleType(), True), ... StructField('payPlatform', IntegerType(), True), ... StructField('provinceId', IntegerType(), True), ... ]) >>> >>> streaming_df = session.readStream\ ... .format("kafka")\ ... .option("kafka.bootstrap.servers", "localhost:9092")\ ... .option("subscribe", "payment_msg")\ ... .option("startingOffsets","earliest")\ ... .load()\ ... .select(from_json(col("value").cast("string"), schema).alias("parsed_value"))\ ... .select("parsed_value.*")\ ... .withWatermark("createTime", "10 seconds") >>> >>> streaming_df.createOrReplaceTempView("streaming_df") >>> session.sql(""" ... SELECT ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount ... FROM streaming_df ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') ... ORDER BY window.start ... """)\ ... .writeStream\ ... .format("kafka") \ ... .option("checkpointLocation", "checkpoint") \ ... .option("kafka.bootstrap.servers", "localhost:9092") \ ... .option("topic", "sink") \ ... .start() ``` This throws exception ``` pyspark.errors.exceptions.captured.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark; line 6 pos 4; ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47318) AuthEngine key exchange needs additional KDF round
[ https://issues.apache.org/jira/browse/SPARK-47318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834987#comment-17834987 ] Dongjoon Hyun commented on SPARK-47318: --- I added a target version (3.4.3) based on the dev mailing list discussion. - https://lists.apache.org/thread/htq3hwfyh6kg28d8bq2n3v60fpn7s375 > AuthEngine key exchange needs additional KDF round > --- > > Key: SPARK-47318 > URL: https://issues.apache.org/jira/browse/SPARK-47318 > Project: Spark > Issue Type: Bug > Components: Security >Affects Versions: 4.0.0 >Reporter: Steve Weis >Priority: Minor > Labels: pull-request-available > > AuthEngine implements a bespoke [key exchange protocol > |[https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto]|https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto].] > based on the NNpsk0 Noise pattern and using X25519. > The Spark code improperly uses the derived shared secret directly, which is > an encoded X coordinate. This should be passed into a KDF rather than used > directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47318) AuthEngine key exchange needs additional KDF round
[ https://issues.apache.org/jira/browse/SPARK-47318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47318: -- Target Version/s: 3.4.3 > AuthEngine key exchange needs additional KDF round > --- > > Key: SPARK-47318 > URL: https://issues.apache.org/jira/browse/SPARK-47318 > Project: Spark > Issue Type: Bug > Components: Security >Affects Versions: 4.0.0 >Reporter: Steve Weis >Priority: Minor > Labels: pull-request-available > > AuthEngine implements a bespoke [key exchange protocol > |[https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto]|https://github.com/apache/spark/tree/master/common/network-common/src/main/java/org/apache/spark/network/crypto].] > based on the NNpsk0 Noise pattern and using X25519. > The Spark code improperly uses the derived shared secret directly, which is > an encoded X coordinate. This should be passed into a KDF rather than used > directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47504) Resolve AbstractDataType simpleStrings for StringTypeCollated
[ https://issues.apache.org/jira/browse/SPARK-47504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47504. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45694 [https://github.com/apache/spark/pull/45694] > Resolve AbstractDataType simpleStrings for StringTypeCollated > - > > Key: SPARK-47504 > URL: https://issues.apache.org/jira/browse/SPARK-47504 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *SPARK-47296* introduced a change to fail all unsupported functions. Because > of this change expected *inputTypes* in *ExpectsInputTypes* had to be > changed. This change introduced a change on user side which will print > *"STRING_ANY_COLLATION"* in places where before we printed *"STRING"* when an > error occurred. Concretely if we get an input of Int where > *StringTypeAnyCollation* was expected, we will throw this faulty message for > users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47681) Add schema_of_variant expression.
[ https://issues.apache.org/jira/browse/SPARK-47681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47681. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45806 [https://github.com/apache/spark/pull/45806] > Add schema_of_variant expression. > - > > Key: SPARK-47681 > URL: https://issues.apache.org/jira/browse/SPARK-47681 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47765) Add SET COLLATION to parser rules
Mihailo Milosevic created SPARK-47765: - Summary: Add SET COLLATION to parser rules Key: SPARK-47765 URL: https://issues.apache.org/jira/browse/SPARK-47765 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
[ https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47764: --- Labels: pull-request-available (was: ) > Cleanup shuffle dependencies for Spark Connect SQL executions > - > > Key: SPARK-47764 > URL: https://issues.apache.org/jira/browse/SPARK-47764 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Priority: Major > Labels: pull-request-available > > Shuffle dependencies are created by shuffle map stages, which consists of > files on disks and the corresponding references in Spark JVM heap memory. > Currently Spark cleanup unused shuffle dependencies through JVM GCs, and > periodic GCs are triggered once every 30 minutes (see ContextCleaner). > However, we still found cases in which the size of the shuffle data files are > too large, which makes shuffle data migration slow. > > We do have chances to cleanup shuffle dependencies, especially for SQL > queries created by Spark Connect, since we do have better control of the > DataFrame instances there. Even if DataFrame instances are reused in the > client side, on the server side the instances are still recreated. > > We might also provide the option to 1. cleanup eagerly after each query > executions, or 2. only mark the shuffle executions and do not migrate them at > node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
Bo Zhang created SPARK-47764: Summary: Cleanup shuffle dependencies for Spark Connect SQL executions Key: SPARK-47764 URL: https://issues.apache.org/jira/browse/SPARK-47764 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Bo Zhang Shuffle dependencies are created by shuffle map stages, which consists of files on disks and the corresponding references in Spark JVM heap memory. Currently Spark cleanup unused shuffle dependencies through JVM GCs, and periodic GCs are triggered once every 30 minutes (see ContextCleaner). However, we still found cases in which the size of the shuffle data files are too large, which makes shuffle data migration slow. We do have chances to cleanup shuffle dependencies, especially for SQL queries created by Spark Connect, since we do have better control of the DataFrame instances there. Even if DataFrame instances are reused in the client side, on the server side the instances are still recreated. We might also provide the option to 1. cleanup eagerly after each query executions, or 2. only mark the shuffle executions and do not migrate them at node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47286) IN operator support
[ https://issues.apache.org/jira/browse/SPARK-47286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandar Tomic resolved SPARK-47286. -- Fix Version/s: 4.0.0 Target Version/s: 4.0.0 Resolution: Fixed > IN operator support > --- > > Key: SPARK-47286 > URL: https://issues.apache.org/jira/browse/SPARK-47286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Fix For: 4.0.0 > > > At this point following query works fine: > ``` > sql("select * from t1 where ucs_basic_lcase in ('aaa' collate > 'ucs_basic_lcase', 'bbb' collate 'ucs_basic_lcase')").show() > ``` > But if we were to miss explicit collate or even mix collations: > ``` > sql("select * from t1 where ucs_basic_lcase in ('aaa' collate > 'ucs_basic_lcase', 'bbb'").show() > ``` > Query would still run and return invalid results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47763) Reeanble Protobuf function doctests
Hyukjin Kwon created SPARK-47763: Summary: Reeanble Protobuf function doctests Key: SPARK-47763 URL: https://issues.apache.org/jira/browse/SPARK-47763 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47746) Use column ordinals instead of prefix ordering columns in the range scan encoder
[ https://issues.apache.org/jira/browse/SPARK-47746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-47746. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45905 [https://github.com/apache/spark/pull/45905] > Use column ordinals instead of prefix ordering columns in the range scan > encoder > > > Key: SPARK-47746 > URL: https://issues.apache.org/jira/browse/SPARK-47746 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Neil Ramaswamy >Assignee: Neil Ramaswamy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, the State V2 implementations do projections in their state > managers, and then provide some prefix (ordering) columns to the > RocksDBStateEncoder. However, we can avoid doing extra projection by just > reading the ordinals we need, in the order we need, in the state encoder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47587) Hive module: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47587: --- Labels: pull-request-available (was: ) > Hive module: Migrate logWarn with variables to structured logging framework > --- > > Key: SPARK-47587 > URL: https://issues.apache.org/jira/browse/SPARK-47587 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47591: --- Labels: pull-request-available (was: ) > Hive-thriftserver: Migrate logInfo with variables to structured logging > framework > - > > Key: SPARK-47591 > URL: https://issues.apache.org/jira/browse/SPARK-47591 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes
[ https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47761: --- Labels: pull-request-available (was: ) > Oracle: Support reading AnsiIntervalTypes > - > > Key: SPARK-47761 > URL: https://issues.apache.org/jira/browse/SPARK-47761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
[ https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47762. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45924 [https://github.com/apache/spark/pull/45924] > Add pyspark.sql.connect.protobuf into setup.py > -- > > Key: SPARK-47762 > URL: https://issues.apache.org/jira/browse/SPARK-47762 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
[ https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47762: - Fix Version/s: 3.5.2 > Add pyspark.sql.connect.protobuf into setup.py > -- > > Key: SPARK-47762 > URL: https://issues.apache.org/jira/browse/SPARK-47762 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834828#comment-17834828 ] Haejoon Lee commented on SPARK-47591: - I'm working on this :) > Hive-thriftserver: Migrate logInfo with variables to structured logging > framework > - > > Key: SPARK-47591 > URL: https://issues.apache.org/jira/browse/SPARK-47591 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
Hyukjin Kwon created SPARK-47762: Summary: Add pyspark.sql.connect.protobuf into setup.py Key: SPARK-47762 URL: https://issues.apache.org/jira/browse/SPARK-47762 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 3.5.1, 4.0.0 Reporter: Hyukjin Kwon We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes
Kent Yao created SPARK-47761: Summary: Oracle: Support reading AnsiIntervalTypes Key: SPARK-47761 URL: https://issues.apache.org/jira/browse/SPARK-47761 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834804#comment-17834804 ] Uroš Bojanić commented on SPARK-47413: -- [~gpgp] Thank you, of course! Take a look at [SPARK-47412|https://issues.apache.org/jira/browse/SPARK-47412] and let me know what you think > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834803#comment-17834803 ] Uroš Bojanić commented on SPARK-47412: -- [~gpgp] Thank you for your hard work on [SPARK-47413|https://issues.apache.org/jira/browse/SPARK-47413]! We'll put your [PR|https://github.com/apache/spark/pull/45738/] under final review, so feel free to move on to this ticket. This one should be relatively simple as well, and you've also got some experience under your belt already. Nevertheless, feel free to let me know if you have any questions! > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46143) pyspark.pandas read_excel implementation at version 3.4.1
[ https://issues.apache.org/jira/browse/SPARK-46143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834802#comment-17834802 ] comet commented on SPARK-46143: --- voted for this issue. > pyspark.pandas read_excel implementation at version 3.4.1 > - > > Key: SPARK-46143 > URL: https://issues.apache.org/jira/browse/SPARK-46143 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 > Environment: pyspark 3.4.1.5.3 build 20230713. > Running on Microsoft Fabric workspace at runtime 1.2. > Tested the same scenario on a spark 3.4.1 standalone deployment on docker > documented at https://github.com/mpavanetti/sparkenv > > >Reporter: Matheus Pavanetti >Priority: Major > Attachments: MicrosoftTeams-image.png, > image-2023-11-28-13-20-40-275.png, image-2023-11-28-13-20-51-291.png > > > Hello, > I would like to report an issue with pyspark.pandas implementation on > read_excel function. > Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which > potentially uses an older version of pandas on it's implementations of > pyspark.pandas. > The function read_excel from pandas doesn't expect a parameter called > "squeeze" however it's implemented as part of pyspark.pandas and the > parameter "squeeze" is being passed to the pandas function. > > !image-2023-11-28-13-20-40-275.png! > > I've been digging into it for further investigation into pyspark 3.4.1 > documentation > [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500] > > This is the point I found that "squeeze" parameter is being passed to pandas > read_excel function which is not expected. > It seems like it was deprecated as part of pyspark 3.4.0 but still being used > in the implementation. > > !image-2023-11-28-13-20-51-291.png! > > I believe this is an issue with pyspark implementation 3.4.1 not necessaily > with fabric. However fabric uses this version as its 1.2 build. > > I am able to work around that for now by download the excel from the one lake > to the spark driver, loading that to the memory with pandas and then > converting to a spark dataframe etc or I made it work downgrading the build > I downloaded the pyspark build 20230713 to my local, made the changes and > re-compiled it and it worked locally. So it means that is related to the > implementation and they would have to fix or I do a downgrade to older > version like 3.3.3 or try the latest 3.5.0 which is not the case for fabric > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47412: - Description: Enable collation support for the *StringLPad* & *StringRPad* built-in string functions in Spark. First confirm what is the expected behaviour for these functions when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* functions so that they support all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. was: Enable collation support for the *Substring* built-in string function in Spark (including *Right* and *Left* functions). First confirm what is the expected behaviour for these functions when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the {*}Substring{*}, {*}Right{*}, and *Left* functions so that they support all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for >
[jira] [Updated] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47412: - Summary: StringLPad, StringRPad (all collations) (was: StringLPad, BinaryPad, StringRPad (all collations)) > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47412) StringLPad, BinaryPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47412: - Description: Enable collation support for the *Substring* built-in string function in Spark (including *Right* and *Left* functions). First confirm what is the expected behaviour for these functions when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the {*}Substring{*}, {*}Right{*}, and *Left* functions so that they support all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. > StringLPad, BinaryPad, StringRPad (all collations) > -- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40782) Upgrade Jackson-databind to 2.13.4.1
[ https://issues.apache.org/jira/browse/SPARK-40782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834801#comment-17834801 ] Ramakrishna commented on SPARK-40782: - Hi this seems to be an issue still as transitive dependency in hadoop │ com.fasterxml.jackson.core:jackson-databind │ CVE-2022-42003 │ HIGH │ fixed │ 2.12.7 │ 2.12.7.1, 2.13.4.2 │ jackson-databind: deep wrapper array nesting wrt │ │ (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) Is thrre a fix for this ? > Upgrade Jackson-databind to 2.13.4.1 > > > Key: SPARK-40782 > URL: https://issues.apache.org/jira/browse/SPARK-40782 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > > #3590: Add check in primitive value deserializers to avoid deep wrapper array > nesting wrt `UNWRAP_SINGLE_VALUE_ARRAYS` [CVE-2022-42003] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834800#comment-17834800 ] Gideon P commented on SPARK-47413: -- [~uros-db] can you find me an additional ticket to work on, for once I finish this one? > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without any code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) App being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Summary: App being stuck with an unexpected stack trace when reading/parsing a time string (was: A Spark app being stuck with an unexpected stack trace when reading/parsing a time string) > App being stuck with an unexpected stack trace when reading/parsing a time > string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a time string. > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at >
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Summary: Apps being stuck with an unexpected stack trace when reading/parsing a time string (was: App being stuck with an unexpected stack trace when reading/parsing a time string) > Apps being stuck with an unexpected stack trace when reading/parsing a time > string > -- > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a time string. > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at >
[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at
[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at
[jira] [Updated] (SPARK-47759) A Spark app being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at
[jira] [Created] (SPARK-47760) Reeanble Avro function doctests
Hyukjin Kwon created SPARK-47760: Summary: Reeanble Avro function doctests Key: SPARK-47760 URL: https://issues.apache.org/jira/browse/SPARK-47760 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org