[jira] [Assigned] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
[ https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43914: Assignee: jiaan.geng > Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] > -- > > Key: SPARK-43914 > URL: https://issues.apache.org/jira/browse/SPARK-43914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
[ https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43914. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41476 [https://github.com/apache/spark/pull/41476] > Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] > -- > > Key: SPARK-43914 > URL: https://issues.apache.org/jira/browse/SPARK-43914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43879) Decouple handle command and send response on server side
[ https://issues.apache.org/jira/browse/SPARK-43879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737962#comment-17737962 ] Snoot.io commented on SPARK-43879: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41527 > Decouple handle command and send response on server side > > > Key: SPARK-43879 > URL: https://issues.apache.org/jira/browse/SPARK-43879 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > SparkConnectStreamHandler treat the request from connect client and send the > response back to connect client. SparkConnectStreamHandler hold a component > StreamObserver which is used to send response. > So I think we should keep the StreamObserver could be accessed only with > SparkConnectStreamHandler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44222) Upgrade `grpc` to 1.56.0
Dongjoon Hyun created SPARK-44222: - Summary: Upgrade `grpc` to 1.56.0 Key: SPARK-44222 URL: https://issues.apache.org/jira/browse/SPARK-44222 Project: Spark Issue Type: Improvement Components: Build, python Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45
[ https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737958#comment-17737958 ] Snoot.io commented on SPARK-44221: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41766 > Upgrade RoaringBitmap from 0.9.44 to 0.9.45 > --- > > Key: SPARK-44221 > URL: https://issues.apache.org/jira/browse/SPARK-44221 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45
[ https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44221: Assignee: BingKun Pan > Upgrade RoaringBitmap from 0.9.44 to 0.9.45 > --- > > Key: SPARK-44221 > URL: https://issues.apache.org/jira/browse/SPARK-44221 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45
[ https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44221. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41766 [https://github.com/apache/spark/pull/41766] > Upgrade RoaringBitmap from 0.9.44 to 0.9.45 > --- > > Key: SPARK-44221 > URL: https://issues.apache.org/jira/browse/SPARK-44221 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs
[ https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44182. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41728 [https://github.com/apache/spark/pull/41728] > Use Spark version variables in Python and Spark Connect installation docs > - > > Key: SPARK-44182 > URL: https://issues.apache.org/jira/browse/SPARK-44182 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs
[ https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44182: Assignee: Dongjoon Hyun > Use Spark version variables in Python and Spark Connect installation docs > - > > Key: SPARK-44182 > URL: https://issues.apache.org/jira/browse/SPARK-44182 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44206) Dataset.selectExpr scope Session.active
[ https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44206. -- Fix Version/s: 3.5.0 3.4.2 Assignee: zhuml Resolution: Fixed Issue resolved by https://github.com/apache/spark/pull/41759 > Dataset.selectExpr scope Session.active > --- > > Key: SPARK-44206 > URL: https://issues.apache.org/jira/browse/SPARK-44206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhuml >Assignee: zhuml >Priority: Major > Fix For: 3.5.0, 3.4.2 > > > {code:java} > //代码占位符 > val clone = spark.cloneSession() > clone.conf.set("spark.sql.legacy.interval.enabled", "true") > clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show() > clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as > b").show() {code} > The first one can be executed successfully, but the second one cannot be > executed successfully. > Because selectExpr and sql use different sparkSession conf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
[ https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-44039: - Assignee: BingKun Pan > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite > > > Key: SPARK-44039 > URL: https://issues.apache.org/jira/browse/SPARK-44039 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include: > - When generating `GOLDEN` files, we should first delete the corresponding > directories and generate new ones to avoid submitting some redundant files > during the review process. eg: > When we write a test named `make_timestamp_ltz` for the overloaded method, > and during the review process, the reviewer wishes to add more tests for the > method. The name of this method has changed during the next submission > process, such as `make_timestamp_ltz without timezone`.At this point, if the > `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files of > `function_make_timestamp_ltz` are already in the commit, and there are many > of these files, we generally do not notice the above problem, which leads to > the incorrect submission of `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files without any > impact on UT. These files are redundant. > - Clear and update some redundant files submitted incorrectly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
[ https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44039. --- Fix Version/s: 3.5.0 Resolution: Fixed > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite > > > Key: SPARK-44039 > URL: https://issues.apache.org/jira/browse/SPARK-44039 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include: > - When generating `GOLDEN` files, we should first delete the corresponding > directories and generate new ones to avoid submitting some redundant files > during the review process. eg: > When we write a test named `make_timestamp_ltz` for the overloaded method, > and during the review process, the reviewer wishes to add more tests for the > method. The name of this method has changed during the next submission > process, such as `make_timestamp_ltz without timezone`.At this point, if the > `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files of > `function_make_timestamp_ltz` are already in the commit, and there are many > of these files, we generally do not notice the above problem, which leads to > the incorrect submission of `queries/function_make_timestamp_ltz.json`, > `queries/function_make_timestamp_ltz.proto.bin` and > `explain-results/function_make_timestamp_ltz.explain` files without any > impact on UT. These files are redundant. > - Clear and update some redundant files submitted incorrectly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45
BingKun Pan created SPARK-44221: --- Summary: Upgrade RoaringBitmap from 0.9.44 to 0.9.45 Key: SPARK-44221 URL: https://issues.apache.org/jira/browse/SPARK-44221 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44161) Row as UDF inputs causes encoder errors
[ https://issues.apache.org/jira/browse/SPARK-44161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-44161: - Assignee: Zhen Li > Row as UDF inputs causes encoder errors > --- > > Key: SPARK-44161 > URL: https://issues.apache.org/jira/browse/SPARK-44161 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.5.0 > > > Ensure row inputs to udfs can be handled correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44161) Row as UDF inputs causes encoder errors
[ https://issues.apache.org/jira/browse/SPARK-44161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44161. --- Fix Version/s: 3.5.0 Resolution: Fixed > Row as UDF inputs causes encoder errors > --- > > Key: SPARK-44161 > URL: https://issues.apache.org/jira/browse/SPARK-44161 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > Fix For: 3.5.0 > > > Ensure row inputs to udfs can be handled correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43203) Fix DROP table behavior in session catalog
[ https://issues.apache.org/jira/browse/SPARK-43203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737894#comment-17737894 ] Anton Okolnychyi commented on SPARK-43203: -- I unfortunately created this initially as improvement. It is actually a bug and regression, which breaks DROP in custom sessions catalogs. Can we include it in 3.4.2? > Fix DROP table behavior in session catalog > -- > > Key: SPARK-43203 > URL: https://issues.apache.org/jira/browse/SPARK-43203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > DROP table behavior is not working correctly in 3.4.0 because we always > invoke V1 drop logic if the identifier looks like a V1 identifier. This is a > big blocker for external data sources that provide custom session catalogs. > See [here|https://github.com/apache/spark/pull/37879/files#r1170501180] for > details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43203) Fix DROP table behavior in session catalog
[ https://issues.apache.org/jira/browse/SPARK-43203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Okolnychyi updated SPARK-43203: - Issue Type: Bug (was: Improvement) > Fix DROP table behavior in session catalog > -- > > Key: SPARK-43203 > URL: https://issues.apache.org/jira/browse/SPARK-43203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > DROP table behavior is not working correctly in 3.4.0 because we always > invoke V1 drop logic if the identifier looks like a V1 identifier. This is a > big blocker for external data sources that provide custom session catalogs. > See [here|https://github.com/apache/spark/pull/37879/files#r1170501180] for > details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44220) Move StringConcat to sql/api
Rui Wang created SPARK-44220: Summary: Move StringConcat to sql/api Key: SPARK-44220 URL: https://issues.apache.org/jira/browse/SPARK-44220 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs
[ https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44182: -- Summary: Use Spark version variables in Python and Spark Connect installation docs (was: Use Spark version placeholders in Python and Spark Connect installation docs) > Use Spark version variables in Python and Spark Connect installation docs > - > > Key: SPARK-44182 > URL: https://issues.apache.org/jira/browse/SPARK-44182 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44219) Add extra per-rule validation for optimization rewrites.
Yannis Sismanis created SPARK-44219: --- Summary: Add extra per-rule validation for optimization rewrites. Key: SPARK-44219 URL: https://issues.apache.org/jira/browse/SPARK-44219 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.4.1, 3.4.0 Reporter: Yannis Sismanis Adds per-rule validation checks for the following: 1. aggregate expressions in Aggregate plans are valid. 2. Grouping key types in Aggregate plans cannot by of type Map. 3. No dangling references have been generated. This is validation is by default enabled for all tests or selectively using the spark.sql.planChangeValidation=true flag. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43631) Enable Series.interpolate with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43631. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41670 [https://github.com/apache/spark/pull/41670] > Enable Series.interpolate with Spark Connect > > > Key: SPARK-43631 > URL: https://issues.apache.org/jira/browse/SPARK-43631 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable Series.interpolate with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43631) Enable Series.interpolate with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43631: - Assignee: Haejoon Lee > Enable Series.interpolate with Spark Connect > > > Key: SPARK-43631 > URL: https://issues.apache.org/jira/browse/SPARK-43631 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable Series.interpolate with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44193) Implement GRPC exceptions interception for conversion
[ https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737874#comment-17737874 ] Harish Gontu commented on SPARK-44193: -- Can i take up this task ? > Implement GRPC exceptions interception for conversion > - > > Key: SPARK-44193 > URL: https://issues.apache.org/jira/browse/SPARK-44193 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43092) Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset`
[ https://issues.apache.org/jira/browse/SPARK-43092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737869#comment-17737869 ] Harish Gontu commented on SPARK-43092: -- Can i pick up this task ? > Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset` > > > Key: SPARK-43092 > URL: https://issues.apache.org/jira/browse/SPARK-43092 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44218) Add improved error message formatting for assert_approx_df_equality
Amanda Liu created SPARK-44218: -- Summary: Add improved error message formatting for assert_approx_df_equality Key: SPARK-44218 URL: https://issues.apache.org/jira/browse/SPARK-44218 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44217) Add assert_approx_df_equality util function
Amanda Liu created SPARK-44217: -- Summary: Add assert_approx_df_equality util function Key: SPARK-44217 URL: https://issues.apache.org/jira/browse/SPARK-44217 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44216) Add improved error message formatting for assert_df_equality
Amanda Liu created SPARK-44216: -- Summary: Add improved error message formatting for assert_df_equality Key: SPARK-44216 URL: https://issues.apache.org/jira/browse/SPARK-44216 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks
Chandni Singh created SPARK-44215: - Summary: Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks Key: SPARK-44215 URL: https://issues.apache.org/jira/browse/SPARK-44215 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.0 Reporter: Chandni Singh We still see instances of the server returning 0 {{numChunks}} in {{mergedMetaResponse}} which causes the executor to fail with {{ArithmeticException}}. {code} java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) {code} Here the executor doesn't fallback to fetch un-merged blocks and this also doesn't result in a {{FetchFailure}}. So, the application fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
[ https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737770#comment-17737770 ] Yuming Wang commented on SPARK-44213: - Related issue ticket: SPARK-41752. > CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled > - > > Key: SPARK-44213 > URL: https://issues.apache.org/jira/browse/SPARK-44213 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: enabled.png, screenshot-1.png > > > {code:sql} > create table tbl using parquet as select t1.id from range(10) as t1 join > range(100) as t2 on t1.id = t2.id; > {code} > Enabled: > !enabled.png! > Disabled: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44214) Add driver log live UI for K8s environment
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44214: -- Summary: Add driver log live UI for K8s environment (was: Add driver log UI for K8s environment) > Add driver log live UI for K8s environment > -- > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44214) Add driver log UI for K8s environment
Dongjoon Hyun created SPARK-44214: - Summary: Add driver log UI for K8s environment Key: SPARK-44214 URL: https://issues.apache.org/jira/browse/SPARK-44214 Project: Spark Issue Type: Improvement Components: Kubernetes, Spark Core, Web UI Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
[ https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737768#comment-17737768 ] Yuming Wang commented on SPARK-44213: - cc [~linhongliu-db] > CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled > - > > Key: SPARK-44213 > URL: https://issues.apache.org/jira/browse/SPARK-44213 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: enabled.png, screenshot-1.png > > > {code:sql} > create table tbl using parquet as select t1.id from range(10) as t1 join > range(100) as t2 on t1.id = t2.id; > {code} > Enabled: > !enabled.png! > Disabled: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
[ https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44213: Description: {code:sql} create table tbl using parquet as select t1.id from range(10) as t1 join range(100) as t2 on t1.id = t2.id; {code} Enabled: !enabled.png! Disabled: !screenshot-1.png! was: {code:sql} create table tbl using parquet as select t1.id from range(10) as t1 join range(100) as t2 on t1.id = t2.id; {code} Enabled: Disabled: > CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled > - > > Key: SPARK-44213 > URL: https://issues.apache.org/jira/browse/SPARK-44213 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: enabled.png, screenshot-1.png > > > {code:sql} > create table tbl using parquet as select t1.id from range(10) as t1 join > range(100) as t2 on t1.id = t2.id; > {code} > Enabled: > !enabled.png! > Disabled: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
[ https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44213: Attachment: screenshot-1.png > CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled > - > > Key: SPARK-44213 > URL: https://issues.apache.org/jira/browse/SPARK-44213 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: enabled.png, screenshot-1.png > > > {code:sql} > create table tbl using parquet as select t1.id from range(10) as t1 join > range(100) as t2 on t1.id = t2.id; > {code} > Enabled: > Disabled: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
Yuming Wang created SPARK-44213: --- Summary: CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled Key: SPARK-44213 URL: https://issues.apache.org/jira/browse/SPARK-44213 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.4.0 Reporter: Yuming Wang Attachments: enabled.png, screenshot-1.png {code:sql} create table tbl using parquet as select t1.id from range(10) as t1 join range(100) as t2 on t1.id = t2.id; {code} Enabled: Disabled: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
[ https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44213: Attachment: enabled.png > CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled > - > > Key: SPARK-44213 > URL: https://issues.apache.org/jira/browse/SPARK-44213 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: enabled.png, screenshot-1.png > > > {code:sql} > create table tbl using parquet as select t1.id from range(10) as t1 join > range(100) as t2 on t1.id = t2.id; > {code} > Enabled: > Disabled: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44171) Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes
[ https://issues.apache.org/jira/browse/SPARK-44171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-44171: Assignee: BingKun Pan > Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some > unused error classes > - > > Key: SPARK-44171 > URL: https://issues.apache.org/jira/browse/SPARK-44171 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44171) Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes
[ https://issues.apache.org/jira/browse/SPARK-44171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44171. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41721 [https://github.com/apache/spark/pull/41721] > Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some > unused error classes > - > > Key: SPARK-44171 > URL: https://issues.apache.org/jira/browse/SPARK-44171 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44182) Use Spark version placeholders in Python and Spark Connect installation docs
[ https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44182: -- Summary: Use Spark version placeholders in Python and Spark Connect installation docs (was: Use Spark 3.5.0 in Python and Spark Connect docs) > Use Spark version placeholders in Python and Spark Connect installation docs > > > Key: SPARK-44182 > URL: https://issues.apache.org/jira/browse/SPARK-44182 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44212) Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462
[ https://issues.apache.org/jira/browse/SPARK-44212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737755#comment-17737755 ] Kazuaki Ishizaki commented on SPARK-44212: -- [https://github.com/apache/spark/pull/41681#pullrequestreview-1496876723|http://example.com] is discussing the upgrade of netty. > Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462 > > > Key: SPARK-44212 > URL: https://issues.apache.org/jira/browse/SPARK-44212 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.1 >Reporter: Raúl Cumplido >Priority: Major > > Hi, > On the Apache Arrow project we have noticed that our nightly integration > tests with spark started failing lately. With some investigation I've noticed > that we are defining a different version of the Java netty dependencies. We > upgraded to 4.1.94.Final due to the CVE on the title: > [https://github.com/advisories/GHSA-6mjq-h674-j845] > Our PR upgrading the version: [https://github.com/apache/arrow/issues/36209] > I have opened an issue on the Apache Arrow repository to try and fix > something else on our side but I was wondering if you would want to update > the version to solve the CVE. > > Thanks > Raúl -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44212) Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462
Raúl Cumplido created SPARK-44212: - Summary: Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462 Key: SPARK-44212 URL: https://issues.apache.org/jira/browse/SPARK-44212 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.1 Reporter: Raúl Cumplido Hi, On the Apache Arrow project we have noticed that our nightly integration tests with spark started failing lately. With some investigation I've noticed that we are defining a different version of the Java netty dependencies. We upgraded to 4.1.94.Final due to the CVE on the title: [https://github.com/advisories/GHSA-6mjq-h674-j845] Our PR upgrading the version: [https://github.com/apache/arrow/issues/36209] I have opened an issue on the Apache Arrow repository to try and fix something else on our side but I was wondering if you would want to update the version to solve the CVE. Thanks Raúl -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44211) PySpark: SparkSession.is_stopped
[ https://issues.apache.org/jira/browse/SPARK-44211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alice Sayutina updated SPARK-44211: --- Description: Implement SparkConnectClient.is_stopped property to check if this session have been closed previously (was: Implement SparkConnectClient.is_closed() method to check if this session have been closed previously) > PySpark: SparkSession.is_stopped > > > Key: SPARK-44211 > URL: https://issues.apache.org/jira/browse/SPARK-44211 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Major > > Implement SparkConnectClient.is_stopped property to check if this session > have been closed previously -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44211) PySpark: SparkSession.is_stopped
[ https://issues.apache.org/jira/browse/SPARK-44211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alice Sayutina updated SPARK-44211: --- Summary: PySpark: SparkSession.is_stopped (was: PySpark: SparkConnectClient.is_closed() method) > PySpark: SparkSession.is_stopped > > > Key: SPARK-44211 > URL: https://issues.apache.org/jira/browse/SPARK-44211 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Major > > Implement SparkConnectClient.is_closed() method to check if this session have > been closed previously -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44211) PySpark: SparkConnectClient.is_closed() method
Alice Sayutina created SPARK-44211: -- Summary: PySpark: SparkConnectClient.is_closed() method Key: SPARK-44211 URL: https://issues.apache.org/jira/browse/SPARK-44211 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.5.0 Reporter: Alice Sayutina Implement SparkConnectClient.is_closed() method to check if this session have been closed previously -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44210) Strengthen type checking and better comply with Connect specifications for `levenshtein` function
BingKun Pan created SPARK-44210: --- Summary: Strengthen type checking and better comply with Connect specifications for `levenshtein` function Key: SPARK-44210 URL: https://issues.apache.org/jira/browse/SPARK-44210 Project: Spark Issue Type: Improvement Components: Connect, SQL Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44209) Expose amount of shuffle data available on the node
[ https://issues.apache.org/jira/browse/SPARK-44209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737611#comment-17737611 ] Deependra Patel commented on SPARK-44209: - I will create a pull request for this soon > Expose amount of shuffle data available on the node > --- > > Key: SPARK-44209 > URL: https://issues.apache.org/jira/browse/SPARK-44209 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Affects Versions: 3.4.1 >Reporter: Deependra Patel >Priority: Trivial > > [ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318] > doesn't have metrics like > "totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per > node published by External Shuffle Service. > > Adding these metrics would help in - > 1. Deciding if we can decommission the node if no shuffle data present > 2. Better live monitoring of customer's workload to see if there is skewed > shuffle data present on the node -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44209) Expose amount of shuffle data available on the node
Deependra Patel created SPARK-44209: --- Summary: Expose amount of shuffle data available on the node Key: SPARK-44209 URL: https://issues.apache.org/jira/browse/SPARK-44209 Project: Spark Issue Type: New Feature Components: Shuffle Affects Versions: 3.4.1 Reporter: Deependra Patel [ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318] doesn't have metrics like "totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per node published by External Shuffle Service. Adding these metrics would help in - 1. Deciding if we can decommission the node if no shuffle data present 2. Better live monitoring of customer's workload to see if there is skewed shuffle data present on the node -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44206) Dataset.selectExpr scope Session.active
[ https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44206: -- Summary: Dataset.selectExpr scope Session.active (was: sparkSession.selectExpr scope Session.active) > Dataset.selectExpr scope Session.active > --- > > Key: SPARK-44206 > URL: https://issues.apache.org/jira/browse/SPARK-44206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhuml >Priority: Major > > {code:java} > //代码占位符 > val clone = spark.cloneSession() > clone.conf.set("spark.sql.legacy.interval.enabled", "true") > clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show() > clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as > b").show() {code} > The first one can be executed successfully, but the second one cannot be > executed successfully. > Because selectExpr and sql use different sparkSession conf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44208) assign clear error class names for some logic that directly uses exceptions
BingKun Pan created SPARK-44208: --- Summary: assign clear error class names for some logic that directly uses exceptions Key: SPARK-44208 URL: https://issues.apache.org/jira/browse/SPARK-44208 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 3.5.0 Reporter: BingKun Pan include: * ALL_FOR_PARTITION_COLUMNS_IS_NOT_ALLOWED * INVALID_COLUMN_NAME * SPECIFY_BUCKETING_IS_NOT_ALLOWED * SPECIFY_PARTITION_IS_NOT_ALLOWED * UNSUPPORTED_ADD_FILE.DIRECTORY * UNSUPPORTED_ADD_FILE.LOCAL_DIRECTORY -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44207) Where Clause throwing Resolved attribute(s) _metadata#398 missing from ... error
huizhong xu created SPARK-44207: --- Summary: Where Clause throwing Resolved attribute(s) _metadata#398 missing from ... error Key: SPARK-44207 URL: https://issues.apache.org/jira/browse/SPARK-44207 Project: Spark Issue Type: Question Components: SQL Affects Versions: 3.3.1 Reporter: huizhong xu i have 2 data frames called lt and rt, both with same schema and only 1 row, generated separately by our own curation logic, all the columns are either String, boolean or Timestamp, i am trying to compare them, and i am running a join on two like this var joinedDF = lt.join(rt, "Id") after that, i am trying to compare them by schema fist and then by each column, how many % of rows are same, code is kindof like this for (column <- lt.schema) { if (rt.columns.contains(column.name) && column.dataType == rt.schema(column.name).dataType) { var matchCount = joinedCount if (column.dataType.typeName == "string") { matchCount = joinedDF.where((lt(column.name) <=> rt(column.name))).count} else . on the last line where i am running a where clause, it is throwing an error called AnalysisException Resolved attribute(s) _metadata#398 missing from , i don't even have this _metadata column anywhere in my dataframe at all and i searched online people are saying it is a problem of join, i tried to change the colunm names in rt and joinedDF, both doesn't work, same error is still thrown, can anybody help here -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT
[ https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737570#comment-17737570 ] Max Gekk commented on SPARK-43438: -- > 2. when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as > follows: > 3. when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as > follows: [~panbingkun] Where is there the difference? > Should we align the logic of 1 and 2? Yep, let's try to make it consistent in any case. > Fix mismatched column list error on INSERT > -- > > Key: SPARK-43438 > URL: https://issues.apache.org/jira/browse/SPARK-43438 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > This error message is pretty bad, and common > "_LEGACY_ERROR_TEMP_1038" : { > "message" : [ > "Cannot write to table due to mismatched user specified column > size() and data column size()." > ] > }, > It can perhaps be merged with this one - after giving it an ERROR_CLASS > "_LEGACY_ERROR_TEMP_1168" : { > "message" : [ > " requires that the data to be inserted have the same number of > columns as the target table: target table has column(s) but > the inserted data has column(s), including > partition column(s) having constant value(s)." > ] > }, > Repro: > CREATE TABLE tabtest(c1 INT, c2 INT); > INSERT INTO tabtest SELECT 1; > `spark_catalog`.`default`.`tabtest` requires that the data to be inserted > have the same number of columns as the target table: target table has 2 > column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > INSERT INTO tabtest(c1) SELECT 1, 2, 3; > Cannot write to table due to mismatched user specified column size(1) and > data column size(3).; line 1 pos 24 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls
[ https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737563#comment-17737563 ] Yuming Wang commented on SPARK-42260: - Remove the target version since 3.4.1 is released. > Log when the K8s Exec Pods Allocator Stalls > --- > > Key: SPARK-42260 > URL: https://issues.apache.org/jira/browse/SPARK-42260 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0, 3.4.1 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Minor > > Sometimes if the K8s APIs are being slow the ExecutorPods allocator can stall > and it would be good for us to log this (and how long we've stalled for) so > folks can tell more clearly why Spark is unable to reach the desired target > number of executors. > > This is _somewhat_ related to SPARK-36664 which logs the time spent waiting > for executor allocation but goes a step further for K8s and logs when we've > stalled because we have too many pending pods. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls
[ https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-42260: Target Version/s: (was: 3.4.1) > Log when the K8s Exec Pods Allocator Stalls > --- > > Key: SPARK-42260 > URL: https://issues.apache.org/jira/browse/SPARK-42260 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0, 3.4.1 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Minor > > Sometimes if the K8s APIs are being slow the ExecutorPods allocator can stall > and it would be good for us to log this (and how long we've stalled for) so > folks can tell more clearly why Spark is unable to reach the desired target > number of executors. > > This is _somewhat_ related to SPARK-36664 which logs the time spent waiting > for executor allocation but goes a step further for K8s and logs when we've > stalled because we have too many pending pods. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44025) CSV Table Read Error with CharType(length) column
[ https://issues.apache.org/jira/browse/SPARK-44025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44025: Target Version/s: (was: 3.4.1) > CSV Table Read Error with CharType(length) column > - > > Key: SPARK-44025 > URL: https://issues.apache.org/jira/browse/SPARK-44025 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 > Environment: {{apache/spark:v3.4.0 image}} >Reporter: Fengyu Cao >Priority: Major > > Problem: > # read a CSV format table > # table has a `CharType(length)` column > # read table failed with Exception: `org.apache.spark.SparkException: Job > aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor > 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct).` > > reproduce with official image: > # {{docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql}} > # {{CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV > OPTIONS ('header' = 'true', 'sep' = ';') LOCATION > "/opt/spark/examples/src/main/resources/people.csv";}} > # SELECT * FROM csv_bug; > # ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44206) sparkSession.selectExpr scope Session.active
[ https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44206: -- Summary: sparkSession.selectExpr scope Session.active (was: sparkSession.selectExpr use Session.active) > sparkSession.selectExpr scope Session.active > > > Key: SPARK-44206 > URL: https://issues.apache.org/jira/browse/SPARK-44206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhuml >Priority: Major > > {code:java} > //代码占位符 > val clone = spark.cloneSession() > clone.conf.set("spark.sql.legacy.interval.enabled", "true") > clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show() > clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as > b").show() {code} > The first one can be executed successfully, but the second one cannot be > executed successfully. > Because selectExpr and sql use different sparkSession conf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44206) sparkSession.selectExpr use Session.active
zhuml created SPARK-44206: - Summary: sparkSession.selectExpr use Session.active Key: SPARK-44206 URL: https://issues.apache.org/jira/browse/SPARK-44206 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: zhuml {code:java} //代码占位符 val clone = spark.cloneSession() clone.conf.set("spark.sql.legacy.interval.enabled", "true") clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show() clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as b").show() {code} The first one can be executed successfully, but the second one cannot be executed successfully. Because selectExpr and sql use different sparkSession conf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44204) Add missing recordHiveCall for getPartitionNames
[ https://issues.apache.org/jira/browse/SPARK-44204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44204. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41756 [https://github.com/apache/spark/pull/41756] > Add missing recordHiveCall for getPartitionNames > > > Key: SPARK-44204 > URL: https://issues.apache.org/jira/browse/SPARK-44204 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44204) Add missing recordHiveCall for getPartitionNames
[ https://issues.apache.org/jira/browse/SPARK-44204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44204: Assignee: Cheng Pan > Add missing recordHiveCall for getPartitionNames > > > Key: SPARK-44204 > URL: https://issues.apache.org/jira/browse/SPARK-44204 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44192) Support R 4.3.1
[ https://issues.apache.org/jira/browse/SPARK-44192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44192: - Assignee: Yang Jie > Support R 4.3.1 > --- > > Key: SPARK-44192 > URL: https://issues.apache.org/jira/browse/SPARK-44192 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > https://cran.r-project.org/doc/manuals/r-release/NEWS.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44192) Support R 4.3.1
[ https://issues.apache.org/jira/browse/SPARK-44192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44192. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41754 [https://github.com/apache/spark/pull/41754] > Support R 4.3.1 > --- > > Key: SPARK-44192 > URL: https://issues.apache.org/jira/browse/SPARK-44192 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > > https://cran.r-project.org/doc/manuals/r-release/NEWS.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40513) SPIP: Support Docker Official Image for Spark
[ https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang reassigned SPARK-40513: --- Assignee: Yikun Jiang > SPIP: Support Docker Official Image for Spark > - > > Key: SPARK-40513 > URL: https://issues.apache.org/jira/browse/SPARK-40513 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Docker >Affects Versions: 3.5.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Labels: SPIP > > This SPIP is proposed to add [Docker Official > Image(DOI)|https://github.com/docker-library/official-images] to ensure the > Spark Docker images meet the quality standards for Docker images, to provide > these Docker images for users who want to use Apache Spark via Docker image. > There are also several [Apache projects that release the Docker Official > Images|https://hub.docker.com/search?q=apache_filter=official], such > as: [flink|https://hub.docker.com/_/flink], > [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], > [zookeeper|https://hub.docker.com/_/zookeeper], > [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). > From the huge download statistics, we can see the real demands of users, and > from the support of other apache projects, we should also be able to do it. > After support: > * The Dockerfile will still be maintained by the Apache Spark community and > reviewed by Docker. > * The images will be maintained by the Docker community to ensure the > quality standards for Docker images of the Docker community. > It will also reduce the extra docker images maintenance effort (such as > frequently rebuilding, image security update) of the Apache Spark community. > > SPIP DOC: > [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o] > DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40513) SPIP: Support Docker Official Image for Spark
[ https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40513. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 34 [https://github.com/apache/spark-docker/pull/34] > SPIP: Support Docker Official Image for Spark > - > > Key: SPARK-40513 > URL: https://issues.apache.org/jira/browse/SPARK-40513 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Docker >Affects Versions: 3.5.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Labels: SPIP > Fix For: 3.5.0 > > > This SPIP is proposed to add [Docker Official > Image(DOI)|https://github.com/docker-library/official-images] to ensure the > Spark Docker images meet the quality standards for Docker images, to provide > these Docker images for users who want to use Apache Spark via Docker image. > There are also several [Apache projects that release the Docker Official > Images|https://hub.docker.com/search?q=apache_filter=official], such > as: [flink|https://hub.docker.com/_/flink], > [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], > [zookeeper|https://hub.docker.com/_/zookeeper], > [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). > From the huge download statistics, we can see the real demands of users, and > from the support of other apache projects, we should also be able to do it. > After support: > * The Dockerfile will still be maintained by the Apache Spark community and > reviewed by Docker. > * The images will be maintained by the Docker community to ensure the > quality standards for Docker images of the Docker community. > It will also reduce the extra docker images maintenance effort (such as > frequently rebuilding, image security update) of the Apache Spark community. > > SPIP DOC: > [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o] > DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44175) Remove useless lib64 path link in dockerfile
[ https://issues.apache.org/jira/browse/SPARK-44175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-44175. - Fix Version/s: 3.5.0 Resolution: Fixed Resolved by https://github.com/apache/spark-docker/pull/48 > Remove useless lib64 path link in dockerfile > > > Key: SPARK-44175 > URL: https://issues.apache.org/jira/browse/SPARK-44175 > Project: Spark > Issue Type: Sub-task > Components: Spark Docker >Affects Versions: 3.5.0 >Reporter: Yikun Jiang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org