[jira] [Assigned] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-27 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43914:


Assignee: jiaan.geng

> Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
> --
>
> Key: SPARK-43914
> URL: https://issues.apache.org/jira/browse/SPARK-43914
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-27 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43914.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41476
[https://github.com/apache/spark/pull/41476]

> Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
> --
>
> Key: SPARK-43914
> URL: https://issues.apache.org/jira/browse/SPARK-43914
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43879) Decouple handle command and send response on server side

2023-06-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737962#comment-17737962
 ] 

Snoot.io commented on SPARK-43879:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41527

> Decouple handle command and send response on server side
> 
>
> Key: SPARK-43879
> URL: https://issues.apache.org/jira/browse/SPARK-43879
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> SparkConnectStreamHandler treat the request from connect client and send the 
> response back to connect client. SparkConnectStreamHandler hold a component 
> StreamObserver which is used to send response.
> So I think we should keep the StreamObserver could be accessed only with 
> SparkConnectStreamHandler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44222) Upgrade `grpc` to 1.56.0

2023-06-27 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44222:
-

 Summary: Upgrade `grpc` to 1.56.0
 Key: SPARK-44222
 URL: https://issues.apache.org/jira/browse/SPARK-44222
 Project: Spark
  Issue Type: Improvement
  Components: Build, python
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45

2023-06-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737958#comment-17737958
 ] 

Snoot.io commented on SPARK-44221:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41766

> Upgrade RoaringBitmap from 0.9.44 to 0.9.45
> ---
>
> Key: SPARK-44221
> URL: https://issues.apache.org/jira/browse/SPARK-44221
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45

2023-06-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44221:


Assignee: BingKun Pan

> Upgrade RoaringBitmap from 0.9.44 to 0.9.45
> ---
>
> Key: SPARK-44221
> URL: https://issues.apache.org/jira/browse/SPARK-44221
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45

2023-06-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44221.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41766
[https://github.com/apache/spark/pull/41766]

> Upgrade RoaringBitmap from 0.9.44 to 0.9.45
> ---
>
> Key: SPARK-44221
> URL: https://issues.apache.org/jira/browse/SPARK-44221
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs

2023-06-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44182.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41728
[https://github.com/apache/spark/pull/41728]

> Use Spark version variables in Python and Spark Connect installation docs
> -
>
> Key: SPARK-44182
> URL: https://issues.apache.org/jira/browse/SPARK-44182
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs

2023-06-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44182:


Assignee: Dongjoon Hyun

> Use Spark version variables in Python and Spark Connect installation docs
> -
>
> Key: SPARK-44182
> URL: https://issues.apache.org/jira/browse/SPARK-44182
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44206) Dataset.selectExpr scope Session.active

2023-06-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44206.
--
Fix Version/s: 3.5.0
   3.4.2
 Assignee: zhuml
   Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/41759

> Dataset.selectExpr scope Session.active
> ---
>
> Key: SPARK-44206
> URL: https://issues.apache.org/jira/browse/SPARK-44206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhuml
>Assignee: zhuml
>Priority: Major
> Fix For: 3.5.0, 3.4.2
>
>
> {code:java}
> //代码占位符
> val clone = spark.cloneSession()
> clone.conf.set("spark.sql.legacy.interval.enabled", "true")
> clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show()
> clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as 
> b").show() {code}
> The first one can be executed successfully, but the second one cannot be 
> executed successfully.
> Because selectExpr and sql use different sparkSession conf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-44039:
-

Assignee: BingKun Pan

> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
> 
>
> Key: SPARK-44039
> URL: https://issues.apache.org/jira/browse/SPARK-44039
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>
> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include:
> - When generating `GOLDEN` files, we should first delete the corresponding 
> directories and generate new ones to avoid submitting some redundant files 
> during the review process. eg:
> When we write a test named `make_timestamp_ltz` for the overloaded method, 
> and during the review process, the reviewer wishes to add more tests for the 
> method. The name of this method has changed during the next submission 
> process, such as `make_timestamp_ltz without timezone`.At this point, if the 
> `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files of 
> `function_make_timestamp_ltz` are already in the commit, and there are many 
> of these files, we generally do not notice the above problem, which leads to 
> the incorrect submission of `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files without any 
> impact on UT. These files are redundant.
> - Clear and update some redundant files submitted incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44039.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
> 
>
> Key: SPARK-44039
> URL: https://issues.apache.org/jira/browse/SPARK-44039
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>
> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include:
> - When generating `GOLDEN` files, we should first delete the corresponding 
> directories and generate new ones to avoid submitting some redundant files 
> during the review process. eg:
> When we write a test named `make_timestamp_ltz` for the overloaded method, 
> and during the review process, the reviewer wishes to add more tests for the 
> method. The name of this method has changed during the next submission 
> process, such as `make_timestamp_ltz without timezone`.At this point, if the 
> `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files of 
> `function_make_timestamp_ltz` are already in the commit, and there are many 
> of these files, we generally do not notice the above problem, which leads to 
> the incorrect submission of `queries/function_make_timestamp_ltz.json`, 
> `queries/function_make_timestamp_ltz.proto.bin` and 
> `explain-results/function_make_timestamp_ltz.explain` files without any 
> impact on UT. These files are redundant.
> - Clear and update some redundant files submitted incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44221) Upgrade RoaringBitmap from 0.9.44 to 0.9.45

2023-06-27 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44221:
---

 Summary: Upgrade RoaringBitmap from 0.9.44 to 0.9.45
 Key: SPARK-44221
 URL: https://issues.apache.org/jira/browse/SPARK-44221
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44161) Row as UDF inputs causes encoder errors

2023-06-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-44161:
-

Assignee: Zhen Li

> Row as UDF inputs causes encoder errors
> ---
>
> Key: SPARK-44161
> URL: https://issues.apache.org/jira/browse/SPARK-44161
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.5.0
>
>
> Ensure row inputs to udfs can be handled correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44161) Row as UDF inputs causes encoder errors

2023-06-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44161.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Row as UDF inputs causes encoder errors
> ---
>
> Key: SPARK-44161
> URL: https://issues.apache.org/jira/browse/SPARK-44161
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
> Fix For: 3.5.0
>
>
> Ensure row inputs to udfs can be handled correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43203) Fix DROP table behavior in session catalog

2023-06-27 Thread Anton Okolnychyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737894#comment-17737894
 ] 

Anton Okolnychyi commented on SPARK-43203:
--

I unfortunately created this initially as improvement. It is actually a bug and 
regression, which breaks DROP in custom sessions catalogs. Can we include it in 
3.4.2?

> Fix DROP table behavior in session catalog
> --
>
> Key: SPARK-43203
> URL: https://issues.apache.org/jira/browse/SPARK-43203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> DROP table behavior is not working correctly in 3.4.0 because we always 
> invoke V1 drop logic if the identifier looks like a V1 identifier. This is a 
> big blocker for external data sources that provide custom session catalogs.
> See [here|https://github.com/apache/spark/pull/37879/files#r1170501180] for 
> details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43203) Fix DROP table behavior in session catalog

2023-06-27 Thread Anton Okolnychyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Okolnychyi updated SPARK-43203:
-
Issue Type: Bug  (was: Improvement)

> Fix DROP table behavior in session catalog
> --
>
> Key: SPARK-43203
> URL: https://issues.apache.org/jira/browse/SPARK-43203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> DROP table behavior is not working correctly in 3.4.0 because we always 
> invoke V1 drop logic if the identifier looks like a V1 identifier. This is a 
> big blocker for external data sources that provide custom session catalogs.
> See [here|https://github.com/apache/spark/pull/37879/files#r1170501180] for 
> details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44220) Move StringConcat to sql/api

2023-06-27 Thread Rui Wang (Jira)
Rui Wang created SPARK-44220:


 Summary: Move StringConcat to sql/api
 Key: SPARK-44220
 URL: https://issues.apache.org/jira/browse/SPARK-44220
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44182) Use Spark version variables in Python and Spark Connect installation docs

2023-06-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44182:
--
Summary: Use Spark version variables in Python and Spark Connect 
installation docs  (was: Use Spark version placeholders in Python and Spark 
Connect installation docs)

> Use Spark version variables in Python and Spark Connect installation docs
> -
>
> Key: SPARK-44182
> URL: https://issues.apache.org/jira/browse/SPARK-44182
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44219) Add extra per-rule validation for optimization rewrites.

2023-06-27 Thread Yannis Sismanis (Jira)
Yannis Sismanis created SPARK-44219:
---

 Summary: Add extra per-rule validation for optimization rewrites.
 Key: SPARK-44219
 URL: https://issues.apache.org/jira/browse/SPARK-44219
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.4.1, 3.4.0
Reporter: Yannis Sismanis


Adds per-rule validation checks for the following:

1.  aggregate expressions in Aggregate plans are valid.
2. Grouping key types in Aggregate plans cannot by of type Map. 
3. No dangling references have been generated.

This is validation is by default enabled for all tests or selectively using the 
spark.sql.planChangeValidation=true flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43631) Enable Series.interpolate with Spark Connect

2023-06-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43631.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41670
[https://github.com/apache/spark/pull/41670]

> Enable Series.interpolate with Spark Connect
> 
>
> Key: SPARK-43631
> URL: https://issues.apache.org/jira/browse/SPARK-43631
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable Series.interpolate with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43631) Enable Series.interpolate with Spark Connect

2023-06-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43631:
-

Assignee: Haejoon Lee

> Enable Series.interpolate with Spark Connect
> 
>
> Key: SPARK-43631
> URL: https://issues.apache.org/jira/browse/SPARK-43631
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable Series.interpolate with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44193) Implement GRPC exceptions interception for conversion

2023-06-27 Thread Harish Gontu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737874#comment-17737874
 ] 

Harish Gontu commented on SPARK-44193:
--

Can i take up this task ?

> Implement GRPC exceptions interception for conversion
> -
>
> Key: SPARK-44193
> URL: https://issues.apache.org/jira/browse/SPARK-44193
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43092) Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset`

2023-06-27 Thread Harish Gontu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737869#comment-17737869
 ] 

Harish Gontu commented on SPARK-43092:
--

Can i pick up this task ?

> Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset`
> 
>
> Key: SPARK-43092
> URL: https://issues.apache.org/jira/browse/SPARK-43092
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44218) Add improved error message formatting for assert_approx_df_equality

2023-06-27 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44218:
--

 Summary: Add improved error message formatting for 
assert_approx_df_equality
 Key: SPARK-44218
 URL: https://issues.apache.org/jira/browse/SPARK-44218
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44217) Add assert_approx_df_equality util function

2023-06-27 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44217:
--

 Summary: Add assert_approx_df_equality util function
 Key: SPARK-44217
 URL: https://issues.apache.org/jira/browse/SPARK-44217
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44216) Add improved error message formatting for assert_df_equality

2023-06-27 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44216:
--

 Summary: Add improved error message formatting for 
assert_df_equality
 Key: SPARK-44216
 URL: https://issues.apache.org/jira/browse/SPARK-44216
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks

2023-06-27 Thread Chandni Singh (Jira)
Chandni Singh created SPARK-44215:
-

 Summary: Client receives zero number of chunks in merge meta 
response which doesn't trigger fallback to unmerged blocks
 Key: SPARK-44215
 URL: https://issues.apache.org/jira/browse/SPARK-44215
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.0
Reporter: Chandni Singh


We still see instances of the server returning 0 {{numChunks}} in 
{{mergedMetaResponse}} which causes the executor to fail with 
{{ArithmeticException}}. 
{code}
java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90)
at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
{code}
Here the executor doesn't fallback to fetch un-merged blocks and this also 
doesn't result in a {{FetchFailure}}. So, the application fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737770#comment-17737770
 ] 

Yuming Wang commented on SPARK-44213:
-

Related issue ticket: SPARK-41752.

> CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
> -
>
> Key: SPARK-44213
> URL: https://issues.apache.org/jira/browse/SPARK-44213
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: enabled.png, screenshot-1.png
>
>
> {code:sql}
> create table tbl using parquet as select t1.id from range(10) as t1 join 
> range(100) as t2 on t1.id = t2.id;
> {code}
> Enabled:
>  !enabled.png! 
> Disabled:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44214) Add driver log live UI for K8s environment

2023-06-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44214:
--
Summary: Add driver log live UI for K8s environment  (was: Add driver log 
UI for K8s environment)

> Add driver log live UI for K8s environment
> --
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44214) Add driver log UI for K8s environment

2023-06-27 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44214:
-

 Summary: Add driver log UI for K8s environment
 Key: SPARK-44214
 URL: https://issues.apache.org/jira/browse/SPARK-44214
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Spark Core, Web UI
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737768#comment-17737768
 ] 

Yuming Wang commented on SPARK-44213:
-

cc [~linhongliu-db]

> CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
> -
>
> Key: SPARK-44213
> URL: https://issues.apache.org/jira/browse/SPARK-44213
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: enabled.png, screenshot-1.png
>
>
> {code:sql}
> create table tbl using parquet as select t1.id from range(10) as t1 join 
> range(100) as t2 on t1.id = t2.id;
> {code}
> Enabled:
>  !enabled.png! 
> Disabled:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44213:

Description: 
{code:sql}
create table tbl using parquet as select t1.id from range(10) as t1 join 
range(100) as t2 on t1.id = t2.id;
{code}
Enabled:
 !enabled.png! 
Disabled:
 !screenshot-1.png! 

  was:
{code:sql}
create table tbl using parquet as select t1.id from range(10) as t1 join 
range(100) as t2 on t1.id = t2.id;
{code}
Enabled:

Disabled:



> CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
> -
>
> Key: SPARK-44213
> URL: https://issues.apache.org/jira/browse/SPARK-44213
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: enabled.png, screenshot-1.png
>
>
> {code:sql}
> create table tbl using parquet as select t1.id from range(10) as t1 join 
> range(100) as t2 on t1.id = t2.id;
> {code}
> Enabled:
>  !enabled.png! 
> Disabled:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44213:

Attachment: screenshot-1.png

> CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
> -
>
> Key: SPARK-44213
> URL: https://issues.apache.org/jira/browse/SPARK-44213
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: enabled.png, screenshot-1.png
>
>
> {code:sql}
> create table tbl using parquet as select t1.id from range(10) as t1 join 
> range(100) as t2 on t1.id = t2.id;
> {code}
> Enabled:
> Disabled:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44213:
---

 Summary: CTAS missing the child info on UI when 
groupSQLSubExecutionEnabled is enabled
 Key: SPARK-44213
 URL: https://issues.apache.org/jira/browse/SPARK-44213
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 3.4.0
Reporter: Yuming Wang
 Attachments: enabled.png, screenshot-1.png

{code:sql}
create table tbl using parquet as select t1.id from range(10) as t1 join 
range(100) as t2 on t1.id = t2.id;
{code}
Enabled:

Disabled:




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44213) CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled

2023-06-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44213:

Attachment: enabled.png

> CTAS missing the child info on UI when groupSQLSubExecutionEnabled is enabled
> -
>
> Key: SPARK-44213
> URL: https://issues.apache.org/jira/browse/SPARK-44213
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: enabled.png, screenshot-1.png
>
>
> {code:sql}
> create table tbl using parquet as select t1.id from range(10) as t1 join 
> range(100) as t2 on t1.id = t2.id;
> {code}
> Enabled:
> Disabled:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44171) Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes

2023-06-27 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44171:


Assignee: BingKun Pan

> Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some 
> unused error classes
> -
>
> Key: SPARK-44171
> URL: https://issues.apache.org/jira/browse/SPARK-44171
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44171) Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes

2023-06-27 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44171.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41721
[https://github.com/apache/spark/pull/41721]

> Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some 
> unused error classes
> -
>
> Key: SPARK-44171
> URL: https://issues.apache.org/jira/browse/SPARK-44171
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44182) Use Spark version placeholders in Python and Spark Connect installation docs

2023-06-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44182:
--
Summary: Use Spark version placeholders in Python and Spark Connect 
installation docs  (was: Use Spark 3.5.0 in Python and Spark Connect docs)

> Use Spark version placeholders in Python and Spark Connect installation docs
> 
>
> Key: SPARK-44182
> URL: https://issues.apache.org/jira/browse/SPARK-44182
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44212) Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462

2023-06-27 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737755#comment-17737755
 ] 

Kazuaki Ishizaki commented on SPARK-44212:
--

[https://github.com/apache/spark/pull/41681#pullrequestreview-1496876723|http://example.com]
 is discussing the upgrade of netty.

> Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462
> 
>
> Key: SPARK-44212
> URL: https://issues.apache.org/jira/browse/SPARK-44212
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Raúl Cumplido
>Priority: Major
>
> Hi,
> On the Apache Arrow project we have noticed that our nightly integration 
> tests with spark started failing lately. With some investigation I've noticed 
> that we are defining a different version of the Java netty dependencies. We 
> upgraded to 4.1.94.Final due to the CVE on the title: 
> [https://github.com/advisories/GHSA-6mjq-h674-j845]
> Our PR upgrading the version: [https://github.com/apache/arrow/issues/36209]
> I have opened  an issue on the Apache Arrow repository to try and fix 
> something else on our side but I was wondering if you would want to update 
> the version to solve the CVE.
>  
> Thanks
> Raúl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44212) Upgrade netty dependencies to 4.1.94.Final due to CVE-2023-34462

2023-06-27 Thread Jira
Raúl Cumplido created SPARK-44212:
-

 Summary: Upgrade netty dependencies to 4.1.94.Final due to 
CVE-2023-34462
 Key: SPARK-44212
 URL: https://issues.apache.org/jira/browse/SPARK-44212
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: Raúl Cumplido


Hi,

On the Apache Arrow project we have noticed that our nightly integration tests 
with spark started failing lately. With some investigation I've noticed that we 
are defining a different version of the Java netty dependencies. We upgraded to 
4.1.94.Final due to the CVE on the title: 
[https://github.com/advisories/GHSA-6mjq-h674-j845]

Our PR upgrading the version: [https://github.com/apache/arrow/issues/36209]

I have opened  an issue on the Apache Arrow repository to try and fix something 
else on our side but I was wondering if you would want to update the version to 
solve the CVE.

 

Thanks

Raúl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44211) PySpark: SparkSession.is_stopped

2023-06-27 Thread Alice Sayutina (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Sayutina updated SPARK-44211:
---
Description: Implement SparkConnectClient.is_stopped property to check if 
this session have been closed previously  (was: Implement 
SparkConnectClient.is_closed() method to check if this session have been closed 
previously)

> PySpark: SparkSession.is_stopped
> 
>
> Key: SPARK-44211
> URL: https://issues.apache.org/jira/browse/SPARK-44211
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Major
>
> Implement SparkConnectClient.is_stopped property to check if this session 
> have been closed previously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44211) PySpark: SparkSession.is_stopped

2023-06-27 Thread Alice Sayutina (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Sayutina updated SPARK-44211:
---
Summary: PySpark: SparkSession.is_stopped  (was: PySpark: 
SparkConnectClient.is_closed() method)

> PySpark: SparkSession.is_stopped
> 
>
> Key: SPARK-44211
> URL: https://issues.apache.org/jira/browse/SPARK-44211
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Major
>
> Implement SparkConnectClient.is_closed() method to check if this session have 
> been closed previously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44211) PySpark: SparkConnectClient.is_closed() method

2023-06-27 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-44211:
--

 Summary: PySpark: SparkConnectClient.is_closed() method
 Key: SPARK-44211
 URL: https://issues.apache.org/jira/browse/SPARK-44211
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Alice Sayutina


Implement SparkConnectClient.is_closed() method to check if this session have 
been closed previously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44210) Strengthen type checking and better comply with Connect specifications for `levenshtein` function

2023-06-27 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44210:
---

 Summary: Strengthen type checking and better comply with Connect 
specifications for `levenshtein` function
 Key: SPARK-44210
 URL: https://issues.apache.org/jira/browse/SPARK-44210
 Project: Spark
  Issue Type: Improvement
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44209) Expose amount of shuffle data available on the node

2023-06-27 Thread Deependra Patel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737611#comment-17737611
 ] 

Deependra Patel commented on SPARK-44209:
-

I will create a pull request for this soon

> Expose amount of shuffle data available on the node
> ---
>
> Key: SPARK-44209
> URL: https://issues.apache.org/jira/browse/SPARK-44209
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 3.4.1
>Reporter: Deependra Patel
>Priority: Trivial
>
> [ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318]
>  doesn't have metrics like 
> "totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per 
> node published by External Shuffle Service.
>  
> Adding these metrics would help in - 
> 1. Deciding if we can decommission the node if no shuffle data present
> 2. Better live monitoring of customer's workload to see if there is skewed 
> shuffle data present on the node



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44209) Expose amount of shuffle data available on the node

2023-06-27 Thread Deependra Patel (Jira)
Deependra Patel created SPARK-44209:
---

 Summary: Expose amount of shuffle data available on the node
 Key: SPARK-44209
 URL: https://issues.apache.org/jira/browse/SPARK-44209
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Affects Versions: 3.4.1
Reporter: Deependra Patel


[ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318]
 doesn't have metrics like 
"totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per 
node published by External Shuffle Service.
 
Adding these metrics would help in - 
1. Deciding if we can decommission the node if no shuffle data present
2. Better live monitoring of customer's workload to see if there is skewed 
shuffle data present on the node



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44206) Dataset.selectExpr scope Session.active

2023-06-27 Thread zhuml (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuml updated SPARK-44206:
--
Summary: Dataset.selectExpr scope Session.active  (was: 
sparkSession.selectExpr scope Session.active)

> Dataset.selectExpr scope Session.active
> ---
>
> Key: SPARK-44206
> URL: https://issues.apache.org/jira/browse/SPARK-44206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhuml
>Priority: Major
>
> {code:java}
> //代码占位符
> val clone = spark.cloneSession()
> clone.conf.set("spark.sql.legacy.interval.enabled", "true")
> clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show()
> clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as 
> b").show() {code}
> The first one can be executed successfully, but the second one cannot be 
> executed successfully.
> Because selectExpr and sql use different sparkSession conf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44208) assign clear error class names for some logic that directly uses exceptions

2023-06-27 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44208:
---

 Summary: assign clear error class names for some logic that 
directly uses exceptions
 Key: SPARK-44208
 URL: https://issues.apache.org/jira/browse/SPARK-44208
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 3.5.0
Reporter: BingKun Pan


include:
 * ALL_FOR_PARTITION_COLUMNS_IS_NOT_ALLOWED
 * INVALID_COLUMN_NAME
 * SPECIFY_BUCKETING_IS_NOT_ALLOWED
 * SPECIFY_PARTITION_IS_NOT_ALLOWED
 * UNSUPPORTED_ADD_FILE.DIRECTORY
 * UNSUPPORTED_ADD_FILE.LOCAL_DIRECTORY



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44207) Where Clause throwing Resolved attribute(s) _metadata#398 missing from ... error

2023-06-27 Thread huizhong xu (Jira)
huizhong xu created SPARK-44207:
---

 Summary: Where Clause throwing Resolved attribute(s) _metadata#398 
missing from ... error
 Key: SPARK-44207
 URL: https://issues.apache.org/jira/browse/SPARK-44207
 Project: Spark
  Issue Type: Question
  Components: SQL
Affects Versions: 3.3.1
Reporter: huizhong xu


i have 2 data frames called lt and rt, both with same schema and only 1 row, 
generated separately by our own curation logic, all the columns are either 
String, boolean or Timestamp, i am trying to compare them, and i am running a 
join on two like this 

var joinedDF = lt.join(rt, "Id")

after that, i am trying to compare them by schema fist and then by  each 
column, how many % of rows are same,

code is kindof like this

for (column <- lt.schema) {
     if (rt.columns.contains(column.name) &&
     column.dataType == rt.schema(column.name).dataType) {

      var matchCount = joinedCount
      if (column.dataType.typeName == "string") {
             matchCount = joinedDF.where((lt(column.name) <=> 
rt(column.name))).count}

else

.

 

on the last line where i am running a where clause, it is throwing an error 
called AnalysisException Resolved attribute(s) _metadata#398 missing from , 
i don't even have this _metadata column anywhere in my dataframe at all

and i searched online people are saying it is a problem of join, i tried to 
change the colunm names in rt and joinedDF, both doesn't work, same error is 
still thrown, can anybody help here



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT

2023-06-27 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737570#comment-17737570
 ] 

Max Gekk commented on SPARK-43438:
--

> 2. when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
> follows:
> 3. when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
> follows:

[~panbingkun] Where is there the difference?

> Should we align the logic of 1 and 2?

Yep, let's try to make it consistent in any case.

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls

2023-06-27 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737563#comment-17737563
 ] 

Yuming Wang commented on SPARK-42260:
-

Remove the target version since 3.4.1 is released.

> Log when the K8s Exec Pods Allocator Stalls
> ---
>
> Key: SPARK-42260
> URL: https://issues.apache.org/jira/browse/SPARK-42260
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
>
> Sometimes if the K8s APIs are being slow the ExecutorPods allocator can stall 
> and it would be good for us to log this (and how long we've stalled for) so 
> folks can tell more clearly why Spark is unable to reach the desired target 
> number of executors.
>  
> This is _somewhat_ related to SPARK-36664 which logs the time spent waiting 
> for executor allocation but goes a step further for K8s and logs when we've 
> stalled because we have too many pending pods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls

2023-06-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-42260:

Target Version/s:   (was: 3.4.1)

> Log when the K8s Exec Pods Allocator Stalls
> ---
>
> Key: SPARK-42260
> URL: https://issues.apache.org/jira/browse/SPARK-42260
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
>
> Sometimes if the K8s APIs are being slow the ExecutorPods allocator can stall 
> and it would be good for us to log this (and how long we've stalled for) so 
> folks can tell more clearly why Spark is unable to reach the desired target 
> number of executors.
>  
> This is _somewhat_ related to SPARK-36664 which logs the time spent waiting 
> for executor allocation but goes a step further for K8s and logs when we've 
> stalled because we have too many pending pods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44025) CSV Table Read Error with CharType(length) column

2023-06-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44025:

Target Version/s:   (was: 3.4.1)

> CSV Table Read Error with CharType(length) column
> -
>
> Key: SPARK-44025
> URL: https://issues.apache.org/jira/browse/SPARK-44025
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
> Environment: {{apache/spark:v3.4.0 image}}
>Reporter: Fengyu Cao
>Priority: Major
>
> Problem:
>  # read a CSV format table
>  # table has a `CharType(length)` column
>  # read table failed with Exception:  `org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor 
> 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).`
>  
> reproduce with official image:
>  # {{docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql}}
>  # {{CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV 
> OPTIONS ('header' = 'true', 'sep' = ';') LOCATION 
> "/opt/spark/examples/src/main/resources/people.csv";}}
>  # SELECT * FROM csv_bug;
>  # ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44206) sparkSession.selectExpr scope Session.active

2023-06-27 Thread zhuml (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuml updated SPARK-44206:
--
Summary: sparkSession.selectExpr scope Session.active  (was: 
sparkSession.selectExpr use Session.active)

> sparkSession.selectExpr scope Session.active
> 
>
> Key: SPARK-44206
> URL: https://issues.apache.org/jira/browse/SPARK-44206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhuml
>Priority: Major
>
> {code:java}
> //代码占位符
> val clone = spark.cloneSession()
> clone.conf.set("spark.sql.legacy.interval.enabled", "true")
> clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show()
> clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as 
> b").show() {code}
> The first one can be executed successfully, but the second one cannot be 
> executed successfully.
> Because selectExpr and sql use different sparkSession conf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44206) sparkSession.selectExpr use Session.active

2023-06-27 Thread zhuml (Jira)
zhuml created SPARK-44206:
-

 Summary: sparkSession.selectExpr use Session.active
 Key: SPARK-44206
 URL: https://issues.apache.org/jira/browse/SPARK-44206
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: zhuml


{code:java}
//代码占位符
val clone = spark.cloneSession()
clone.conf.set("spark.sql.legacy.interval.enabled", "true")
clone.sql("select '2023-01-01'+ INTERVAL 1 YEAR as b").show()
clone.sql("select '2023-01-01' as a").selectExpr("a + INTERVAL 1 YEAR as 
b").show() {code}
The first one can be executed successfully, but the second one cannot be 
executed successfully.

Because selectExpr and sql use different sparkSession conf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44204) Add missing recordHiveCall for getPartitionNames

2023-06-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44204.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41756
[https://github.com/apache/spark/pull/41756]

> Add missing recordHiveCall for getPartitionNames
> 
>
> Key: SPARK-44204
> URL: https://issues.apache.org/jira/browse/SPARK-44204
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44204) Add missing recordHiveCall for getPartitionNames

2023-06-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44204:


Assignee: Cheng Pan

> Add missing recordHiveCall for getPartitionNames
> 
>
> Key: SPARK-44204
> URL: https://issues.apache.org/jira/browse/SPARK-44204
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44192) Support R 4.3.1

2023-06-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44192:
-

Assignee: Yang Jie

> Support R 4.3.1
> ---
>
> Key: SPARK-44192
> URL: https://issues.apache.org/jira/browse/SPARK-44192
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://cran.r-project.org/doc/manuals/r-release/NEWS.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44192) Support R 4.3.1

2023-06-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44192.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41754
[https://github.com/apache/spark/pull/41754]

> Support R 4.3.1
> ---
>
> Key: SPARK-44192
> URL: https://issues.apache.org/jira/browse/SPARK-44192
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>
> https://cran.r-project.org/doc/manuals/r-release/NEWS.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40513) SPIP: Support Docker Official Image for Spark

2023-06-27 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-40513:
---

Assignee: Yikun Jiang

> SPIP: Support Docker Official Image for Spark
> -
>
> Key: SPARK-40513
> URL: https://issues.apache.org/jira/browse/SPARK-40513
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.5.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>  Labels: SPIP
>
> This SPIP is proposed to add [Docker Official 
> Image(DOI)|https://github.com/docker-library/official-images] to ensure the 
> Spark Docker images meet the quality standards for Docker images, to provide 
> these Docker images for users who want to use Apache Spark via Docker image.
> There are also several [Apache projects that release the Docker Official 
> Images|https://hub.docker.com/search?q=apache_filter=official], such 
> as: [flink|https://hub.docker.com/_/flink], 
> [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], 
> [zookeeper|https://hub.docker.com/_/zookeeper], 
> [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). 
> From the huge download statistics, we can see the real demands of users, and 
> from the support of other apache projects, we should also be able to do it.
> After support:
>  * The Dockerfile will still be maintained by the Apache Spark community and 
> reviewed by Docker.
>  * The images will be maintained by the Docker community to ensure the 
> quality standards for Docker images of the Docker community.
> It will also reduce the extra docker images maintenance effort (such as 
> frequently rebuilding, image security update) of the Apache Spark community.
>  
> SPIP DOC: 
> [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o]
> DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40513) SPIP: Support Docker Official Image for Spark

2023-06-27 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40513.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 34
[https://github.com/apache/spark-docker/pull/34]

> SPIP: Support Docker Official Image for Spark
> -
>
> Key: SPARK-40513
> URL: https://issues.apache.org/jira/browse/SPARK-40513
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.5.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>  Labels: SPIP
> Fix For: 3.5.0
>
>
> This SPIP is proposed to add [Docker Official 
> Image(DOI)|https://github.com/docker-library/official-images] to ensure the 
> Spark Docker images meet the quality standards for Docker images, to provide 
> these Docker images for users who want to use Apache Spark via Docker image.
> There are also several [Apache projects that release the Docker Official 
> Images|https://hub.docker.com/search?q=apache_filter=official], such 
> as: [flink|https://hub.docker.com/_/flink], 
> [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], 
> [zookeeper|https://hub.docker.com/_/zookeeper], 
> [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). 
> From the huge download statistics, we can see the real demands of users, and 
> from the support of other apache projects, we should also be able to do it.
> After support:
>  * The Dockerfile will still be maintained by the Apache Spark community and 
> reviewed by Docker.
>  * The images will be maintained by the Docker community to ensure the 
> quality standards for Docker images of the Docker community.
> It will also reduce the extra docker images maintenance effort (such as 
> frequently rebuilding, image security update) of the Apache Spark community.
>  
> SPIP DOC: 
> [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o]
> DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44175) Remove useless lib64 path link in dockerfile

2023-06-27 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-44175.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark-docker/pull/48

> Remove useless lib64 path link in dockerfile
> 
>
> Key: SPARK-44175
> URL: https://issues.apache.org/jira/browse/SPARK-44175
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 3.5.0
>Reporter: Yikun Jiang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org