[jira] [Created] (SPARK-46088) Add a self-contained example about creating dataframe from jdbc

2023-11-23 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46088:
---

 Summary: Add a self-contained example about creating dataframe 
from jdbc
 Key: SPARK-46088
 URL: https://issues.apache.org/jira/browse/SPARK-46088
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator

2023-11-23 Thread Vara Bonthu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vara Bonthu updated SPARK-46054:

Description: 
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.

*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially reported 45.

 * The primary issue at hand is that this project resides under the 
GoogleCloudPlatform GitHub organization and is exclusively moderated by a 
Google employee. Concerns have been raised by numerous users and customers 
regarding the maintenance of the repository.

 * The existing Google maintainers are constrained by limitations in terms of 
time and support, which negatively impacts both the project and its user 
community.

 

*Recent Developments:*
 * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the Apple 
infrastructure team engaged in discussions with the Google's team, specifically 
with Marcin Wielgus. They expressed their interest in contributing the project 
to either the Kubeflow or Apache Spark community.

 * *{color:#00875a}Marcin from Google confirmed their willingness to donate the 
project to either of these communities.{color}*

 * An adoption process has been initiated by the Kubeflow project under CNCF, 
as documented in the following thread: [Link to the 
thread|https://github.com/kubeflow/community/issues/648].

 

*Primary Goal:*
 * The primary goal is to ensure the collaborative support and adoption of 
Google's Spark Operator by the Apache Spark , thereby avoiding the development 
of redundant tools and reducing confusion among users.

*Next Steps:*
 * *Meeting with Apache Spark Working Group Maintainers:* We propose arranging 
a meeting with the Apache Spark working group maintainers to delve deeper into 
this matter, address any questions or concerns they may have, and collectively 
work towards a decision.

 * *Establish a New Working Group:* Upon reaching an agreement, we intend to 
create a new working group comprising members from diverse organizations who 
are willing to contribute and collaborate on this initiative.

 * *Repository Transfer:* Our plan involves transferring the project repository 
from Google's organization to either the Apache or Kubeflow organization, 
aligning with the chosen community.

 * *Roadmap Development:* We will formulate a new roadmap that encompasses 
immediate issue resolution and a long-term design strategy aimed at enhancing 
performance, scalability, and security for this tool.

 
We believe that working towards one Spark Operator will benefit the Apache 
Spark community and address the current maintenance challenges. Your feedback 
and support in this matter are highly valued. Let's collaborate to ensure a 
robust and well-maintained Spark Operator for the Apache Spark community's 
benefit.

*Community members are encouraged to leave their comments or give a thumbs-up 
to express their support for adopting Google's Spark Operator as the official 
Apache Spark operator.*

 

*Proposed Authors*

Vara Bonthu (AWS)

Marcin Wielgus (Google)

 

  was:
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.

*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially reported 45.

 * The primary issue at hand is that this project resides under the 
GoogleCloudPlatform GitHub 

[jira] [Resolved] (SPARK-46084) Refactor data type casting operation for Categorical type.

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46084.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43993
[https://github.com/apache/spark/pull/43993]

> Refactor data type casting operation for Categorical type.
> --
>
> Key: SPARK-46084
> URL: https://issues.apache.org/jira/browse/SPARK-46084
> Project: Spark
>  Issue Type: Bug
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Using official API for better performance and readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46073) Remove the special resolution of UnresolvedNamespace for certain commands

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46073.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43980
[https://github.com/apache/spark/pull/43980]

> Remove the special resolution of UnresolvedNamespace for certain commands
> -
>
> Key: SPARK-46073
> URL: https://issues.apache.org/jira/browse/SPARK-46073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46083:
-

Assignee: Hyukjin Kwon

> Make SparkNoSuchElementException as a canonical error API
> -
>
> Key: SPARK-46083
> URL: https://issues.apache.org/jira/browse/SPARK-46083
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
> It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46083.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43992
[https://github.com/apache/spark/pull/43992]

> Make SparkNoSuchElementException as a canonical error API
> -
>
> Key: SPARK-46083
> URL: https://issues.apache.org/jira/browse/SPARK-46083
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
> It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46087) Sync PySpark dependencies in docs and dev requirements

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46087:
---
Labels: pull-request-available  (was: )

> Sync PySpark dependencies in docs and dev requirements
> --
>
> Key: SPARK-46087
> URL: https://issues.apache.org/jira/browse/SPARK-46087
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> There is inconsistency between docs and dev env. We should sync them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46087) Sync PySpark dependencies in docs and dev requirements

2023-11-23 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46087:
---

 Summary: Sync PySpark dependencies in docs and dev requirements
 Key: SPARK-46087
 URL: https://issues.apache.org/jira/browse/SPARK-46087
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


There is inconsistency between docs and dev env. We should sync them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789328#comment-17789328
 ] 

Marc Le Bihan edited comment on SPARK-45311 at 11/24/23 5:39 AM:
-

A breakpoint in {{catalogueJeuxDeDonnees()}} test,  at : 
{{org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116}}
 

!JavaTypeInference_116.png!

The caller of it :

!sparkIssue_02.png!

{{OMD_ID}} is a generic, compatible with {{{}CatalogueId{}}}.

 


was (Author: mlebihan):
A breakpoint in :
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
 

!JavaTypeInference_116.png!

The caller of it :

!sparkIssue_02.png!

{{OMD_ID}} is a generic, compatible with {{{}CatalogueId{}}}.

 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Attachments: JavaTypeInference_116.png, sparkIssue_02.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789328#comment-17789328
 ] 

Marc Le Bihan edited comment on SPARK-45311 at 11/24/23 5:39 AM:
-

A breakpoint in {{catalogueJeuxDeDonnees()}} test,  at :

{{org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116}}
 

!JavaTypeInference_116.png!

The caller of it :

!sparkIssue_02.png!

{{OMD_ID}} is a generic, compatible with {{{}CatalogueId{}}}.

 


was (Author: mlebihan):
A breakpoint in {{catalogueJeuxDeDonnees()}} test,  at : 
{{org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116}}
 

!JavaTypeInference_116.png!

The caller of it :

!sparkIssue_02.png!

{{OMD_ID}} is a generic, compatible with {{{}CatalogueId{}}}.

 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Attachments: JavaTypeInference_116.png, sparkIssue_02.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45356) Adjust the Maven daily test configuration

2023-11-23 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45356:
-
Summary: Adjust the Maven daily test configuration  (was: Optimize the 
Maven daily test configuration)

> Adjust the Maven daily test configuration
> -
>
> Key: SPARK-45356
> URL: https://issues.apache.org/jira/browse/SPARK-45356
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789328#comment-17789328
 ] 

Marc Le Bihan commented on SPARK-45311:
---

A breakpoint in :
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
 

!JavaTypeInference_116.png!

The caller of it :

!sparkIssue_02.png!

{{OMD_ID}} is a generic, compatible with {{{}CatalogueId{}}}.

 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Attachments: JavaTypeInference_116.png, sparkIssue_02.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Le Bihan updated SPARK-45311:
--
Attachment: sparkIssue_02.png

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Attachments: JavaTypeInference_116.png, sparkIssue_02.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Le Bihan updated SPARK-45311:
--
Attachment: JavaTypeInference_116.png

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
> Attachments: JavaTypeInference_116.png
>
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789321#comment-17789321
 ] 

Marc Le Bihan edited comment on SPARK-45311 at 11/24/23 5:15 AM:
-

I've updated the  [https://gitlab.com/territoirevif/minimal-tests-spark-issue] 
testing project accordingly.
{code:java}
---
Test set: 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT
---
Tests run: 6, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 8.715 s <<< 
FAILURE! - in 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT
catalogueJeuxDeDonneesEtRessources  Time elapsed: 1.498 s  <<< ERROR!
java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class 
[Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and 
[Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap')
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
    at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
    at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
    at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
    at org.apache.spark.sql.Encoders.bean(Encoders.scala)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvJeuxDeDonneesDataset.catalogueDataset(CatalogueDatagouvJeuxDeDonneesDataset.java:100)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvJeuxDeDonneesDataset.catalogueDataset(CatalogueDatagouvJeuxDeDonneesDataset.java:88)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonneesEtRessources(CatalogueDatagouvIT.java:161)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
    at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
    at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
    at 

[jira] [Updated] (SPARK-46058) [CORE] Add separate flag for privateKeyPassword

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46058:
---
Labels: pull-request-available  (was: )

> [CORE] Add separate flag for privateKeyPassword
> ---
>
> Key: SPARK-46058
> URL: https://issues.apache.org/jira/browse/SPARK-46058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> Right now with config inheritance we support:
>  * JKS with password A, PEM with password B
>  * JKS with no password, PEM with password A
>  * JKS and PEM with no password
>  
> But we do not support the case where JKS has a password and PEM does not. If 
> we set keyPassword we will attempt to use it, and cannot set 
> `spark.ssl.rpc.keyPassword` to null. So let's make it a separate flag as the 
> easiest workaround.
>  
> This was noticed while migrating some existing deployments to the RPC SSL 
> support where we use openssl support for RPC and use a key with no password



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Marc Le Bihan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789321#comment-17789321
 ] 

Marc Le Bihan commented on SPARK-45311:
---

I've updated the  [https://gitlab.com/territoirevif/minimal-tests-spark-issue] 
testing project accordingly.

I didn't go further yet in debugging, because I have a trouble to attach 
snapshot sources to this project. From it, IntelliJ sees the sources roots in 
the second project where spark 3.4.2-SNAPSHOT is compiled and offer to attach 
them, but when it's accepted, it has no effect. I'm left with 
{{JavaTypeInference.scala}} and other having no sources (methods with = ???). I 
take a look over Stackoverflow, if someone has a clue about that.
{code:java}
---
Test set: 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT
---
Tests run: 6, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 8.715 s <<< 
FAILURE! - in 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT
catalogueJeuxDeDonneesEtRessources  Time elapsed: 1.498 s  <<< ERROR!
java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class 
[Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and 
[Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap')
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:116)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.$anonfun$encoderFor$1(JavaTypeInference.scala:140)
    at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:929)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:138)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:60)
    at 
org.apache.spark.sql.catalyst.JavaTypeInference$.encoderFor(JavaTypeInference.scala:53)
    at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:62)
    at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179)
    at org.apache.spark.sql.Encoders.bean(Encoders.scala)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvJeuxDeDonneesDataset.catalogueDataset(CatalogueDatagouvJeuxDeDonneesDataset.java:100)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvJeuxDeDonneesDataset.catalogueDataset(CatalogueDatagouvJeuxDeDonneesDataset.java:88)
    at 
fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonneesEtRessources(CatalogueDatagouvIT.java:161)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
    at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
    at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at 

[jira] [Created] (SPARK-46086) Fix potential non-atomic operation issues in ReloadingX509TrustManager

2023-11-23 Thread Yang Jie (Jira)
Yang Jie created SPARK-46086:


 Summary: Fix potential non-atomic operation issues in 
ReloadingX509TrustManager
 Key: SPARK-46086
 URL: https://issues.apache.org/jira/browse/SPARK-46086
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46086) Fix potential non-atomic operation issues in ReloadingX509TrustManager

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46086:
---
Labels: pull-request-available  (was: )

> Fix potential non-atomic operation issues in ReloadingX509TrustManager
> --
>
> Key: SPARK-46086
> URL: https://issues.apache.org/jira/browse/SPARK-46086
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45568) WholeStageCodegenSparkSubmitSuite flakiness

2023-11-23 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45568:
---
Component/s: Tests

> WholeStageCodegenSparkSubmitSuite flakiness
> ---
>
> Key: SPARK-45568
> URL: https://issues.apache.org/jira/browse/SPARK-45568
> Project: Spark
>  Issue Type: Test
>  Components: Tests, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect

2023-11-23 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45751:
---
Component/s: Documentation

> The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the 
> official website is incorrect
> 
>
> Key: SPARK-45751
> URL: https://issues.apache.org/jira/browse/SPARK-45751
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>
> Attachments: the default value.png, the value on the website.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45791) Rename `SparkConnectSessionHodlerSuite.scala` to `SparkConnectSessionHolderSuite.scala`

2023-11-23 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45791:
---
Component/s: Tests

> Rename `SparkConnectSessionHodlerSuite.scala` to 
> `SparkConnectSessionHolderSuite.scala`
> ---
>
> Key: SPARK-45791
> URL: https://issues.apache.org/jira/browse/SPARK-45791
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46016) Fix pandas API support list properly

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46016:
---
Labels: pull-request-available  (was: )

> Fix pandas API support list properly
> 
>
> Key: SPARK-46016
> URL: https://issues.apache.org/jira/browse/SPARK-46016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46016) Fix pandas API support list properly

2023-11-23 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46016:

Summary: Fix pandas API support list properly  (was: Correct Supported 
pandas API list)

> Fix pandas API support list properly
> 
>
> Key: SPARK-46016
> URL: https://issues.apache.org/jira/browse/SPARK-46016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46016) Correct Supported pandas API list

2023-11-23 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46016:

Summary: Correct Supported pandas API list  (was: Fix the script for 
Supported pandas API to work properly)

> Correct Supported pandas API list
> -
>
> Key: SPARK-46016
> URL: https://issues.apache.org/jira/browse/SPARK-46016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46082:


Assignee: Hyukjin Kwon

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46082.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43991
[https://github.com/apache/spark/pull/43991]

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46067) Upgrade commons-compress to 1.25.0

2023-11-23 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46067.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43974
[https://github.com/apache/spark/pull/43974]

> Upgrade commons-compress to 1.25.0
> --
>
> Key: SPARK-46067
> URL: https://issues.apache.org/jira/browse/SPARK-46067
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://commons.apache.org/proper/commons-compress/changes-report.html#a1.25.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46067) Upgrade commons-compress to 1.25.0

2023-11-23 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46067:


Assignee: Yang Jie

> Upgrade commons-compress to 1.25.0
> --
>
> Key: SPARK-46067
> URL: https://issues.apache.org/jira/browse/SPARK-46067
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> https://commons.apache.org/proper/commons-compress/changes-report.html#a1.25.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46085:
---
Labels: pull-request-available  (was: )

> Dataset.groupingSets in Scala Spark Connect client
> --
>
> Key: SPARK-46085
> URL: https://issues.apache.org/jira/browse/SPARK-46085
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Scala Spark Connect client for SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46085:


 Summary: Dataset.groupingSets in Scala Spark Connect client
 Key: SPARK-46085
 URL: https://issues.apache.org/jira/browse/SPARK-46085
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Scala Spark Connect client for SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46084) Refactor data type casting operation for Categorical type.

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46084:
---
Labels: pull-request-available  (was: )

> Refactor data type casting operation for Categorical type.
> --
>
> Key: SPARK-46084
> URL: https://issues.apache.org/jira/browse/SPARK-46084
> Project: Spark
>  Issue Type: Bug
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Using official API for better performance and readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46080.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43989
[https://github.com/apache/spark/pull/43989]

> Upgrade Cloudpickle to 3.0.0
> 
>
> Key: SPARK-46080
> URL: https://issues.apache.org/jira/browse/SPARK-46080
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> It includes official support of Python 3.12 
> (https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46083:
---
Labels: pull-request-available  (was: )

> Make SparkNoSuchElementException as a canonical error API
> -
>
> Key: SPARK-46083
> URL: https://issues.apache.org/jira/browse/SPARK-46083
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
> It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46080:
-

Assignee: Hyukjin Kwon

> Upgrade Cloudpickle to 3.0.0
> 
>
> Key: SPARK-46080
> URL: https://issues.apache.org/jira/browse/SPARK-46080
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> It includes official support of Python 3.12 
> (https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46084) Refactor data type casting operation for Categorical type.

2023-11-23 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46084:

Description: Using official API for better performance and readability.  
(was: Using official API for better performance.)

> Refactor data type casting operation for Categorical type.
> --
>
> Key: SPARK-46084
> URL: https://issues.apache.org/jira/browse/SPARK-46084
> Project: Spark
>  Issue Type: Bug
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Using official API for better performance and readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46083:


 Summary: Make SparkNoSuchElementException as a canonical error API
 Key: SPARK-46083
 URL: https://issues.apache.org/jira/browse/SPARK-46083
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46084) Refactor data type casting operation for Categorical type.

2023-11-23 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46084:
---

 Summary: Refactor data type casting operation for Categorical type.
 Key: SPARK-46084
 URL: https://issues.apache.org/jira/browse/SPARK-46084
 Project: Spark
  Issue Type: Bug
  Components: PS
Affects Versions: 4.0.0
Reporter: Haejoon Lee


Using official API for better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46082:
---
Labels: pull-request-available  (was: )

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46082:


 Summary: Fix protobuf string representation for Pandas Functions 
API with Spark Connect
 Key: SPARK-46082
 URL: https://issues.apache.org/jira/browse/SPARK-46082
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
df = spark.range(1)
df.mapInPandas(lambda x: x, df.schema)._plan.print()
{code}

prints as below. It should includes functions.

{code}

  
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42891) Implement CoGrouped Map API

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42891:
---
Labels: pull-request-available  (was: )

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46080:
--
Parent: SPARK-45981
Issue Type: Sub-task  (was: Improvement)

> Upgrade Cloudpickle to 3.0.0
> 
>
> Key: SPARK-46080
> URL: https://issues.apache.org/jira/browse/SPARK-46080
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> It includes official support of Python 3.12 
> (https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46081) Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46081:
-

Assignee: Dongjoon Hyun

> Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`
> -
>
> Key: SPARK-46081
> URL: https://issues.apache.org/jira/browse/SPARK-46081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46081) Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46081.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43990
[https://github.com/apache/spark/pull/43990]

> Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`
> -
>
> Key: SPARK-46081
> URL: https://issues.apache.org/jira/browse/SPARK-46081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46079) Install `torch` nightly only at Python 3.12 in Infra docker image

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46079:
-

Assignee: Dongjoon Hyun

> Install `torch` nightly only at Python 3.12 in Infra docker image
> -
>
> Key: SPARK-46079
> URL: https://issues.apache.org/jira/browse/SPARK-46079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46079) Install `torch` nightly only at Python 3.12 in Infra docker image

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46079.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43988
[https://github.com/apache/spark/pull/43988]

> Install `torch` nightly only at Python 3.12 in Infra docker image
> -
>
> Key: SPARK-46079
> URL: https://issues.apache.org/jira/browse/SPARK-46079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46081) Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46081:
---
Labels: pull-request-available  (was: )

> Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`
> -
>
> Key: SPARK-46081
> URL: https://issues.apache.org/jira/browse/SPARK-46081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46080:
---
Labels: pull-request-available  (was: )

> Upgrade Cloudpickle to 3.0.0
> 
>
> Key: SPARK-46080
> URL: https://issues.apache.org/jira/browse/SPARK-46080
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> It includes official support of Python 3.12 
> (https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46081) Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`

2023-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46081:
-

 Summary: Set DEDICATED_JVM_SBT_TESTS in `build_java21.yml`
 Key: SPARK-46081
 URL: https://issues.apache.org/jira/browse/SPARK-46081
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46080:


 Summary: Upgrade Cloudpickle to 3.0.0
 Key: SPARK-46080
 URL: https://issues.apache.org/jira/browse/SPARK-46080
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


It includes official support of Python 3.12 
(https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42905) pyspark.ml.stat.Correlation - Spearman Correlation method giving incorrect and inconsistent results for the same DataFrame if it has huge amount of Ties.

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42905:
---
Labels: correctness pull-request-available  (was: correctness)

> pyspark.ml.stat.Correlation - Spearman Correlation method giving incorrect 
> and inconsistent results for the same DataFrame if it has huge amount of Ties.
> -
>
> Key: SPARK-42905
> URL: https://issues.apache.org/jira/browse/SPARK-42905
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.3.0
>Reporter: dronzer
>Priority: Critical
>  Labels: correctness, pull-request-available
> Attachments: image-2023-03-23-10-51-28-420.png, 
> image-2023-03-23-10-52-11-481.png, image-2023-03-23-10-52-49-392.png, 
> image-2023-03-23-10-53-37-461.png, image-2023-03-23-10-55-26-879.png
>
>
> pyspark.ml.stat.Correlation
> Following is the Scenario where the Correlation function fails for giving 
> correct Spearman Coefficient Results.
> Tested E.g -> Spark DataFrame has 2 columns A and B.
> !image-2023-03-23-10-55-26-879.png|width=562,height=162!
> Column A has 3 Distinct Values and total of 108Million rows
> Column B has 4 Distinct Values and total of 108Million rows
> If I Calculate the correlation for this DataFrame in Python Pandas DF.corr, 
> it gives the correct answer even if i run the same code multiple times the 
> same answer is produced. (Each column has only 3-4 distinct values)
> !image-2023-03-23-10-53-37-461.png|width=468,height=287!
>  
> Coming to Spark and using Spearman Correlation produces a *different results* 
> for the *same dataframe* on multiple runs. (see below) (each column in this 
> df has only 3-4 distinct values)
> !image-2023-03-23-10-52-49-392.png|width=516,height=322!
>  
> Basically in python Pandas Df.corr it gives same results on same dataframe on 
> multiple runs which is expected behaviour. However, in Spark using the same 
> data it gives different result, moreover running the same cell with same data 
> multiple times produces different results meaning the output is inconsistent.
> Coming to data the only observation I could conclude is Ties in data. (Only 
> 3-4 Distinct values over 108M Rows.) This scenario is not handled in Spark 
> Correlation method as the same data when used in python using df.corr 
> produces consistent results.
> The only Workaround we could find to get consistent and the same output as 
> from python in Spark is by using Pandas UDF as shown below:
> !image-2023-03-23-10-52-11-481.png|width=518,height=111!
> !image-2023-03-23-10-51-28-420.png|width=509,height=270!
>  
> We also tried pyspark.pandas.DataFrame .corr method and it produces incorrect 
> and inconsistent results for this case too.
> Only PandasUDF seems to provide consistent results.
>  
> Another point to note is : If i add some random noise to the data, which will 
> inturn increase the distinct values in the data. It again gives consistent 
> results for any runs. Which makes me believe that the Python version handles 
> ties correctly and gives consistent results no matter how many ties exist. 
> However, pyspark method is somehow not able to handle many ties in data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46079) Install `torch` nightly only at Python 3.12 in Infra docker image

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46079:
---
Labels: pull-request-available  (was: )

> Install `torch` nightly only at Python 3.12 in Infra docker image
> -
>
> Key: SPARK-46079
> URL: https://issues.apache.org/jira/browse/SPARK-46079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46079) Install `torch` nightly only at Python 3.12 in Infra docker image

2023-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46079:
-

 Summary: Install `torch` nightly only at Python 3.12 in Infra 
docker image
 Key: SPARK-46079
 URL: https://issues.apache.org/jira/browse/SPARK-46079
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46076.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43986
[https://github.com/apache/spark/pull/43986]

> Remove `unittest` deprecated alias usage for Python 3.12
> 
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46076:
--
Summary: Remove `unittest` deprecated alias usage for Python 3.12  (was: 
Remove `unittest` alias usage for Python 3.12)

> Remove `unittest` deprecated alias usage for Python 3.12
> 
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46076) Remove `unittest` alias usage for Python 3.12

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46076:
-

Assignee: Dongjoon Hyun

> Remove `unittest` alias usage for Python 3.12
> -
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46078) Upgrade `pytorch` for Python 3.12

2023-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46078:
-

 Summary: Upgrade `pytorch` for Python 3.12
 Key: SPARK-46078
 URL: https://issues.apache.org/jira/browse/SPARK-46078
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun


https://github.com/pytorch/pytorch/issues/110436



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46076) Remove `unittest` alias usage for Python 3.12

2023-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46076:
--
Parent: SPARK-45981
Issue Type: Sub-task  (was: Test)

> Remove `unittest` alias usage for Python 3.12
> -
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46077) Error in postgresql when pushing down filter by timestamp_ntz field

2023-11-23 Thread Marina Krasilnikova (Jira)
Marina Krasilnikova created SPARK-46077:
---

 Summary: Error in postgresql when pushing down filter by 
timestamp_ntz field
 Key: SPARK-46077
 URL: https://issues.apache.org/jira/browse/SPARK-46077
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Marina Krasilnikova


code to reproduce:

SparkSession sparkSession = SparkSession
.builder()
.appName("test-app")
.master("local[*]")
.config("spark.sql.timestampType", "TIMESTAMP_NTZ")
.getOrCreate();

String url = "...";

String catalogPropPrefix = "spark.sql.catalog.myc";
sparkSession.conf().set(catalogPropPrefix, JDBCTableCatalog.class.getName());
sparkSession.conf().set(catalogPropPrefix + ".url", url);

Map options = new HashMap<>();
options.put("driver", "org.postgresql.Driver");
// options.put("pushDownPredicate", "false");  it works fine if  this line is 
uncommented

Dataset dataset = sparkSession.read()
.options(options)
.table("myc.demo.`My table`");

dataset.createOrReplaceTempView("view1");
String sql = "select * from view1 where `my date` = '2021-04-01 00:00:00'";
Dataset result = sparkSession.sql(sql);
result.show();
result.printSchema();

 

Field `my date` is of type timestamp. This code results in 
org.postgresql.util.PSQLException  syntax error , because resulting sql  lacks 
straight quotes in filter condition. (Something like this  "my date" = 
2021-04-01T00:00)

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46076) Remove `unittest` alias usage for Python 3.12

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46076:
---
Labels: pull-request-available  (was: )

> Remove `unittest` alias usage for Python 3.12
> -
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46076) Remove `unittest` alias usage for Python 3.12

2023-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46076:
-

 Summary: Remove `unittest` alias usage for Python 3.12
 Key: SPARK-46076
 URL: https://issues.apache.org/jira/browse/SPARK-46076
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46075) Refactor SparkConnectSessionManager to not use guava cache

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46075:
---
Labels: pull-request-available  (was: )

> Refactor SparkConnectSessionManager to not use guava cache
> --
>
> Key: SPARK-46075
> URL: https://issues.apache.org/jira/browse/SPARK-46075
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>
> Guava cache gives limited control over session eviction. For example, there 
> can't be more complex policies of session eviction. Refactor it to be more 
> similar to SparkConnectExecutionManager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37660) Spark-3.2.0 Fetch Hbase Data not working

2023-11-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789235#comment-17789235
 ] 

Istvan Toth commented on SPARK-37660:
-

I have encountered this.

There are several issues:
- Hbase returns the HBase Region size, instead of the split size, which may not 
be the same.
- HBase rounds the size to Megabytes.
- Even if it didn't round to Megabytes, I suspect that it only tallies HFiles, 
so for new tables the size may still be zero until the first HFile is written.

> Spark-3.2.0 Fetch Hbase Data not working
> 
>
> Key: SPARK-37660
> URL: https://issues.apache.org/jira/browse/SPARK-37660
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
> Environment: Hadoop version : hadoop-2.9.2
> HBase version : hbase-2.2.5
> Spark version : spark-3.2.0-bin-without-hadoop
> java version : jdk1.8.0_151
> scala version : scala-sdk-2.12.10
> os version : Red Hat Enterprise Linux Server release 6.6 (Santiago)
>Reporter: Bhavya Raj Sharma
>Priority: Major
>
> Below is the sample code snipet that is used to fetch data from hbase. This 
> used to work fine with spark-3.1.1
> However after upgrading to psark-3.2.0 it is not working, The issue is it is 
> not throwing any exception, it just don't fill RDD.
>  
> {code:java}
>  
>    def getInfo(sc: SparkContext, startDate:String, cachingValue: Int, 
> sparkLoggerParams: SparkLoggerParams, zkIP: String, zkPort: String): 
> RDD[(String)] = {{
> val scan = new Scan
>     scan.addFamily("family")
>     scan.addColumn("family","time")
>     val rdd = getHbaseConfiguredRDDFromScan(sc, zkIP, zkPort, "myTable", 
> scan, cachingValue, sparkLoggerParams)
>     val output: RDD[(String)] = rdd.map { row =>
>       (Bytes.toString(row._2.getRow))
>     }
>     output
>   }
>  
> def getHbaseConfiguredRDDFromScan(sc: SparkContext, zkIP: String, zkPort: 
> String, tableName: String,
>                                     scan: Scan, cachingValue: Int, 
> sparkLoggerParams: SparkLoggerParams): NewHadoopRDD[ImmutableBytesWritable, 
> Result] = {
>     scan.setCaching(cachingValue)
>     val scanString = 
> Base64.getEncoder.encodeToString(org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(scan).toByteArray)
>     val hbaseContext = new SparkHBaseContext(zkIP, zkPort)
>     val hbaseConfig = hbaseContext.getConfiguration()
>     hbaseConfig.set(TableInputFormat.INPUT_TABLE, tableName)
>     hbaseConfig.set(TableInputFormat.SCAN, scanString)
>     sc.newAPIHadoopRDD(
>       hbaseConfig,
>       classOf[TableInputFormat],
>       classOf[ImmutableBytesWritable], classOf[Result]
>     ).asInstanceOf[NewHadoopRDD[ImmutableBytesWritable, Result]]
>   }
>  
> {code}
>  
> If we fetch with using scan directly without using newAPIHadoopRDD, it works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-23 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-46011:
--
Epic Link: SPARK-43754

> Spark Connect session heartbeat / keepalive
> ---
>
> Key: SPARK-46011
> URL: https://issues.apache.org/jira/browse/SPARK-46011
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-23 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-46011.
---
Resolution: Won't Fix

Decided to not add this at this point.

> Spark Connect session heartbeat / keepalive
> ---
>
> Key: SPARK-46011
> URL: https://issues.apache.org/jira/browse/SPARK-46011
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46075) Refactor SparkConnectSessionManager to not use guava cache

2023-11-23 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-46075:
-

 Summary: Refactor SparkConnectSessionManager to not use guava cache
 Key: SPARK-46075
 URL: https://issues.apache.org/jira/browse/SPARK-46075
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski


Guava cache gives limited control over session eviction. For example, there 
can't be more complex policies of session eviction. Refactor it to be more 
similar to SparkConnectExecutionManager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46074:
---
Labels: pull-request-available  (was: )

> [CONNECT][SCALA] Insufficient details in error when a UDF fails
> ---
>
> Key: SPARK-46074
> URL: https://issues.apache.org/jira/browse/SPARK-46074
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when a UDF fails the connect client does not receive the actual 
> error that caused the failure. 
> As an example, the error message looks like -
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: 
> grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to 
> stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
> task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). 
> SQLSTATE: 39000 {code}
> In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-23 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-46074:


 Summary: [CONNECT][SCALA] Insufficient details in error when a UDF 
fails
 Key: SPARK-46074
 URL: https://issues.apache.org/jira/browse/SPARK-46074
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Niranjan Jayakar


Currently, when a UDF fails the connect client does not receive the actual 
error that caused the failure. 

As an example, the error message looks like -
{code:java}
Exception in thread "main" org.apache.spark.SparkException: 
grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage 
failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 
in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). SQLSTATE: 
39000 {code}
In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46070) Precompile regex patterns in SparkDateTimeUtils.getZoneId

2023-11-23 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-46070:


Assignee: Tanel Kiis

> Precompile regex patterns in SparkDateTimeUtils.getZoneId
> -
>
> Key: SPARK-46070
> URL: https://issues.apache.org/jira/browse/SPARK-46070
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
>  Labels: pull-request-available
>
> SparkDateTimeUtils.getZoneId uses String.replaceFirst method, that internally 
> does a Pattern.compile(regex). This method is called once for each dataset 
> row when using functions like from_utc_timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46070) Precompile regex patterns in SparkDateTimeUtils.getZoneId

2023-11-23 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46070.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43976
[https://github.com/apache/spark/pull/43976]

> Precompile regex patterns in SparkDateTimeUtils.getZoneId
> -
>
> Key: SPARK-46070
> URL: https://issues.apache.org/jira/browse/SPARK-46070
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SparkDateTimeUtils.getZoneId uses String.replaceFirst method, that internally 
> does a Pattern.compile(regex). This method is called once for each dataset 
> row when using functions like from_utc_timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46069) Support unwrap timestamp type to date type

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46069:
---
Labels: pull-request-available  (was: )

> Support unwrap timestamp type to date type
> --
>
> Key: SPARK-46069
> URL: https://issues.apache.org/jira/browse/SPARK-46069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides

2023-11-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43980:
---

Assignee: Yash Kothari

> Add support for EXCEPT in select clause, similar to what databricks provides
> 
>
> Key: SPARK-43980
> URL: https://issues.apache.org/jira/browse/SPARK-43980
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yash Kothari
>Assignee: Yash Kothari
>Priority: Major
>  Labels: pull-request-available
>
> I'm looking for a way to incorporate the {{select * except(col1, ...)}} 
> clause provided by Databricks into my workflow. I don't use Databricks and 
> would like to introduce this {{select except}} clause either as a 
> spark-package or by contributing a change to Spark.
> However, I'm unsure about how to begin this process and would appreciate any 
> guidance from the community.
> [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples]
>  
> Thank you for your assistance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides

2023-11-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43980.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43843
[https://github.com/apache/spark/pull/43843]

> Add support for EXCEPT in select clause, similar to what databricks provides
> 
>
> Key: SPARK-43980
> URL: https://issues.apache.org/jira/browse/SPARK-43980
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yash Kothari
>Assignee: Yash Kothari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I'm looking for a way to incorporate the {{select * except(col1, ...)}} 
> clause provided by Databricks into my workflow. I don't use Databricks and 
> would like to introduce this {{select except}} clause either as a 
> spark-package or by contributing a change to Spark.
> However, I'm unsure about how to begin this process and would appreciate any 
> guidance from the community.
> [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples]
>  
> Thank you for your assistance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46062) CTE reference node does not inherit the flag `isStreaming` from CTE definition node

2023-11-23 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-46062:


Assignee: Jungtaek Lim

> CTE reference node does not inherit the flag `isStreaming` from CTE 
> definition node
> ---
>
> Key: SPARK-46062
> URL: https://issues.apache.org/jira/browse/SPARK-46062
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>
> Looks like this is a long standing bug.
> We figured out that CTE reference node would never set the isStreaming flag 
> to true, regardless of the value for flag in CTE definition. The node cannot 
> determine the right value of isStreaming flag by itself (likewise it cannot 
> determine about resolution by itself) but it has no parameter in constructor, 
> hence always takes the default (no children, so batch one).
> This may impact some rules which behaves differently depending on isStreaming 
> flag. It would no longer be a problem once CTE reference is replaced with CTE 
> definition at some point in "optimization phase", but all rules in analyzer 
> and optimizer being triggered before the rule takes effect may be impacted.
> We probably couldn't sync the flag in real time, but we should sync the flag 
> when we mark CTE reference to be "resolved". The rule `ResolveWithCTE` will 
> be a good place to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator

2023-11-23 Thread Vara Bonthu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vara Bonthu updated SPARK-46054:

Description: 
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.

*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially reported 45.

 * The primary issue at hand is that this project resides under the 
GoogleCloudPlatform GitHub organization and is exclusively moderated by a 
Google employee. Concerns have been raised by numerous users and customers 
regarding the maintenance of the repository.

 * The existing Google maintainers are constrained by limitations in terms of 
time and support, which negatively impacts both the project and its user 
community.

 

*Recent Developments:*
 * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the Apple 
infrastructure team engaged in discussions with the Google's team, specifically 
with Marcin Wielgus. They expressed their interest in contributing the project 
to either the Kubeflow or Apache Spark community.

 * *{color:#00875a}Marcin from Google confirmed their willingness to donate the 
project to either of these communities.{color}*

 * An adoption process has been initiated by the Kubeflow project under CNCF, 
as documented in the following thread: [Link to the 
thread|https://github.com/kubeflow/community/issues/648].

 

*Primary Goal:*
 * The primary goal is to ensure the collaborative support and adoption of 
Google's Spark Operator by the Apache Spark , thereby avoiding the development 
of redundant tools and reducing confusion among users.

*Next Steps:*
 * *Meeting with Apache Spark Working Group Maintainers:* We propose arranging 
a meeting with the Apache Spark working group maintainers to delve deeper into 
this matter, address any questions or concerns they may have, and collectively 
work towards a decision.

 * *Establish a New Working Group:* Upon reaching an agreement, we intend to 
create a new working group comprising members from diverse organizations who 
are willing to contribute and collaborate on this initiative.

 * *Repository Transfer:* Our plan involves transferring the project repository 
from Google's organization to either the Apache or Kubeflow organization, 
aligning with the chosen community.

 * *Roadmap Development:* We will formulate a new roadmap that encompasses 
immediate issue resolution and a long-term design strategy aimed at enhancing 
performance, scalability, and security for this tool.

 
We believe that working towards one Spark Operator will benefit the Apache 
Spark community and address the current maintenance challenges. Your feedback 
and support in this matter are highly valued. Let's collaborate to ensure a 
robust and well-maintained Spark Operator for the Apache Spark community's 
benefit.

*Community members are encouraged to leave their comments or give a thumbs-up 
to express their support for adopting Google's Spark Operator as the official 
Apache Spark operator.*

 

*Proposed Authors*

Vara Bonthu (AWS)

Andrey Velichkevich (Apple)

Chaoran Yu (Apple)

Marcin Wielgus (Google)

Rus Pandey (Apple)

 

  was:
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.



*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially reported 45.

 * The primary issue at hand is that 

[jira] [Resolved] (SPARK-46062) CTE reference node does not inherit the flag `isStreaming` from CTE definition node

2023-11-23 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46062.
--
Fix Version/s: 3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43966
[https://github.com/apache/spark/pull/43966]

> CTE reference node does not inherit the flag `isStreaming` from CTE 
> definition node
> ---
>
> Key: SPARK-46062
> URL: https://issues.apache.org/jira/browse/SPARK-46062
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0, 3.4.2
>
>
> Looks like this is a long standing bug.
> We figured out that CTE reference node would never set the isStreaming flag 
> to true, regardless of the value for flag in CTE definition. The node cannot 
> determine the right value of isStreaming flag by itself (likewise it cannot 
> determine about resolution by itself) but it has no parameter in constructor, 
> hence always takes the default (no children, so batch one).
> This may impact some rules which behaves differently depending on isStreaming 
> flag. It would no longer be a problem once CTE reference is replaced with CTE 
> definition at some point in "optimization phase", but all rules in analyzer 
> and optimizer being triggered before the rule takes effect may be impacted.
> We probably couldn't sync the flag in real time, but we should sync the flag 
> when we mark CTE reference to be "resolved". The rule `ResolveWithCTE` will 
> be a good place to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator

2023-11-23 Thread Vara Bonthu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vara Bonthu updated SPARK-46054:

Description: 
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.



*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially reported 45.

 * The primary issue at hand is that this project resides under the 
GoogleCloudPlatform GitHub organization and is exclusively moderated by a 
Google employee. Concerns have been raised by numerous users and customers 
regarding the maintenance of the repository.

 * The existing Google maintainers are constrained by limitations in terms of 
time and support, which negatively impacts both the project and its user 
community.

 

*Recent Developments:*
 * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the Apple 
infrastructure team engaged in discussions with the Google's team, specifically 
with Marcin Wielgus. They expressed their interest in contributing the project 
to either the Kubeflow or Apache Spark community.

 * *{color:#00875a}Marcin from Google confirmed their willingness to donate the 
project to either of these communities.{color}*

 * An adoption process has been initiated by the Kubeflow project under CNCF, 
as documented in the following thread: [Link to the 
thread|https://github.com/kubeflow/community/issues/648].

 

*Primary Goal:*
**The primary goal is to ensure the collaborative support and adoption of 
Google's Spark Operator by the Apache Spark (supported by Kubeflow and CNCF 
communities) , thereby avoiding the development of redundant tools and reducing 
confusion among users.

 

*Next Steps:*
 * *Meeting with Apache Spark Working Group Maintainers:* We propose arranging 
a meeting with the Apache Spark working group maintainers to delve deeper into 
this matter, address any questions or concerns they may have, and collectively 
work towards a decision.

 * *Establish a New Working Group:* Upon reaching an agreement, we intend to 
create a new working group comprising members from diverse organizations who 
are willing to contribute and collaborate on this initiative.

 * *Repository Transfer:* Our plan involves transferring the project repository 
from Google's organization to either the Apache or Kubeflow organization, 
aligning with the chosen community.

 * *Roadmap Development:* We will formulate a new roadmap that encompasses 
immediate issue resolution and a long-term design strategy aimed at enhancing 
performance, scalability, and security for this tool.

 
We believe that working towards one Spark Operator will benefit the Apache 
Spark community and address the current maintenance challenges. Your feedback 
and support in this matter are highly valued. Let's collaborate to ensure a 
robust and well-maintained Spark Operator for the Apache Spark community's 
benefit.

*Community members are encouraged to leave their comments or give a thumbs-up 
to express their support for adopting Google's Spark Operator as the official 
Apache Spark operator.*

 

*Proposed Authors*

Vara Bonthu (AWS)

Andrey Velichkevich (Apple)

Chaoran Yu (Apple)

Marcin Wielgus (Google)

Rus Pandey (Apple)


 

  was:
*Description:*

This proposal aims to recommend the adoption of [Google's Spark K8s 
Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
official Spark Operator for the Apache Spark community. The operator has gained 
significant traction among many users and organizations and used heavily in 
production environments, but challenges related to maintenance and governance 
necessitate this recommendation.



*Background:*
 * Google's Spark K8s Operator is currently in use by hundreds of users and 
organizations. However, due to maintenance issues, many of these users and 
organizations have resorted to forking the repository and implementing their 
own fixes.

 * The project boasts an impressive user base with 167 contributors, 2.5k 
likes, and endorsements from 45 organizations, as documented in the "Who is 
using" document. Notably, there are many more organizations using it than the 
initially 

[jira] [Created] (SPARK-46073) Remove the special resolution of UnresolvedNamespace for certain commands

2023-11-23 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-46073:
---

 Summary: Remove the special resolution of UnresolvedNamespace for 
certain commands
 Key: SPARK-46073
 URL: https://issues.apache.org/jira/browse/SPARK-46073
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46071) TreeNode.toJSON may result in OOM when there are multiple levels of nesting of expressions.

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46071:
---
Labels: pull-request-available  (was: )

> TreeNode.toJSON may result in OOM when there are multiple levels of nesting 
> of expressions.
> ---
>
> Key: SPARK-46071
> URL: https://issues.apache.org/jira/browse/SPARK-46071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: JacobZheng
>Priority: Major
>  Labels: pull-request-available
>
> I am encountering an OOM exception when executing the following code:
> {code:scala}
> parser.parseExpression(sql).toJSON
> {code}
> This sql is a multiple nesting of {*}_CaseWhen_{*}. After testing I found 
> that the number of expressions in the json increases exponentially as the 
> number of nestings increases.
> Here are some example:
> sql:
> {code:sql}
> CASE WHEN(`cost` <= 275) THEN '(270-275]' 
> ELSE '' END
> {code}
> json:
> {code:json}
> [
> {
> "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
> "num-children":3,
> "branches":[
> {
> "product-class":"scala.Tuple2",
> "_1":[
> {
> 
> "class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
> "num-children":2,
> "left":0,
> "right":1
> },
> {
> 
> "class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
> "num-children":0,
> "nameParts":"[cost]"
> },
> {
> 
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"275",
> "dataType":"integer"
> }
> ],
> "_2":[
> {
> 
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"(270-275]",
> "dataType":"string"
> }
> ]
> }
> ],
> "elseValue":[
> {
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"",
> "dataType":"string"
> }
> ]
> },
> {
> "class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
> "num-children":2,
> "left":0,
> "right":1
> },
> {
> "class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
> "num-children":0,
> "nameParts":"[cost]"
> },
> {
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"275",
> "dataType":"integer"
> },
> {
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"(270-275]",
> "dataType":"string"
> },
> {
> "class":"org.apache.spark.sql.catalyst.expressions.Literal",
> "num-children":0,
> "value":"",
> "dataType":"string"
> }
> ]
> {code}
> The child nodes of the *_CaseWhen_* expression are stored twice in JSON.
> When *_CaseWhen_* is nested twice, the child expression of the first case 
> when is repeated 4 times, and so on.
> {code:sql}
> CASE WHEN(`cost` <= 270) THEN '(265-270]'
> ELSE 
> CASE WHEN(`cost` <= 275) THEN '(270-275]' 
> ELSE '' END END
> {code}
> Nesting the *_CaseWhen_* expression n times in this case will result in 
> 2^n+11 expressions in the json.
> The reason for this problem is that the field of *_CaseWhen_* cannot be 
> converted to children index of when executing method {*}_jsonFields_{*}.
> Perhaps simplifying *_CaseWhen_* json a bit by overriding the *_jsonFields_* 
> method is a viable way to go.
> {code:json}
> [
> {
> "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
> "num-children":3,
> "branches":[
> {
> "condition":0,
> "value":1
> }
> ],
> "elseValue":2
> },
> {
> "class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
> "num-children":2,
> "left":0,
> "right":1
> },
> {
> "class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
> 

[jira] [Resolved] (SPARK-46021) Support canceling future jobs belonging to a certain job group on `cancelJobGroup` call

2023-11-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46021.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43926
[https://github.com/apache/spark/pull/43926]

> Support canceling future jobs belonging to a certain job group on 
> `cancelJobGroup` call
> ---
>
> Key: SPARK-46021
> URL: https://issues.apache.org/jira/browse/SPARK-46021
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46021) Support canceling future jobs belonging to a certain job group on `cancelJobGroup` call

2023-11-23 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46021:
---

Assignee: Xinyi Yu

> Support canceling future jobs belonging to a certain job group on 
> `cancelJobGroup` call
> ---
>
> Key: SPARK-46021
> URL: https://issues.apache.org/jira/browse/SPARK-46021
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30385) WebUI occasionally throw IOException on stop()

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-30385:
---
Labels: pull-request-available  (was: )

> WebUI occasionally throw IOException on stop()
> --
>
> Key: SPARK-30385
> URL: https://issues.apache.org/jira/browse/SPARK-30385
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
> Environment: MacOS 10.14.6
> Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231
> Scala version 2.12.10
>Reporter: wuyi
>Assignee: Kousuke Saruta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> While using ./bin/spark-shell, recently, I occasionally see IOException when 
> I try to quit:
> {code:java}
> 19/12/30 17:33:21 WARN AbstractConnector:
> java.io.IOException: No such file or directory
>  at sun.nio.ch.NativeThread.signal(Native Method)
>  at 
> sun.nio.ch.ServerSocketChannelImpl.implCloseSelectableChannel(ServerSocketChannelImpl.java:292)
>  at 
> java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234)
>  at 
> java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
>  at org.eclipse.jetty.server.ServerConnector.close(ServerConnector.java:368)
>  at 
> org.eclipse.jetty.server.AbstractNetworkConnector.shutdown(AbstractNetworkConnector.java:105)
>  at org.eclipse.jetty.server.Server.doStop(Server.java:439)
>  at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
>  
>  at org.apache.spark.ui.ServerInfo.stop(JettyUtils.scala:499)
>  at org.apache.spark.ui.WebUI.$anonfun$stop$2(WebUI.scala:173)
>  at org.apache.spark.ui.WebUI.$anonfun$stop$2$adapted(WebUI.scala:173)
>  at scala.Option.foreach(Option.scala:407)
>  at org.apache.spark.ui.WebUI.stop(WebUI.scala:173)
>  at org.apache.spark.ui.SparkUI.stop(SparkUI.scala:101)
>  at org.apache.spark.SparkContext.$anonfun$stop$6(SparkContext.scala:1972)
>  at 
> org.apache.spark.SparkContext.$anonfun$stop$6$adapted(SparkContext.scala:1972)
>  at scala.Option.foreach(Option.scala:407)
>  at org.apache.spark.SparkContext.$anonfun$stop$5(SparkContext.scala:1972)
>  at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
>  at org.apache.spark.SparkContext.stop(SparkContext.scala:1972)
>  at org.apache.spark.repl.Main$.$anonfun$doMain$3(Main.scala:79)
>  at org.apache.spark.repl.Main$.$anonfun$doMain$3$adapted(Main.scala:79)
>  at scala.Option.foreach(Option.scala:407)
>  at org.apache.spark.repl.Main$.doMain(Main.scala:79)
>  at org.apache.spark.repl.Main$.main(Main.scala:58)
>  at org.apache.spark.repl.Main.main(Main.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) 
>  at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> I don't find a way to reproduce it stably, but it will increase possibility 
> if you stay in spark-shell for not a short time.  
> A possible way to reproduce this is: start ./bin/spark-shell , wait for 5 
> min, then use :q or :quit to quit.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator

2023-11-23 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-46054:
-
Fix Version/s: (was: 4.0.0)

> SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator
> --
>
> Key: SPARK-46054
> URL: https://issues.apache.org/jira/browse/SPARK-46054
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Vara Bonthu
>Priority: Minor
>
> *Description:*
> This proposal aims to recommend the adoption of [Google's Spark K8s 
> Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
> official Spark Operator for the Apache Spark community. The operator has 
> gained significant traction among many users and organizations and used 
> heavily in production environments, but challenges related to maintenance and 
> governance necessitate this recommendation.
> *Background:*
>  * Google's Spark K8s Operator is currently in use by hundreds of users and 
> organizations. However, due to maintenance issues, many of these users and 
> organizations have resorted to forking the repository and implementing their 
> own fixes.
>  * The project boasts an impressive user base with 167 contributors, 2.5k 
> likes, and endorsements from 45 organizations, as documented in the "Who is 
> using" document. Notably, there are many more organizations using it than the 
> initially reported 45.
>  * The primary issue at hand is that this project resides under the 
> GoogleCloudPlatform GitHub organization and is exclusively moderated by a 
> Google employee. Concerns have been raised by numerous users and customers 
> regarding the maintenance of the repository.
>  * The existing Google maintainers are constrained by limitations in terms of 
> time and support, which negatively impacts both the project and its user 
> community.
>  
> *Recent Developments:*
>  * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the 
> Apple infrastructure team engaged in discussions with the Google's team, 
> specifically with Marcin Wielgus. They expressed their interest in 
> contributing the project to either the Kubeflow or Apache Spark community.
>  * *{color:#00875a}Marcin from Google confirmed their willingness to donate 
> the project to either of these communities.{color}*
>  * An adoption process has been initiated by the Kubeflow project under CNCF, 
> as documented in the following thread: [Link to the 
> thread|https://github.com/kubeflow/community/issues/648].
>  
> *Primary Goal:*
>  * The primary goal is to ensure the endorsement of one tool, collaboratively 
> supported by the Apache Spark, Kubeflow, and CNCF communities.
>  
> *Next Steps:*
>  * *Meeting with Apache Spark Working Group Maintainers:* We propose 
> arranging a meeting with the Apache Spark working group maintainers to delve 
> deeper into this matter, address any questions or concerns they may have, and 
> collectively work towards a decision.
>  * *Establish a New Working Group:* Upon reaching an agreement, we intend to 
> create a new working group comprising members from diverse organizations who 
> are willing to contribute and collaborate on this initiative.
>  * *Repository Transfer:* Our plan involves transferring the project 
> repository from Google's organization to either the Apache or Kubeflow 
> organization, aligning with the chosen community.
>  * *Roadmap Development:* We will formulate a new roadmap that encompasses 
> immediate issue resolution and a long-term design strategy aimed at enhancing 
> performance, scalability, and security for this tool.
>  
> We({*}Proposed Authors{*}) believe that endorsing Google's Spark K8s Operator 
> as the official Spark Operator will benefit the Apache Spark community and 
> address the current maintenance challenges. Your feedback and support in this 
> matter are highly valued.
> Let's collaborate to ensure a robust and well-maintained Spark Operator for 
> the Apache Spark community's benefit.
>  
> *Community members are encouraged to leave their comments or give a thumbs-up 
> to express their support for adopting Google's Spark Operator as the official 
> Apache Spark operator.*
>  
> *Proposed Authors*
> Vara Bonthu (AWS)
> Andrey Velichkevich (Apple)
> Chaoran Yu (Apple)
> Marcin Wielgus (Google)
> Rus Pandey (Apple)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46065.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43970
[https://github.com/apache/spark/pull/43970]

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46065:


Assignee: Haejoon Lee

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46072) Missing .jars when applying code to spark-connect

2023-11-23 Thread Dmitry Kravchuk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kravchuk updated SPARK-46072:

Summary: Missing .jars when applying code to spark-connect  (was: Missing 
.jars when trying to apply code to spark-connect)

> Missing .jars when applying code to spark-connect
> -
>
> Key: SPARK-46072
> URL: https://issues.apache.org/jira/browse/SPARK-46072
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
> Environment: python 3.9
> scala 2.12
> spark 3.4.1
> hdfs 3.1.2
> hive 3.1.3
>Reporter: Dmitry Kravchuk
>Priority: Major
> Fix For: 3.4.2, 3.5.1
>
>
> I've built spark with following maven code for our onprem hadoop cluster:
> {code:bash}
> ./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive 
> -Phive-thriftserver -DskipTests clean package
> {code}
>  
> So I start connect server like that:
> {code:bash}
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.4.1
> {code}
>  
> When I'm trying to run any code after following code I always have an error 
> from connect-server side:
> {code:bash}
> ./bin/pyspark --remote "sc://localhost"
> {code}
> Error: 
> {code:bash}
>           
> /home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar
>          central: tried
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>          spark-packages: tried
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>                 ::
>                 ::          UNRESOLVED DEPENDENCIES         ::
>                 ::
>                 :: org.apache.spark#spark-connect_2.12;3.4.1: not found
>                 ::
> {code}
>  
> Where am I wrong? I thought it's a firewall issue what it's not cause I fixed 
> to set http_proxy and https_proxy variables with my own credentials.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46072) Missing .jars when trying to apply code to spark-connect

2023-11-23 Thread Dmitry Kravchuk (Jira)
Dmitry Kravchuk created SPARK-46072:
---

 Summary: Missing .jars when trying to apply code to spark-connect
 Key: SPARK-46072
 URL: https://issues.apache.org/jira/browse/SPARK-46072
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.1
 Environment: python 3.9

scala 2.12

spark 3.4.1

hdfs 3.1.2

hive 3.1.3
Reporter: Dmitry Kravchuk
 Fix For: 3.4.2, 3.5.1


I've built spark with following maven code for our onprem hadoop cluster:
{code:bash}
./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive 
-Phive-thriftserver -DskipTests clean package
{code}
 
So I start connect server like that:
{code:bash}
./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.4.1
{code}
 
When I'm trying to run any code after following code I always have an error 
from connect-server side:
{code:bash}
./bin/pyspark --remote "sc://localhost"
{code}
Error: 
{code:bash}
          
/home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar

         central: tried

          
https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom

          -- artifact 
org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:

          
https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar

         spark-packages: tried

          
https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom

          -- artifact 
org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:

          
https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar

                ::

                ::          UNRESOLVED DEPENDENCIES         ::

                ::

                :: org.apache.spark#spark-connect_2.12;3.4.1: not found

                ::
{code}
 

Where am I wrong? I thought it's a firewall issue what it's not cause I fixed 
to set http_proxy and https_proxy variables with my own credentials.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46064) EliminateEventTimeWatermark does not consider the fact that isStreaming flag can change for current child during resolution

2023-11-23 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46064.
--
Fix Version/s: 3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43971
[https://github.com/apache/spark/pull/43971]

> EliminateEventTimeWatermark does not consider the fact that isStreaming flag 
> can change for current child during resolution
> ---
>
> Key: SPARK-46064
> URL: https://issues.apache.org/jira/browse/SPARK-46064
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0, 3.4.2
>
>
> Looks like this is a long standing bug.
> The object `EliminateEventTimeWatermark` is implemented as a rule, but it is 
> not registered in analyzer/optimizer. Instead, it is called directly when 
> withWatermark method is called, which means the rule is applied immediately 
> against the child, regardless whether child is resolved or not.
> It is not an issue for the usage of pure DataFrame API because streaming 
> sources have the flag isStreaming set to true even it is yet resolved, but 
> mix-up of SQL and DataFrame API would expose the issue; we may not know the 
> exact value of isStreaming flag on unresolved node and it is subject to 
> change upon resolution.
> We should register EliminateEventTimeWatermark as a rule on analysis (or 
> pre-optimization) instead, and do not apply the elimination if the child is 
> not yet resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-23 Thread Giambattista Bloisi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789058#comment-17789058
 ] 

Giambattista Bloisi commented on SPARK-45311:
-

Could you disable failsafe trimStackTrace (as explained in 
[https://stackoverflow.com/questions/42248856/how-to-get-the-full-stacktrace-of-failed-tests-in-failsafe]
 ) to get the full stack of the error and report it here? 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46070) Precompile regex patterns in SparkDateTimeUtils.getZoneId

2023-11-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46070:
---
Labels: pull-request-available  (was: )

> Precompile regex patterns in SparkDateTimeUtils.getZoneId
> -
>
> Key: SPARK-46070
> URL: https://issues.apache.org/jira/browse/SPARK-46070
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Tanel Kiis
>Priority: Major
>  Labels: pull-request-available
>
> SparkDateTimeUtils.getZoneId uses String.replaceFirst method, that internally 
> does a Pattern.compile(regex). This method is called once for each dataset 
> row when using functions like from_utc_timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46071) TreeNode.toJSON may result in OOM when there are multiple levels of nesting of expressions.

2023-11-23 Thread JacobZheng (Jira)
JacobZheng created SPARK-46071:
--

 Summary: TreeNode.toJSON may result in OOM when there are multiple 
levels of nesting of expressions.
 Key: SPARK-46071
 URL: https://issues.apache.org/jira/browse/SPARK-46071
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: JacobZheng


I am encountering an OOM exception when executing the following code:
{code:scala}
parser.parseExpression(sql).toJSON
{code}
This sql is a multiple nesting of {*}_CaseWhen_{*}. After testing I found that 
the number of expressions in the json increases exponentially as the number of 
nestings increases.

Here are some example:

sql:
{code:sql}
CASE WHEN(`cost` <= 275) THEN '(270-275]' 
ELSE '' END
{code}
json:
{code:json}
[
{
"class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
"num-children":3,
"branches":[
{
"product-class":"scala.Tuple2",
"_1":[
{

"class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
"num-children":2,
"left":0,
"right":1
},
{

"class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
"num-children":0,
"nameParts":"[cost]"
},
{

"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"275",
"dataType":"integer"
}
],
"_2":[
{

"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"(270-275]",
"dataType":"string"
}
]
}
],
"elseValue":[
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"",
"dataType":"string"
}
]
},
{
"class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
"num-children":2,
"left":0,
"right":1
},
{
"class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
"num-children":0,
"nameParts":"[cost]"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"275",
"dataType":"integer"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"(270-275]",
"dataType":"string"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"",
"dataType":"string"
}
]
{code}
The child nodes of the *_CaseWhen_* expression are stored twice in JSON.

When *_CaseWhen_* is nested twice, the child expression of the first case when 
is repeated 4 times, and so on.
{code:sql}
CASE WHEN(`cost` <= 270) THEN '(265-270]'
ELSE 
CASE WHEN(`cost` <= 275) THEN '(270-275]' 
ELSE '' END END
{code}
Nesting the *_CaseWhen_* expression n times in this case will result in 2^n+11 
expressions in the json.

The reason for this problem is that the field of *_CaseWhen_* cannot be 
converted to children index of when executing method {*}_jsonFields_{*}.

Perhaps simplifying *_CaseWhen_* json a bit by overriding the *_jsonFields_* 
method is a viable way to go.
{code:json}
[
{
"class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
"num-children":3,
"branches":[
{
"condition":0,
"value":1
}
],
"elseValue":2
},
{
"class":"org.apache.spark.sql.catalyst.expressions.LessThanOrEqual",
"num-children":2,
"left":0,
"right":1
},
{
"class":"org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute",
"num-children":0,
"nameParts":"[cost]"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"275",
"dataType":"integer"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"(270-275]",
"dataType":"string"
},
{
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
"num-children":0,
"value":"",
"dataType":"string"
}
]
{code}
 



--
This message was sent by Atlassian Jira

[jira] [Created] (SPARK-46070) Precompile regex patterns in SparkDateTimeUtils.getZoneId

2023-11-23 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-46070:
--

 Summary: Precompile regex patterns in SparkDateTimeUtils.getZoneId
 Key: SPARK-46070
 URL: https://issues.apache.org/jira/browse/SPARK-46070
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Tanel Kiis


SparkDateTimeUtils.getZoneId uses String.replaceFirst method, that internally 
does a Pattern.compile(regex). This method is called once for each dataset row 
when using functions like from_utc_timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46069) Support unwrap timestamp type to date type

2023-11-23 Thread Wan Kun (Jira)
Wan Kun created SPARK-46069:
---

 Summary: Support unwrap timestamp type to date type
 Key: SPARK-46069
 URL: https://issues.apache.org/jira/browse/SPARK-46069
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wan Kun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org