[jira] [Created] (SPARK-42958) Refactor `CheckConnectJvmClientCompatibility` to compare client and avro

2023-03-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-42958:


 Summary: Refactor `CheckConnectJvmClientCompatibility` to compare 
client and avro
 Key: SPARK-42958
 URL: https://issues.apache.org/jira/browse/SPARK-42958
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42957) `release-build.sh` should not remove SBOM artifacts

2023-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42957:
--
Priority: Critical  (was: Major)

> `release-build.sh` should not remove SBOM artifacts
> ---
>
> Key: SPARK-42957
> URL: https://issues.apache.org/jira/browse/SPARK-42957
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42957) `release-build.sh` should not remove SBOM artifacts

2023-03-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42957:
-

 Summary: `release-build.sh` should not remove SBOM artifacts
 Key: SPARK-42957
 URL: https://issues.apache.org/jira/browse/SPARK-42957
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42956) avro functions

2023-03-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-42956:


 Summary: avro functions
 Key: SPARK-42956
 URL: https://issues.apache.org/jira/browse/SPARK-42956
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42946) Sensitive data could still be exposed by variable substitution

2023-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-42946.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40576
[https://github.com/apache/spark/pull/40576]

> Sensitive data could still be exposed by variable substitution
> --
>
> Key: SPARK-42946
> URL: https://issues.apache.org/jira/browse/SPARK-42946
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.4.0
>
>
> Case 1 by SET syntax's key part
>  
> {code:java}
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql> set ${spark.ssl.keyPassword}
>          > ;
> abc     {code}
> Case 2 by SELECT as String lit
>  
> {code:java}
> spark-sql> set spark.ssl.keyPassword;
> spark.ssl.keyPassword    *(redacted)
> Time taken: 0.009 seconds, Fetched 1 row(s)
> spark-sql> select '${spark.ssl.keyPassword}'
>          > ;
> abc
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42946) Sensitive data could still be exposed by variable substitution

2023-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-42946:


Assignee: Kent Yao

> Sensitive data could still be exposed by variable substitution
> --
>
> Key: SPARK-42946
> URL: https://issues.apache.org/jira/browse/SPARK-42946
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Case 1 by SET syntax's key part
>  
> {code:java}
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql> set ${spark.ssl.keyPassword}
>          > ;
> abc     {code}
> Case 2 by SELECT as String lit
>  
> {code:java}
> spark-sql> set spark.ssl.keyPassword;
> spark.ssl.keyPassword    *(redacted)
> Time taken: 0.009 seconds, Fetched 1 row(s)
> spark-sql> select '${spark.ssl.keyPassword}'
>          > ;
> abc
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42955) Skip classifyException and wrap AnalysisException for SparkThrowable

2023-03-28 Thread Kent Yao (Jira)
Kent Yao created SPARK-42955:


 Summary: Skip classifyException and wrap AnalysisException for 
SparkThrowable
 Key: SPARK-42955
 URL: https://issues.apache.org/jira/browse/SPARK-42955
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42393:
-
Description: 
There are derivative APIs which depend on the implementation of Pandas UDFs: 
Pandas Function APIs and Arrow Function APIs, as shown below:

!image-2023-03-29-11-40-44-318.png|width=576,height=225!

 

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
UDFs is essential for Spark Connect to reach parity with the PySpark legacy API.

See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

  was:See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].


> Support for Pandas/Arrow Functions API
> --
>
> Key: SPARK-42393
> URL: https://issues.apache.org/jira/browse/SPARK-42393
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Attachments: image-2023-03-29-11-40-44-318.png
>
>
> There are derivative APIs which depend on the implementation of Pandas UDFs: 
> Pandas Function APIs and Arrow Function APIs, as shown below:
> !image-2023-03-29-11-40-44-318.png|width=576,height=225!
>  
> Spark Connect Python Client (SCPC), as a client and server interface for 
> PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
> UDFs is essential for Spark Connect to reach parity with the PySpark legacy 
> API.
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42393:
-
Attachment: image-2023-03-29-11-40-44-318.png

> Support for Pandas/Arrow Functions API
> --
>
> Key: SPARK-42393
> URL: https://issues.apache.org/jira/browse/SPARK-42393
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Attachments: image-2023-03-29-11-40-44-318.png
>
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41661:
-
Description: 
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
UDFs is essential for Spark Connect to reach parity with the PySpark legacy API.

  was:
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark in OSS. Supporting 
PySpark UDFs is essential for Spark Connect to reach parity with the PySpark 
legacy API.


> Support for User-defined Functions in Python
> 
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
> User-defined Functions in Python consist of (pickled) Python UDFs and 
> (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
> on top of the Apache Spark™ engine. Users only have to state "what to do"; 
> PySpark, as a sandbox, encapsulates "how to do it".
> Spark Connect Python Client (SCPC), as a client and server interface for 
> PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
> UDFs is essential for Spark Connect to reach parity with the PySpark legacy 
> API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41661:
-
Description: 
User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
UDFs is essential for Spark Connect to reach parity with the PySpark legacy API.

See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

  was:
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
UDFs is essential for Spark Connect to reach parity with the PySpark legacy API.


> Support for User-defined Functions in Python
> 
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> User-defined Functions in Python consist of (pickled) Python UDFs and 
> (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
> on top of the Apache Spark™ engine. Users only have to state "what to do"; 
> PySpark, as a sandbox, encapsulates "how to do it".
> Spark Connect Python Client (SCPC), as a client and server interface for 
> PySpark will eventually replace the legacy API of PySpark. Supporting PySpark 
> UDFs is essential for Spark Connect to reach parity with the PySpark legacy 
> API.
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41661:
-
Description: 
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark will eventually replace the legacy API of PySpark in OSS. Supporting 
PySpark UDFs is essential for Spark Connect to reach parity with the PySpark 
legacy API.

  was:
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark, will eventually (probably Spark 4.0) replace the legacy API of PySpark 
in both OSS. Supporting PySpark UDFs is essential for Spark Connect to reach 
parity with the PySpark legacy API.


> Support for User-defined Functions in Python
> 
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
> User-defined Functions in Python consist of (pickled) Python UDFs and 
> (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
> on top of the Apache Spark™ engine. Users only have to state "what to do"; 
> PySpark, as a sandbox, encapsulates "how to do it".
> Spark Connect Python Client (SCPC), as a client and server interface for 
> PySpark will eventually replace the legacy API of PySpark in OSS. Supporting 
> PySpark UDFs is essential for Spark Connect to reach parity with the PySpark 
> legacy API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41661:
-
Description: 
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

User-defined Functions in Python consist of (pickled) Python UDFs and 
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
on top of the Apache Spark™ engine. Users only have to state "what to do"; 
PySpark, as a sandbox, encapsulates "how to do it".

Spark Connect Python Client (SCPC), as a client and server interface for 
PySpark, will eventually (probably Spark 4.0) replace the legacy API of PySpark 
in both OSS. Supporting PySpark UDFs is essential for Spark Connect to reach 
parity with the PySpark legacy API.

  was:
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

PySpark UDFs  mainly consist of (pickled) Python UDFs and (Arrow-optimized) 
Pandas UDFs.


> Support for User-defined Functions in Python
> 
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
> User-defined Functions in Python consist of (pickled) Python UDFs and 
> (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code 
> on top of the Apache Spark™ engine. Users only have to state "what to do"; 
> PySpark, as a sandbox, encapsulates "how to do it".
> Spark Connect Python Client (SCPC), as a client and server interface for 
> PySpark, will eventually (probably Spark 4.0) replace the legacy API of 
> PySpark in both OSS. Supporting PySpark UDFs is essential for Spark Connect 
> to reach parity with the PySpark legacy API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41661:
-
Description: 
See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

PySpark UDFs  mainly consist of (pickled) Python UDFs and (Arrow-optimized) 
Pandas UDFs.

  was:Spark Connect should support Python UDFs


> Support for User-defined Functions in Python
> 
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
> PySpark UDFs  mainly consist of (pickled) Python UDFs and (Arrow-optimized) 
> Pandas UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42393:
-
Description: See design doc 
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].

> Support for Pandas/Arrow Functions API
> --
>
> Key: SPARK-42393
> URL: https://issues.apache.org/jira/browse/SPARK-42393
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> See design doc 
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42954) Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client

2023-03-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42954:
-

 Summary: Add `YearMonthIntervalType` to PySpark and Spark Connect 
Python Client
 Key: SPARK-42954
 URL: https://issues.apache.org/jira/browse/SPARK-42954
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39153) When we look at spark UI or History, we can see the failed tasks first

2023-03-28 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong resolved SPARK-39153.
-
Resolution: Not A Problem

> When we look at spark UI or History, we can see the failed tasks first
> --
>
> Key: SPARK-39153
> URL: https://issues.apache.org/jira/browse/SPARK-39153
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
> Environment: spark 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
> Fix For: 3.2.0
>
>
> When a task fails, users are more concerned about the causes of failed tasks 
> and failed tasks. The Current Spark UI and History are sorted according to 
> "Index" rather than "Errors". When a large number of tasks are sorted, you 
> need to wait a certain period for tasks to be sorted. In order to find the 
> cause of Errors for failed tasks, we can improve the user experience by 
> specifying sorting by the "Errors" column at the beginning



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed

2023-03-28 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong resolved SPARK-39967.
-
Resolution: Fixed

New version not reproduced

 

> Instead of using the scalar tasksSuccessful, use the successful array to 
> calculate whether the task is completed
> 
>
> Key: SPARK-39967
> URL: https://issues.apache.org/jira/browse/SPARK-39967
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3, 2.4.6
>Reporter: jingxiong zhong
>Priority: Critical
> Attachments: spark1-1.png, spark2.png, spark3-1.png
>
>
> When counting the number of successful tasks in the stage of spark, spark 
> uses the indicator of `tasksSuccessful`, but in fact, the success or failure 
> of tasks is based on the array of `successful`. Through the log I added, it 
> is found that the number of failed tasks counted by `tasksSuccessful` is 
> inconsistent with the number of failures stored in the array of `successful`. 
> We should take `successful` as the standard.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42953) Impl typed map, flatMap, mapPartitions in Dataset

2023-03-28 Thread Zhen Li (Jira)
Zhen Li created SPARK-42953:
---

 Summary: Impl typed map, flatMap, mapPartitions in Dataset
 Key: SPARK-42953
 URL: https://issues.apache.org/jira/browse/SPARK-42953
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add missing typed API support in the Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42952) Simplify the parameter of analyzer rule PreprocessTableCreation and DataSourceAnalysis

2023-03-28 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-42952:
---
Summary: Simplify the parameter of analyzer rule PreprocessTableCreation 
and DataSourceAnalysis  (was: Simplify the parameter of analysis rule 
PreprocessTableCreation and DataSourceAnalysis)

> Simplify the parameter of analyzer rule PreprocessTableCreation and 
> DataSourceAnalysis
> --
>
> Key: SPARK-42952
> URL: https://issues.apache.org/jira/browse/SPARK-42952
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42952) Simplify the parameter of analysis rule PreprocessTableCreation and DataSourceAnalysis

2023-03-28 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42952:
--

 Summary: Simplify the parameter of analysis rule 
PreprocessTableCreation and DataSourceAnalysis
 Key: SPARK-42952
 URL: https://issues.apache.org/jira/browse/SPARK-42952
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()

2023-03-28 Thread Wei Liu (Jira)
Wei Liu created SPARK-42951:
---

 Summary: Spark Connect: Streaming DataStreamReader API except 
table()
 Key: SPARK-42951
 URL: https://issues.apache.org/jira/browse/SPARK-42951
 Project: Spark
  Issue Type: Story
  Components: Connect, Structured Streaming
Affects Versions: 3.4.0, 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42950) Add exit code in SparkListenerApplicationEnd

2023-03-28 Thread Paul Laffon (Jira)
Paul Laffon created SPARK-42950:
---

 Summary: Add exit code in SparkListenerApplicationEnd
 Key: SPARK-42950
 URL: https://issues.apache.org/jira/browse/SPARK-42950
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.2
Reporter: Paul Laffon


When an application ends, the {{SparkListener}} receives a final event called 
{{{}SparkListenerApplicationEnd{}}}.

This event currently only includes a timestamp, but it would be beneficial to 
also include the exitCode of the application. This additional information would 
provide insight into whether the application succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42949) Simplify code for NAAJ

2023-03-28 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-42949:
-

 Summary: Simplify code for NAAJ
 Key: SPARK-42949
 URL: https://issues.apache.org/jira/browse/SPARK-42949
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42927) Make `o.a.spark.util.Iterators#size` as `private[spark]`

2023-03-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42927.
--
Fix Version/s: 3.4.1
   3.5.0
 Assignee: Yang Jie
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/40556

> Make `o.a.spark.util.Iterators#size` as `private[spark]`
> 
>
> Key: SPARK-42927
> URL: https://issues.apache.org/jira/browse/SPARK-42927
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.4.1, 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42927) Make `o.a.spark.util.Iterators#size` as `private[spark]`

2023-03-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42927:
-
Priority: Trivial  (was: Major)

> Make `o.a.spark.util.Iterators#size` as `private[spark]`
> 
>
> Key: SPARK-42927
> URL: https://issues.apache.org/jira/browse/SPARK-42927
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705990#comment-17705990
 ] 

Jiayi Liu commented on SPARK-42947:
---

issue fixed by https://github.com/apache/spark/pull/40577

> Spark Thriftserver LDAP should not use DN pattern if user contains domain
> -
>
> Key: SPARK-42947
> URL: https://issues.apache.org/jira/browse/SPARK-42947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiayi Liu
>Priority: Major
>
> When the LDAP provider has domain configuration, such as Active Directory, 
> the principal should not be constructed according to the DN pattern, but the 
> username containing the domain should be directly passed to the LDAP provider 
> as the principal. We can refer to the implementation of Hive LdapUtils.
> When the username contains a domain or domain passes from 
> hive.server2.authentication.ldap.Domain configuration, if we construct the 
> principal according to the DN pattern (For example, 
> uid=user@domain,dc=test,dc=com), we will get the following error:
> {code:java}
> 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: Error validating the login
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
> ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_352]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_352]
>   at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
> Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
> user
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   ... 8 more
> Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
> 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
> 52e, v2580]
>   at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.init(InitialContext.java:244) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.(InitialContext.java:216) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
> ~[?:1.8.0_352]
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> 

[jira] [Updated] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled

2023-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42937:
--
Affects Version/s: (was: 3.5.0)

> Join with subquery in condition can fail with wholestage codegen and adaptive 
> execution disabled
> 
>
> Key: SPARK-42937
> URL: https://issues.apache.org/jira/browse/SPARK-42937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.3.3, 3.4.1
>
>
> The below left outer join gets an error:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
> (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
> (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
> as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, 
> value9, value10);
> create or replace temp view v2 as
> select * from values
> (1, 2),
> (3, 8),
> (7, 9)
> as v2(a, b);
> create or replace temp view v3 as
> select * from values
> (3),
> (8)
> as v3(col1);
> set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100
> set spark.sql.adaptive.enabled=false;
> select *
> from v1
> left outer join v2
> on key = a
> and key in (select col1 from v3);
> {noformat}
> The join fails during predicate codegen:
> {noformat}
> 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to 
> interpreter mode
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40)
> {noformat}
> It fails again after fallback to interpreter mode:
> {noformat}
> 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205)
> {noformat}
> Both the predicate codegen and the evaluation fail for the same reason: 
> {{PlanSubqueries}} creates {{InSubqueryExec}} with {{shouldBroadcast=false}}. 
> The driver waits for the subquery to finish, but it's the executor that uses 
> the results of the subquery (for 

[jira] [Assigned] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled

2023-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42937:
-

Assignee: Bruce Robbins

> Join with subquery in condition can fail with wholestage codegen and adaptive 
> execution disabled
> 
>
> Key: SPARK-42937
> URL: https://issues.apache.org/jira/browse/SPARK-42937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>
> The below left outer join gets an error:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
> (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
> (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
> as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, 
> value9, value10);
> create or replace temp view v2 as
> select * from values
> (1, 2),
> (3, 8),
> (7, 9)
> as v2(a, b);
> create or replace temp view v3 as
> select * from values
> (3),
> (8)
> as v3(col1);
> set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100
> set spark.sql.adaptive.enabled=false;
> select *
> from v1
> left outer join v2
> on key = a
> and key in (select col1 from v3);
> {noformat}
> The join fails during predicate codegen:
> {noformat}
> 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to 
> interpreter mode
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40)
> {noformat}
> It fails again after fallback to interpreter mode:
> {noformat}
> 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205)
> {noformat}
> Both the predicate codegen and the evaluation fail for the same reason: 
> {{PlanSubqueries}} creates {{InSubqueryExec}} with {{shouldBroadcast=false}}. 
> The driver waits for the subquery to finish, but it's the executor that uses 
> the results of the subquery (for predicate codegen or evaluation). 

[jira] [Resolved] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled

2023-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42937.
---
Fix Version/s: 3.3.3
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 40569
[https://github.com/apache/spark/pull/40569]

> Join with subquery in condition can fail with wholestage codegen and adaptive 
> execution disabled
> 
>
> Key: SPARK-42937
> URL: https://issues.apache.org/jira/browse/SPARK-42937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.3.3, 3.4.1
>
>
> The below left outer join gets an error:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
> (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
> (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
> as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, 
> value9, value10);
> create or replace temp view v2 as
> select * from values
> (1, 2),
> (3, 8),
> (7, 9)
> as v2(a, b);
> create or replace temp view v3 as
> select * from values
> (3),
> (8)
> as v3(col1);
> set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100
> set spark.sql.adaptive.enabled=false;
> select *
> from v1
> left outer join v2
> on key = a
> and key in (select col1 from v3);
> {noformat}
> The join fails during predicate codegen:
> {noformat}
> 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to 
> interpreter mode
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40)
> {noformat}
> It fails again after fallback to interpreter mode:
> {noformat}
> 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
> java.lang.IllegalArgumentException: requirement failed: input[0, int, false] 
> IN subquery#34 has not finished
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144)
>   at 
> org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205)
> {noformat}
> Both the predicate codegen and the evaluation fail for the same reason: 
> {{PlanSubqueries}} creates {{InSubqueryExec}} with 

[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liu updated SPARK-42947:
--
Description: 
When the LDAP provider has domain configuration, such as Active Directory, the 
principal should not be constructed according to the DN pattern, but the 
username containing the domain should be directly passed to the LDAP provider 
as the principal. We can refer to the implementation of Hive LdapUtils.

When the username contains a domain or domain passes from 
hive.server2.authentication.ldap.Domain configuration, if we construct the 
principal according to the DN pattern (For example, 
uid=user@domain,dc=test,dc=com), we will get the following error:


{code:java}
23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
user
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
52e, v2580]
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
~[?:1.8.0_352]
at 
javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
~[?:1.8.0_352]
at 
javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
~[?:1.8.0_352]
at javax.naming.InitialContext.init(InitialContext.java:244) 
~[?:1.8.0_352]
at javax.naming.InitialContext.(InitialContext.java:216) 
~[?:1.8.0_352]
at 
javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
~[?:1.8.0_352]
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
{code}


we should pass user@domain directly to the LDAP provider, just like HiveServer 
did.

  was:
When the LDAP provider has domain configuration, such as Active Directory, the 
principal should not be constructed according to the DN pattern, but the user 
containing the domain should be directly passed to the LDAP 

[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liu updated SPARK-42947:
--
Description: 
When the LDAP provider has domain configuration, such as Active Directory, the 
principal should not be constructed according to the DN pattern, but the user 
containing the domain should be directly passed to the LDAP provider as the 
principal. We can refer to the implementation of Hive LdapUtils.

When the username contains a domain or domain passes from 
hive.server2.authentication.ldap.Domain configuration, if we construct the 
principal according to the DN pattern (For example, 
uid=user@domain,dc=test,dc=com), we will get the following error:


{code:java}
23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
user
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
52e, v2580]
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
~[?:1.8.0_352]
at 
javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
~[?:1.8.0_352]
at 
javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
~[?:1.8.0_352]
at javax.naming.InitialContext.init(InitialContext.java:244) 
~[?:1.8.0_352]
at javax.naming.InitialContext.(InitialContext.java:216) 
~[?:1.8.0_352]
at 
javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
~[?:1.8.0_352]
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
{code}


we should pass user@domain directly to the LDAP provider, just like HiveServer 
did.

  was:
When the LDAP provider includes domain configuration, such as Active Directory, 
the principal should not be constructed according to the DN pattern, but the 
user containing the domain should be directly passed to the LDAP 

[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liu updated SPARK-42947:
--
Description: 
When the LDAP provider includes domain configuration, such as Active Directory, 
the principal should not be constructed according to the DN pattern, but the 
user containing the domain should be directly passed to the LDAP provider as 
the principal. We can refer to the implementation of Hive LdapUtils.

When the username contains a domain or domain passes from 
hive.server2.authentication.ldap.Domain configuration, if we construct the 
principal according to the DN pattern (For example, 
uid=user@domain,dc=test,dc=com), we will get the following error:


{code:java}
23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
user
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
52e, v2580]
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
~[?:1.8.0_352]
at 
javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
~[?:1.8.0_352]
at 
javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
~[?:1.8.0_352]
at javax.naming.InitialContext.init(InitialContext.java:244) 
~[?:1.8.0_352]
at javax.naming.InitialContext.(InitialContext.java:216) 
~[?:1.8.0_352]
at 
javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
~[?:1.8.0_352]
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
{code}


we should pass user@domain directly to the LDAP provider, just like HiveServer 
did.

  was:
When the LDAP provider includes domain configuration, such as Active Directory, 
the principal should not be constructed according to the DN pattern, but the 
user containing the domain should be directly passed to the LDAP 

[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liu updated SPARK-42947:
--
Description: 
When the LDAP provider includes domain configuration, such as Active Directory, 
the principal should not be constructed according to the DN pattern, but the 
user containing the domain should be directly passed to the LDAP provider as 
the principal. We can refer to the implementation of Hive LdapUtils.

When the username contains a domain or domain passes from 
hive.server2.authentication.ldap.Domain configuration, if we construct the 
principal according to the DN pattern (For example, 
uid=user@domain,dc=test,dc=com), we will get the following error:
```
23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
 ~[libthrift-0.12.0.jar:0.12.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
user
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
52e, v2580]
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
~[?:1.8.0_352]
at 
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
~[?:1.8.0_352]
at 
javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
~[?:1.8.0_352]
at 
javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
~[?:1.8.0_352]
at javax.naming.InitialContext.init(InitialContext.java:244) 
~[?:1.8.0_352]
at javax.naming.InitialContext.(InitialContext.java:216) 
~[?:1.8.0_352]
at 
javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
~[?:1.8.0_352]
at 
org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at 
org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
 ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
... 8 more
```

we should pass user@domain directly to the LDAP provider, just like HiveServer 
did.

  was:When the LDAP provider includes domain configuration, such as Active 
Directory, the principal should not be constructed according to the DN pattern, 
but the user containing the domain should be directly passed to the LDAP 
provider as the 

[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705965#comment-17705965
 ] 

Dongjoon Hyun commented on SPARK-41006:
---

Thank you for reporting and pinging me, [~dhkold]. Let me take a look at your 
PR.

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
>  
> 

[jira] [Commented] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType

2023-03-28 Thread Kent Yao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705961#comment-17705961
 ] 

Kent Yao commented on SPARK-42943:
--

issue resolved by https://github.com/apache/spark/pull/40573

> Use LONGTEXT instead of TEXT for StringType
> ---
>
> Key: SPARK-42943
> URL: https://issues.apache.org/jira/browse/SPARK-42943
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>
> MysqlDataTruncation will be thrown if the string length exceeds 65535



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType

2023-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-42943.
--
   Fix Version/s: 3.5.0
Target Version/s: 3.5.0
  Resolution: Fixed

issue fixed by https://github.com/apache/spark/pull/40573

> Use LONGTEXT instead of TEXT for StringType
> ---
>
> Key: SPARK-42943
> URL: https://issues.apache.org/jira/browse/SPARK-42943
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>
> MysqlDataTruncation will be thrown if the string length exceeds 65535



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType

2023-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-42943:


Assignee: Kent Yao

> Use LONGTEXT instead of TEXT for StringType
> ---
>
> Key: SPARK-42943
> URL: https://issues.apache.org/jira/browse/SPARK-42943
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> MysqlDataTruncation will be thrown if the string length exceeds 65535



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file

2023-03-28 Thread shamim (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42127 ]


shamim deleted comment on SPARK-42127:


was (Author: JIRAUSER295634):
We are using spark 3.3.0 with hadoop 3  coming with spark.

Spark in our application is used as standalone , and we are not using HDFS file 
system.

Spark is writing on local file system.

Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we 
are getting this issue. 

 

3 Node cluster , Master running on one node and executor on 3 Node, other 
executors are not able to write , Getting MKDIR error

 

> Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
> -
>
> Key: SPARK-42127
> URL: https://issues.apache.org/jira/browse/SPARK-42127
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: shamim
>Priority: Major
>
> 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) 
> (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create 
> file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0
>  (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081)
>         at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113)
>         at 
> org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file

2023-03-28 Thread shamim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705950#comment-17705950
 ] 

shamim edited comment on SPARK-42127 at 3/28/23 11:39 AM:
--

We are using spark 3.3.0 with hadoop 3  coming with spark.

Spark in our application is used as standalone , and we are not using HDFS file 
system.

Spark is writing on local file system.

Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we 
are getting this issue. 

 

3 Node cluster , Master running on one node and executor on 3 Node, other 
executors are not able to write , Getting MKDIR error

 


was (Author: JIRAUSER295634):
We are using spark 3.3.0 with hadoop 3  coming with spark.

Spark in our application is used as standalone , and we are not using HDFS file 
system.

Spark is writing on local file system.

Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we 
are getting this issue. 

 

> Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
> -
>
> Key: SPARK-42127
> URL: https://issues.apache.org/jira/browse/SPARK-42127
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: shamim
>Priority: Major
>
> 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) 
> (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create 
> file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0
>  (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081)
>         at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113)
>         at 
> org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file

2023-03-28 Thread shamim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705950#comment-17705950
 ] 

shamim commented on SPARK-42127:


We are using spark 3.3.0 with hadoop 3  coming with spark.

Spark in our application is used as standalone , and we are not using HDFS file 
system.

Spark is writing on local file system.

Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we 
are getting this issue. 

 

> Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
> -
>
> Key: SPARK-42127
> URL: https://issues.apache.org/jira/browse/SPARK-42127
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: shamim
>Priority: Major
>
> 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) 
> (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create 
> file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0
>  (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081)
>         at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113)
>         at 
> org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705938#comment-17705938
 ] 

Jiayi Liu commented on SPARK-42947:
---

I will try to fix this.

> Spark Thriftserver LDAP should not use DN pattern if user contains domain
> -
>
> Key: SPARK-42947
> URL: https://issues.apache.org/jira/browse/SPARK-42947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiayi Liu
>Priority: Major
>
> When the LDAP provider includes domain configuration, such as Active 
> Directory, the principal should not be constructed according to the DN 
> pattern, but the user containing the domain should be directly passed to the 
> LDAP provider as the principal. We can refer to the implementation of Hive 
> LdapUtils.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42948) Execution plan error, unable to obtain desired results

2023-03-28 Thread miaowang (Jira)
miaowang created SPARK-42948:


 Summary: Execution plan error, unable to obtain desired results
 Key: SPARK-42948
 URL: https://issues.apache.org/jira/browse/SPARK-42948
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
 Environment: !image-2023-03-28-18-15-55-189.png!

!image-2023-03-28-18-17-08-017.png!

!image-2023-03-28-18-18-41-754.png!
Reporter: miaowang


A jar is packaged using SparkSession to submit Spark SQL:
{code:java}
//SparkSession.builder().appName(args(0)).config("spark.sql.crossJoin.enabled", 
true).enableHiveSupport().getOrCreate() spark.sql(arg(1)) {code}
Execute the following SQL fragment:
{code:java}
//INSERT INTO gjdw.aa partition(dt='20230327')
SELECT t1.mandt,
       t1.pur_no,
       t1.pur_item,
       t1.pur_comp_code,
       t1.pur_pur_org,
       t1.zzcoca,
       t1.zzycgdd
FROM
  (SELECT *
   FROM gjdw.aa
   WHERE 
dt=from_unixtime(unix_timestamp(date_add(from_unixtime(unix_timestamp('20230327','mmdd'),'-mm-dd'),-1),'-mm-dd'),'mmdd'))
 t1
LEFT JOIN
  (SELECT *
   FROM gjdw.aa
   WHERE dt='20230327') t ON t.pur_no = t1.pur_no
AND t.pur_item = t1.pur_item
WHERE (t.pur_no = ''
       AND t.pur_item = ''
       OR (t.pur_no IS NULL
           AND t.pur_item IS NULL)) {code}
 

Strangely, I didn't get the desired result. There was data in the table, and 
the correct value should have data inserted. However, there was no data output, 
and there was no task error message for the job. This occurred in the execution 
plan

!image-2023-03-28-18-15-07-115.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liu updated SPARK-42947:
--
Summary: Spark Thriftserver LDAP should not use DN pattern if user contains 
domain  (was: Spark Thriftserver should not use dn pattern if user contains 
domain)

> Spark Thriftserver LDAP should not use DN pattern if user contains domain
> -
>
> Key: SPARK-42947
> URL: https://issues.apache.org/jira/browse/SPARK-42947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiayi Liu
>Priority: Major
>
> When the LDAP provider includes domain configuration, such as Active 
> Directory, the principal should not be constructed according to the DN 
> pattern, but the user containing the domain should be directly passed to the 
> LDAP provider as the principal. We can refer to the implementation of Hive 
> LdapUtils.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42947) Spark Thriftserver should not use dn pattern if user contains domain

2023-03-28 Thread Jiayi Liu (Jira)
Jiayi Liu created SPARK-42947:
-

 Summary: Spark Thriftserver should not use dn pattern if user 
contains domain
 Key: SPARK-42947
 URL: https://issues.apache.org/jira/browse/SPARK-42947
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Jiayi Liu


When the LDAP provider includes domain configuration, such as Active Directory, 
the principal should not be constructed according to the DN pattern, but the 
user containing the domain should be directly passed to the LDAP provider as 
the principal. We can refer to the implementation of Hive LdapUtils.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42393) Support for Pandas/Arrow Functions API

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-42393.
--
Resolution: Resolved

> Support for Pandas/Arrow Functions API
> --
>
> Key: SPARK-42393
> URL: https://issues.apache.org/jira/browse/SPARK-42393
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API

2023-03-28 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42393:
-
Affects Version/s: (was: 3.5.0)

> Support for Pandas/Arrow Functions API
> --
>
> Key: SPARK-42393
> URL: https://issues.apache.org/jira/browse/SPARK-42393
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-28 Thread Cedric van Eetvelde (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705918#comment-17705918
 ] 

Cedric van Eetvelde commented on SPARK-41006:
-

Anyone to check this? [~dongjoon]  (Sorry if tagging the wrong person, I don't 
know who I can tag)

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat
> 

[jira] [Created] (SPARK-42946) Sensitive data could still be exposed by variable substitution

2023-03-28 Thread Kent Yao (Jira)
Kent Yao created SPARK-42946:


 Summary: Sensitive data could still be exposed by variable 
substitution
 Key: SPARK-42946
 URL: https://issues.apache.org/jira/browse/SPARK-42946
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.2, 3.4.0
Reporter: Kent Yao


Case 1 by SET syntax's key part

 
{code:java}
Time taken: 0.017 seconds, Fetched 1 row(s)
spark-sql> set ${spark.ssl.keyPassword}
         > ;
abc     {code}
Case 2 by SELECT as String lit

 
{code:java}
spark-sql> set spark.ssl.keyPassword;
spark.ssl.keyPassword    *(redacted)
Time taken: 0.009 seconds, Fetched 1 row(s)
spark-sql> select '${spark.ssl.keyPassword}'
         > ;
abc
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42928) Make resolvePersistentFunction synchronized

2023-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42928:
---

Assignee: Allison Wang

> Make resolvePersistentFunction synchronized
> ---
>
> Key: SPARK-42928
> URL: https://issues.apache.org/jira/browse/SPARK-42928
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Make resolvePersistentFunction synchronized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42928) Make resolvePersistentFunction synchronized

2023-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42928.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40557
[https://github.com/apache/spark/pull/40557]

> Make resolvePersistentFunction synchronized
> ---
>
> Key: SPARK-42928
> URL: https://issues.apache.org/jira/browse/SPARK-42928
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Make resolvePersistentFunction synchronized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42936) Unresolved having at the end of analysis when using with LCA with the having clause that can be resolved directly by its child Aggregate

2023-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42936.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40558
[https://github.com/apache/spark/pull/40558]

> Unresolved having at the end of analysis when using with LCA with the having 
> clause that can be resolved directly by its child Aggregate
> 
>
> Key: SPARK-42936
> URL: https://issues.apache.org/jira/browse/SPARK-42936
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> select sum(value1) as total_1, total_1
> from values(1, 'name', 100, 50) AS data(id, name, value1, value2)
> having total_1 > 0
> SparkException: [INTERNAL_ERROR] Found the unresolved operator: 
> 'UnresolvedHaving (total_1#353L > cast(0 as bigint)) {code}
> To trigger the issue, the having condition need to be (can be resolved by) an 
> attribute in the select.
> Without the LCA {{{}total_1{}}}, the query works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42936) Unresolved having at the end of analysis when using with LCA with the having clause that can be resolved directly by its child Aggregate

2023-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42936:
---

Assignee: Xinyi Yu

> Unresolved having at the end of analysis when using with LCA with the having 
> clause that can be resolved directly by its child Aggregate
> 
>
> Key: SPARK-42936
> URL: https://issues.apache.org/jira/browse/SPARK-42936
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>
> {code:java}
> select sum(value1) as total_1, total_1
> from values(1, 'name', 100, 50) AS data(id, name, value1, value2)
> having total_1 > 0
> SparkException: [INTERNAL_ERROR] Found the unresolved operator: 
> 'UnresolvedHaving (total_1#353L > cast(0 as bigint)) {code}
> To trigger the issue, the having condition need to be (can be resolved by) an 
> attribute in the select.
> Without the LCA {{{}total_1{}}}, the query works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42945) Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect

2023-03-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-42945:
-
Summary: Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect  (was: 
Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect)

> Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect
> ---
>
> Key: SPARK-42945
> URL: https://issues.apache.org/jira/browse/SPARK-42945
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42945) Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect

2023-03-28 Thread Allison Wang (Jira)
Allison Wang created SPARK-42945:


 Summary: Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark 
Connect
 Key: SPARK-42945
 URL: https://issues.apache.org/jira/browse/SPARK-42945
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Allison Wang


Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.5

2023-03-28 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705862#comment-17705862
 ] 

Yang Jie commented on SPARK-42382:
--

OK

> Upgrade `cyclonedx-maven-plugin` to 2.7.5
> -
>
> Key: SPARK-42382
> URL: https://issues.apache.org/jira/browse/SPARK-42382
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4]
> [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42895) ValueError when invoking any session operations on a stopped Spark session

2023-03-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-42895:
-
Issue Type: Improvement  (was: Bug)

> ValueError when invoking any session operations on a stopped Spark session
> --
>
> Key: SPARK-42895
> URL: https://issues.apache.org/jira/browse/SPARK-42895
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> If a remote Spark session is stopped, trying to invoke any session operations 
> will result in a ValueError. For example:
>  
> {code:java}
> spark.stop()
> spark.sql("select 1")
> ValueError: Cannot invoke RPC: Channel closed!
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   ...
>     return e.code() == grpc.StatusCode.UNAVAILABLE
> AttributeError: 'ValueError' object has no attribute 'code'{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org