[jira] [Created] (SPARK-42958) Refactor `CheckConnectJvmClientCompatibility` to compare client and avro
Yang Jie created SPARK-42958: Summary: Refactor `CheckConnectJvmClientCompatibility` to compare client and avro Key: SPARK-42958 URL: https://issues.apache.org/jira/browse/SPARK-42958 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42957) `release-build.sh` should not remove SBOM artifacts
[ https://issues.apache.org/jira/browse/SPARK-42957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42957: -- Priority: Critical (was: Major) > `release-build.sh` should not remove SBOM artifacts > --- > > Key: SPARK-42957 > URL: https://issues.apache.org/jira/browse/SPARK-42957 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42957) `release-build.sh` should not remove SBOM artifacts
Dongjoon Hyun created SPARK-42957: - Summary: `release-build.sh` should not remove SBOM artifacts Key: SPARK-42957 URL: https://issues.apache.org/jira/browse/SPARK-42957 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42956) avro functions
Yang Jie created SPARK-42956: Summary: avro functions Key: SPARK-42956 URL: https://issues.apache.org/jira/browse/SPARK-42956 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42946) Sensitive data could still be exposed by variable substitution
[ https://issues.apache.org/jira/browse/SPARK-42946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-42946. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40576 [https://github.com/apache/spark/pull/40576] > Sensitive data could still be exposed by variable substitution > -- > > Key: SPARK-42946 > URL: https://issues.apache.org/jira/browse/SPARK-42946 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.4.0 > > > Case 1 by SET syntax's key part > > {code:java} > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> set ${spark.ssl.keyPassword} > > ; > abc {code} > Case 2 by SELECT as String lit > > {code:java} > spark-sql> set spark.ssl.keyPassword; > spark.ssl.keyPassword *(redacted) > Time taken: 0.009 seconds, Fetched 1 row(s) > spark-sql> select '${spark.ssl.keyPassword}' > > ; > abc > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42946) Sensitive data could still be exposed by variable substitution
[ https://issues.apache.org/jira/browse/SPARK-42946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-42946: Assignee: Kent Yao > Sensitive data could still be exposed by variable substitution > -- > > Key: SPARK-42946 > URL: https://issues.apache.org/jira/browse/SPARK-42946 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Case 1 by SET syntax's key part > > {code:java} > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> set ${spark.ssl.keyPassword} > > ; > abc {code} > Case 2 by SELECT as String lit > > {code:java} > spark-sql> set spark.ssl.keyPassword; > spark.ssl.keyPassword *(redacted) > Time taken: 0.009 seconds, Fetched 1 row(s) > spark-sql> select '${spark.ssl.keyPassword}' > > ; > abc > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42955) Skip classifyException and wrap AnalysisException for SparkThrowable
Kent Yao created SPARK-42955: Summary: Skip classifyException and wrap AnalysisException for SparkThrowable Key: SPARK-42955 URL: https://issues.apache.org/jira/browse/SPARK-42955 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API
[ https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42393: - Description: There are derivative APIs which depend on the implementation of Pandas UDFs: Pandas Function APIs and Arrow Function APIs, as shown below: !image-2023-03-29-11-40-44-318.png|width=576,height=225! Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. was:See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > Support for Pandas/Arrow Functions API > -- > > Key: SPARK-42393 > URL: https://issues.apache.org/jira/browse/SPARK-42393 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Attachments: image-2023-03-29-11-40-44-318.png > > > There are derivative APIs which depend on the implementation of Pandas UDFs: > Pandas Function APIs and Arrow Function APIs, as shown below: > !image-2023-03-29-11-40-44-318.png|width=576,height=225! > > Spark Connect Python Client (SCPC), as a client and server interface for > PySpark will eventually replace the legacy API of PySpark. Supporting PySpark > UDFs is essential for Spark Connect to reach parity with the PySpark legacy > API. > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API
[ https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42393: - Attachment: image-2023-03-29-11-40-44-318.png > Support for Pandas/Arrow Functions API > -- > > Key: SPARK-42393 > URL: https://issues.apache.org/jira/browse/SPARK-42393 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Attachments: image-2023-03-29-11-40-44-318.png > > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python
[ https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41661: - Description: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. was: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark in OSS. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. > Support for User-defined Functions in Python > > > Key: SPARK-41661 > URL: https://issues.apache.org/jira/browse/SPARK-41661 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Xinrong Meng >Priority: Major > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > User-defined Functions in Python consist of (pickled) Python UDFs and > (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code > on top of the Apache Spark™ engine. Users only have to state "what to do"; > PySpark, as a sandbox, encapsulates "how to do it". > Spark Connect Python Client (SCPC), as a client and server interface for > PySpark will eventually replace the legacy API of PySpark. Supporting PySpark > UDFs is essential for Spark Connect to reach parity with the PySpark legacy > API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python
[ https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41661: - Description: User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. was: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. > Support for User-defined Functions in Python > > > Key: SPARK-41661 > URL: https://issues.apache.org/jira/browse/SPARK-41661 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Xinrong Meng >Priority: Major > > User-defined Functions in Python consist of (pickled) Python UDFs and > (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code > on top of the Apache Spark™ engine. Users only have to state "what to do"; > PySpark, as a sandbox, encapsulates "how to do it". > Spark Connect Python Client (SCPC), as a client and server interface for > PySpark will eventually replace the legacy API of PySpark. Supporting PySpark > UDFs is essential for Spark Connect to reach parity with the PySpark legacy > API. > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python
[ https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41661: - Description: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark in OSS. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. was: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark, will eventually (probably Spark 4.0) replace the legacy API of PySpark in both OSS. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. > Support for User-defined Functions in Python > > > Key: SPARK-41661 > URL: https://issues.apache.org/jira/browse/SPARK-41661 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Xinrong Meng >Priority: Major > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > User-defined Functions in Python consist of (pickled) Python UDFs and > (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code > on top of the Apache Spark™ engine. Users only have to state "what to do"; > PySpark, as a sandbox, encapsulates "how to do it". > Spark Connect Python Client (SCPC), as a client and server interface for > PySpark will eventually replace the legacy API of PySpark in OSS. Supporting > PySpark UDFs is essential for Spark Connect to reach parity with the PySpark > legacy API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python
[ https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41661: - Description: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Spark™ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it". Spark Connect Python Client (SCPC), as a client and server interface for PySpark, will eventually (probably Spark 4.0) replace the legacy API of PySpark in both OSS. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API. was: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. PySpark UDFs mainly consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. > Support for User-defined Functions in Python > > > Key: SPARK-41661 > URL: https://issues.apache.org/jira/browse/SPARK-41661 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Xinrong Meng >Priority: Major > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > User-defined Functions in Python consist of (pickled) Python UDFs and > (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code > on top of the Apache Spark™ engine. Users only have to state "what to do"; > PySpark, as a sandbox, encapsulates "how to do it". > Spark Connect Python Client (SCPC), as a client and server interface for > PySpark, will eventually (probably Spark 4.0) replace the legacy API of > PySpark in both OSS. Supporting PySpark UDFs is essential for Spark Connect > to reach parity with the PySpark legacy API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41661) Support for User-defined Functions in Python
[ https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41661: - Description: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. PySpark UDFs mainly consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. was:Spark Connect should support Python UDFs > Support for User-defined Functions in Python > > > Key: SPARK-41661 > URL: https://issues.apache.org/jira/browse/SPARK-41661 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Xinrong Meng >Priority: Major > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > PySpark UDFs mainly consist of (pickled) Python UDFs and (Arrow-optimized) > Pandas UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API
[ https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42393: - Description: See design doc [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. > Support for Pandas/Arrow Functions API > -- > > Key: SPARK-42393 > URL: https://issues.apache.org/jira/browse/SPARK-42393 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > See design doc > [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42954) Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client
Ruifeng Zheng created SPARK-42954: - Summary: Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client Key: SPARK-42954 URL: https://issues.apache.org/jira/browse/SPARK-42954 Project: Spark Issue Type: New Feature Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39153) When we look at spark UI or History, we can see the failed tasks first
[ https://issues.apache.org/jira/browse/SPARK-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-39153. - Resolution: Not A Problem > When we look at spark UI or History, we can see the failed tasks first > -- > > Key: SPARK-39153 > URL: https://issues.apache.org/jira/browse/SPARK-39153 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 > Environment: spark 3.2.0 >Reporter: jingxiong zhong >Priority: Major > Fix For: 3.2.0 > > > When a task fails, users are more concerned about the causes of failed tasks > and failed tasks. The Current Spark UI and History are sorted according to > "Index" rather than "Errors". When a large number of tasks are sorted, you > need to wait a certain period for tasks to be sorted. In order to find the > cause of Errors for failed tasks, we can improve the user experience by > specifying sorting by the "Errors" column at the beginning -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed
[ https://issues.apache.org/jira/browse/SPARK-39967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-39967. - Resolution: Fixed New version not reproduced > Instead of using the scalar tasksSuccessful, use the successful array to > calculate whether the task is completed > > > Key: SPARK-39967 > URL: https://issues.apache.org/jira/browse/SPARK-39967 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3, 2.4.6 >Reporter: jingxiong zhong >Priority: Critical > Attachments: spark1-1.png, spark2.png, spark3-1.png > > > When counting the number of successful tasks in the stage of spark, spark > uses the indicator of `tasksSuccessful`, but in fact, the success or failure > of tasks is based on the array of `successful`. Through the log I added, it > is found that the number of failed tasks counted by `tasksSuccessful` is > inconsistent with the number of failures stored in the array of `successful`. > We should take `successful` as the standard. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42953) Impl typed map, flatMap, mapPartitions in Dataset
Zhen Li created SPARK-42953: --- Summary: Impl typed map, flatMap, mapPartitions in Dataset Key: SPARK-42953 URL: https://issues.apache.org/jira/browse/SPARK-42953 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Add missing typed API support in the Dataset API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42952) Simplify the parameter of analyzer rule PreprocessTableCreation and DataSourceAnalysis
[ https://issues.apache.org/jira/browse/SPARK-42952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-42952: --- Summary: Simplify the parameter of analyzer rule PreprocessTableCreation and DataSourceAnalysis (was: Simplify the parameter of analysis rule PreprocessTableCreation and DataSourceAnalysis) > Simplify the parameter of analyzer rule PreprocessTableCreation and > DataSourceAnalysis > -- > > Key: SPARK-42952 > URL: https://issues.apache.org/jira/browse/SPARK-42952 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42952) Simplify the parameter of analysis rule PreprocessTableCreation and DataSourceAnalysis
Gengliang Wang created SPARK-42952: -- Summary: Simplify the parameter of analysis rule PreprocessTableCreation and DataSourceAnalysis Key: SPARK-42952 URL: https://issues.apache.org/jira/browse/SPARK-42952 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.5.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()
Wei Liu created SPARK-42951: --- Summary: Spark Connect: Streaming DataStreamReader API except table() Key: SPARK-42951 URL: https://issues.apache.org/jira/browse/SPARK-42951 Project: Spark Issue Type: Story Components: Connect, Structured Streaming Affects Versions: 3.4.0, 3.5.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42950) Add exit code in SparkListenerApplicationEnd
Paul Laffon created SPARK-42950: --- Summary: Add exit code in SparkListenerApplicationEnd Key: SPARK-42950 URL: https://issues.apache.org/jira/browse/SPARK-42950 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.2 Reporter: Paul Laffon When an application ends, the {{SparkListener}} receives a final event called {{{}SparkListenerApplicationEnd{}}}. This event currently only includes a timestamp, but it would be beneficial to also include the exitCode of the application. This additional information would provide insight into whether the application succeeded or failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42949) Simplify code for NAAJ
Cheng Pan created SPARK-42949: - Summary: Simplify code for NAAJ Key: SPARK-42949 URL: https://issues.apache.org/jira/browse/SPARK-42949 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42927) Make `o.a.spark.util.Iterators#size` as `private[spark]`
[ https://issues.apache.org/jira/browse/SPARK-42927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42927. -- Fix Version/s: 3.4.1 3.5.0 Assignee: Yang Jie Resolution: Fixed Resolved by https://github.com/apache/spark/pull/40556 > Make `o.a.spark.util.Iterators#size` as `private[spark]` > > > Key: SPARK-42927 > URL: https://issues.apache.org/jira/browse/SPARK-42927 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.4.1, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42927) Make `o.a.spark.util.Iterators#size` as `private[spark]`
[ https://issues.apache.org/jira/browse/SPARK-42927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-42927: - Priority: Trivial (was: Major) > Make `o.a.spark.util.Iterators#size` as `private[spark]` > > > Key: SPARK-42927 > URL: https://issues.apache.org/jira/browse/SPARK-42927 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705990#comment-17705990 ] Jiayi Liu commented on SPARK-42947: --- issue fixed by https://github.com/apache/spark/pull/40577 > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider has domain configuration, such as Active Directory, > the principal should not be constructed according to the DN pattern, but the > username containing the domain should be directly passed to the LDAP provider > as the principal. We can refer to the implementation of Hive LdapUtils. > When the username contains a domain or domain passes from > hive.server2.authentication.ldap.Domain configuration, if we construct the > principal according to the DN pattern (For example, > uid=user@domain,dc=test,dc=com), we will get the following error: > {code:java} > 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: Error validating the login > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) > ~[libthrift-0.12.0.jar:0.12.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_352] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_352] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] > Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP > user > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > ... 8 more > Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - > 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data > 52e, v2580] > at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) > ~[?:1.8.0_352] > at > javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) > ~[?:1.8.0_352] > at > javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) > ~[?:1.8.0_352] > at javax.naming.InitialContext.init(InitialContext.java:244) > ~[?:1.8.0_352] > at javax.naming.InitialContext.(InitialContext.java:216) > ~[?:1.8.0_352] > at > javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) > ~[?:1.8.0_352] > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at >
[jira] [Updated] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled
[ https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42937: -- Affects Version/s: (was: 3.5.0) > Join with subquery in condition can fail with wholestage codegen and adaptive > execution disabled > > > Key: SPARK-42937 > URL: https://issues.apache.org/jira/browse/SPARK-42937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.3.3, 3.4.1 > > > The below left outer join gets an error: > {noformat} > create or replace temp view v1 as > select * from values > (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), > (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), > (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) > as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, > value9, value10); > create or replace temp view v2 as > select * from values > (1, 2), > (3, 8), > (7, 9) > as v2(a, b); > create or replace temp view v3 as > select * from values > (3), > (8) > as v3(col1); > set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100 > set spark.sql.adaptive.enabled=false; > select * > from v1 > left outer join v2 > on key = a > and key in (select col1 from v3); > {noformat} > The join fails during predicate codegen: > {noformat} > 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to > interpreter mode > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40) > {noformat} > It fails again after fallback to interpreter mode: > {noformat} > 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205) > {noformat} > Both the predicate codegen and the evaluation fail for the same reason: > {{PlanSubqueries}} creates {{InSubqueryExec}} with {{shouldBroadcast=false}}. > The driver waits for the subquery to finish, but it's the executor that uses > the results of the subquery (for
[jira] [Assigned] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled
[ https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42937: - Assignee: Bruce Robbins > Join with subquery in condition can fail with wholestage codegen and adaptive > execution disabled > > > Key: SPARK-42937 > URL: https://issues.apache.org/jira/browse/SPARK-42937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > > The below left outer join gets an error: > {noformat} > create or replace temp view v1 as > select * from values > (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), > (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), > (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) > as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, > value9, value10); > create or replace temp view v2 as > select * from values > (1, 2), > (3, 8), > (7, 9) > as v2(a, b); > create or replace temp view v3 as > select * from values > (3), > (8) > as v3(col1); > set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100 > set spark.sql.adaptive.enabled=false; > select * > from v1 > left outer join v2 > on key = a > and key in (select col1 from v3); > {noformat} > The join fails during predicate codegen: > {noformat} > 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to > interpreter mode > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40) > {noformat} > It fails again after fallback to interpreter mode: > {noformat} > 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205) > {noformat} > Both the predicate codegen and the evaluation fail for the same reason: > {{PlanSubqueries}} creates {{InSubqueryExec}} with {{shouldBroadcast=false}}. > The driver waits for the subquery to finish, but it's the executor that uses > the results of the subquery (for predicate codegen or evaluation).
[jira] [Resolved] (SPARK-42937) Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled
[ https://issues.apache.org/jira/browse/SPARK-42937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42937. --- Fix Version/s: 3.3.3 3.4.1 Resolution: Fixed Issue resolved by pull request 40569 [https://github.com/apache/spark/pull/40569] > Join with subquery in condition can fail with wholestage codegen and adaptive > execution disabled > > > Key: SPARK-42937 > URL: https://issues.apache.org/jira/browse/SPARK-42937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.3.3, 3.4.1 > > > The below left outer join gets an error: > {noformat} > create or replace temp view v1 as > select * from values > (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), > (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), > (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) > as v1(key, value1, value2, value3, value4, value5, value6, value7, value8, > value9, value10); > create or replace temp view v2 as > select * from values > (1, 2), > (3, 8), > (7, 9) > as v2(a, b); > create or replace temp view v3 as > select * from values > (3), > (8) > as v3(col1); > set spark.sql.codegen.maxFields=10; -- let's make maxFields 10 instead of 100 > set spark.sql.adaptive.enabled=false; > select * > from v1 > left outer join v2 > on key = a > and key in (select col1 from v3); > {noformat} > The join fails during predicate codegen: > {noformat} > 23/03/27 12:24:12 WARN Predicate: Expr codegen error and falling back to > interpreter mode > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.doGenCode(subquery.scala:156) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressions$2(CodeGenerator.scala:1278) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressions(CodeGenerator.scala:1278) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:41) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:33) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:73) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.createCodeGeneratedObject(predicates.scala:70) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:86) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.boundCondition$(HashJoin.scala:140) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition$lzycompute(BroadcastHashJoinExec.scala:40) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.boundCondition(BroadcastHashJoinExec.scala:40) > {noformat} > It fails again after fallback to interpreter mode: > {noformat} > 23/03/27 12:24:12 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) > java.lang.IllegalArgumentException: requirement failed: input[0, int, false] > IN subquery#34 has not finished > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.execution.InSubqueryExec.prepareResult(subquery.scala:144) > at > org.apache.spark.sql.execution.InSubqueryExec.eval(subquery.scala:151) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:52) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$boundCondition$2$adapted(HashJoin.scala:146) > at > org.apache.spark.sql.execution.joins.HashJoin.$anonfun$outerJoin$1(HashJoin.scala:205) > {noformat} > Both the predicate codegen and the evaluation fail for the same reason: > {{PlanSubqueries}} creates {{InSubqueryExec}} with
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the username containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider has domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: {code:java} 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more {code} we should pass user@domain directly to the LDAP provider, just like HiveServer did. was: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Description: When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. When the username contains a domain or domain passes from hive.server2.authentication.ldap.Domain configuration, if we construct the principal according to the DN pattern (For example, uid=user@domain,dc=test,dc=com), we will get the following error: ``` 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) ~[libthrift-0.12.0.jar:0.12.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) ~[libthrift-0.12.0.jar:0.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP user at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 52e, v2580] at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) ~[?:1.8.0_352] at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) ~[?:1.8.0_352] at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) ~[?:1.8.0_352] at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) ~[?:1.8.0_352] at javax.naming.InitialContext.init(InitialContext.java:244) ~[?:1.8.0_352] at javax.naming.InitialContext.(InitialContext.java:216) ~[?:1.8.0_352] at javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) ~[?:1.8.0_352] at org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] ... 8 more ``` we should pass user@domain directly to the LDAP provider, just like HiveServer did. was:When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the
[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705965#comment-17705965 ] Dongjoon Hyun commented on SPARK-41006: --- Thank you for reporting and pinging me, [~dhkold]. Let me take a look at your PR. > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > >
[jira] [Commented] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType
[ https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705961#comment-17705961 ] Kent Yao commented on SPARK-42943: -- issue resolved by https://github.com/apache/spark/pull/40573 > Use LONGTEXT instead of TEXT for StringType > --- > > Key: SPARK-42943 > URL: https://issues.apache.org/jira/browse/SPARK-42943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > > MysqlDataTruncation will be thrown if the string length exceeds 65535 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType
[ https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-42943. -- Fix Version/s: 3.5.0 Target Version/s: 3.5.0 Resolution: Fixed issue fixed by https://github.com/apache/spark/pull/40573 > Use LONGTEXT instead of TEXT for StringType > --- > > Key: SPARK-42943 > URL: https://issues.apache.org/jira/browse/SPARK-42943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > > MysqlDataTruncation will be thrown if the string length exceeds 65535 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42943) Use LONGTEXT instead of TEXT for StringType
[ https://issues.apache.org/jira/browse/SPARK-42943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-42943: Assignee: Kent Yao > Use LONGTEXT instead of TEXT for StringType > --- > > Key: SPARK-42943 > URL: https://issues.apache.org/jira/browse/SPARK-42943 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > MysqlDataTruncation will be thrown if the string length exceeds 65535 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
[ https://issues.apache.org/jira/browse/SPARK-42127 ] shamim deleted comment on SPARK-42127: was (Author: JIRAUSER295634): We are using spark 3.3.0 with hadoop 3 coming with spark. Spark in our application is used as standalone , and we are not using HDFS file system. Spark is writing on local file system. Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we are getting this issue. 3 Node cluster , Master running on one node and executor on 3 Node, other executors are not able to write , Getting MKDIR error > Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file > - > > Key: SPARK-42127 > URL: https://issues.apache.org/jira/browse/SPARK-42127 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: shamim >Priority: Major > > 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) > (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create > file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0 > (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113) > at > org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238) > at > org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126) > at > org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
[ https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705950#comment-17705950 ] shamim edited comment on SPARK-42127 at 3/28/23 11:39 AM: -- We are using spark 3.3.0 with hadoop 3 coming with spark. Spark in our application is used as standalone , and we are not using HDFS file system. Spark is writing on local file system. Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we are getting this issue. 3 Node cluster , Master running on one node and executor on 3 Node, other executors are not able to write , Getting MKDIR error was (Author: JIRAUSER295634): We are using spark 3.3.0 with hadoop 3 coming with spark. Spark in our application is used as standalone , and we are not using HDFS file system. Spark is writing on local file system. Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we are getting this issue. > Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file > - > > Key: SPARK-42127 > URL: https://issues.apache.org/jira/browse/SPARK-42127 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: shamim >Priority: Major > > 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) > (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create > file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0 > (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113) > at > org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238) > at > org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126) > at > org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
[ https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705950#comment-17705950 ] shamim commented on SPARK-42127: We are using spark 3.3.0 with hadoop 3 coming with spark. Spark in our application is used as standalone , and we are not using HDFS file system. Spark is writing on local file system. Same spark version 3.3.0 is working fine with hadoop 2. but with hadoop 3 , we are getting this issue. > Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file > - > > Key: SPARK-42127 > URL: https://issues.apache.org/jira/browse/SPARK-42127 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: shamim >Priority: Major > > 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) > (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create > file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0 > (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113) > at > org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238) > at > org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126) > at > org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705938#comment-17705938 ] Jiayi Liu commented on SPARK-42947: --- I will try to fix this. > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider includes domain configuration, such as Active > Directory, the principal should not be constructed according to the DN > pattern, but the user containing the domain should be directly passed to the > LDAP provider as the principal. We can refer to the implementation of Hive > LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42948) Execution plan error, unable to obtain desired results
miaowang created SPARK-42948: Summary: Execution plan error, unable to obtain desired results Key: SPARK-42948 URL: https://issues.apache.org/jira/browse/SPARK-42948 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Environment: !image-2023-03-28-18-15-55-189.png! !image-2023-03-28-18-17-08-017.png! !image-2023-03-28-18-18-41-754.png! Reporter: miaowang A jar is packaged using SparkSession to submit Spark SQL: {code:java} //SparkSession.builder().appName(args(0)).config("spark.sql.crossJoin.enabled", true).enableHiveSupport().getOrCreate() spark.sql(arg(1)) {code} Execute the following SQL fragment: {code:java} //INSERT INTO gjdw.aa partition(dt='20230327') SELECT t1.mandt, t1.pur_no, t1.pur_item, t1.pur_comp_code, t1.pur_pur_org, t1.zzcoca, t1.zzycgdd FROM (SELECT * FROM gjdw.aa WHERE dt=from_unixtime(unix_timestamp(date_add(from_unixtime(unix_timestamp('20230327','mmdd'),'-mm-dd'),-1),'-mm-dd'),'mmdd')) t1 LEFT JOIN (SELECT * FROM gjdw.aa WHERE dt='20230327') t ON t.pur_no = t1.pur_no AND t.pur_item = t1.pur_item WHERE (t.pur_no = '' AND t.pur_item = '' OR (t.pur_no IS NULL AND t.pur_item IS NULL)) {code} Strangely, I didn't get the desired result. There was data in the table, and the correct value should have data inserted. However, there was no data output, and there was no task error message for the job. This occurred in the execution plan !image-2023-03-28-18-15-07-115.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liu updated SPARK-42947: -- Summary: Spark Thriftserver LDAP should not use DN pattern if user contains domain (was: Spark Thriftserver should not use dn pattern if user contains domain) > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider includes domain configuration, such as Active > Directory, the principal should not be constructed according to the DN > pattern, but the user containing the domain should be directly passed to the > LDAP provider as the principal. We can refer to the implementation of Hive > LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42947) Spark Thriftserver should not use dn pattern if user contains domain
Jiayi Liu created SPARK-42947: - Summary: Spark Thriftserver should not use dn pattern if user contains domain Key: SPARK-42947 URL: https://issues.apache.org/jira/browse/SPARK-42947 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Jiayi Liu When the LDAP provider includes domain configuration, such as Active Directory, the principal should not be constructed according to the DN pattern, but the user containing the domain should be directly passed to the LDAP provider as the principal. We can refer to the implementation of Hive LdapUtils. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42393) Support for Pandas/Arrow Functions API
[ https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-42393. -- Resolution: Resolved > Support for Pandas/Arrow Functions API > -- > > Key: SPARK-42393 > URL: https://issues.apache.org/jira/browse/SPARK-42393 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0, 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42393) Support for Pandas/Arrow Functions API
[ https://issues.apache.org/jira/browse/SPARK-42393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42393: - Affects Version/s: (was: 3.5.0) > Support for Pandas/Arrow Functions API > -- > > Key: SPARK-42393 > URL: https://issues.apache.org/jira/browse/SPARK-42393 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705918#comment-17705918 ] Cedric van Eetvelde commented on SPARK-41006: - Anyone to check this? [~dongjoon] (Sorry if tagging the wrong person, I don't know who I can tag) > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat >
[jira] [Created] (SPARK-42946) Sensitive data could still be exposed by variable substitution
Kent Yao created SPARK-42946: Summary: Sensitive data could still be exposed by variable substitution Key: SPARK-42946 URL: https://issues.apache.org/jira/browse/SPARK-42946 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2, 3.4.0 Reporter: Kent Yao Case 1 by SET syntax's key part {code:java} Time taken: 0.017 seconds, Fetched 1 row(s) spark-sql> set ${spark.ssl.keyPassword} > ; abc {code} Case 2 by SELECT as String lit {code:java} spark-sql> set spark.ssl.keyPassword; spark.ssl.keyPassword *(redacted) Time taken: 0.009 seconds, Fetched 1 row(s) spark-sql> select '${spark.ssl.keyPassword}' > ; abc {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42928) Make resolvePersistentFunction synchronized
[ https://issues.apache.org/jira/browse/SPARK-42928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42928: --- Assignee: Allison Wang > Make resolvePersistentFunction synchronized > --- > > Key: SPARK-42928 > URL: https://issues.apache.org/jira/browse/SPARK-42928 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Make resolvePersistentFunction synchronized -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42928) Make resolvePersistentFunction synchronized
[ https://issues.apache.org/jira/browse/SPARK-42928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42928. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40557 [https://github.com/apache/spark/pull/40557] > Make resolvePersistentFunction synchronized > --- > > Key: SPARK-42928 > URL: https://issues.apache.org/jira/browse/SPARK-42928 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.4.0 > > > Make resolvePersistentFunction synchronized -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42936) Unresolved having at the end of analysis when using with LCA with the having clause that can be resolved directly by its child Aggregate
[ https://issues.apache.org/jira/browse/SPARK-42936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42936. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40558 [https://github.com/apache/spark/pull/40558] > Unresolved having at the end of analysis when using with LCA with the having > clause that can be resolved directly by its child Aggregate > > > Key: SPARK-42936 > URL: https://issues.apache.org/jira/browse/SPARK-42936 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > Fix For: 3.4.0 > > > {code:java} > select sum(value1) as total_1, total_1 > from values(1, 'name', 100, 50) AS data(id, name, value1, value2) > having total_1 > 0 > SparkException: [INTERNAL_ERROR] Found the unresolved operator: > 'UnresolvedHaving (total_1#353L > cast(0 as bigint)) {code} > To trigger the issue, the having condition need to be (can be resolved by) an > attribute in the select. > Without the LCA {{{}total_1{}}}, the query works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42936) Unresolved having at the end of analysis when using with LCA with the having clause that can be resolved directly by its child Aggregate
[ https://issues.apache.org/jira/browse/SPARK-42936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42936: --- Assignee: Xinyi Yu > Unresolved having at the end of analysis when using with LCA with the having > clause that can be resolved directly by its child Aggregate > > > Key: SPARK-42936 > URL: https://issues.apache.org/jira/browse/SPARK-42936 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > > {code:java} > select sum(value1) as total_1, total_1 > from values(1, 'name', 100, 50) AS data(id, name, value1, value2) > having total_1 > 0 > SparkException: [INTERNAL_ERROR] Found the unresolved operator: > 'UnresolvedHaving (total_1#353L > cast(0 as bigint)) {code} > To trigger the issue, the having condition need to be (can be resolved by) an > attribute in the select. > Without the LCA {{{}total_1{}}}, the query works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42945) Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-42945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-42945: - Summary: Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect (was: Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect) > Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect > --- > > Key: SPARK-42945 > URL: https://issues.apache.org/jira/browse/SPARK-42945 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > > Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42945) Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect
Allison Wang created SPARK-42945: Summary: Make PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect Key: SPARK-42945 URL: https://issues.apache.org/jira/browse/SPARK-42945 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Allison Wang Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705862#comment-17705862 ] Yang Jie commented on SPARK-42382: -- OK > Upgrade `cyclonedx-maven-plugin` to 2.7.5 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42895) ValueError when invoking any session operations on a stopped Spark session
[ https://issues.apache.org/jira/browse/SPARK-42895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-42895: - Issue Type: Improvement (was: Bug) > ValueError when invoking any session operations on a stopped Spark session > -- > > Key: SPARK-42895 > URL: https://issues.apache.org/jira/browse/SPARK-42895 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > > If a remote Spark session is stopped, trying to invoke any session operations > will result in a ValueError. For example: > > {code:java} > spark.stop() > spark.sql("select 1") > ValueError: Cannot invoke RPC: Channel closed! > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > ... > return e.code() == grpc.StatusCode.UNAVAILABLE > AttributeError: 'ValueError' object has no attribute 'code'{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org