[jira] [Reopened] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[ https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar reopened SPARK-47986: -- This issue was not fully resolved by the previous pull request. A follow up fix is here: https://github.com/apache/spark/pull/46435 > [CONNECT][PYTHON] Unable to create a new session when the default session is > closed by the server > - > > Key: SPARK-47986 > URL: https://issues.apache.org/jira/browse/SPARK-47986 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0, 3.5.1 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the server closes a session, usually after a cluster restart, the client > is unaware of this until it receives an error. > Once it does so, there is no way for the client to create a new session since > the stale sessions are still recorded as default and active sessions. > The only solution currently is to restart the Python interpreter on the > client, or to reach into the session builder and change the active or default > session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47544) [Pyspark] SparkSession builder method is incompatible with vs code intellisense
[ https://issues.apache.org/jira/browse/SPARK-47544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar updated SPARK-47544: - Attachment: old.mov > [Pyspark] SparkSession builder method is incompatible with vs code > intellisense > --- > > Key: SPARK-47544 > URL: https://issues.apache.org/jira/browse/SPARK-47544 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Niranjan Jayakar >Priority: Major > Attachments: old.mov > > > VS code's intellisense is unable to recognize the methods under > `SparkSession.builder`. > > See attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47544) [Pyspark] SparkSession builder method is incompatible with vs code intellisense
Niranjan Jayakar created SPARK-47544: Summary: [Pyspark] SparkSession builder method is incompatible with vs code intellisense Key: SPARK-47544 URL: https://issues.apache.org/jira/browse/SPARK-47544 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Niranjan Jayakar VS code's intellisense is unable to recognize the methods under `SparkSession.builder`. See attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46265) New assertions in AddArtifact RPC make the connect client incompatible with older clusters
Niranjan Jayakar created SPARK-46265: Summary: New assertions in AddArtifact RPC make the connect client incompatible with older clusters Key: SPARK-46265 URL: https://issues.apache.org/jira/browse/SPARK-46265 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Niranjan Jayakar We added new assertions to the AddArtifact RPC - [https://github.com/apache/spark/commit/d9c5f9d6#diff-d4744c7abd099c57d04746140aba3c20b93f1ac011f5915f963e0a3e0758690e] As part of this change, we have also updated the RPC implementation to return session id as part of its response. Since the assertion depends on the session id to be present, it makes the protocol incompatible such that newer Connect clients that apply this assertion are incompatible with older clusters that don't have the corresponding service side changes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
Niranjan Jayakar created SPARK-46074: Summary: [CONNECT][SCALA] Insufficient details in error when a UDF fails Key: SPARK-46074 URL: https://issues.apache.org/jira/browse/SPARK-46074 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Niranjan Jayakar Currently, when a UDF fails the connect client does not receive the actual error that caused the failure. As an example, the error message looks like - {code:java} Exception in thread "main" org.apache.spark.SparkException: grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). SQLSTATE: 39000 {code} In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44816) Cryptic error message when UDF associated class is not found
Niranjan Jayakar created SPARK-44816: Summary: Cryptic error message when UDF associated class is not found Key: SPARK-44816 URL: https://issues.apache.org/jira/browse/SPARK-44816 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Niranjan Jayakar When a Dataset API is used that either requires or is modeled as a UDF, the class defining the UDF/function should be uploaded to the service fist using the `addArtifact()` API. When this is not done, an error is thrown. However, this error message is cryptic and is not clear about the problem. Improve this error message to make it clear that an expected class was not found. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44291) [CONNECT][SCALA] range query returns incorrect schema
Niranjan Jayakar created SPARK-44291: Summary: [CONNECT][SCALA] range query returns incorrect schema Key: SPARK-44291 URL: https://issues.apache.org/jira/browse/SPARK-44291 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.1 Reporter: Niranjan Jayakar The following code on Spark Connect produces the following output Code: {code:java} val df = spark.range(3) df.show() df.printSchema(){code} Output: {code:java} +---+ | id| +---+ | 0| | 1| | 2| +---+ root |-- value: long (nullable = true) {code} The mismatch is that one shows the column as "id" while the other shows this as "value". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43457) [PYTHON][CONNECT] user agent should include the OS and Python versions
Niranjan Jayakar created SPARK-43457: Summary: [PYTHON][CONNECT] user agent should include the OS and Python versions Key: SPARK-43457 URL: https://issues.apache.org/jira/browse/SPARK-43457 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Niranjan Jayakar Including OS and Python versions in the user agent improves tracking to see how Spark Connect is used across Python versions and the different platforms it's used from -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43456) [SCALA][CONNECT] user agent should include the OS and Python versions
Niranjan Jayakar created SPARK-43456: Summary: [SCALA][CONNECT] user agent should include the OS and Python versions Key: SPARK-43456 URL: https://issues.apache.org/jira/browse/SPARK-43456 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Niranjan Jayakar Including OS and Python versions in the user agent improves tracking to see how Spark Connect is used across Python versions and the different platforms it's used from -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43192) Spark connect's user agent validations are too restrictive
Niranjan Jayakar created SPARK-43192: Summary: Spark connect's user agent validations are too restrictive Key: SPARK-43192 URL: https://issues.apache.org/jira/browse/SPARK-43192 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Niranjan Jayakar The current restriction on allowed charset and length are too restrictive https://github.com/apache/spark/blob/cac6f58318bb84d532f02d245a50d3c66daa3e4b/python/pyspark/sql/connect/client.py#L274-L275 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43172) Expose host and bearer tokens from the spark connect client
Niranjan Jayakar created SPARK-43172: Summary: Expose host and bearer tokens from the spark connect client Key: SPARK-43172 URL: https://issues.apache.org/jira/browse/SPARK-43172 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Niranjan Jayakar The `SparkConnectClient` class takes in a connection string to connect with the spark connect service. As part of setting up the connection, it parses the connection string. Expose the parsed host and bearer tokens as part of the class, so they may be accessed by consumers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42502) scala: accept user_agent in spark connect's connection string
[ https://issues.apache.org/jira/browse/SPARK-42502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar updated SPARK-42502: - Description: Currently, the Spark Connect service's {{client_type}} attribute (which is really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. Accept an optional {{user_agent}} parameter in the connection string and plumb this down to the Spark Connect service. This enables partners using Spark Connect to set their application as the user agent, which then allows visibility and measurement of integrations and usages of spark connect. This is already done for the Python client: https://github.com/apache/spark/commit/b887d3de954ae5b2482087fe08affcc4ac60c669 was: Currently, the Spark Connect service's {{client_type}} attribute (which is really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. Accept an optional {{user_agent}} parameter in the connection string and plumb this down to the Spark Connect service. This enables partners using Spark Connect to set their application as the user agent, which then allows visibility and measurement of integrations and usages of spark connect. > scala: accept user_agent in spark connect's connection string > - > > Key: SPARK-42502 > URL: https://issues.apache.org/jira/browse/SPARK-42502 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.3.2 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Fix For: 3.4.0 > > > Currently, the Spark Connect service's {{client_type}} attribute (which is > really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. > Accept an optional {{user_agent}} parameter in the connection string and > plumb this down to the Spark Connect service. > This enables partners using Spark Connect to set their application as the > user agent, > which then allows visibility and measurement of integrations and usages of > spark > connect. > This is already done for the Python client: > https://github.com/apache/spark/commit/b887d3de954ae5b2482087fe08affcc4ac60c669 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42502) scala: accept user_agent in spark connect's connection string
Niranjan Jayakar created SPARK-42502: Summary: scala: accept user_agent in spark connect's connection string Key: SPARK-42502 URL: https://issues.apache.org/jira/browse/SPARK-42502 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.3.2 Reporter: Niranjan Jayakar Assignee: Niranjan Jayakar Fix For: 3.4.0 Currently, the Spark Connect service's {{client_type}} attribute (which is really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. Accept an optional {{user_agent}} parameter in the connection string and plumb this down to the Spark Connect service. This enables partners using Spark Connect to set their application as the user agent, which then allows visibility and measurement of integrations and usages of spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42477) python: accept user_agent in spark connect's connection string
[ https://issues.apache.org/jira/browse/SPARK-42477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar updated SPARK-42477: - Summary: python: accept user_agent in spark connect's connection string (was: accept user_agent in spark connect's connection string) > python: accept user_agent in spark connect's connection string > --- > > Key: SPARK-42477 > URL: https://issues.apache.org/jira/browse/SPARK-42477 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.3.2 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Fix For: 3.4.0 > > > Currently, the Spark Connect service's {{client_type}} attribute (which is > really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. > Accept an optional {{user_agent}} parameter in the connection string and > plumb this down to the Spark Connect service. > This enables partners using Spark Connect to set their application as the > user agent, > which then allows visibility and measurement of integrations and usages of > spark > connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42498) reduce spark connect service retry time
[ https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar resolved SPARK-42498. -- Resolution: Abandoned > reduce spark connect service retry time > --- > > Key: SPARK-42498 > URL: https://issues.apache.org/jira/browse/SPARK-42498 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2 >Reporter: Niranjan Jayakar >Priority: Major > > https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411 > > Currently, 15 retries with the current backoff strategy result in the client > sitting in > the retry loop for ~400 seconds in the worst case. This means, applications > and > users using the spark connect client will hang for >6 minutes with no > response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42498) reduce spark connect service retry time
[ https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar updated SPARK-42498: - Summary: reduce spark connect service retry time (was: make spark connect retries configurat) > reduce spark connect service retry time > --- > > Key: SPARK-42498 > URL: https://issues.apache.org/jira/browse/SPARK-42498 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2 >Reporter: Niranjan Jayakar >Priority: Major > > https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411 > > Currently, 15 retries with the current backoff strategy result in the client > sitting in > the retry loop for ~400 seconds in the worst case. This means, applications > and > users using the spark connect client will hang for >6 minutes with no > response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42498) make spark connect retries configurat
[ https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Jayakar updated SPARK-42498: - Summary: make spark connect retries configurat (was: reduce spark connect service retry time) > make spark connect retries configurat > - > > Key: SPARK-42498 > URL: https://issues.apache.org/jira/browse/SPARK-42498 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2 >Reporter: Niranjan Jayakar >Priority: Major > > https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411 > > Currently, 15 retries with the current backoff strategy result in the client > sitting in > the retry loop for ~400 seconds in the worst case. This means, applications > and > users using the spark connect client will hang for >6 minutes with no > response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42498) reduce spark connect service retry time
Niranjan Jayakar created SPARK-42498: Summary: reduce spark connect service retry time Key: SPARK-42498 URL: https://issues.apache.org/jira/browse/SPARK-42498 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.3.2 Reporter: Niranjan Jayakar https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411 Currently, 15 retries with the current backoff strategy result in the client sitting in the retry loop for ~400 seconds in the worst case. This means, applications and users using the spark connect client will hang for >6 minutes with no response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42477) accept user_agent in spark connect's connection string
Niranjan Jayakar created SPARK-42477: Summary: accept user_agent in spark connect's connection string Key: SPARK-42477 URL: https://issues.apache.org/jira/browse/SPARK-42477 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.3.2 Reporter: Niranjan Jayakar Currently, the Spark Connect service's {{client_type}} attribute (which is really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark. Accept an optional {{user_agent}} parameter in the connection string and plumb this down to the Spark Connect service. This enables partners using Spark Connect to set their application as the user agent, which then allows visibility and measurement of integrations and usages of spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42106) [Pyspark] Hide parameters when re-printing user provided remote URL in REPL
Niranjan Jayakar created SPARK-42106: Summary: [Pyspark] Hide parameters when re-printing user provided remote URL in REPL Key: SPARK-42106 URL: https://issues.apache.org/jira/browse/SPARK-42106 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Niranjan Jayakar The Spark Connect client is initialized in the PySpark REPL by using the {{--remote}} option. The option takes a Spark Connect endpoint URL. The URL may contain auth tokens as URL parameters or query parameters. Hide these values when the URL is re-printed as part of the REPL start-up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org