date:20230921

[jira] [Updated] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45274:
---
Labels: pull-request-available  (was: )

> Implementation of a new DAG drawing approach to  avoid fork
> ---
>
> Key: SPARK-45274
> URL: https://issues.apache.org/jira/browse/SPARK-45274
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork

2023-09-21 Thread Kent Yao (Jira)

Kent Yao created SPARK-45274:


 Summary: Implementation of a new DAG drawing approach to  avoid 
fork
 Key: SPARK-45274
 URL: https://issues.apache.org/jira/browse/SPARK-45274
 Project: Spark
  Issue Type: Improvement
  Components: UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45270) Upgrade `Volcano` to 1.8.0

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45270:
-

Assignee: Dongjoon Hyun

> Upgrade `Volcano` to 1.8.0
> --
>
> Key: SPARK-45270
> URL: https://issues.apache.org/jira/browse/SPARK-45270
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45270) Upgrade `Volcano` to 1.8.0

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45270.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43050
[https://github.com/apache/spark/pull/43050]

> Upgrade `Volcano` to 1.8.0
> --
>
> Key: SPARK-45270
> URL: https://issues.apache.org/jira/browse/SPARK-45270
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45269) Use Java 21-jre in K8s Dockerfile

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45269.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43048
[https://github.com/apache/spark/pull/43048]

> Use Java 21-jre in K8s Dockerfile
> -
>
> Key: SPARK-45269
> URL: https://issues.apache.org/jira/browse/SPARK-45269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45269) Use Java 21-jre in K8s Dockerfile

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45269:
-

Assignee: Dongjoon Hyun

> Use Java 21-jre in K8s Dockerfile
> -
>
> Key: SPARK-45269
> URL: https://issues.apache.org/jira/browse/SPARK-45269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45273) Http header Attack【HttpSecurityFilter】

2023-09-21 Thread chenyu (Jira)

chenyu created SPARK-45273:
--

 Summary: Http header Attack【HttpSecurityFilter】
 Key: SPARK-45273
 URL: https://issues.apache.org/jira/browse/SPARK-45273
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: chenyu


There is an HTTP host header attack vulnerability in the target URL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43655:
---
Labels: pull-request-available  (was: )

> Enable NamespaceParityTests.test_get_index_map
> --
>
> Key: SPARK-43655
> URL: https://issues.apache.org/jira/browse/SPARK-43655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Enable NamespaceParityTests.test_get_index_map



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43877) Fix behavior difference for compare binary functions.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43877:

Epic Link: SPARK-39375

> Fix behavior difference for compare binary functions.
> -
>
> Key: SPARK-43877
> URL: https://issues.apache.org/jira/browse/SPARK-43877
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> In [https://github.com/apache/spark/pull/41362,] we add `result = 
> result.fillna(False)` for filling the gap between pandas <> pandas API on 
> Spark, but it should be internally fixed from Spark Connect side. Please 
> refer to the reproducible code below:
>  
> {code:java}
> import pandas as pd
> import pyspark.pandas as ps
> from pyspark.sql.utils import pyspark_column_op
> pser = pd.Series([None, None, None])
> psser = ps.from_pandas(pser)
> pyspark_column_op("__ge__")(psser, psser)
> # Wrong result:
> #  0    None
> #  1    None
> #  2    None
> #  dtype: object
> # Expected result:
> pser > pser
> #  0    False
> #  1    False
> #  2    False
> dtype: bool{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43877) Fix behavior difference for compare binary functions.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43877:

Parent: (was: SPARK-42497)
Issue Type: Improvement  (was: Sub-task)

> Fix behavior difference for compare binary functions.
> -
>
> Key: SPARK-43877
> URL: https://issues.apache.org/jira/browse/SPARK-43877
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> In [https://github.com/apache/spark/pull/41362,] we add `result = 
> result.fillna(False)` for filling the gap between pandas <> pandas API on 
> Spark, but it should be internally fixed from Spark Connect side. Please 
> refer to the reproducible code below:
>  
> {code:java}
> import pandas as pd
> import pyspark.pandas as ps
> from pyspark.sql.utils import pyspark_column_op
> pser = pd.Series([None, None, None])
> psser = ps.from_pandas(pser)
> pyspark_column_op("__ge__")(psser, psser)
> # Wrong result:
> #  0    None
> #  1    None
> #  2    None
> #  dtype: object
> # Expected result:
> pser > pser
> #  0    False
> #  1    False
> #  2    False
> dtype: bool{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45209) Flame Graph Support For Executor Thread Dump Page

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45209:
---
Labels: pull-request-available  (was: )

> Flame Graph Support For Executor Thread Dump Page
> -
>
> Key: SPARK-45209
> URL: https://issues.apache.org/jira/browse/SPARK-45209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck

2023-09-21 Thread Bo Xiong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843
 ] 

Bo Xiong edited comment on SPARK-45227 at 9/22/23 6:12 AM:
---

I've submitted [a fix|https://github.com/apache/spark/pull/43021].  Please help 
get it merged.

If possible, please also help patch v3.3.1 and above.  Thanks!


was (Author: JIRAUSER302302):
I've submitted a fix.  Please help get it merged.

If possible, please also help patch v3.3.1 and above.  Thanks!

> Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an 
> executor process randomly gets stuck
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, pull-request-available, 
> race-condition, stuck, threadsafe
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote} * Another task that's sent to the executor but didn't get launched 
> since the single-threaded dispatcher was stuck (presumably in an "infinite 
> loop" as explained later).
> {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote}* Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spar

[jira] [Comment Edited] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck

2023-09-21 Thread Bo Xiong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843
 ] 

Bo Xiong edited comment on SPARK-45227 at 9/22/23 6:11 AM:
---

I've submitted a fix.  Please help get it merged.

If possible, please also help patch v3.3.1 and above.  Thanks!


was (Author: JIRAUSER302302):
I've submitted a fix.  Please help get it merged.

If possible, please also help patch v3.3.1 and above.

 

Thanks,

Bo

> Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an 
> executor process randomly gets stuck
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, pull-request-available, 
> race-condition, stuck, threadsafe
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote} * Another task that's sent to the executor but didn't get launched 
> since the single-threaded dispatcher was stuck (presumably in an "infinite 
> loop" as explained later).
> {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote}* Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/193082670

[jira] [Commented] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck

2023-09-21 Thread Bo Xiong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767843#comment-17767843
 ] 

Bo Xiong commented on SPARK-45227:
--

I've submitted a fix.  Please help get it merged.

If possible, please also help patch v3.3.1 and above.

 

Thanks,

Bo

> Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an 
> executor process randomly gets stuck
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, pull-request-available, 
> race-condition, stuck, threadsafe
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote} * Another task that's sent to the executor but didn't get launched 
> since the single-threaded dispatcher was stuck (presumably in an "infinite 
> loop" as explained later).
> {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote}* Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> org.apache.spark.rpc.net

[jira] [Resolved] (SPARK-43623) Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43623.
-
Resolution: Duplicate

> Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup.
> ---
>
> Key: SPARK-43623
> URL: https://issues.apache.org/jira/browse/SPARK-43623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck

2023-09-21 Thread Bo Xiong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Summary: Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend 
where an executor process randomly gets stuck  (was: Fix an issue where an 
executor process randomly gets stuck, by making 
CoarseGrainedExecutorBackend.taskResources thread-safe)

> Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an 
> executor process randomly gets stuck
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, pull-request-available, 
> race-condition, stuck, threadsafe
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote} * Another task that's sent to the executor but didn't get launched 
> since the single-threaded dispatcher was stuck (presumably in an "infinite 
> loop" as explained later).
> {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote}* Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apa

[jira] [Updated] (SPARK-45272) Remove Scala version specific comments, and scala-2.13 profile usage

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45272:
---
Labels: pull-request-available  (was: )

> Remove Scala version specific comments, and scala-2.13 profile usage
> 
>
> Key: SPARK-45272
> URL: https://issues.apache.org/jira/browse/SPARK-45272
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation, Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-44113 applied some changes directly from 
> {{dev/change-scala-version.sh}}. We should clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45269) Use Java 21-jre in K8s Dockerfile

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45269:
--
Summary: Use Java 21-jre in K8s Dockerfile  (was: Use 21-jre in K8s 
Dockerfile)

> Use Java 21-jre in K8s Dockerfile
> -
>
> Key: SPARK-45269
> URL: https://issues.apache.org/jira/browse/SPARK-45269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43159) Refine `column_op` to use lambda function instead of Column API.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-43159.
-
Resolution: Won't Fix

> Refine `column_op` to use lambda function instead of Column API.
> 
>
> Key: SPARK-43159
> URL: https://issues.apache.org/jira/browse/SPARK-43159
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Refining `column_op(Column.__eq__)(left, right)` to use lambda function such 
> as `column_op(lambda x, y: x.__eq__(y))(left, right)`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45272) Remove Scala version specific comments, and scala-2.13 profile usage

2023-09-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45272:


 Summary: Remove Scala version specific comments, and scala-2.13 
profile usage
 Key: SPARK-45272
 URL: https://issues.apache.org/jira/browse/SPARK-45272
 Project: Spark
  Issue Type: Improvement
  Components: Build, Documentation, Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-44113 applied some changes directly from {{dev/change-scala-version.sh}}. 
We should clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43711:
---
Labels: pull-request-available  (was: )

> Support `pyspark.ml.feature.Bucketizer` and 
> `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.
> --
>
> Key: SPARK-43711
> URL: https://issues.apache.org/jira/browse/SPARK-43711
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, MLlib
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Repro: run `DataFramePlotParityTests.test_compute_hist_multi_columns` or `
> SeriesPlotMatplotlibParityTests.test_kde_plot`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors

2023-09-21 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-45271:

Summary: Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some 
unused method in QueryCompilationErrors  (was: Merge _LEGACY_ERROR_TEMP_1113 
into UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in 
QueryCompilationErrors)

> Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused 
> method in QueryCompilationErrors
> 
>
> Key: SPARK-45271
> URL: https://issues.apache.org/jira/browse/SPARK-45271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in QueryCompilationErrors

2023-09-21 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45271:
---

 Summary: Merge _LEGACY_ERROR_TEMP_1113 into 
UNSUPPORTED_FEATURE.TABLE_OPERATION & delete some unused method in 
QueryCompilationErrors
 Key: SPARK-45271
 URL: https://issues.apache.org/jira/browse/SPARK-45271
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42965) metadata mismatch for StructField when running some tests.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42965:

Parent: (was: SPARK-42497)
Issue Type: Improvement  (was: Sub-task)

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42965) metadata mismatch for StructField when running some tests.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42965:

Epic Link: SPARK-39375

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45270) Upgrade `Volcano` to 1.8.0

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45270:
---
Labels: pull-request-available  (was: )

> Upgrade `Volcano` to 1.8.0
> --
>
> Key: SPARK-45270
> URL: https://issues.apache.org/jira/browse/SPARK-45270
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45270) Upgrade `Volcano` to 1.8.0

2023-09-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45270:
-

 Summary: Upgrade `Volcano` to 1.8.0
 Key: SPARK-45270
 URL: https://issues.apache.org/jira/browse/SPARK-45270
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45269) Use 21-jre in K8s Dockerfile

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45269:
---
Labels: pull-request-available  (was: )

> Use 21-jre in K8s Dockerfile
> 
>
> Key: SPARK-45269
> URL: https://issues.apache.org/jira/browse/SPARK-45269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45269) Use 21-jre in K8s Dockerfile

2023-09-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45269:
-

 Summary: Use 21-jre in K8s Dockerfile
 Key: SPARK-45269
 URL: https://issues.apache.org/jira/browse/SPARK-45269
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44113) Make Scala 2.13+ as default Scala version

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44113.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43008
[https://github.com/apache/spark/pull/43008]

> Make Scala 2.13+ as default Scala version
> -
>
> Key: SPARK-44113
> URL: https://issues.apache.org/jira/browse/SPARK-44113
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45268) python function categories should be consistent with SQL function groups

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45268:
---
Labels: pull-request-available  (was: )

> python function categories should be consistent with SQL function groups
> 
>
> Key: SPARK-45268
> URL: https://issues.apache.org/jira/browse/SPARK-45268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45268) python function categories should be consistent with SQL function groups

2023-09-21 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45268:
-

 Summary: python function categories should be consistent with SQL 
function groups
 Key: SPARK-45268
 URL: https://issues.apache.org/jira/browse/SPARK-45268
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45251) Add client_type field for FetchErrorDetails

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45251.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43031
[https://github.com/apache/spark/pull/43031]

> Add client_type field for FetchErrorDetails
> ---
>
> Key: SPARK-45251
> URL: https://issues.apache.org/jira/browse/SPARK-45251
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45251) Add client_type field for FetchErrorDetails

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45251:


Assignee: Yihong He

> Add client_type field for FetchErrorDetails
> ---
>
> Key: SPARK-45251
> URL: https://issues.apache.org/jira/browse/SPARK-45251
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45253) Correct the group of `ShiftLeft` and `ArraySize`

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45253:


Assignee: Ruifeng Zheng

> Correct the group of `ShiftLeft` and `ArraySize`
> 
>
> Key: SPARK-45253
> URL: https://issues.apache.org/jira/browse/SPARK-45253
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45253) Correct the group of `ShiftLeft` and `ArraySize`

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45253.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43033
[https://github.com/apache/spark/pull/43033]

> Correct the group of `ShiftLeft` and `ArraySize`
> 
>
> Key: SPARK-45253
> URL: https://issues.apache.org/jira/browse/SPARK-45253
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43433.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42994
[https://github.com/apache/spark/pull/42994]

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43433:


Assignee: Haejoon Lee

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43433:
---
Labels: pull-request-available  (was: )

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44444) Enabled ANSI mode by default

2023-09-21 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767804#comment-17767804
 ] 

Dongjoon Hyun commented on SPARK-4:
---

+1

> Enabled ANSI mode by default
> 
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> To avoid data issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44442) Drop mesos support

2023-09-21 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767803#comment-17767803
 ] 

Dongjoon Hyun commented on SPARK-2:
---

+1 for the removal. We need some discussions as the final step in the dev 
mailing list.

> Drop mesos support
> --
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://spark.apache.org/docs/latest/running-on-mesos.html]
>  
> {_}Note{_}: Apache Mesos support is deprecated as of Apache Spark 3.2.0. It 
> will be removed in a future version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45093) AddArtifacts should give proper error messages if it fails

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45093.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42949
[https://github.com/apache/spark/pull/42949]

> AddArtifacts should give proper error messages if it fails
> --
>
> Key: SPARK-45093
> URL: https://issues.apache.org/jira/browse/SPARK-45093
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I've been trying to do some testing of udf's using code in other module, so 
> that  AddArtifact is necessary.
>  
> I got the following error:
>  
>  
> {code:java}
> Traceback (most recent call last):
>   File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in 
> 
>     spark.addArtifacts("udf2_support.py", pyfile=True)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py",
>  line 744, in addArtifacts
>     self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, 
> file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py",
>  line 1582, in add_artifacts
>     self._artifact_manager.add_artifacts(*path, pyfile=pyfile, 
> archive=archive, file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 283, in add_artifacts
>     self._request_add_artifacts(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 259, in _request_add_artifacts
>     response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 256, in _retrieve_responses
>     return self._stub.AddArtifacts(requests, metadata=self._metadata)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 1246, in __call__
>     return _end_unary_response_blocking(state, call, False, None)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 910, in _end_unary_response_blocking
>     raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated 
> with:
>         status = StatusCode.UNKNOWN
>         details = "Exception iterating requests!"
>         debug_error_string = "None"
> {code}
>  
> Which doesn't give any clue about what happens.
> Only after noticeable investigation I found the problem: I'm specifying the 
> wrong path and the artifact fails to upload. Specifically what happens is 
> that ArtifactManager doesn't read the file immediately, but rather creates 
> iterator object which will incrementally generate requests to send. This 
> iterator is passed to grpc's stream_unary to consume and actually send, and 
> while grpc catches the error (see above), it suppresses the underlying 
> exception.
> I think we should improve pyspark user experience. One of the possible ways 
> to fix this is to wrap ArtifactsManager._create_requests with an iterator 
> wrapper which would log the throwable into spark connect logger so that user 
> would see something like below at least when the debug mode is on.
>  
> {code:java}
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/Users/alice.sayutina/udf2_support.py' {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45093) AddArtifacts should give proper error messages if it fails

2023-09-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45093:


Assignee: Alice Sayutina

> AddArtifacts should give proper error messages if it fails
> --
>
> Key: SPARK-45093
> URL: https://issues.apache.org/jira/browse/SPARK-45093
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
>
> I've been trying to do some testing of udf's using code in other module, so 
> that  AddArtifact is necessary.
>  
> I got the following error:
>  
>  
> {code:java}
> Traceback (most recent call last):
>   File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in 
> 
>     spark.addArtifacts("udf2_support.py", pyfile=True)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py",
>  line 744, in addArtifacts
>     self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, 
> file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py",
>  line 1582, in add_artifacts
>     self._artifact_manager.add_artifacts(*path, pyfile=pyfile, 
> archive=archive, file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 283, in add_artifacts
>     self._request_add_artifacts(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 259, in _request_add_artifacts
>     response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 256, in _retrieve_responses
>     return self._stub.AddArtifacts(requests, metadata=self._metadata)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 1246, in __call__
>     return _end_unary_response_blocking(state, call, False, None)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 910, in _end_unary_response_blocking
>     raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated 
> with:
>         status = StatusCode.UNKNOWN
>         details = "Exception iterating requests!"
>         debug_error_string = "None"
> {code}
>  
> Which doesn't give any clue about what happens.
> Only after noticeable investigation I found the problem: I'm specifying the 
> wrong path and the artifact fails to upload. Specifically what happens is 
> that ArtifactManager doesn't read the file immediately, but rather creates 
> iterator object which will incrementally generate requests to send. This 
> iterator is passed to grpc's stream_unary to consume and actually send, and 
> while grpc catches the error (see above), it suppresses the underlying 
> exception.
> I think we should improve pyspark user experience. One of the possible ways 
> to fix this is to wrap ArtifactsManager._create_requests with an iterator 
> wrapper which would log the throwable into spark connect logger so that user 
> would see something like below at least when the debug mode is on.
>  
> {code:java}
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/Users/alice.sayutina/udf2_support.py' {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45257) Enable spark.eventLog.compress by default

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45257.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43036
[https://github.com/apache/spark/pull/43036]

> Enable spark.eventLog.compress by default
> -
>
> Key: SPARK-45257
> URL: https://issues.apache.org/jira/browse/SPARK-45257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45257) Enable spark.eventLog.compress by default

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45257:
-

Assignee: Dongjoon Hyun

> Enable spark.eventLog.compress by default
> -
>
> Key: SPARK-45257
> URL: https://issues.apache.org/jira/browse/SPARK-45257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41086) Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE

2023-09-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41086.
-
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43010
[https://github.com/apache/spark/pull/43010]

> Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE
> --
>
> Key: SPARK-41086
> URL: https://issues.apache.org/jira/browse/SPARK-41086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> SECOND_FUNCTION_ARGUMENT_NOT_INTEGER
> _LEGACY_ERROR_TEMP_1104



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45267) Change the default value for `numeric_only`.

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45267:
---
Labels: pull-request-available  (was: )

> Change the default value for `numeric_only`.
> 
>
> Key: SPARK-45267
> URL: https://issues.apache.org/jira/browse/SPARK-45267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> To follow the Pandas 2.0.0 and above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45267) Change the default value for `numeric_only`.

2023-09-21 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-45267:

Summary: Change the default value for `numeric_only`.  (was: Changed the 
default value for `numeric_only`.)

> Change the default value for `numeric_only`.
> 
>
> Key: SPARK-45267
> URL: https://issues.apache.org/jira/browse/SPARK-45267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> To follow the Pandas 2.0.0 and above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45267) Changed the default value for `numeric_only`.

2023-09-21 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-45267:
---

 Summary: Changed the default value for `numeric_only`.
 Key: SPARK-45267
 URL: https://issues.apache.org/jira/browse/SPARK-45267
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


To follow the Pandas 2.0.0 and above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45244) Correct spelling in VolcanoTestsSuite

2023-09-21 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You reassigned SPARK-45244:
-

Assignee: Binjie Yang

> Correct spelling in VolcanoTestsSuite
> -
>
> Key: SPARK-45244
> URL: https://issues.apache.org/jira/browse/SPARK-45244
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.5.0
>Reporter: Binjie Yang
>Assignee: Binjie Yang
>Priority: Minor
>  Labels: pull-request-available
>
> Typo in method naming checkAnnotaion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45244) Correct spelling in VolcanoTestsSuite

2023-09-21 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You resolved SPARK-45244.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43026
[https://github.com/apache/spark/pull/43026]

> Correct spelling in VolcanoTestsSuite
> -
>
> Key: SPARK-45244
> URL: https://issues.apache.org/jira/browse/SPARK-45244
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.5.0
>Reporter: Binjie Yang
>Assignee: Binjie Yang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Typo in method naming checkAnnotaion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45266:
---
Labels: pull-request-available  (was: )

> Refactor ResolveFunctions analyzer rule to delay making lateral join when 
> table arguments are used
> --
>
> Key: SPARK-45266
> URL: https://issues.apache.org/jira/browse/SPARK-45266
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45191) InMemoryTableScanExec simpleStringWithNodeId adds columnar info

2023-09-21 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You resolved SPARK-45191.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42967
[https://github.com/apache/spark/pull/42967]

> InMemoryTableScanExec simpleStringWithNodeId adds columnar info
> ---
>
> Key: SPARK-45191
> URL: https://issues.apache.org/jira/browse/SPARK-45191
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> InMemoryTableScanExec supports both row-based and columnar input and output 
> which is based on the cache serialzier. It would be more friendly for user if 
> we can provide the columnar info to show whether it is columnar in/out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44307) Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44307:
---
Labels: pull-request-available  (was: )

> Bloom filter is not added for left outer join if the left side table is 
> smaller than broadcast threshold.
> -
>
> Key: SPARK-44307
> URL: https://issues.apache.org/jira/browse/SPARK-44307
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.4.1
>Reporter: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> In case of left outer join, even if the left side table is small enough to be 
> broadcasted, shuffle join is used. This is because of the property of the 
> left outer join. If the left side is broadcasted in left outer join, the 
> result generated will be wrong. But this is not taken care of in bloom 
> filter. While injecting the bloom filter, if lest side is smaller than 
> broadcast threshold, bloom filter is not added. It assumes that the left side 
> will be broadcast and there is no need for a bloom filter. This causes bloom 
> filter optimization to be missed in case of left outer join with small left 
> side and huge right-side table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45163) Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_TABLE_OPERATION and refactor some logic

2023-09-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45163:
---

Assignee: BingKun Pan

> Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into 
> UNSUPPORTED_TABLE_OPERATION and refactor some logic
> 
>
> Key: SPARK-45163
> URL: https://issues.apache.org/jira/browse/SPARK-45163
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45163) Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_TABLE_OPERATION and refactor some logic

2023-09-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45163.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42917
[https://github.com/apache/spark/pull/42917]

> Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into 
> UNSUPPORTED_TABLE_OPERATION and refactor some logic
> 
>
> Key: SPARK-45163
> URL: https://issues.apache.org/jira/browse/SPARK-45163
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used

2023-09-21 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-45266:
-

 Summary: Refactor ResolveFunctions analyzer rule to delay making 
lateral join when table arguments are used
 Key: SPARK-45266
 URL: https://issues.apache.org/jira/browse/SPARK-45266
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45264) Configurable error when generating Python docs

2023-09-21 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767764#comment-17767764
 ] 

Haejoon Lee edited comment on SPARK-45264 at 9/22/23 12:28 AM:
---

Currently the PySpark documentation build requires installing the latest Pandas 
version specified from 
[https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101]
 to generate [Supported pandas API 
page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api].

Current supported Pandas version is 2.1.0, so we should install the Pandas 
2.1.0 instead of 2.0.3 for building the documentation to get the proper 
supported API list.


was (Author: itholic):
Currently the PySpark documentation build requires installing the latest Pandas 
version specified from 
[https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101]
 to generate [Supported pandas API 
page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api].

> Configurable error when generating Python docs
> --
>
> Key: SPARK-45264
> URL: https://issues.apache.org/jira/browse/SPARK-45264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> {{cd python/docs}}
> {{make html }}
>  
> Gives a Configuration error:
> There is a programmable error in your configuration file:
> ImportError: Warning: Latest version of pandas (2.1.0) is required to 
> generate the documentation; however, your version was 2.0.3
> make: *** [html] Error 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs

2023-09-21 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767764#comment-17767764
 ] 

Haejoon Lee commented on SPARK-45264:
-

Currently the PySpark documentation build requires installing the latest Pandas 
version specified from 
[https://github.com/apache/spark/blob/master/python/pyspark/pandas/supported_api_gen.py#L101]
 to generate [Supported pandas API 
page|https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html#supported-pandas-api].

> Configurable error when generating Python docs
> --
>
> Key: SPARK-45264
> URL: https://issues.apache.org/jira/browse/SPARK-45264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> {{cd python/docs}}
> {{make html }}
>  
> Gives a Configuration error:
> There is a programmable error in your configuration file:
> ImportError: Warning: Latest version of pandas (2.1.0) is required to 
> generate the documentation; however, your version was 2.0.3
> make: *** [html] Error 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43152) User-defined output metadata path (_spark_metadata)

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43152:
---
Labels: pull-request-available  (was: )

> User-defined output metadata path (_spark_metadata)
> ---
>
> Key: SPARK-43152
> URL: https://issues.apache.org/jira/browse/SPARK-43152
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wojciech Indyk
>Priority: Major
>  Labels: pull-request-available
>
> Currently path of metadata of output checkpoint is hardcoded. The metadata is 
> saved in output path in _spark_metadata folder. It's a constraint on 
> structure of paths, that might be easily relaxed by parametrisable path of 
> output metadata. It would help with issues like [changing output directory of 
> spark streaming 
> job|https://kb.databricks.com/en_US/streaming/file-sink-streaming], [two jobs 
> writing to the same output 
> path|https://issues.apache.org/jira/browse/SPARK-30542] or [partition 
> discovery|https://stackoverflow.com/questions/61904732/is-it-possible-to-change-location-of-spark-metadata-folder-in-spark-structured/61905158].
>  It would also help with separation of metadata from data in path structure.
> The main target of change is getMetadataLogPath method in FileStreamSink. It 
> has got access to sqlConf, so this method can override the default 
> _spark_metadata path if defined it config. Introduction of parametrised 
> metadata path needs reconsidering of meaning of  hasMetadata method in 
> FileStreamSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44838) Enhance raise_error() to exploit the new error framework

2023-09-21 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767762#comment-17767762
 ] 

Hudson commented on SPARK-44838:


User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/42985

> Enhance raise_error() to exploit the new error framework
> 
>
> Key: SPARK-44838
> URL: https://issues.apache.org/jira/browse/SPARK-44838
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> raise_error() and assert_true() do not presently utilize the new error 
> framework.
> We want to generalize raise_error() to take an error class, sqlstate and 
> message parameters as arguments to compose a well-formed error condition.
> The existing assert_true(0 and raise_error() versions should return an error 
> class 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45265) Support Hive 4.0 metastore

2023-09-21 Thread Attila Zsolt Piros (Jira)

Attila Zsolt Piros created SPARK-45265:
--

 Summary: Support Hive 4.0 metastore
 Key: SPARK-45265
 URL: https://issues.apache.org/jira/browse/SPARK-45265
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Attila Zsolt Piros
Assignee: Attila Zsolt Piros


Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
will support support the pushdowns of partition column filters with 
VARCHAR/CHAR types.

For details please see HIVE-26661: Support partition filter for char and 
varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs

2023-09-21 Thread Ruifeng Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767758#comment-17767758
 ] 

Ruifeng Zheng commented on SPARK-45264:
---

I think we already support 2.1.0? [~itholic] 

> Configurable error when generating Python docs
> --
>
> Key: SPARK-45264
> URL: https://issues.apache.org/jira/browse/SPARK-45264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> {{cd python/docs}}
> {{make html }}
>  
> Gives a Configuration error:
> There is a programmable error in your configuration file:
> ImportError: Warning: Latest version of pandas (2.1.0) is required to 
> generate the documentation; however, your version was 2.0.3
> make: *** [html] Error 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45262) Improve the description for `LIKE`

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45262:
---
Labels: pull-request-available  (was: )

> Improve the description for `LIKE`
> --
>
> Key: SPARK-45262
> URL: https://issues.apache.org/jira/browse/SPARK-45262
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> The description of `LIKE` says:
> {code}
> ... in order to match "\abc", the pattern should be "\\abc"
> {code}
> but in Spark SQL shell:
> {code:sql}
> spark-sql (default)> SELECT c FROM t;
> \abc
> spark-sql (default)> SELECT c LIKE "\\abc" FROM t;
> [INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The escape 
> character is not allowed to precede 'a'.
> spark-sql (default)> SELECT c LIKE "abc" FROM t;
> true
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44937) Add SSL/TLS support for RPC and Shuffle communications

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44937:
---
Labels: pull-request-available  (was: )

> Add SSL/TLS support for RPC and Shuffle communications
> --
>
> Key: SPARK-44937
> URL: https://issues.apache.org/jira/browse/SPARK-44937
> Project: Spark
>  Issue Type: New Feature
>  Components: Block Manager, Security, Shuffle, Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> Add support for SSL/TLS based communication for Spark RPCs and block 
> transfers - providing an alternative to the existing encryption / 
> authentication implementation documented at 
> [https://spark.apache.org/docs/latest/security.html#spark-rpc-communication-protocol-between-spark-processes]
> This is a superset of the functionality discussed in 
> https://issues.apache.org/jira/browse/SPARK-6373



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45252) `sbt doc` execution failed.

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45252:
-

Assignee: Yang Jie

> `sbt doc` execution failed.
> ---
>
> Key: SPARK-45252
> URL: https://issues.apache.org/jira/browse/SPARK-45252
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> run 
>  
> {code:java}
> build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl 
> -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pvolcano
> {code}
>  
> {code:java}
> [info] Main Scala API documentation successful.
> [error] sbt.inc.Doc$JavadocGenerationFailed
> [error]         at 
> sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51)
> [error]         at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62)
> [error]         at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57)
> [error]         at sbt.inc.Doc$.go$1(Doc.scala:73)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81)
> [error]         at 
> sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85)
> [error]         at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68)
> [error]         at 
> sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178)
> [error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
> [error]         at 
> sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63)
> [error]         at sbt.std.Transform$$anon$4.work(Transform.scala:69)
> [error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:283)
> [error]         at 
> sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24)
> [error]         at sbt.Execute.work(Execute.scala:292)
> [error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:283)
> [error]         at 
> sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
> [error]         at 
> sbt.CompletionService$$anon$2.call(CompletionService.scala:65)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [error]         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [error]         at java.lang.Thread.run(Thread.java:750)
> [error] sbt.inc.Doc$JavadocGenerationFailed
> [error]         at 
> sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51)
> [error]         at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62)
> [error]         at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57)
> [error]         at sbt.inc.Doc$.go$1(Doc.scala:73)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81)
> [error]         at 
> sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85)
> [error]         at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68)
> [error]         at 
> sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178)
> [error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
> [error]         at 
> sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63)
> [error]         at sbt.std.Transform$$anon$4.work(Transform.scala:69)
> [error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:283)
> [error]         at 
> sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24)
> [error]         at sbt.Execute.work(Execute.scala:292)
> [error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:283)
> [error]         at 
> sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
> [error]         at 
> sbt.CompletionService$$anon$2.call(CompletionService.scala:65)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.ThreadPoolExecutor.r

[jira] [Resolved] (SPARK-45252) `sbt doc` execution failed.

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45252.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43032
[https://github.com/apache/spark/pull/43032]

> `sbt doc` execution failed.
> ---
>
> Key: SPARK-45252
> URL: https://issues.apache.org/jira/browse/SPARK-45252
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> run 
>  
> {code:java}
> build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl 
> -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pvolcano
> {code}
>  
> {code:java}
> [info] Main Scala API documentation successful.
> [error] sbt.inc.Doc$JavadocGenerationFailed
> [error]         at 
> sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51)
> [error]         at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62)
> [error]         at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57)
> [error]         at sbt.inc.Doc$.go$1(Doc.scala:73)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81)
> [error]         at 
> sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85)
> [error]         at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68)
> [error]         at 
> sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178)
> [error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
> [error]         at 
> sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63)
> [error]         at sbt.std.Transform$$anon$4.work(Transform.scala:69)
> [error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:283)
> [error]         at 
> sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24)
> [error]         at sbt.Execute.work(Execute.scala:292)
> [error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:283)
> [error]         at 
> sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
> [error]         at 
> sbt.CompletionService$$anon$2.call(CompletionService.scala:65)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [error]         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [error]         at java.lang.Thread.run(Thread.java:750)
> [error] sbt.inc.Doc$JavadocGenerationFailed
> [error]         at 
> sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cachedJavadoc$1(Doc.scala:51)
> [error]         at sbt.inc.Doc$$anonfun$cachedJavadoc$2.run(Doc.scala:41)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$prepare$1(Doc.scala:62)
> [error]         at sbt.inc.Doc$$anonfun$prepare$5.run(Doc.scala:57)
> [error]         at sbt.inc.Doc$.go$1(Doc.scala:73)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5(Doc.scala:82)
> [error]         at sbt.inc.Doc$.$anonfun$cached$5$adapted(Doc.scala:81)
> [error]         at 
> sbt.util.Tracked$.$anonfun$inputChangedW$1(Tracked.scala:220)
> [error]         at sbt.inc.Doc$.sbt$inc$Doc$$$anonfun$cached$1(Doc.scala:85)
> [error]         at sbt.inc.Doc$$anonfun$cached$7.run(Doc.scala:68)
> [error]         at 
> sbt.Defaults$.$anonfun$docTaskSettings$4(Defaults.scala:2178)
> [error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
> [error]         at 
> sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:63)
> [error]         at sbt.std.Transform$$anon$4.work(Transform.scala:69)
> [error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:283)
> [error]         at 
> sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:24)
> [error]         at sbt.Execute.work(Execute.scala:292)
> [error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:283)
> [error]         at 
> sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
> [error]         at 
> sbt.CompletionService$$anon$2.call(CompletionService.scala:65)
> [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [error]         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [erro

[jira] [Resolved] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45263.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43040
[https://github.com/apache/spark/pull/43040]

> Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
> 
>
> Key: SPARK-45263
> URL: https://issues.apache.org/jira/browse/SPARK-45263
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45263:
-

Assignee: Dongjoon Hyun

> Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
> 
>
> Key: SPARK-45263
> URL: https://issues.apache.org/jira/browse/SPARK-45263
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45264) Configurable error when generating Python docs

2023-09-21 Thread Allison Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767743#comment-17767743
 ] 

Allison Wang commented on SPARK-45264:
--

[~podongfeng] do we have ways to bypass such pandas version error when 
generating documentations?

> Configurable error when generating Python docs
> --
>
> Key: SPARK-45264
> URL: https://issues.apache.org/jira/browse/SPARK-45264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> {{cd python/docs}}
> {{make html }}
>  
> Gives a Configuration error:
> There is a programmable error in your configuration file:
> ImportError: Warning: Latest version of pandas (2.1.0) is required to 
> generate the documentation; however, your version was 2.0.3
> make: *** [html] Error 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45264) Configurable error when generating Python docs

2023-09-21 Thread Allison Wang (Jira)

Allison Wang created SPARK-45264:


 Summary: Configurable error when generating Python docs
 Key: SPARK-45264
 URL: https://issues.apache.org/jira/browse/SPARK-45264
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


{{cd python/docs}}

{{make html }}

 

Gives a Configuration error:
There is a programmable error in your configuration file:

ImportError: Warning: Latest version of pandas (2.1.0) is required to generate 
the documentation; however, your version was 2.0.3

make: *** [html] Error 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45263:
--
Affects Version/s: (was: 3.5.0)
   (was: 3.4.1)

> Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
> 
>
> Key: SPARK-45263
> URL: https://issues.apache.org/jira/browse/SPARK-45263
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45263:
--
Affects Version/s: 3.5.0

> Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
> 
>
> Key: SPARK-45263
> URL: https://issues.apache.org/jira/browse/SPARK-45263
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45261:
-

Assignee: Dongjoon Hyun

> Fix EventLogFileWriters to handle `none` codec case
> ---
>
> Key: SPARK-45261
> URL: https://issues.apache.org/jira/browse/SPARK-45261
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45261.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43038
[https://github.com/apache/spark/pull/43038]

> Fix EventLogFileWriters to handle `none` codec case
> ---
>
> Key: SPARK-45261
> URL: https://issues.apache.org/jira/browse/SPARK-45261
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45263:
---
Labels: pull-request-available  (was: )

> Make EventLoggingListenerSuite independent from spark.eventLog.compress conf
> 
>
> Key: SPARK-45263
> URL: https://issues.apache.org/jira/browse/SPARK-45263
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.4.1, 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45263) Make EventLoggingListenerSuite independent from spark.eventLog.compress conf

2023-09-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45263:
-

 Summary: Make EventLoggingListenerSuite independent from 
spark.eventLog.compress conf
 Key: SPARK-45263
 URL: https://issues.apache.org/jira/browse/SPARK-45263
 Project: Spark
  Issue Type: Test
  Components: Spark Core
Affects Versions: 3.4.1, 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45220) Refine docstring of `DataFrame.join`

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45220:
---
Labels: pull-request-available  (was: )

> Refine docstring of `DataFrame.join`
> 
>
> Key: SPARK-45220
> URL: https://issues.apache.org/jira/browse/SPARK-45220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of `DataFrame.join`.
> The examples should also include: left join, left anit join, join on multiple 
> columns and column names, join on multiple conditions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45262) Improve the description for `LIKE`

2023-09-21 Thread Max Gekk (Jira)

Max Gekk created SPARK-45262:


 Summary: Improve the description for `LIKE`
 Key: SPARK-45262
 URL: https://issues.apache.org/jira/browse/SPARK-45262
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


The description of `LIKE` says:
{code}
... in order to match "\abc", the pattern should be "\\abc"
{code}
but in Spark SQL shell:
{code:sql}
spark-sql (default)> SELECT c FROM t;
\abc
spark-sql (default)> SELECT c LIKE "\\abc" FROM t;
[INVALID_FORMAT.ESC_IN_THE_MIDDLE] The format is invalid: '\\abc'. The escape 
character is not allowed to precede 'a'.
spark-sql (default)> SELECT c LIKE "abc" FROM t;
true
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45261:
---
Labels: pull-request-available  (was: )

> Fix EventLogFileWriters to handle `none` codec case
> ---
>
> Key: SPARK-45261
> URL: https://issues.apache.org/jira/browse/SPARK-45261
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45261) Fix EventLogFileWriters to handle `none` codec case

2023-09-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45261:
-

 Summary: Fix EventLogFileWriters to handle `none` codec case
 Key: SPARK-45261
 URL: https://issues.apache.org/jira/browse/SPARK-45261
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45260) Refine docstring of count_distinct

2023-09-21 Thread Allison Wang (Jira)

Allison Wang created SPARK-45260:


 Summary: Refine docstring of count_distinct
 Key: SPARK-45260
 URL: https://issues.apache.org/jira/browse/SPARK-45260
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine docstring of the function `count_distinct`, (e.g provide examples with 
groupBy)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45259) Refine docstring of `count`

2023-09-21 Thread Allison Wang (Jira)

Allison Wang created SPARK-45259:


 Summary: Refine docstring of `count`
 Key: SPARK-45259
 URL: https://issues.apache.org/jira/browse/SPARK-45259
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of the function `count` (e.g provide examples with groupby)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45258) Refine docstring of `sum`

2023-09-21 Thread Allison Wang (Jira)

Allison Wang created SPARK-45258:


 Summary: Refine docstring of `sum`
 Key: SPARK-45258
 URL: https://issues.apache.org/jira/browse/SPARK-45258
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of function `sum` (e.g provide examples with groupBy)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-21 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45255:
---
Description: 
java 1.8, sbt 1.9, scala 2.12

 

I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

EDIT

Not sure if it's the right mitigation but explicitly adding guava worked but 
now I am in the 2nd territory of error

{{Sep 21, 2023 8:21:59 PM 
org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
getDefaultRegistry}}
{{WARNING: No NameResolverProviders found via ServiceLoader, including for DNS. 
This is probably due to a broken build. If using ProGuard, check your 
configuration}}
{{Exception in thread "main" 
org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
 
org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
 No functional channel service provider found. Try adding a dependency on the 
grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
{{    at 
org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
{{    at 
org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
{{    at 
org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
{{    at 
org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
{{    at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
{{    at scala.Option.getOrElse(Option.scala:189)}}
{{    at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: 
org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
 No functional channel service provider found. Try adding a dependency on the 
grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
{{    at 
org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:179)}}
{{    at 
org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:155)}}
{{    at 
org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilder(Grpc.java:101)}}
{{    at 
org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilderForAddress(Grpc.java:111)}}
{{    at 
org.apache.spark.sql.connect.client.SparkConnectClient$Configuration.createChannel(SparkConnectClient.scala:633)}}
{{    at 
org.apache.spark.sql.connect.client.SparkConnectClient$Configu

[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-21 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45255:
---
Description: 
java 1.8, sbt 1.9, scala 2.12

 

I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

I followed the doc exactly as described. Can somebody help

  was:
I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

I followed the doc exactly as described. Can somebody help


> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" ja

[jira] [Assigned] (SPARK-44113) Make Scala 2.13+ as default Scala version

2023-09-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44113:
-

Assignee: Yang Jie

> Make Scala 2.13+ as default Scala version
> -
>
> Key: SPARK-44113
> URL: https://issues.apache.org/jira/browse/SPARK-44113
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45257) Enable spark.eventLog.compress by default

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45257:
---
Labels: pull-request-available  (was: )

> Enable spark.eventLog.compress by default
> -
>
> Key: SPARK-45257
> URL: https://issues.apache.org/jira/browse/SPARK-45257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45257) Enable spark.eventLog.compress by default

2023-09-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45257:
-

 Summary: Enable spark.eventLog.compress by default
 Key: SPARK-45257
 URL: https://issues.apache.org/jira/browse/SPARK-45257
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-21 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45255:
---
Issue Type: Bug  (was: New Feature)

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> I followed the doc exactly as described. Can somebody help



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-09-21 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:59 PM:
--

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava/failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira 
issue, I do not have time to create a pull request, you can apply the patch by 
navigating inside the source folder and run:
{{patch -p1  NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
> org.apache.spark.storage.BlockManagerId$.getCachedBlockM

[jira] [Updated] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-09-21 Thread Sebastian Daberdaku (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Daberdaku updated SPARK-45201:

Attachment: spark-3.5.0.patch

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
> org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146)
>     at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127)
>     at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536)
>     at org.apache.spark.SparkContext.(SparkContext.scala:625)
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093)
>     at 
> org

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-09-21 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:57 PM:
--

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava-failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. (I am adding the patch file [^spark-3.5.0.patch] to this Jira 
issue, I do not have time to create a pull request).

Second, the spark-connect-common jar produced by make-distribution is redundant 
and was the cause of the class loading issues. Removing it resolves all these 
issues I had. 


was (Author: JIRAUSER302265):
After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava-failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. (I am adding the patch file to this Jira issue, I do not have 
time to create a pull request).

Second, the spark-connect-common jar produced by make-distribution is redundant 
and was the cause of the class loading issues. Removing it resolves all these 
issues I had. 

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoade

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-09-21 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 9/21/23 4:56 PM:
--

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava-failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. (I am adding the patch file to this Jira issue, I do not have 
time to create a pull request).

Second, the spark-connect-common jar produced by make-distribution is redundant 
and was the cause of the class loading issues. Removing it resolves all these 
issues I had. 


was (Author: JIRAUSER302265):
After spending hours analyzing the project pom files, I discovered that by 
simply deleting the spark-connect-common jar all class loading issues are gone. 
I hope this might be usefult to others as well. 

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.g

[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-21 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45255:
---
Description: 
I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

I followed the doc exactly as described. Can somebody help

  was:
I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

I followed the doc exactly as described. Can somebody help?

BTW it did work if I copied the exact shading rules in my project but I wonder 
if that's the right thing to do?


> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the follow

[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-09-21 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767622#comment-17767622
 ] 

Sebastian Daberdaku commented on SPARK-45201:
-

After spending hours analyzing the project pom files, I discovered that by 
simply deleting the spark-connect-common jar all class loading issues are gone. 
I hope this might be usefult to others as well. 

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
> org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146)
>     at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127)
>     at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536)
>     at org.apache.spark.SparkContext.(SparkContext.scala:625)
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
>     at 
> org.apache.spark.sql.SparkSession$B

[jira] [Updated] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45256:
---
Labels: pull-request-available  (was: )

> Arrow DurationWriter fails when vector is at capacity
> -
>
> Key: SPARK-45256
> URL: https://issues.apache.org/jira/browse/SPARK-45256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1
>Reporter: Sander Goos
>Priority: Major
>  Labels: pull-request-available
>
> The DurationWriter fails if more values are written than the initial capacity 
> of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity

2023-09-21 Thread Sander Goos (Jira)

Sander Goos created SPARK-45256:
---

 Summary: Arrow DurationWriter fails when vector is at capacity
 Key: SPARK-45256
 URL: https://issues.apache.org/jira/browse/SPARK-45256
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 3.4.0, 3.4.2, 3.5.1
Reporter: Sander Goos


The DurationWriter fails if more values are written than the initial capacity 
of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45240) Implement Error Enrichment for Python Client

2023-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45240:
---
Labels: pull-request-available  (was: )

> Implement Error Enrichment for Python Client
> 
>
> Key: SPARK-45240
> URL: https://issues.apache.org/jira/browse/SPARK-45240
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-21 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45255:
---
Description: 
I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of 
connect client )

 

I followed the doc exactly as described. Can somebody help?

BTW it did work if I copied the exact shading rules in my project but I wonder 
if that's the right thing to do?

  was:
I have a very simple repo with the following dependency in `build.sbt`

```

{{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" 
% "3.5.0")}}

```

A simple application

```

{{object Main extends App {}}
{{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
{{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
{{}}}

```

But when I run it, I get the following error

 

```

{{Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
{{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
{{    at Main$delayedInit$body.apply(Main.scala:3)}}
{{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
{{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
{{    at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
{{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
{{    at scala.collection.immutable.List.foreach(List.scala:431)}}
{{    at scala.App.main(App.scala:80)}}
{{    at scala.App.main$(App.scala:78)}}
{{    at Main$.main(Main.scala:3)}}
{{    at Main.main(Main.scala)}}
{{Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
{{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
{{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
{{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
{{    ... 11 more}}

```

I know the connect does a bunch of shading during assembly so it could be 
related to that. This application is not started via spark-submit or anything. 
Neither under `SPARK_HOME` ( I guess that's the whole point of connect client )

 

I followed the doc exactly as described. Can somebody help?

BTW it did work if I copied the exact shading rules in my project but I wonder 
if that's the right thing to do?


> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("

1 2 >

1 - 100 of 135 matches

Mail list logo