[jira] [Commented] (SPARK-44149) Support DataFrame Merge API
[ https://issues.apache.org/jira/browse/SPARK-44149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818746#comment-17818746 ] Hussein Awala commented on SPARK-44149: --- Is it duplicated by SPARK-46207 which was fixed by [#44119|https://github.com/apache/spark/pull/44119]? or it's a different Merge support? > Support DataFrame Merge API > --- > > Key: SPARK-44149 > URL: https://issues.apache.org/jira/browse/SPARK-44149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45324) pyspark 3.5.0 missing in pypi
[ https://issues.apache.org/jira/browse/SPARK-45324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768931#comment-17768931 ] Hussein Awala commented on SPARK-45324: --- I found this in the announcement email: > (Please note: the PyPi upload is pending due to a size limit request; we're > actively following up here <[https://github.com/pypi/support/issues/3175]> > with the PyPi organization) The issue was resolved 3 hours ago, so the package should be pushed in the next 24 hours. > pyspark 3.5.0 missing in pypi > - > > Key: SPARK-45324 > URL: https://issues.apache.org/jira/browse/SPARK-45324 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: raphaelauv >Priority: Major > > the pyspark 3.5.0 is not present in pypi -> > [https://pypi.org/project/pyspark/#history] > the version is 3.4.1 is currently the latest -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45324) pyspark 3.5.0 missing in pypi
[ https://issues.apache.org/jira/browse/SPARK-45324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768927#comment-17768927 ] Hussein Awala commented on SPARK-45324: --- I haven't found any CI jobs calling `release-build.sh finalize`, it looks like the push script is executed manually by one of the PMC members, and they just missed this step. > pyspark 3.5.0 missing in pypi > - > > Key: SPARK-45324 > URL: https://issues.apache.org/jira/browse/SPARK-45324 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: raphaelauv >Priority: Major > > the pyspark 3.5.0 is not present in pypi -> > [https://pypi.org/project/pyspark/#history] > the version is 3.4.1 is currently the latest -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45324) pyspark 3.5.0 missing in pypi
[ https://issues.apache.org/jira/browse/SPARK-45324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768923#comment-17768923 ] Hussein Awala commented on SPARK-45324: --- I was going to report the same issue, the published doc and the release note mention PySpark 3.5.0 features, but its package was not published to pypi. > pyspark 3.5.0 missing in pypi > - > > Key: SPARK-45324 > URL: https://issues.apache.org/jira/browse/SPARK-45324 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: raphaelauv >Priority: Major > > the pyspark 3.5.0 is not present in pypi -> > [https://pypi.org/project/pyspark/#history] > the version is 3.4.1 is currently the latest -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes
[ https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685897#comment-17685897 ] Hussein Awala commented on SPARK-34645: --- I am facing a similar problem with Spark 3.2.1 and JDK 8, I'm running the jobs in client mode on arm64 nodes, in 10% of these jobs, after deleting the executors pods and the created PVCs, the driver pod stucks in running state with this log: {code:java} 3/02/08 13:04:38 INFO SparkUI: Stopped Spark web UI at http://172.17.45.51:4040 23/02/08 13:04:38 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 23/02/08 13:04:38 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 23/02/08 13:04:38 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed. 23/02/08 13:04:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/02/08 13:04:39 INFO MemoryStore: MemoryStore cleared 23/02/08 13:04:39 INFO BlockManager: BlockManager stopped 23/02/08 13:04:39 INFO BlockManagerMaster: BlockManagerMaster stopped 23/02/08 13:04:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/02/08 13:04:39 INFO SparkContext: Successfully stopped SparkContext {code} JDK: {code:java} root@***:/# java -version | tail -n3 openjdk version "1.8.0_362" OpenJDK Runtime Environment (Temurin)(build 1.8.0_362-b09) OpenJDK 64-Bit Server VM (Temurin)(build 25.362-b09, mixed mode) {code} I tried: * without the conf _spark.kubernetes.driver.reusePersistentVolumeClaim_ and without the PVC at all * applying the patch [https://github.com/apache/spark/commit/457b75ea2bca6b5811d61ce9f1d28c94b0dde3a2] proposed by [~mickayg] on spark 3.2.1 * upgrading tp 3.2.3 but I still have the same problem. I didn't find any relevant fix in the spark 3.3.0 and 3.3.1 release note except upgrading the kubernetes client, do you have some tips for investigating the issue? > [K8S] Driver pod stuck in Running state after job completes > --- > > Key: SPARK-34645 > URL: https://issues.apache.org/jira/browse/SPARK-34645 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2 > Environment: Kubernetes: > {code:java} > Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", > GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", > BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", > Platform:"linux/amd64"} > Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", > GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", > BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", > Platform:"linux/amd64"} > {code} >Reporter: Andy Grove >Priority: Major > > I am running automated benchmarks in k8s, using spark-submit in cluster mode, > so the driver runs in a pod. > When running with Spark 3.0.1 and 3.1.1 everything works as expected and I > see the Spark context being shut down after the job completes. > However, when running with Spark 3.0.2 I do not see the context get shut down > and the driver pod is stuck in the Running state indefinitely. > This is the output I see after job completion with 3.0.1 and 3.1.1 and this > output does not appear with 3.0.2. With 3.0.2 there is no output at all after > the job completes. > {code:java} > 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from > shutdown hook > 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped > Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at > http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040 > 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting > down all executors > 2021-03-05 20:09:24,600 INFO > k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes > client has been closed (this is expected if the application is shutting down.) > 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared > 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped > 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 2021-03-05 20:09:24,752 INFO > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped > SparkContext > 2021-03-05 20:09:24,768 INFO util.ShutdownHookManager:
[jira] [Created] (SPARK-42060) Support overriding the driver/executors container name
Hussein Awala created SPARK-42060: - Summary: Support overriding the driver/executors container name Key: SPARK-42060 URL: https://issues.apache.org/jira/browse/SPARK-42060 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 3.4.0 Reporter: Hussein Awala When we don't provide a pod template for driver/executor pods, or we provide a pod template without the container name, spark uses {{spark-kubernetes-driver}} as container name for the driver and {{spark-kubernetes-executor}} as container name for the executors pods. I suggest adding two new config {{spark.kubernetes.driver.container.name}} and {{spark.kubernetes.executor.container.name}} to override the default containers names. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org