Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Hi Artemis, Thanks for your input, to answer your questions: > You may want to ask yourself why it is necessary to change the jar packages during runtime. I have a long running orchestrator process, which executes multiple spark jobs, currently on a single VM/driver, some of those jobs might

Re: spark jobs don't require the master/worker to startup?

2022-03-09 Thread Sean Owen
You can run Spark in local mode and not require any standalone master or worker. Are you sure you're not using local mode? are you sure the daemons aren't running? What is the Spark master you pass? On Wed, Mar 9, 2022 at 7:35 PM wrote: > What I tried to say is, I didn't start spark

Re: spark jobs don't require the master/worker to startup?

2022-03-09 Thread capitnfrakass
What I tried to say is, I didn't start spark master/worker at all, for a standalone deployment. But I still can login into pyspark to run the job. I don't know why. $ ps -efw|grep spark $ netstat -ntlp both the output above have no spark related info. And this machine is managed by myself, I

Re: RebaseDateTime with dynamicAllocation

2022-03-09 Thread Andreas Weise
Okay, found the root cause. Our k8s image got some changes, including a mess with some jars dependencies around com.fasterxml.jackson ... Sorry for the inconvenience. Some earlier log in the driver contained that info... [2022-03-09 21:54:25,163] ({task-result-getter-3}

Re: RebaseDateTime with dynamicAllocation

2022-03-09 Thread Andreas Weise
Full trace doesn't provide any further details. It looks like this: Py4JJavaError: An error occurred while calling o337.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 18.0 failed 4 times, most recent failure: Lost task 1.3 in stage 18.0 (TID 220)

Re: CPU usage from Event log

2022-03-09 Thread Artemis User
I am not sure what column/properties you are referring to.  But the event log in Spark deals with application level "events', not JVM-level metrics.  To retrieve the JVM metrics, you need to use the REST API provided in Spark.  Please see https://spark.apache.org/docs/latest/monitoring.html

Re: RebaseDateTime with dynamicAllocation

2022-03-09 Thread Sean Owen
Doesn't quite seem the same. What is the rest of the error -- why did the class fail to initialize? On Wed, Mar 9, 2022 at 10:08 AM Andreas Weise wrote: > Hi, > > When playing around with spark.dynamicAllocation.enabled I face the > following error after the first round of executors have been

RebaseDateTime with dynamicAllocation

2022-03-09 Thread Andreas Weise
Hi, When playing around with spark.dynamicAllocation.enabled I face the following error after the first round of executors have been killed. Py4JJavaError: An error occurred while calling o337.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 18.0

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Artemis User
This is indeed a JVM issue, not a Spark issue.  You may want to ask yourself why it is necessary to change the jar packages during runtime.  Changing package doesn't mean to reload the classes. There is no way to reload the same class unless you customize the classloader of Spark.  I also

Re: spark jobs don't require the master/worker to startup?

2022-03-09 Thread Artemis User
To be specific: 1. Check the log files on both master and worker and see if any errors. 2. If you are not running your browser on the same machine and the Spark cluster, please use the host's external IP instead of localhost IP when launching the worker Hope this helps... -- ND On 3/9/22

CPU usage from Event log

2022-03-09 Thread Prasad Bhalerao
Hi, I am trying to calculate CPU utilization of an Executor(JVM level CPU usage) using Event log. Can someone please help me with this? 1) Which column/properties to select 2) the correct formula to derive cpu usage Has anyone done anything similar to this? We have many pipelines and those are

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Sean, I understand you might be sceptical about adding this functionality into (py)spark, I'm curious: * would error/warning on update in configuration that is currently effectively impossible (requires restart of JVM) be reasonable? * what do you think about the workaround in the issue? Cheers -

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Sean Owen
Unfortunately this opens a lot more questions and problems than it solves. What if you take something off the classpath, for example? change a class? On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła wrote: > Thanks Sean, > To be clear, if you prefer to change the label on this issue from bug to >

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Thanks Sean, To be clear, if you prefer to change the label on this issue from bug to sth else, feel free to do so, no strong opinions on my end. What happens to the classpath, whether spark uses some classloader magic, is probably an implementation detail. That said, it's definitely not intuitive

Re: spark jobs don't require the master/worker to startup?

2022-03-09 Thread Sean Owen
Did it start successfully? What do you mean ports were not opened? On Wed, Mar 9, 2022 at 3:02 AM wrote: > Hello > > I have spark 3.2.0 deployed in localhost as the standalone mode. > I even didn't run the start master and worker command: > > start-master.sh > start-worker.sh

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Sean Owen
That isn't a bug - you can't change the classpath once the JVM is executing. On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła wrote: > Hi, > My use case is that, I have a long running process (orchestrator) with > multiple tasks, some tasks might require extra spark dependencies. It seems > once

[SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Hi, My use case is that, I have a long running process (orchestrator) with multiple tasks, some tasks might require extra spark dependencies. It seems once the spark context is started it's not possible to update `spark.jars.packages`? I have reported an issue at

spark jobs don't require the master/worker to startup?

2022-03-09 Thread capitnfrakass
Hello I have spark 3.2.0 deployed in localhost as the standalone mode. I even didn't run the start master and worker command: start-master.sh start-worker.sh spark://127.0.0.1:7077 And the ports (such as 7077) were not opened there. But I still can login into pyspark to run the jobs.