Hello,
I am using Spark 2.2.1 with standalone resource manager.
I have a streaming job where from time to time jobs are aborted due to the
following exception. The reasons are different e.g.
FileNotFound/NullPointerException etc
org.apache.spark.SparkException: Job aborted due to stage failure:
I have a client application which launches multiple jobs in Spark Cluster
using SparkLauncher. I am using *Standalone* *cluster mode*. Launching jobs
works fine till now. I use launcher.startApplication() to launch.
But now, I have a requirement to check the states of my Driver process. I
added a
Hello,
We are using spark-jobserver to spawn jobs in Spark cluster. We have
recently faced issues with Zombie jobs in Spark cluster. This normally
happens when the job is accessing some external resources like Kafka/C* and
something goes wrong while consuming them. For example, if suddenly a topic
Hi,
I have a Spark Cluster running in client mode. I programmatically submit
jobs to spark cluster. Under the hood, I am using spark-submit.
If my cluster is overloaded and I start a context, the driver JVM keeps on
waiting for executors. The executors are in waiting state because cluster
does
On Fri, Mar 24, 2017 at 2:21 PM, Yong Zhang <java8...@hotmail.com> wrote:
> I never experienced worker OOM or very rarely see this online. So my guess
> that you have to generate the heap dump file to analyze it.
>
>
> Yong
>
>
> ------
&g
Thank you for the response.
Yes, I am sure because the driver was working fine. Only 2 workers went
down with OOM.
Regards,
Behroz
On Fri, Mar 24, 2017 at 2:12 PM, Yong Zhang wrote:
> I am not 100% sure, but normally "dispatcher-event-loop" OOM means the
> driver OOM.
Hello,
Spark version: 1.6.2
Hadoop: 2.6.0
Cluster:
All VMS are deployed on AWS.
1 Master (t2.large)
1 Secondary Master (t2.large)
5 Workers (m4.xlarge)
Zookeeper (t2.large)
Recently, 2 of our workers went down with out of memory exception.
> java.lang.OutOfMemoryError: GC overhead limit