Streaming job, catch exceptions

2019-05-11 Thread Behroz Sikander
Hello, I am using Spark 2.2.1 with standalone resource manager. I have a streaming job where from time to time jobs are aborted due to the following exception. The reasons are different e.g. FileNotFound/NullPointerException etc org.apache.spark.SparkException: Job aborted due to stage failure:

[SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread Behroz Sikander
I have a client application which launches multiple jobs in Spark Cluster using SparkLauncher. I am using *Standalone* *cluster mode*. Launching jobs works fine till now. I use launcher.startApplication() to launch. But now, I have a requirement to check the states of my Driver process. I added a

Properly stop applications or jobs within the application

2018-03-05 Thread Behroz Sikander
Hello, We are using spark-jobserver to spawn jobs in Spark cluster. We have recently faced issues with Zombie jobs in Spark cluster. This normally happens when the job is accessing some external resources like Kafka/C* and something goes wrong while consuming them. For example, if suddenly a topic

Programmatically get status of job (WAITING/RUNNING)

2017-10-30 Thread Behroz Sikander
Hi, I have a Spark Cluster running in client mode. I programmatically submit jobs to spark cluster. Under the hood, I am using spark-submit. If my cluster is overloaded and I start a context, the driver JVM keeps on waiting for executors. The executors are in waiting state because cluster does

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Behroz Sikander
On Fri, Mar 24, 2017 at 2:21 PM, Yong Zhang <java8...@hotmail.com> wrote: > I never experienced worker OOM or very rarely see this online. So my guess > that you have to generate the heap dump file to analyze it. > > > Yong > > > ------ &g

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Behroz Sikander
Thank you for the response. Yes, I am sure because the driver was working fine. Only 2 workers went down with OOM. Regards, Behroz On Fri, Mar 24, 2017 at 2:12 PM, Yong Zhang wrote: > I am not 100% sure, but normally "dispatcher-event-loop" OOM means the > driver OOM.

[Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-23 Thread Behroz Sikander
Hello, Spark version: 1.6.2 Hadoop: 2.6.0 Cluster: All VMS are deployed on AWS. 1 Master (t2.large) 1 Secondary Master (t2.large) 5 Workers (m4.xlarge) Zookeeper (t2.large) Recently, 2 of our workers went down with out of memory exception. > java.lang.OutOfMemoryError: GC overhead limit