Yes.

If the job fails repeatedly (4 times in this case), Spark assumes that
there is a problem in the Job and notifies the user. In exchange for this,
the engine can go on to serve other jobs with its available resources.

I would try the following until things improve:

1. Figure out what's wrong with the Job that's failing
2. Make exception handling more functional:
http://oneeyedmen.com/failure-is-not-an-option-part-4.html (kotlin, but
ideas still work)

Why do #2? Because it will force you to decide which exceptions: your code
should handle, spark should handle, should cause job failure (current
behavior).

By analogy to the halting problem
<https://en.wikipedia.org/wiki/Chaitin%27s_constant#Relationship_to_the_halting_problem>,
I believe that expecting a program to handle all possible exceptional
states is unreasonable.

Jm2c
Jason


On Tue, May 21, 2019 at 9:30 AM bsikander <behro...@gmail.com> wrote:

> umm, i am not sure if I got this fully.
>
> It is a design decision to not have context.stop() right after
> awaitTermination throws exception?
>
> So, the ideology is that if after n tries (default 4) a task fails, the
> spark should fail fast and let user know? Is this correct?
>
>
> As you mentioned there are many error classes and as the chances of getting
> an exception are quite high. If the above ideology is correct then it makes
> it really hard to keep the job up and running all the time especially
> streaming cases.
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Thanks,
Jason

Reply via email to