Re: Streaming job, catch exceptions

Jason Nerothin Tue, 21 May 2019 07:26:14 -0700

Correction: The Driver manages the Tasks, the resource manager serves up
resources to the Driver or Task.


On Tue, May 21, 2019 at 9:11 AM Jason Nerothin <jasonnerot...@gmail.com>
wrote:

> The behavior is a deliberate design decision by the Spark team.
>
> If Spark were to "fail fast", it would prevent the system from recovering
> from many classes of errors that are in principle recoverable (for example
> if two otherwise unrelated jobs cause a garbage collection spike on the
> same node). Checkpointing and similar features have been added to support
> high availability and platform resilience.
>
> As regards the more general topic of exception handling and recovery, I
> side with Bruce Eckel <https://www.artima.com/intv/handcuffs.html> and
> (for Java) Josh Bloch (see Effective Java, Exception Handling). The
> Scala+functional community is similarly opinionated against using
> exceptions for explicit control flow. (scala.util.Try is useful for
> supporting libraries that don't share this opinion.)
>
> Higher-level design thoughts:
>
> I recommend reading Chapter 15 of Chambers & Zaharia's *Spark The
> Definitive Guide *(at least)*. *The Spark engine makes some assumptions
> about execution boundaries being managed by Spark (that the Spark Jobs get
> broken into Tasks on the Executor and are managed by the resource manager).
> If multiple Threads are executing within a given Task, I would expect
> things like data exchange/shuffle to get unpredictable.
>
> Said a different way: Spark is a micro-batch architecture, even when using
> the streaming apis. The Spark Application is assumed to be relatively
> light-weight (the goal is to parallelize execution across big data, after
> all).
>
> You might also look at the way the Apache Livy
> <https://livy.incubator.apache.org/> team is implementing their solution.
>
> HTH
> Jason
>
>
> On Tue, May 21, 2019 at 6:04 AM bsikander <behro...@gmail.com> wrote:
>
>> Ok, I found the reason.
>>
>> In my QueueStream example, I have a while(true) which keeps on adding the
>> RDDs, my awaitTermination call if after the while loop. Since, the while
>> loop never exits, awaitTermination never gets fired and never get reported
>> the exceptions.
>>
>>
>> The above was just the problem with the code that I tried to show my
>> problem
>> with.
>>
>> My real problem was due to the shutdown behavior of Spark. Spark streaming
>> does the following
>>
>> - context.start() triggers the pipeline, context.awaitTerminate() block
>> the
>> current thread, whenever an exception is reported, awaitTerminated throws
>> an
>> exception. Since generally, we never have any code after awaitTerminate,
>> the
>> shutdown hooks get called which stops the spark context.
>>
>> - I am using spark-jobserver, when an exception is reported from
>> awaitTerminate, jobserver catches the exception and updates the status of
>> job in database but the driver process keeps on running because the main
>> thread in driver is waiting for an Akka actor to shutdown which belongs to
>> jobserver. Since, it never shutsdown, the driver keeps on running and no
>> one
>> executes a context.stop(). Since context.stop() is not executed, the
>> jobschedular and generator keeps on running and job also keeps on going.
>>
>> This implicit behavior of Spark where it relies on shutdown hooks to close
>> the context is a bit strange. I believe that as soon as an exception is
>> reported, the spark should just execute context.stop(). This behavior can
>> have serious consequence e.g. data loss. Will fix it though.
>>
>> What is your opinion on stopping the context as soon as an exception is
>> raised?
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
> --
> Thanks,
> Jason
>


-- 
Thanks,
Jason

Re: Streaming job, catch exceptions

Reply via email to