Correction: The Driver manages the Tasks, the resource manager serves up resources to the Driver or Task.
On Tue, May 21, 2019 at 9:11 AM Jason Nerothin <jasonnerot...@gmail.com> wrote: > The behavior is a deliberate design decision by the Spark team. > > If Spark were to "fail fast", it would prevent the system from recovering > from many classes of errors that are in principle recoverable (for example > if two otherwise unrelated jobs cause a garbage collection spike on the > same node). Checkpointing and similar features have been added to support > high availability and platform resilience. > > As regards the more general topic of exception handling and recovery, I > side with Bruce Eckel <https://www.artima.com/intv/handcuffs.html> and > (for Java) Josh Bloch (see Effective Java, Exception Handling). The > Scala+functional community is similarly opinionated against using > exceptions for explicit control flow. (scala.util.Try is useful for > supporting libraries that don't share this opinion.) > > Higher-level design thoughts: > > I recommend reading Chapter 15 of Chambers & Zaharia's *Spark The > Definitive Guide *(at least)*. *The Spark engine makes some assumptions > about execution boundaries being managed by Spark (that the Spark Jobs get > broken into Tasks on the Executor and are managed by the resource manager). > If multiple Threads are executing within a given Task, I would expect > things like data exchange/shuffle to get unpredictable. > > Said a different way: Spark is a micro-batch architecture, even when using > the streaming apis. The Spark Application is assumed to be relatively > light-weight (the goal is to parallelize execution across big data, after > all). > > You might also look at the way the Apache Livy > <https://livy.incubator.apache.org/> team is implementing their solution. > > HTH > Jason > > > On Tue, May 21, 2019 at 6:04 AM bsikander <behro...@gmail.com> wrote: > >> Ok, I found the reason. >> >> In my QueueStream example, I have a while(true) which keeps on adding the >> RDDs, my awaitTermination call if after the while loop. Since, the while >> loop never exits, awaitTermination never gets fired and never get reported >> the exceptions. >> >> >> The above was just the problem with the code that I tried to show my >> problem >> with. >> >> My real problem was due to the shutdown behavior of Spark. Spark streaming >> does the following >> >> - context.start() triggers the pipeline, context.awaitTerminate() block >> the >> current thread, whenever an exception is reported, awaitTerminated throws >> an >> exception. Since generally, we never have any code after awaitTerminate, >> the >> shutdown hooks get called which stops the spark context. >> >> - I am using spark-jobserver, when an exception is reported from >> awaitTerminate, jobserver catches the exception and updates the status of >> job in database but the driver process keeps on running because the main >> thread in driver is waiting for an Akka actor to shutdown which belongs to >> jobserver. Since, it never shutsdown, the driver keeps on running and no >> one >> executes a context.stop(). Since context.stop() is not executed, the >> jobschedular and generator keeps on running and job also keeps on going. >> >> This implicit behavior of Spark where it relies on shutdown hooks to close >> the context is a bit strange. I believe that as soon as an exception is >> reported, the spark should just execute context.stop(). This behavior can >> have serious consequence e.g. data loss. Will fix it though. >> >> What is your opinion on stopping the context as soon as an exception is >> raised? >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > -- > Thanks, > Jason > -- Thanks, Jason