Hm, but you already only have to define it in one place, rather than on each transformation. I thought you wanted exception handling at each transformation?
Or do you want it once for all actions? you can enclose all actions in a try-catch block, I suppose, to write exception handling code once. You can easily write a Scala construct that takes a function and logs exceptions it throws, and the function you pass can invoke an RDD action. So you can refactor that way too. On Wed, Mar 11, 2015 at 2:39 PM, Michal Klos <michal.klo...@gmail.com> wrote: > Is there a way to have the exception handling go lazily along with the > definition? > > e.g... we define it on the RDD but then our exception handling code gets > triggered on that first action... without us having to define it on the > first action? (e.g. that RDD code is boilerplate and we want to just have it > in many many projects) > > m > > On Wed, Mar 11, 2015 at 10:08 AM, Sean Owen <so...@cloudera.com> wrote: >> >> Handling exceptions this way means handling errors on the driver side, >> which may or may not be what you want. You can also write functions >> with exception handling inside, which could make more sense in some >> cases (like, to ignore bad records or count them or something). >> >> If you want to handle errors at every step on the driver side, you >> have to force RDDs to materialize to see if they "work". You can do >> that with .count() or .take(1).length > 0. But to avoid recomputing >> the RDD then, it needs to be cached. So there is a big non-trivial >> overhead to approaching it this way. >> >> If you go this way, consider materializing only a few key RDDs in your >> flow, not every one. >> >> The most natural thing is indeed to handle exceptions where the action >> occurs. >> >> >> On Wed, Mar 11, 2015 at 1:51 PM, Michal Klos <michal.klo...@gmail.com> >> wrote: >> > Hi Spark Community, >> > >> > We would like to define exception handling behavior on RDD instantiation >> > / >> > build. Since the RDD is lazily evaluated, it seems like we are forced to >> > put >> > all exception handling in the first action call? >> > >> > This is an example of something that would be nice: >> > >> > def myRDD = { >> > Try { >> > val rdd = sc.textFile(...) >> > } match { >> > Failure(e) => Handle ... >> > } >> > } >> > >> > myRDD.reduceByKey(...) //don't need to worry about that exception here >> > >> > The reason being that we want to try to avoid having to copy paste >> > exception >> > handling boilerplate on every first action. We would love to define this >> > once somewhere for the RDD build code and just re-use. >> > >> > Is there a best practice for this? Are we missing something here? >> > >> > thanks, >> > Michal > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org