Hm, but you already only have to define it in one place, rather than
on each transformation. I thought you wanted exception handling at
each transformation?

Or do you want it once for all actions? you can enclose all actions in
a try-catch block, I suppose, to write exception handling code once.
You can easily write a Scala construct that takes a function and logs
exceptions it throws, and the function you pass can invoke an RDD
action. So you can refactor that way too.

On Wed, Mar 11, 2015 at 2:39 PM, Michal Klos <michal.klo...@gmail.com> wrote:
> Is there a way to have the exception handling go lazily along with the
> definition?
>
> e.g... we define it on the RDD but then our exception handling code gets
> triggered on that first action... without us having to define it on the
> first action? (e.g. that RDD code is boilerplate and we want to just have it
> in many many projects)
>
> m
>
> On Wed, Mar 11, 2015 at 10:08 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>> Handling exceptions this way means handling errors on the driver side,
>> which may or may not be what you want. You can also write functions
>> with exception handling inside, which could make more sense in some
>> cases (like, to ignore bad records or count them or something).
>>
>> If you want to handle errors at every step on the driver side, you
>> have to force RDDs to materialize to see if they "work". You can do
>> that with .count() or .take(1).length > 0. But to avoid recomputing
>> the RDD then, it needs to be cached. So there is a big non-trivial
>> overhead to approaching it this way.
>>
>> If you go this way, consider materializing only a few key RDDs in your
>> flow, not every one.
>>
>> The most natural thing is indeed to handle exceptions where the action
>> occurs.
>>
>>
>> On Wed, Mar 11, 2015 at 1:51 PM, Michal Klos <michal.klo...@gmail.com>
>> wrote:
>> > Hi Spark Community,
>> >
>> > We would like to define exception handling behavior on RDD instantiation
>> > /
>> > build. Since the RDD is lazily evaluated, it seems like we are forced to
>> > put
>> > all exception handling in the first action call?
>> >
>> > This is an example of something that would be nice:
>> >
>> > def myRDD = {
>> > Try {
>> > val rdd = sc.textFile(...)
>> > } match {
>> > Failure(e) => Handle ...
>> > }
>> > }
>> >
>> > myRDD.reduceByKey(...) //don't need to worry about that exception here
>> >
>> > The reason being that we want to try to avoid having to copy paste
>> > exception
>> > handling boilerplate on every first action. We would love to define this
>> > once somewhere for the RDD build code and just re-use.
>> >
>> > Is there a best practice for this? Are we missing something here?
>> >
>> > thanks,
>> > Michal
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to