i use a SparkListener to collect info about failures in task related to my
RDD.

to do so for every stage submitted i verify if the stage is for an RDD that
is a dependency of my target target RDD (including the target RDD itself).

then for every task ending i check if the task is for a stage i care about,
after which i collect any errors for the task (for which i already have to
break the spark API, since i currently cannot pattern match on
taskEnd.reason due to the private nature of ExceptionFailure and friends.

all of this simply to be able to provide the user with a useful error
message as to why the calculation failed (as opposed to: fetch failed more
than 4 times).


On Fri, Nov 29, 2013 at 3:09 PM, Koert Kuipers <[email protected]> wrote:

> in 0.9-SNAPSHOT StageInfo has been changed to make the stage itself no
> longer accessible.
>
> however the stage contains the rdd, which is necessary to tie this
> StageInfo to an RDD. now all we have is the rddName. is the rddName
> guaranteed to be unique, and can it be relied upon to identify RDDs?
>

Reply via email to