> > When we time an action it includes all the transformations timings too, > and it is not clear which transformation takes how long. Is there a way of > timing each transformation separately?
Not really, because even though you may logically specify several different transformations within your Spark job, transformations within a single stage will typically get pipelined into a single transformation, so separate timing information for each logical transformation no longer makes sense and is not available. The best you are going to be able to do is stage- and task-level information, with stages defined by shuffle boundaries and tasks being units of work within a stage and on a particular RDD partition. On Tue, Jan 7, 2014 at 9:00 AM, Aureliano Buendia <[email protected]>wrote: > Hi, > > When we time an action it includes all the transformations timings too, > and it is not clear which transformation takes how long. Is there a way of > timing each transformation separately? > > Also, does spark provide a way of more detailed progress reporting, broken > to transformation steps? For example, can the web ui progress report be > broken into transformation steps, can we give each transformation step a > name? >
