Okay, so out of 164 stages, is 163 are skipped. And how 41405 tasks are skipped if the total is only 19788.
On Wed, Mar 16, 2016 at 6:31 AM, Mark Hamstra <m...@clearstorydata.com> wrote: > It's not just if the RDD is explicitly cached, but also if the map outputs > for stages have been materialized into shuffle files and are still > accessible through the map output tracker. Because of that, explicitly > caching RDD actions often gains you little or nothing, since even without a > call to cache() or persist() the prior computation will largely be reused > and stages will show up as skipped -- i.e. no need to recompute that stage. > > On Tue, Mar 15, 2016 at 5:50 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> If RDD is cached, this RDD is only computed once and the stages for >> computing this RDD in the following jobs are skipped. >> >> >> On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph < >> prabhujose.ga...@gmail.com> wrote: >> >>> Hi All, >>> >>> >>> Spark UI Completed Jobs section shows below information, what is the >>> skipped value shown for Stages and Tasks below. >>> >>> Job_ID Description Submitted Duration >>> Stages (Succeeded/Total) Tasks (for all stages): Succeeded/Total >>> >>> 11 count 2016/03/14 15:35:32 1.4 >>> min 164/164 * (163 skipped) * 19841/19788 >>> *(41405 skipped)* >>> Thanks, >>> Prabhu Joseph >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > >