I've had very good success troubleshooting this type of thing by using the Spark Web UI, which will depict a breakdown of all tasks. This also includes the RDDs being used, as well as any cached data. Additional information about this tool can be found at http://spark.apache.org/docs/latest/monitoring.html.
On Thu, Mar 10, 2016 at 1:31 PM, souri datta <souri.isthe...@gmail.com> wrote: > Hi, > Currently I am trying to optimize my spark application and in that > process, I am trying to figure out if at any stage in the code, I am > recomputing a large RDD (so that I can optimize it by > persisting/checkpointing it). > > Is there any indication in the event logs that tells us about an RDD being > computed? > If anyone has done similar analysis, can you please share how you went > about it. > > Thanks in advance, > Souri >