Hi, I'm confused about the Stage times reported on the Spark-UI (Spark 1.1.0) for an Spark-Streaming job. I'm hoping somebody can shine some light on it:
Let's do this with an example: On the /stages page, stage # 232 is reported to have lasted 18 seconds: 232runJob at RDDFunctions.scala:23 <http://localhost:24040/stages/stage?id=232&attempt=0>+details 2014/12/08 15:06:2518 s 12/12 When I click on it for details, I see: [1] Total time across all tasks = 42s Aggregated metrics by executor: Executor1 19s Executor2 24s Summing all tasks is actually: 40,009s What is the time reported on the overview page? (18s?) What is relation between the reported time on the overview and the detail page? My Spark Streaming job is reported to be taking 3m24s, and (I think) there's only 1 stage in my job. How does the timing per stage relate to the Spark Streaming reported in the 'streaming' page ? (e.g. 'last batch') ? Is there a way to relate a streaming batch to the stages executed to complete that batch? The numbers as they are at the moment don't seem to add up. Thanks, Gerard. [1] https://drive.google.com/file/d/0BznIWnuWhoLlMkZubzY2dTdOWDQ