Hi Sean, Thanks for replying but my question was about multiple stages running the same line of code, not about multiple stages in general. Yes single job can have multiple stages, but they should not be repeated, as far as I know, if you're caching/persisting your intermediate outputs.
My question is why am I seeing multiple stages running the same line of code? As I understand it stage is a grouping of operations that can be executed without shuffling data or invoking a new action and they are divided into tasks, and tasks are the ones that are executed in parallel and can have the same line of code running on different executors. Or is this assumption wrong? Thanks, Joe On Thu, 2022-04-21 at 09:14 -0500, Sean Owen wrote: > A job can have multiple stages for sure. One action triggers a job. > This seems normal. > > On Thu, Apr 21, 2022, 9:10 AM Joe <j...@net2020.org> wrote: > > Hi, > > When looking at application UI (in Amazon EMR) I'm seeing one job > > for > > my particular line of code, for example: > > 64 Running count at MySparkJob.scala:540 > > > > When I click into the job and go to stages I can see over a 100 > > stages > > running the same line of code (stages are active, pending or > > completed): > > 190 Pending count at MySparkJob.scala:540 > > ... > > 162 Active count at MySparkJob.scala:540 > > ... > > 108 Completed count at MySparkJob.scala:540 > > ... > > > > I'm not sure what that means, I thought that stage was a logical > > operation boundary and you could have only one stage in the job > > (unless > > you executed the same dataset+action many times on purpose) and > > tasks > > were the ones that were replicated across partitions. But here I'm > > seeing many stages running, each with the same line of code? > > > > I don't have a situation where my code is re-processing the same > > set of > > data many times, all intermediate sets are persisted. > > I'm not sure if EMR UI display is wrong or if spark stages are not > > what > > I thought they were? > > Thanks, > > > > Joe > > > > > > > > ------------------------------------------------------------------- > > -- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org