[ https://issues.apache.org/jira/browse/SPARK-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nirav patel updated SPARK-15845: -------------------------------- Summary: Expose metrics for sub-task transformations and action (was: Expose metrics for sub-task steps ) > Expose metrics for sub-task transformations and action > ------------------------------------------------------- > > Key: SPARK-15845 > URL: https://issues.apache.org/jira/browse/SPARK-15845 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.5.2 > Reporter: nirav patel > > Spark optimizes DAG processing by efficiently selecting stage boundaries. > This makes spark stage a sequence of multiple transformation and one or zero > action. As Aa result stage that spark is currently running can be internally > series of (map -> shuffle -> map -> map -> collect) Notice here that it goes > pass shuffle dependency and includes the next transformations and actions > into same stage. So any task of this stage is essentially doing all those > transformation/actions as a Unit and there is no further visibility inside > it. Basically network read, populating partitions, compute, shuffle write, > shuffle read, compute, writing final partitions to disk ALL happens within > one stage! Means all tasks of that stage is basically doing all those > operations on single partition as a unit. This takes away huge visibility > into users transformation and actions in terms of which one is taking longer > or which one is resource bottleneck and which one is failing. > spark UI just shows its currently running some action stage. If job fails at > that point spark UI just says Action failed but in fact it could be any stage > in that lazy chain of evaluation. Looking at executor logs gives some > insights but that's not always straightforward. > I think we need more visibility into what's happening underneath a task > (series of spark transformations/actions that comprise a stage) so we can > easily troubleshoot as well as find bottlenecks and optimize our DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org