planga82 commented on a change in pull request #25598: 
[SPARK-28542][DOCS][WebUI] Stages Tab
URL: https://github.com/apache/spark/pull/25598#discussion_r319268163
 
 

 ##########
 File path: docs/web-ui.md
 ##########
 @@ -94,9 +94,76 @@ This page displays the details of a specific job identified 
by its job ID.
 </p>
 
 ## Stages Tab
+
 The Stages tab displays a summary page that shows the current state of all 
stages of all jobs in
-the Spark application, and, when you click on a stage, a details page for that 
stage. The details
-page shows the event timeline, DAG visualization, and all tasks for the stage.
+the Spark application.
+
+At the beginning of the page is the summary with the count of all stages by 
status (active, pending, completed, sikipped, and failed)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages 
header" width="30%">
+</p>
+
+In [Fair scheduling 
mode](job-scheduling.html#scheduling-within-an-application) there is a table 
that displays [pools 
properties](job-scheduling.html#configuring-pool-properties)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail2.png" title="Pool properties" alt="Pool 
properties">
+</p>
+
+After that are the details of stages per status (active, pending, 
completed,skipped, failed). In active stages, it's possible to kill the stage 
with the kill button. Only in failure stages, failure reason is shown. There is 
 access to the task detail by clicking on the description.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail3.png" title="Stages detail" alt="Stages 
detail">
+</p>
+
+### Stage detail
+The summary is at the beginning of the page with information like Total time 
across all tasks, [Locality level summary](tuning.html#data-locality) , 
[Shuffle Read Size / Records](rdd-programming-guide.html#shuffle-operations) 
and Associated Job Ids.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail4.png" title="Stage header" alt="Stage 
header" width="30%">
+</p>
+
+There is also the visual representatión of the directed acyclic graph (DAG) of 
this stage, where vertices represent the RDDs or DataFrames and the edges 
represent an operation to be applied
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail5.png" title="Stage DAG" alt="Stage DAG" 
width="50%">
+</p>
+
+Summary metrics for all task are represented in a table and in a timeline
+* **[Tasks deserialization 
time](configuration.html#compression-and-serialization)**
+* **Duration of tasks**
+* **GC time**
+* **Result serialization time** is the time spent serializing the task result 
on a executor before sending it back to the driver
+* **Getting result time** is the time that the driver spends fetching task 
results from workers
+* **Scheduler delay** includes the time to ship the task from the scheduler to 
executors, and the time to send the task result from the executors to the 
scheduler
 
 Review comment:
   Yes, I think it includes return time but it's not true. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to