ron8hu commented on a change in pull request #31611:
URL: https://github.com/apache/spark/pull/31611#discussion_r589860730



##########
File path: docs/monitoring.md
##########
@@ -479,11 +479,27 @@ can be identified by their `[attempt-id]`. In the API 
listed below, when running
     <td><code>/applications/[app-id]/stages/[stage-id]</code></td>
     <td>
       A list of all attempts for the given stage.
+        <br><code>?details=true</code> lists all attempts with the task data 
for the given stage.
+        <br><code>?withSummaries=true</code> lists task metrics distribution 
and executor metrics distribution of each attempt.
+        <br><code>?quantiles=0.1,0.25,0.5,0.75,1.0</code> summarize the 
metrics with the given quantiles. Query parameter quantiles takes effect only 
when <code>withSummaries=true</code>. Default value is 
<code>0.0,0.25,0.5,0.75,1.0</code>. 

Review comment:
       The default quantiles value is 0.0,0.25,0.5,0.75,1.0 so that it is 
consistent with Spark web UI.   The quantile 1.0 value is useful to show the 
maximal metric value so that users can easily identify the bottleneck by using 
the REST call.  To my understanding, some downstream product such as 
LinkedIn/Dr. Elephant (https://github.com/linkedin/dr-elephant) computes the 
ratio of quantile-1.0-value/quantile-0.5-value to decide how 
skewed/bottlenecked the load is among the tasks/executors of a stage.
   If a user does not like the default value, he can always change it by 
explicitly specifying the quantile values he wants.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to