ron8hu commented on a change in pull request #31611:
URL: https://github.com/apache/spark/pull/31611#discussion_r589860730
##########
File path: docs/monitoring.md
##########
@@ -479,11 +479,27 @@ can be identified by their `[attempt-id]`. In the API
listed below, when running
<td><code>/applications/[app-id]/stages/[stage-id]</code></td>
<td>
A list of all attempts for the given stage.
+ <br><code>?details=true</code> lists all attempts with the task data
for the given stage.
+ <br><code>?withSummaries=true</code> lists task metrics distribution
and executor metrics distribution of each attempt.
+ <br><code>?quantiles=0.1,0.25,0.5,0.75,1.0</code> summarize the
metrics with the given quantiles. Query parameter quantiles takes effect only
when <code>withSummaries=true</code>. Default value is
<code>0.0,0.25,0.5,0.75,1.0</code>.
Review comment:
The default quantiles value is 0.0,0.25,0.5,0.75,1.0 so that it is
consistent with Spark web UI. The quantile 1.0 value is useful to show the
maximal metric value so that users can easily identify the bottleneck by using
the REST call. To my understanding, some downstream product such as
LinkedIn/Dr. Elephant (https://github.com/linkedin/dr-elephant) computes the
ratio of quantile-1.0-value/quantile-0.5-value to decide how
skewed/bottlenecked the load is among the tasks/executors of a stage.
If a user does not like the default value, he can always change it by
explicitly specifying the quantile values he wants.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]