dilipbiswal commented on a change in pull request #25658: [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for Query Plan URL: https://github.com/apache/spark/pull/25658#discussion_r320338379
########## File path: docs/web-ui.md ########## @@ -363,6 +363,42 @@ number of written shuffle records, total data size, etc. Clicking the 'Details' link on the bottom displays the logical plans and the physical plan, which illustrate how Spark parses, analyzes, optimizes and performs the query. +### SQL metrics + +The metrics of SQL operators show in the block of operators. The SQL metrics can be useful when +we want to dive into the execution details of each operator, for example, how many rows are output +after a Filter operator. The related metrics are different for each type of operator, for example +Exchange has the metrics called "shuffle bytes writte total" which shows the number of bytes written +by shuffle. + +Here is the list of some SQL metrics: + +<table class="table"> +<tr><th>SQL metrics</th><th>Meaning</th><th>Operators</th></tr> +<tr><td> <code>number of output rows</code> </td><td> the number of output rows of the operator </td><td> Aggregate operators, Join operators, Sample, Range, Scan operators, Filter, etc.</td>></tr> +<tr><td> <code>data size</code> </td><td> the size of broadcasted/shuffled/collected data of the operator </td><td> BroadcastExchange, ShuffleExchange, Subquery </td></tr> +<tr><td> <code>time to collect</code> </td><td> the time spent to collect data </td><td> BroadcastExchange, Subquery </td></tr> +<tr><td> <code>scan time</code> </td><td> the time spent to scan data </td><td> ColumnarBatchScan, FileSourceScan </td></tr> +<tr><td> <code>metadata time</code> </td><td> the time spent on getting metadata like number of partitions, number of files </td><td> FileSourceScan </td></tr> +<tr><td> <code>shuffle bytes written</code> </td><td> number of bytes written </td><td> CollectLimit, TakeOrderedAndProject, ShuffleExchange </td></tr> +<tr><td> <code>shuffle records written</code> </td><td> number of records written </td><td> CollectLimit, TakeOrderedAndProject, ShuffleExchange </td></tr> +<tr><td> <code>shuffle write time</code> </td><td> the time on shuffle writing </td><td> CollectLimit, TakeOrderedAndProject, ShuffleExchange </td></tr> Review comment: Nit: the time spent on writing shuffle data ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
