zhengruifeng commented on a change in pull request #25349:
[SPARK-28538][UI][WIP] Document SQL page
URL: https://github.com/apache/spark/pull/25349#discussion_r311332004
##########
File path: docs/monitoring.md
##########
@@ -40,6 +40,100 @@ To view the web UI after the fact, set
`spark.eventLog.enabled` to true before s
application. This configures Spark to log Spark events that encode the
information displayed
in the UI to persisted storage.
+## Web UI Tabs
+The web UI provides an overview of the Spark cluster and is composed of
following tabs:
+
+### Jobs Tab
+The Jobs tab displays a summary page of all jobs in the Spark application and
a detailed page
+for each job. The summary page shows high-level information, such as the
status, duration, and
+progress of all jobs and the overall event timeline. When you click on a job
on the summary
+page, you see the detailed page for that job. The detailed page further shows
the event timeline,
+DAG visualization, and all stages of the job.
+
+### Stages Tab
+The Stages tab displays a summary page that shows the current state of all
stages of all jobs in
+the Spark application, and, when you click on a stage, a detailed page for
that stage. The details
+page shows the event timeline, DAG visualization, and all tasks for the stage.
+
+### Storage Tab
+The Storage tab displays the persisted RDDs, if any, in the application. The
summary page shows
+the storage levels, sizes and partitions of all RDDs, and the detailed page
shows the sizes and
+using executors for all partitions in an RDD.
+
+### Environment Tab
+The Environment tab displays the values for the different environment and
configuration variables,
+including JVM, Spark, and system properties.
+
+### Executors Tab
+The Executors tab displays summary information about the executors that were
created for the
+application, including memory and disk usage and task and shuffle information.
The Storage Memory
+column shows the amount of memory used and reserved for caching data.
+
+### SQL Tab
+If the application executes Spark SQL queries, the SQL tab displays
information, such as the duration,
+jobs, and physical and logical plans for the queries. Here we include a basic
example to illustrate
+this tab:
+{% highlight scala %}
+scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
+df: org.apache.spark.sql.DataFrame = [count: int, name: string]
+
+scala> df.count
+res0: Long = 3
+
+scala> df.createGlobalTempView("df")
+
+scala> spark.sql("select name,sum(count) from global_temp.df group by
name").show
++----+----------+
+|name|sum(count)|
++----+----------+
+|andy| 3|
+| bob| 2|
++----+----------+
+{% endhighlight %}
+
+<p style="text-align: center;">
+ <img src="img/webui-sql-tab.png"
+ title="SQL tab"
+ alt="SQL tab"
+ width="80%" />
+ <!-- Images are downsized intentionally to improve quality on retina
displays -->
+</p>
+
+Now the above three dataframe/SQL operators are shown in the list. If we click
the
+'show at \<console\>: 24' link of the last query, we will see the DAG of the
job.
+
+<p style="text-align: center;">
+ <img src="img/webui-sql-dag.png"
+ title="SQL DAG"
+ alt="SQL DAG"
+ width="50%" />
+ <!-- Images are downsized intentionally to improve quality on retina
displays -->
+</p>
+
+We can see that detailed information of each stage. The first block
'WholeStageCodegen'
+compile multiple operator ('LocalTableScan' and 'HashAggregate') together into
a single Java
+function to improve performance, and metrics like number of rows and spill
size are listed in
+the block. The second block 'Exchange' shows the metrics on the shuffle
exchange, including
+number of written shuffle records, total data size, etc.
+
+
+<p style="text-align: center;">
+ <img src="img/webui-sql-plan.png"
Review comment:
HI, @dongjoon-hyun I local run the example and make screenshots to png
files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]