LuciferYang commented on code in PR #55787:
URL: https://github.com/apache/spark/pull/55787#discussion_r3225033192


##########
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala:
##########
@@ -74,6 +75,11 @@ private[v1] class SqlResource extends BaseAppResource {
    * Server-side DataTables endpoint for SQL executions listing.
    * Accepts DataTables server-side parameters (start, length, order, search)
    * and returns paginated results with recordsTotal/recordsFiltered counts.
+   *
+   * When `groupSubExecution=true` (default = 
`spark.ui.sql.groupSubExecutionEnabled`),

Review Comment:
   The actual config (from 
`core/src/main/scala/org/apache/spark/internal/config/UI.scala`) is 
`spark.ui.groupSQLSubExecutionEnabled`. Fix the docstring so users don't waste 
time searching for a non-existent key.



##########
sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala:
##########
@@ -215,6 +215,69 @@ class SqlResourceWithActualMetricsSuite
     }
   }
 
+  test("SPARK-56811: sqlTable groups sub-executions under their root 
execution") {
+    // CACHE TABLE produces a root execution plus an inner sub-execution that
+    // shares its rootExecutionId. This is the canonical case where the SQL
+    // listing should fold the sub row under the root rather than flattening 
it.
+    spark.sql("CREATE OR REPLACE TEMP VIEW spark_56811 AS SELECT id FROM 
RANGE(10)")
+      .collect()
+    spark.sql("CACHE TABLE spark_56811_cached AS SELECT * FROM 
spark_56811").collect()
+
+    eventually(timeout(10.seconds), interval(1.second)) {
+      val baseUrl = spark.sparkContext.ui.get.webUrl +
+        
s"/api/v1/applications/${spark.sparkContext.applicationId}/sql/sqlTable"
+
+      // Grouping ON: roots only, with subExecutions embedded on the root that
+      // owns a sub-execution.
+      val groupedUrl = new URI(
+        s"$baseUrl?start=0&length=100&draw=1&groupSubExecution=true").toURL
+      val (groupedCode, groupedOpt, _) = getContentAndCode(groupedUrl)
+      assert(groupedCode === HttpServletResponse.SC_OK)
+      val groupedJson = JsonMethods.parse(groupedOpt.get)
+      val groupedRecordsTotal = (groupedJson \ "recordsTotal").extract[Long]
+      val groupedRecordsFiltered = (groupedJson \ 
"recordsFiltered").extract[Long]
+      val groupedRows = (groupedJson \ "aaData").children
+      assert(groupedRecordsTotal === groupedRows.size,
+        "with no filter, recordsTotal should match returned root count")
+      assert(groupedRecordsFiltered === groupedRows.size,
+        "with no filter, recordsFiltered should match returned root count")
+      groupedRows.foreach { row =>

Review Comment:
   This asserts that **every** grouped row satisfies `id == rootExecutionId`, 
but the backend (line 133-134 of `SqlResource.scala`) deliberately surfaces 
orphan sub-executions (whose parent is missing from the filtered set) as roots. 
Those orphans have `id != rootExecutionId`. The test passes today only because 
`CACHE TABLE` happens not to produce orphans **and** the test runs with no 
filter (so every parent is present); the assertion encodes a property that 
holds only in the unfiltered case but is documented as a universal "grouped row 
is a root" invariant.



##########
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala:
##########
@@ -94,9 +103,12 @@ private[v1] class SqlResource extends BaseAppResource {
         .filter(_.nonEmpty)
       val needsFilter = searchValue.isDefined || statusFilter.isDefined
 
+      // Always load all execs once. The KVStore-level pagination optimization

Review Comment:
   ```
   // Always load all execs once. We need the full set to (a) identify orphan
   // sub-executions whose root is filtered out and (b) count root rows for
   // `recordsTotal`. `sqlStore.executionsList()` is already a full
   // materialization, so there is no separate "KVStore-pagination" path being
   // disabled here.
   ```



##########
sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala:
##########
@@ -215,6 +215,69 @@ class SqlResourceWithActualMetricsSuite
     }
   }
 
+  test("SPARK-56811: sqlTable groups sub-executions under their root 
execution") {
+    // CACHE TABLE produces a root execution plus an inner sub-execution that
+    // shares its rootExecutionId. This is the canonical case where the SQL
+    // listing should fold the sub row under the root rather than flattening 
it.
+    spark.sql("CREATE OR REPLACE TEMP VIEW spark_56811 AS SELECT id FROM 
RANGE(10)")

Review Comment:
   `CACHE TABLE spark_56811_cached` leaves a cached table behind that persists 
into subsequent tests in this suite. Wrap with `withTempView` and an explicit 
`UNCACHE TABLE`, or add a `try { ... } finally { spark.sql("UNCACHE TABLE IF 
EXISTS spark_56811_cached") }` block.



##########
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala:
##########
@@ -125,26 +148,48 @@ private[v1] class SqlResource extends BaseAppResource {
       val start = Option(uriParams.getFirst("start")).map(_.toInt).getOrElse(0)
       val length = 
Option(uriParams.getFirst("length")).map(_.toInt).getOrElse(20)
 
-      val page = if (needsFilter) {
-        // Filter/search: sort and paginate in memory
-        val sorted = sortExecs(filteredExecs, sortCol, sortDir)
-        if (length > 0) sorted.slice(start, start + length) else sorted
-      } else {
-        // No filter: use KVStore-level pagination for efficiency
-        // KVStore returns in insertion order; sort in memory for the page
-        val execs = sqlStore.executionsList()
-        val sorted = sortExecs(execs, sortCol, sortDir)
-        if (length > 0) sorted.slice(start, start + length) else sorted
+      val sortedRoots = sortExecs(rootRows, sortCol, sortDir)
+      val page = if (length > 0) sortedRoots.slice(start, start + length) else 
sortedRoots
+
+      // Convert to Java-compatible row data; embed sub-executions when 
grouping
+      val aaData = page.map { exec =>

Review Comment:
   A single CACHE TABLE on a large nested DAG, or a deeply-nested CTAS, can 
produce dozens of sub-executions under one root. The current design embeds 
*every* sub of every visible root into the same JSON response, so a page of 20 
roots could carry hundreds of sub rows even if the user never expands any 
disclosure. For most workloads this is fine, but worth noting in the PR 
description — and worth considering a follow-up that lazy-loads sub rows on 
first expand (a separate `?rootId=N` endpoint) if a user reports payload size 
issues.



##########
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala:
##########
@@ -125,26 +148,48 @@ private[v1] class SqlResource extends BaseAppResource {
       val start = Option(uriParams.getFirst("start")).map(_.toInt).getOrElse(0)
       val length = 
Option(uriParams.getFirst("length")).map(_.toInt).getOrElse(20)
 
-      val page = if (needsFilter) {
-        // Filter/search: sort and paginate in memory
-        val sorted = sortExecs(filteredExecs, sortCol, sortDir)
-        if (length > 0) sorted.slice(start, start + length) else sorted
-      } else {
-        // No filter: use KVStore-level pagination for efficiency
-        // KVStore returns in insertion order; sort in memory for the page
-        val execs = sqlStore.executionsList()
-        val sorted = sortExecs(execs, sortCol, sortDir)
-        if (length > 0) sorted.slice(start, start + length) else sorted
+      val sortedRoots = sortExecs(rootRows, sortCol, sortDir)
+      val page = if (length > 0) sortedRoots.slice(start, start + length) else 
sortedRoots
+
+      // Convert to Java-compatible row data; embed sub-executions when 
grouping
+      val aaData = page.map { exec =>
+        val row = execToRow(exec)
+        if (groupSubExec) {
+          val subs = subsByRoot.getOrElse(exec.executionId, Seq.empty)
+          if (subs.nonEmpty) {
+            // Sort subs by id ascending so they appear in chronological order
+            val subRows = new 
java.util.ArrayList[java.util.LinkedHashMap[String, Object]]()
+            sortExecs(subs, "id", "asc").foreach(s => 
subRows.add(execToRow(s)))
+            row.put("subExecutions", subRows)
+          }
+        }
+        row
       }
 
-      // Convert to Java-compatible row data
-      val aaData = page.map(execToRow)
+      // Counts: when grouping, totals reflect root-only counts so DataTables 
shows
+      // "Showing X to Y of Z entries" matching the rows the user actually 
sees.
+      val recordsTotal = if (groupSubExec) {

Review Comment:
   When `groupSubExec && needsFilter`, this does a fresh `O(N)` Set 
construction + `O(N)` count on `allExecs`. The work is the same *flavor* as the 
filtered-set partition above (line 132-135) but on a different domain (all 
execs vs. filtered execs), so it can't be folded directly. Consider 
precomputing both partitions (filtered + unfiltered) up front behind a single 
helper, or memoizing `allRootIds` so the unfiltered count and the filtered set 
both consult the same lazily-built data.



##########
sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala:
##########
@@ -215,6 +215,69 @@ class SqlResourceWithActualMetricsSuite
     }
   }
 
+  test("SPARK-56811: sqlTable groups sub-executions under their root 
execution") {

Review Comment:
   The orphan branch on line 134 of `SqlResource.scala` is one of the most 
subtle pieces of new logic — sub appears under a root **unless** the root is 
filtered out, in which case the sub surfaces. The new test doesn't exercise 
this path. Worth a second test case using a `search[value]` that matches the 
sub but not the root (or a status filter that produces the same shape).
   



##########
sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/allexecutionspage.js:
##########
@@ -131,6 +138,91 @@ $(document).ready(function () {
       '<table id="sql-table" class="table table-striped compact cell-border" ' 
+
       'style="width:100%"></table>';
 
+    var columns = [
+      {
+        data: "id", name: "id", title: "ID",
+        render: function (data, type) {
+          if (type !== "display") return data;
+          var basePath = uiRoot + appBasePath;
+          return '<a href="' + basePath + '/SQL/execution/?id=' + data + '">' +
+            data + '</a>';
+        }
+      },
+      {
+        data: "queryId", name: "queryId", title: "Query ID",
+        orderable: false,
+        render: function (data, type) {
+          if (type !== "display" || !data) return data || "";
+          return '<span title="' + data + '">' + data.substring(0, 8) + 
'...</span>';
+        }
+      },
+      {
+        data: "status", name: "status", title: "Status",
+        render: function (data, type) {
+          if (type !== "display") return data;
+          return statusBadge(data);
+        }
+      },
+      {
+        data: "description", name: "description", title: "Description",
+        render: function (data, type, row) {
+          if (type !== "display") return data || "";
+          return descriptionHtml({ id: row.id, description: data });
+        }
+      },
+      {
+        data: "submissionTime", name: "submissionTime", title: "Submitted",
+        render: function (data, type) {
+          if (type !== "display") return data;
+          return formatDateSql(data);
+        }
+      },
+      {
+        data: "duration", name: "duration", title: "Duration",
+        render: function (data, type) {
+          if (type !== "display") return data;
+          return formatDurationSql(data);
+        }
+      },
+      {
+        data: "jobIds", name: "jobIds", title: "Succeeded Jobs",
+        orderable: false,
+        render: function (data, type) {
+          if (type !== "display") return (data || []).join(",");
+          return jobIdLinks(data || []);
+        }
+      },
+      {
+        data: "errorMessage", name: "errorMessage", title: "Error Message",
+        orderable: false,
+        render: function (data, type) {
+          if (type !== "display" || !data) return data || "";
+          if (data.length > 100) {
+            return '<span title="' + escapeHtml(data) + '">' +
+              escapeHtml(data.substring(0, 100)) + '...</span>';
+          }
+          return escapeHtml(data);
+        }
+      }
+    ];
+    if (groupSubExecEnabled) {
+      // Trailing "Sub Executions" column matching the SPARK-41752 / 4.1 
layout:
+      // shows "+N sub" when the root has children, blank otherwise. Click to
+      // expand a child row containing the sub-execution rows.
+      columns.push({
+        data: null, name: "subExecutions", title: "Sub Executions",
+        orderable: false, searchable: false,
+        className: "sub-exec-toggle",
+        render: function (data, type, row) {
+          if (type !== "display") return "";
+          var subs = row.subExecutions || [];
+          if (subs.length === 0) return "";
+          return '<a href="#" class="toggle-sub-exec">' +

Review Comment:
   The disclosure is a `<a href="#">` with a click handler — functionally a 
button. Screen readers see it as a generic link with no expanded/collapsed 
state. Worth adding:
   - `role="button"` (so AT announces it as a control, not navigation),
   - `aria-expanded="false"` on render, toggled to `"true"` / `"false"` in the 
click handler alongside the `shown` class,
   - preferably `aria-controls=` referencing the generated child row id.
   
   Existing components in the Spark UI are already inconsistent with such 
accessibility standards, and we can unify and standardize this in subsequent 
iterations.



##########
sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/allexecutionspage.js:
##########
@@ -131,6 +138,91 @@ $(document).ready(function () {
       '<table id="sql-table" class="table table-striped compact cell-border" ' 
+
       'style="width:100%"></table>';
 
+    var columns = [
+      {
+        data: "id", name: "id", title: "ID",
+        render: function (data, type) {
+          if (type !== "display") return data;
+          var basePath = uiRoot + appBasePath;
+          return '<a href="' + basePath + '/SQL/execution/?id=' + data + '">' +
+            data + '</a>';
+        }
+      },
+      {
+        data: "queryId", name: "queryId", title: "Query ID",
+        orderable: false,
+        render: function (data, type) {

Review Comment:
   `data` (queryId) is interpolated into both a `title` attribute and visible 
text without `escapeHtml`. The other rendered columns (`description`, 
`errorMessage`) use `escapeHtml`. Today queryId is a UUID, so this is safe in 
practice, but the inconsistency is the kind of thing that bites later when a 
schema changes. Consider wrapping in `escapeHtml(data)` to match the rest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to