Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/24353 )
Change subject: IMPALA-12956: Add profile summary output ...................................................................... Patch Set 4: (9 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/24353/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/24353/4//COMMIT_MSG@24 PS4, Line 24: option. It'd be nice to add tests on failed/cancelled queries, especially queries that failed due to insufficient memory and already had some progress. http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@239 PS4, Line 239: nit: use 4-space indent http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@245 PS4, Line 245: static string ProgressString(int64_t completed, int64_t total) { : stringstream ss; : ss << completed << " / " << total << " (" << std::setw(4); : if (completed == 0 || total == 0) { : ss << "0%)"; : } else { : ss << (100.0 * completed / static_cast<double>(total)) << "%)"; : } : return ss.str(); : } This duplicates ImpalaHttpHandler::ProgressToString(). Let's move it to util files (e.g. be/src/util/summary-util.[h/cc]) and share it instead. http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@281 PS4, Line 281: if (!GetTimelineDuration(summary_node, "Query Timeline", &duration)) { To be consistent with what we show in the WebUI, we should try using (end_time_us - start_time_us) first. If "End Time" is missing in the profile, using "Query Timeline" is a good fallback. http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@304 PS4, Line 304: static bool ParseBytesValue(const string& value, int64_t* bytes) { Can we reuse ParseUtil::ParseMemSpec()? https://github.com/apache/impala/blob/11b5fd6c1f5201af5710342e9c793b586664d6b1/be/src/util/parse-util.cc#L39 http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@378 PS4, Line 378: // The profile log does not keep total scan ranges, but finished queries are 100% done. We can sum values in the "File Formats" row for the total scan ranges. E.g. this has 2 scan ranges: "File Formats: PARQUET/GZIP:1 PARQUET/SNAPPY:1". This has 150 scan ranges: "File Formats: PARQUET/Unknown(Skipped):150" http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@390 PS4, Line 390: if (query_state == "FINISHED" Can we handle failed/cancelled queries here? The progress is not that helpful for completed queries. http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@461 PS4, Line 461: GetInfoString(summary_node, "End Time"), "End Time" could be missing in failed/cancelled queries. We show an empty string here instead of "N/A", which is inconsistent with other fields. How about adding a default parameter for GetInfoString and use SUMMARY_VALUE_UNAVAILABLE here? http://gerrit.cloudera.org:8080/#/c/24353/4/be/src/util/impala-profile-tool.cc@471 PS4, Line 471: query_state, "Query Status" is also helpful. "Query State" just shows EXCEPTION for failed queries. "Query Status" shows more details, e.g. "Session closed", "AnalysisException: ...", "Query xxx:xxx expired due to client inactivity (timeout is 5m)". We can truncate it when it's too long like sql statement. -- To view, visit http://gerrit.cloudera.org:8080/24353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35da958106ffb35b14ef867da50ed23ffda0a45b Gerrit-Change-Number: 24353 Gerrit-PatchSet: 4 Gerrit-Owner: Aleksandr Efimov <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 28 May 2026 06:25:09 +0000 Gerrit-HasComments: Yes
