Surya Hebbar has posted comments on this change. ( http://gerrit.cloudera.org:8080/23154 )
Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default ...................................................................... Patch Set 31: > Builds have been passing for both V1 and V2 since late January > I saw some cases where I am concerned. > from runtime_filters_mt_dop.test: > row_regex: RowsReturned: total=7.30K > How can this pass with the old profile? Hi Csaba, thank you for noticing this. This inconsistency seems to have occurred in runtime_filters_mt_dop.test and spilling.test, in about ~20 lines. While updating the patchsets and breaking them up, only those 2 file changes were missed. I have verified it again now and will make sure to update them. > I disagree with the order - the patch should be leaned down to not include > trivial test changes before being merged. I've pulled the newline-only changes out into a separate patch and posted it ahead of this change, so the main change shrinks for review. The remaining .* changes I've kept here, since they're tied to the total additions (see below). > There may be some tricky exceptions, but replacing most of the existing lines > seems fairly mechanical: > row_regex: RowsRead: (?:2.43K \(2433\)|total=7.30K \(7300\)) > -> > match_counter(RowsRead, 2433) 1. On "fairly mechanical": There are 1891 RUNTIME_PROFILE sections in .test files, so several thousand lines contain row_regex. Of them, only about ~150 counter changes are in the form suggested here, and these have been updated in this patch. But a substantial portion (~few thousand lines) are row_regex or line matches like this: NumModifiedRows: 0 NumDeletedRows: 1 row_regex: Per-Host Resource Estimates: Memory=10MB row_regex: Cluster Memory Admitted: 50.00 MB Rewriting all of them at once is hard, as they do not follow this form. Many of them also depend on the pretty-printed representation of the profile, so they cannot directly be converted to match_counter or a similar approach. 2. For the counter lines, matching a specific instance's value isn't more accurate than the total. For distributed counters like RowsRead/RowsReturned/Files Rejected, work distribution across instances isn't fixed, any instance can process an arbitrary share, so an individual instance's value isn't a stable assertion (that's why the older tests effectively just checked that some value appeared across instances). The meaningful invariant is the total. aggregation(SUM, RowsRead): 7300 is both more meaningful and already version-agnostic; match_counter against a per-instance 2433 would assert something not guaranteed run-to-run. I.e. Instance level matching of 8 vs total match of 23 files row_regex: Files rejected: (?:8 \(8\)|total=23 \(23\)) aggregation(SUM, Files rejected): 23 So the proposal is the hybrid: aggregation(SUM) for counter checks now (covers the common case, no flexible regex); version-flexible regex kept only for genuine special cases (isolating one node's value where SUM across nodes adds noise); and a small has_counter-style helper added as a separate follow-up for those node-isolation cases. That preserves the test inventory this change already validated, rather than rewriting a thousand more tests and re-touching every converted line a second time. A full move to a thrift-based matcher now would mean reworking essentially every profile test again for no functional gain, and I'd rather not block enabling V2 on that. If reviewers feel the larger approach is required before merge, I'd like to scope it explicitly as a separate effort and get a second committer's read, happy to pull in Joe / Michael / Kurt so we converge on direction. > For non-aggregated text profiles this doesn't really need Thrift parsing, the > exact values can be parsed from text using values in (). This may be actually > a more backport friendly way to do it, as in older branches some tests may > still use beeswax client that doesn't have Thrift profile. > For aggregated profile Thrift is needed, but where we use aggregated profiles > we can also assume using HS2 protocol that has Thrift. Agreed that for V1 the exact values are in the parens, and aggregation(SUM) already uses exactly that on the V1 path, no thrift needed there. One clarification on the thrift path for V2, re your draft (https://gerrit.cloudera.org/#/c/24407/): it resolves counters from the non-aggregated thrift structure, which carries per-instance profile subtrees. In V2 those subtrees no longer exist, that's where the memory consumption / performance win comes from for the aggregated profile. The per-instance values do survive as arrays inside each TAggCounter, not as instance nodes, so matching is possible against the values list rather than an instance subtree: Fragment F00 AGGREGATION_NODE (id=1) TRuntimeProfileNode(name='AGGREGATION_NODE (id=1)', num_children=2, counters=[TCounter(name='InactiveTotalTime', unit=5, value=0), TCounter(name='PeakMemoryUsage', unit=3, value=37910656), TCounter(name='RowsReturned', unit=0, value=2000404), TCounter(name='RowsReturnedRate', unit=1, value=2428389), TCounter(name='TotalTime', unit=5, value=822274640)] [(mean values, not instance level)] ... aggregated=TAggregatedRuntimeProfileNode(num_instances=3, input_profiles=None, counters=[TAggCounter(name='InactiveTotalTime', unit=5, has_value=[True, True, True], values=[0, 0, 0]), TAggCounter(name='PeakMemoryUsage', unit=3, has_value=[True, True, True], ...), TAggCounter(name='RowsReturned', unit=0, has_value=[True, True, True], values=[...])]) In traditional profile, those per-instance values live under per-instance Fragment Instance nodes instead, and there is no aggregated array: Fragment F01 Instance 6545360911b7328e:0107004000000002 (host=surya-Precision-5560:27000) AGGREGATION_NODE (id=2) TRuntimeProfileNode(name='AGGREGATION_NODE (id=2)', num_children=2, counters=[TCounter(name='InactiveTotalTime', unit=5, value=0), TCounter(name='PeakMemoryUsage', unit=3, value=16384), TCounter(name='RowsReturned', unit=0, value=1), TCounter(name='RowsReturnedRate', unit=1, value=11), TCounter(name='TotalTime', unit=5, value=88872949)]) Two costs make me want this later rather than now. The thrift profile is a flattened node tree, so resolving a counter is a linear walk over TRuntimeProfileNodes plus a scan of each node's counter list, not faster than the current approach, which compiles all row_regex + aggregation patterns into one regex and does a single pass over the text. And mixing in thrift means fetching and searching both the text and thrift profiles per query, which is less clean and much slower than the current approach. So a thrift matcher is buildable as the has_counter follow-up, but it doesn't pay for itself for the residual handful of lines and isn't needed for the common case aggregation(SUM) already covers. > Another case that seems easily separable are .* removals like in > alter-table.test > most were removed in https://gerrit.cloudera.org/#/c/23864/ , but there are > still many around - these have nothing to do with switching to aggregated > profile AFAIK The bulk of the .* removals already went in via 23864. The ones remaining here aren't pure cleanup, they're entangled with the new total additions, e.g. row_regex: .RowsRead: 2.43K . becomes row_regex: RowsRead: (?:2.43K|total=7.30K) so the .* removal and the total change are the same edit. That's why I've kept these in this patch rather than separating them. The newline-only changes I have split out and posted ahead of this change so they can merge quickly. https://gerrit.cloudera.org/24491 -- To view, visit http://gerrit.cloudera.org:8080/23154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8 Gerrit-Change-Number: 23154 Gerrit-PatchSet: 31 Gerrit-Owner: Surya Hebbar <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aleksandr Efimov <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Mon, 22 Jun 2026 13:01:09 +0000 Gerrit-HasComments: No
