Aleksandr Efimov has posted comments on this change. ( http://gerrit.cloudera.org:8080/24429 )
Change subject: IMPALA-15090: Make Query Profile Redaction DOM to DOM ...................................................................... Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/24429/5/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24429/5/be/src/service/query-profile-redaction.cc@404 PS5, Line 404: if (value.IsDouble()) return std::to_string(value.GetDouble()).size(); This does not match the serialization that the size limit is meant to bound. std::to_string(double) uses fixed six-decimal formatting, while RapidJSON Writer emits enough digits for the JSON number. Real profile JSON already has DOUBLE_VALUE counters such as 3.1631581491853288 and 0.00013492935534459787; this estimator would count those as "3.163158" / "0.000135". Since this value gates both RedactSourceJson() and the later CreateQueryProfileToolExecutorForProfile() call, the DOM path can accept a profile whose actual serialized JSON is over the configured limit. Could we use RapidJSON Writer with a counting output stream instead? That would avoid materializing the string while preserving JsonToString()'s exact serialization rules. Please also add double/exponent coverage to the estimator test. http://gerrit.cloudera.org:8080/#/c/24429/5/be/src/service/query-profile-redaction.cc@561 PS5, Line 561: json_value->SetString( This looks like it can keep a second copy of almost every string/key in the redacted DOM. RedactSourceJson() first CopyFrom()s the full source DOM, then this path calls SetString() for every string value and every member name even when ApplyAliasMap() made no replacement. With RapidJSON's pool allocator the previous string storage from the CopyFrom() is not reclaimed per value, so unchanged strings/keys are retained and copied again. On the tpcds_72 profile fixture this recursive walk would touch ~23.8K string values and ~40.6K member names once the alias map is non-empty. Could we skip SetString() when the text is unchanged, and ideally precompute the sorted replacement entries once instead of rebuilding/sorting them inside every ApplyAliasMap() call? -- To view, visit http://gerrit.cloudera.org:8080/24429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8b8ad721a5531e9d257cf6e54b83ea3ee9734039 Gerrit-Change-Number: 24429 Gerrit-PatchSet: 5 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aleksandr Efimov <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Fri, 19 Jun 2026 14:43:41 +0000 Gerrit-HasComments: Yes
