Gokul Kolady has posted comments on this change. ( http://gerrit.cloudera.org:8080/24282 )
Change subject: IMPALA-14961: Query Profile Redaction ...................................................................... Patch Set 4: (20 comments) http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h File be/src/service/query-profile-redaction.h: http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@30 PS3, Line 30: const std::string_view& profi > Nit: declare this as const std::string_view& Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@31 PS3, Line 31: const std::string_vie > Nit: declare this as const std::string_view& Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@36 PS3, Line 36: std::map<std::string, std::string> alias_to_original_; > This map is sorted in a couple different places. Declare it as a std::map Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@50 PS3, Line 50: > This is called in a single place so you could also embed `boost::algorithm: Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@49 PS3, Line 49: static constexpr const char* REDACTED_SQL_STATEMENT = "[REDACTED_SQL_STATEMENT]"; : : // Escapes a string so it can be safely matched within JSON-serialized text. : s > Eliminate this function entirely and call boost::trim_copy directly. As wr Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@55 PS3, Line 55: writer.String(input.data(), static_cast<rapidjson::SizeType>(input.size())); > I think instead of regex search we could rely on extracting hostnames based Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@55 PS3, Line 55: ic_cast<rapidjson > Use const string_view&. Also on lines 66, 116, 130, 165, 175, 343, 433, 44 Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@70 PS3, Line 70: } : return alias_to_original; : } : : sta > We're doing 3 copies here L70, L72 and L73. One copy is unavoidable so we s Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@71 PS3, Line 71: return alias_to_original; : } : : sta > This code will not handle the situation where either the first or last (but Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@99 PS3, Line 99: ; > Use .cbegin() and .cend() here. Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@100 PS3, Line 100: vector<string> host_sections = > Since ordering is needed, use a set instead of unordered_set so that sortin Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@117 PS3, Line 117: size_t idx = 0; : for (const string& token : sorted_tokens) { > Instead of sorting a vector, can tokens be a std::multiset or std::set? Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@128 PS3, Line 128: vector<std::pair<string, string>> entries, string text) { : std::sort(entries.begin(), entries.end(), : [](const std::pair<string, string>& a, const std::pair<string, string>& b) { : return a.first.size() > b.first.size(); : }); : for (const auto& entry : entries) { : > No need for this function, call boost::algorithm::replace_all or boost::alg Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@136 PS3, Line 136: } : return text; : } : : // Applies a full alias map to a text blob. : static string ApplyAliasMap(const unordered_map<string, string>& alias_map, string text) { : return ApplySortedReplacements( : vector<std::pair<string, string>>(alias_map.cbegin(), alias_map.cend()), : > This function is only called from one place, move the code to ApplySortedRe Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@183 PS3, Line 183: if (node.IsArray()) { > return boost::algorithm::trim_copy which will enable return value optimizat Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@335 PS3, Line 335: CollectInfoStringValuesByKeys(source_json, {"Sql Statement"}); : for (const string& sql_statement : sql_statements) { : boost::algorithm::replace_all( : redacted, JsonEscapeString(sql_statement), REDAC > Declare table_set and column_set as type std::set<string> and the sorting w Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@344 PS3, Line 344: nst std::regex user_kv_re( > A shared_ptr doesn't provide value here. Pass in an unordered_map<string, Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@376 PS3, Line 376: static string UnredactTextWithAliases( : const string_view& text, const map<string, string>& alias_to_original) { : if (alias_to_original.empty()) return text; : return ApplySortedReplacements(vector<std::pai > It does not seem possible for this else case to happen. Same on lines 388, Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@441 PS3, Line 441: > Ensure this function does not get called twice by adding a DCHECK: Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@455 PS3, Line 455: > Ensure the Redact function has already been called by adding a DCHECK: Done -- To view, visit http://gerrit.cloudera.org:8080/24282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32 Gerrit-Change-Number: 24282 Gerrit-PatchSet: 4 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 13 May 2026 21:53:42 +0000 Gerrit-HasComments: Yes
