Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/24282 )
Change subject: IMPALA-14961: Query Profile Redaction ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc@74 PS4, Line 74: static const std::regex IP_RE( : R"(\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}" > No need to have a separate declaration for static functions. Done http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc@96 PS4, Line 96: alias_to_original.reserve(original_to_alias.size()); : for (const auto& entry : original_to_alias) { > Declare all regexs after line 49 like this: Done http://gerrit.cloudera.org:8080/#/c/24282/5/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/5/be/src/service/query-profile-redaction.cc@51 PS5, Line 51: static const std::regex HOST_WITH_PORT_RE( : R"(\b([A-Za-z][A-Za-z0-9-]*(?:\.[A-Za-z0-9-]+)*)\:(\d{2,5})\b)", regex::optimize); : // Matches the analyzed query subsection embedded in the textual plan. : static const std::regex ANALYZED_RE( : R"(Analyzed query:\s*([\s\S]*?)\n\nF\d+:PLAN FRAGMENT)", regex::optimize); : // Matches fully-qualified identifiers with at least one dot (e.g. db.tbl.col). : static const std::regex QUALIFIED_ID_RE( : R"(\b([A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)+)\b)", regex::optimize); : // Matches table tokens following FROM/JOIN clauses. : static const std::regex FROM_JOIN_TABLE_RE( : R"(\b(?:from|join)\s+([A-Za-z_][A-Za-z0-9_\.]*)\b)", : regex::optimize | regex::icase); : // Matches snake_case identifiers that are candidates for column tokens. : static const std::regex SNAKE_CASE_ID_RE( : R"(\b([A-Za-z_][A-Za-z0-9_]*_[A-Za-z0-9_]*)\b)", regex::optimize); : // Matches e-mail addresses. : static const std::regex EMAIL_RE( : R"([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})", : regex::optimize | regex::nosubs); : // Matches user/userid key-value pairs like user=alice. : static const std::regex USER_KV_RE( : R"(\b(?:user|uid)=([A-Za-z0-9._@-]+)\b)", regex::optimize | regex::icase); : // Matches IPv4 addresses. : static const std::regex IP_RE( : R"(\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}" Don't use the std:: prefix unless absolutely necessary to avoid compiler confusion. http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@106 PS6, Line 106: unordered_set<string_view> unique_views; I'm not sure what the purpose is of this intermediate unordered_set. Why not just directly emplace strings into the results vector? http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@118 PS6, Line 118: std::sort(results.begin(), results.end()); According to the comment at https://gerrit.cloudera.org/c/24282/4..5/be/src/service/query-profile-redaction.cc#b80, the results do not need to be sorted. Is this sort necessary? http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@307 PS6, Line 307: vector<string> host_tokens; Now that BuildApplyAndTrackAliases is a templatized function, there is no need for this second vector. Return seen_host_tokens instead. http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@392 PS6, Line 392: std::sort(table_tokens.begin(), table_tokens.end()); : std::sort(column_tokens.begin(), column_tokens.end()); Is sorting required? -- To view, visit http://gerrit.cloudera.org:8080/24282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32 Gerrit-Change-Number: 24282 Gerrit-PatchSet: 6 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 14 May 2026 21:27:08 +0000 Gerrit-HasComments: Yes
