Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24282 )

Change subject: IMPALA-14961: Query Profile Redaction
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc@74
PS4, Line 74: static const std::regex IP_RE(
            :     R"(\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}"
> No need to have a separate declaration for static functions.
Done


http://gerrit.cloudera.org:8080/#/c/24282/4/be/src/service/query-profile-redaction.cc@96
PS4, Line 96:   alias_to_original.reserve(original_to_alias.size());
            :   for (const auto& entry : original_to_alias) {
> Declare all regexs after line 49 like this:
Done


http://gerrit.cloudera.org:8080/#/c/24282/5/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/5/be/src/service/query-profile-redaction.cc@51
PS5, Line 51: static const std::regex HOST_WITH_PORT_RE(
            :     
R"(\b([A-Za-z][A-Za-z0-9-]*(?:\.[A-Za-z0-9-]+)*)\:(\d{2,5})\b)", 
regex::optimize);
            : // Matches the analyzed query subsection embedded in the textual 
plan.
            : static const std::regex ANALYZED_RE(
            :     R"(Analyzed query:\s*([\s\S]*?)\n\nF\d+:PLAN FRAGMENT)", 
regex::optimize);
            : // Matches fully-qualified identifiers with at least one dot 
(e.g. db.tbl.col).
            : static const std::regex QUALIFIED_ID_RE(
            :     
R"(\b([A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)+)\b)", 
regex::optimize);
            : // Matches table tokens following FROM/JOIN clauses.
            : static const std::regex FROM_JOIN_TABLE_RE(
            :     R"(\b(?:from|join)\s+([A-Za-z_][A-Za-z0-9_\.]*)\b)",
            :     regex::optimize | regex::icase);
            : // Matches snake_case identifiers that are candidates for column 
tokens.
            : static const std::regex SNAKE_CASE_ID_RE(
            :     R"(\b([A-Za-z_][A-Za-z0-9_]*_[A-Za-z0-9_]*)\b)", 
regex::optimize);
            : // Matches e-mail addresses.
            : static const std::regex EMAIL_RE(
            :     R"([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})",
            :     regex::optimize | regex::nosubs);
            : // Matches user/userid key-value pairs like user=alice.
            : static const std::regex USER_KV_RE(
            :     R"(\b(?:user|uid)=([A-Za-z0-9._@-]+)\b)", regex::optimize | 
regex::icase);
            : // Matches IPv4 addresses.
            : static const std::regex IP_RE(
            :     R"(\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}"
Don't use the std:: prefix unless absolutely necessary to avoid compiler 
confusion.


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@106
PS6, Line 106:   unordered_set<string_view> unique_views;
I'm not sure what the purpose is of this intermediate unordered_set.  Why not 
just directly emplace strings into the results vector?


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@118
PS6, Line 118:   std::sort(results.begin(), results.end());
According to the comment at 
https://gerrit.cloudera.org/c/24282/4..5/be/src/service/query-profile-redaction.cc#b80,
 the results do not need to be sorted.  Is this sort necessary?


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@307
PS6, Line 307:   vector<string> host_tokens;
Now that BuildApplyAndTrackAliases is a templatized function, there is no need 
for this second vector.  Return seen_host_tokens instead.


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@392
PS6, Line 392:   std::sort(table_tokens.begin(), table_tokens.end());
             :   std::sort(column_tokens.begin(), column_tokens.end());
Is sorting required?



--
To view, visit http://gerrit.cloudera.org:8080/24282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32
Gerrit-Change-Number: 24282
Gerrit-PatchSet: 6
Gerrit-Owner: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 14 May 2026 21:27:08 +0000
Gerrit-HasComments: Yes

Reply via email to