Abhishek Rawat has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24282 )

Change subject: IMPALA-14961: Query Profile Redaction
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@74
PS6, Line 74: static const std::regex IP_RE(
We should probably redact IPV6 addresses also.


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@50
PS6, Line 50: // Matches hostnames followed by a port (e.g. 
coordinator.example.com:22000).
            : static const std::regex HOST_WITH_PORT_RE(
            :     
R"(\b([A-Za-z][A-Za-z0-9-]*(?:\.[A-Za-z0-9-]+)*)\:(\d{2,5})\b)", 
regex::optimize);
            : // Matches the analyzed query subsection embedded in the textual 
plan.
            : static const std::regex ANALYZED_RE(
            :     R"(Analyzed query:\s*([\s\S]*?)\n\nF\d+:PLAN FRAGMENT)", 
regex::optimize);
            : // Matches fully-qualified identifiers with at least one dot 
(e.g. db.tbl.col).
            : static const std::regex QUALIFIED_ID_RE(
            :     
R"(\b([A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)+)\b)", 
regex::optimize);
            : // Matches table tokens following FROM/JOIN clauses.
            : static const std::regex FROM_JOIN_TABLE_RE(
            :     R"(\b(?:from|join)\s+([A-Za-z_][A-Za-z0-9_\.]*)\b)",
            :     regex::optimize | regex::icase);
            : // Matches snake_case identifiers that are candidates for column 
tokens.
            : static const std::regex SNAKE_CASE_ID_RE(
            :     R"(\b([A-Za-z_][A-Za-z0-9_]*_[A-Za-z0-9_]*)\b)", 
regex::optimize);
            : // Matches e-mail addresses.
            : static const std::regex EMAIL_RE(
            :     R"([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})",
            :     regex::optimize | regex::nosubs);
            : // Matches user/userid key-value pairs like user=alice.
            : static const std::regex USER_KV_RE(
            :     R"(\b(?:user|uid)=([A-Za-z0-9._@-]+)\b)", regex::optimize | 
regex::icase);
            : // Matches IPv4 addresses.
            : static const std::regex IP_RE(
            :     R"(\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}"
            :     R"((?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b)",
            :     regex::optimize | regex::nosubs);
unit tests for theses regexes would be good.


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@98
PS6, Line 98:     alias_to_original[entry.second] = entry.first;
we guarantee aliases are unique?


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@118
PS6, Line 118:   std::sort(results.begin(), results.end());
I think I missed why we need to sort the matched results?


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@126
PS6, Line 126:   unordered_set<string> seen_tokens;
seen_tokens is unecessary as you can look up in the alias_map.


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@132
PS6, Line 132: std::stringstream alias;
             :     alias << prefix << "_" << std::setfill('0') << std::setw(3) 
<< (++idx);
We could avoid stringstream altogether and do string manipulation using 
append() ?


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@134
PS6, Line 134:     alias_map[token] = alias.str();
when possible use emplace.



--
To view, visit http://gerrit.cloudera.org:8080/24282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32
Gerrit-Change-Number: 24282
Gerrit-PatchSet: 6
Gerrit-Owner: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 14 May 2026 22:25:08 +0000
Gerrit-HasComments: Yes

Reply via email to