Gokul Kolady has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24282 )

Change subject: IMPALA-14961: Query Profile Redaction
......................................................................


Patch Set 4:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h
File be/src/service/query-profile-redaction.h:

http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@30
PS3, Line 30: const std::string_view& profi
> Nit: declare this as const std::string_view&
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@31
PS3, Line 31: const std::string_vie
> Nit: declare this as const std::string_view&
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@36
PS3, Line 36:   std::map<std::string, std::string> alias_to_original_;
> This map is sorted in a couple different places.  Declare it as a std::map
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@50
PS3, Line 50:
> This is called in a single place so you could also embed `boost::algorithm:
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@49
PS3, Line 49: static constexpr const char* REDACTED_SQL_STATEMENT = 
"[REDACTED_SQL_STATEMENT]";
            :
            : // Escapes a string so it can be safely matched within 
JSON-serialized text.
            : s
> Eliminate this function entirely and call boost::trim_copy directly.  As wr
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@55
PS3, Line 55:   writer.String(input.data(), 
static_cast<rapidjson::SizeType>(input.size()));
> I think instead of regex search we could rely on extracting hostnames based
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@55
PS3, Line 55: ic_cast<rapidjson
> Use const string_view&.  Also on lines 66, 116, 130, 165, 175, 343, 433, 44
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@70
PS3, Line 70: }
            :   return alias_to_original;
            : }
            :
            : sta
> We're doing 3 copies here L70, L72 and L73. One copy is unavoidable so we s
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@71
PS3, Line 71:   return alias_to_original;
            : }
            :
            : sta
> This code will not handle the situation where either the first or last (but
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@99
PS3, Line 99: ;
> Use .cbegin() and .cend() here.
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@100
PS3, Line 100:   vector<string> host_sections =
> Since ordering is needed, use a set instead of unordered_set so that sortin
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@117
PS3, Line 117:   size_t idx = 0;
             :   for (const string& token : sorted_tokens) {
> Instead of sorting a vector, can tokens be a std::multiset or std::set?
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@128
PS3, Line 128:     vector<std::pair<string, string>> entries, string text) {
             :   std::sort(entries.begin(), entries.end(),
             :       [](const std::pair<string, string>& a, const 
std::pair<string, string>& b) {
             :         return a.first.size() > b.first.size();
             :       });
             :   for (const auto& entry : entries) {
             :
> No need for this function, call boost::algorithm::replace_all or boost::alg
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@136
PS3, Line 136:   }
             :   return text;
             : }
             :
             : // Applies a full alias map to a text blob.
             : static string ApplyAliasMap(const unordered_map<string, string>& 
alias_map, string text) {
             :   return ApplySortedReplacements(
             :       vector<std::pair<string, string>>(alias_map.cbegin(), 
alias_map.cend()),
             :
> This function is only called from one place, move the code to ApplySortedRe
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@183
PS3, Line 183:   if (node.IsArray()) {
> return boost::algorithm::trim_copy which will enable return value optimizat
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@335
PS3, Line 335:       CollectInfoStringValuesByKeys(source_json, {"Sql 
Statement"});
             :   for (const string& sql_statement : sql_statements) {
             :     boost::algorithm::replace_all(
             :         redacted, JsonEscapeString(sql_statement), REDAC
> Declare table_set and column_set as type std::set<string> and the sorting w
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@344
PS3, Line 344: nst std::regex user_kv_re(
> A shared_ptr doesn't provide value here.  Pass in an unordered_map<string,
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@376
PS3, Line 376: static string UnredactTextWithAliases(
             :     const string_view& text, const map<string, string>& 
alias_to_original) {
             :   if (alias_to_original.empty()) return text;
             :   return ApplySortedReplacements(vector<std::pai
> It does not seem possible for this else case to happen.  Same on lines 388,
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@441
PS3, Line 441:
> Ensure this function does not get called twice by adding a DCHECK:
Done


http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@455
PS3, Line 455:
> Ensure the Redact function has already been called by adding a DCHECK:
Done



--
To view, visit http://gerrit.cloudera.org:8080/24282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32
Gerrit-Change-Number: 24282
Gerrit-PatchSet: 4
Gerrit-Owner: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 13 May 2026 21:53:42 +0000
Gerrit-HasComments: Yes

Reply via email to