Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24282 )

Change subject: IMPALA-14961: Query Profile Redaction
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc
File be/src/service/query-profile-redaction.cc:

http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@166
PS6, Line 166: for (size_t nonce = 0;; ++nonce) {
We should probably add a boundary limit here (maybe 10,000) to avoid a 
potential infinite loop. If it exceeds the boundary, it would be good to return 
an error Status and fail the operation. Making the limit configurable could be 
nicer


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@244
PS6, Line 244:   if (node.HasMember("info_strings") && 
node["info_strings"].IsArray()) {
             :     for (const auto& entry : node["info_strings"].GetArray()) {
             :       const auto [key, value] = ParseInfoStringEntry(entry);
             :       if (key == nullptr || value == nullptr) continue;
             :       if (strcmp(key, "Select Columns") == 0 || strcmp(key, 
"Where Columns") == 0
             :           || strcmp(key, "Join Columns") == 0) {
             :         if (*value != '\0') contexts.push_back(value);
             :       } else if (strcmp(key, "Analyzed query") == 0) {
             :         if (*value != '\0') contexts.push_back(value);
             :       } else if (strcmp(key, "Plan") == 0) {
             :         string analyzed_query = 
ExtractAnalyzedQueryFromPlanText(value);
             :         if (!analyzed_query.empty()) 
contexts.push_back(analyzed_query);
             :       }
             :     }
             :   }
             :
             :   for (auto it = node.MemberBegin(); it != node.MemberEnd(); 
++it) {
             :     auto child_contexts = 
CollectIdentifierContextsFromJsonProfile(it->value);
             :     contexts.insert(
             :         contexts.end(), 
std::make_move_iterator(child_contexts.begin()),
             :         std::make_move_iterator(child_contexts.end()));
             :   }
It seems we process the info_strings node twice here. Also, calling HasMember 
and then node["info_strings"] before the for loop results in redundant 
searches. We can combine this into a single loop to save some searches and 
prevent the double recursion if the order doesn't matter:
"
  for (auto it = node.MemberBegin(); it != node.MemberEnd(); ++it) {
    if (strcmp(it->name.GetString(), "info_strings") == 0) {
      if (it->value.IsArray()) {
        for (const auto& entry : it->value.GetArray()) {
          ...
        }
      }
      continue;
    }
    auto child_contexts = CollectIdentifierContextsFromJsonProfile(it->value);
    contexts.insert(
        contexts.end(), std::make_move_iterator(child_contexts.begin()),
        std::make_move_iterator(child_contexts.end()));
  }
"


http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@405
PS6, Line 405:   if (!sql_statements.empty()) {
             :     const string& sql_statement = sql_statements.front();
             :     boost::algorithm::replace_all(
             :         redacted, JsonEscapeString(sql_statement), 
REDACTED_SQL_STATEMENT);
             :   }
Do we expect all the 'Sql Statement' values to be exactly the same? I think we 
should use a for loop to replace all of them just to be safe. Also, do you 
think we should deduplicate the results inside CollectInfoStringValuesByKeys?



--
To view, visit http://gerrit.cloudera.org:8080/24282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32
Gerrit-Change-Number: 24282
Gerrit-PatchSet: 6
Gerrit-Owner: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 14 May 2026 23:16:03 +0000
Gerrit-HasComments: Yes

Reply via email to