Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/24282 )
Change subject: IMPALA-14961: Query Profile Redaction ...................................................................... Patch Set 3: (30 comments) http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h File be/src/service/query-profile-redaction.h: http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@30 PS3, Line 30: std::string_view profile_text Nit: declare this as const std::string_view& http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.h@31 PS3, Line 31: std::string_view text Nit: declare this as const std::string_view& http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.h File be/src/service/query-profile-redaction.h: http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.h@25 PS2, Line 25: > It may be cleaner to define a class that encompasses these functions. Then Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.h@27 PS2, Line 27: > use std::string, same comment for lines 28 and 30 Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.h@28 PS2, Line 28: s QueryProfileRedactor { > If possible, use std::string_view instead of const string&. Same comment f Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.h@37 PS2, Line 37: }; > What is the need for this function? Can't the original profile be used ins This function performs partial unredaction so the results from the LLM show the original values from the query profile. http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@35 PS2, Line 35: #include <rapidjson/document.h> > add using namespace std; Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@46 PS2, Line 46: > General style feedback: Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@50 PS2, Line 50: static string Trim(string_view s) { : return boost::algorithm::trim_copy(string(s)); : } : : // Heuristically checks whether a token looks like a hostname-like identifier. : static bool IsLikelyHostnameToken(string_view token) { : if (token.empty()) return false; : > use boost::trim instead of writing a trim function Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@62 PS2, Line 62: return false; : } : > No need for these variables, just return true from the if statements on lin Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@73 PS2, Line 73: escaped = boost::algorithm::erase_last_copy(escaped, "\""); > Typically escaping strings means to add a backslash before control characte Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@79 PS2, Line 79: static unordered_map<string, string> BuildAliasReverseMap( > No, rapidjson::Writer::String(...) escapes embedded quotes inside the paylo Ack http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@87 PS2, Line 87: > Return a new unordered_map instead of having a pointer passed in. Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@97 PS2, Line 97: if (!token.empty()) unique.insert(token); > Unless ordering is needed, use unordered_set instead of set. Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@100 PS2, Line 100: in(), results.en > Shouldn't need this static_cast. May need to define the type of group_inde Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@104 PS2, Line 104: // Adds hostname-like regex matches to the provided deduplicated token set. > We use a vector here in order to maintain consistent ordering for matches f Ack http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@109 PS2, Line 109: ameToken(token)) host_to > Avoid raw pointers as function inputs in favor of returning an unordered_se Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@121 PS2, Line 121: std::stringstream alias; > Could use for (int idx=0; idx < tokens.size(); idx++) instead of declaring Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@129 PS2, Line 129: static string ReplaceAllOccurrences( : string_view token, string_view replacement, string text) { : if (token.empty()) return text; : boost::algorithm::replace_all(text, string(token), string(replacement)); : return text; : } : : // Sorts replacement entries by descending token length to avoid partial overlap issues. : s > Instead of a custom function, use boost functions to replace all occurrence Done http://gerrit.cloudera.org:8080/#/c/24282/2/be/src/service/query-profile-redaction.cc@289 PS2, Line 289: tor it(context.c > Avoid this static cast, define i with a type of size_t Done http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@49 PS3, Line 49: // Trims leading and trailing whitespace from input text. : static string Trim(string_view s) { : return boost::algorithm::trim_copy(string(s)); : } Eliminate this function entirely and call boost::trim_copy directly. As written now, it makes a copy of the underlying string from the string_view. This duplicate copy is unnecessary. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@71 PS3, Line 71: if (escaped.size() >= 2 && escaped.front() == '"' && escaped.back() == '"') { : escaped = boost::algorithm::erase_first_copy(escaped, "\""); : escaped = boost::algorithm::erase_last_copy(escaped, "\""); : } This code will not handle the situation where either the first or last (but not both) double quotes is missing. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@99 PS3, Line 99: unique.begin(), unique.end() Use .cbegin() and .cend() here. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@100 PS3, Line 100: std::sort(results.begin(), results.end()); Since ordering is needed, use a set instead of unordered_set so that sorting happens automatically at insert time. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@117 PS3, Line 117: vector<string> sorted_tokens = tokens; : std::sort(sorted_tokens.begin(), sorted_tokens.end()); Instead of sorting a vector, can tokens be a std::multiset or std::set? http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@128 PS3, Line 128: // Replaces every occurrence of a token with a replacement string. : static string ReplaceAllOccurrences( : string_view token, string_view replacement, string text) { : if (token.empty()) return text; : boost::algorithm::replace_all(text, string(token), string(replacement)); : return text; : } No need for this function, call boost::algorithm::replace_all or boost::algorithm::replace_all_copy directly instead. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@183 PS3, Line 183: return Trim(m[1].str()); return boost::algorithm::trim_copy which will enable return value optimization to avoid an unnecessary string create. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@335 PS3, Line 335: vector<string> table_tokens(table_set.cbegin(), table_set.cend()); : vector<string> column_tokens(column_set.cbegin(), column_set.cend()); : std::sort(table_tokens.begin(), table_tokens.end()); : std::sort(column_tokens.begin(), column_tokens.end()); Declare table_set and column_set as type std::set<string> and the sorting will happen automatically at insert time. http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@441 PS3, Line 441: Status QueryProfileRedactor::Redact(string_view profile_text) { Ensure this function does not get called twice by adding a DCHECK: DCHECK(redacted_profile_text_.empty()) << "Cannot call Redact function more than once"; http://gerrit.cloudera.org:8080/#/c/24282/3/be/src/service/query-profile-redaction.cc@455 PS3, Line 455: return UnredactTextWithAliases(text, alias_to_original_); Ensure the Redact function has already been called by adding a DCHECK: DCHECK(!redacted_profile_text_.empty()) << "Redact function has not been called, no profile to unredact"; -- To view, visit http://gerrit.cloudera.org:8080/24282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32 Gerrit-Change-Number: 24282 Gerrit-PatchSet: 3 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 13 May 2026 19:09:30 +0000 Gerrit-HasComments: Yes
