Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/24282 )
Change subject: IMPALA-14961: Query Profile Redaction ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc File be/src/service/query-profile-redaction.cc: http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@166 PS6, Line 166: for (size_t nonce = 0;; ++nonce) { We should probably add a boundary limit here (maybe 10,000) to avoid a potential infinite loop. If it exceeds the boundary, it would be good to return an error Status and fail the operation. Making the limit configurable could be nicer http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@244 PS6, Line 244: if (node.HasMember("info_strings") && node["info_strings"].IsArray()) { : for (const auto& entry : node["info_strings"].GetArray()) { : const auto [key, value] = ParseInfoStringEntry(entry); : if (key == nullptr || value == nullptr) continue; : if (strcmp(key, "Select Columns") == 0 || strcmp(key, "Where Columns") == 0 : || strcmp(key, "Join Columns") == 0) { : if (*value != '\0') contexts.push_back(value); : } else if (strcmp(key, "Analyzed query") == 0) { : if (*value != '\0') contexts.push_back(value); : } else if (strcmp(key, "Plan") == 0) { : string analyzed_query = ExtractAnalyzedQueryFromPlanText(value); : if (!analyzed_query.empty()) contexts.push_back(analyzed_query); : } : } : } : : for (auto it = node.MemberBegin(); it != node.MemberEnd(); ++it) { : auto child_contexts = CollectIdentifierContextsFromJsonProfile(it->value); : contexts.insert( : contexts.end(), std::make_move_iterator(child_contexts.begin()), : std::make_move_iterator(child_contexts.end())); : } It seems we process the info_strings node twice here. Also, calling HasMember and then node["info_strings"] before the for loop results in redundant searches. We can combine this into a single loop to save some searches and prevent the double recursion if the order doesn't matter: " for (auto it = node.MemberBegin(); it != node.MemberEnd(); ++it) { if (strcmp(it->name.GetString(), "info_strings") == 0) { if (it->value.IsArray()) { for (const auto& entry : it->value.GetArray()) { ... } } continue; } auto child_contexts = CollectIdentifierContextsFromJsonProfile(it->value); contexts.insert( contexts.end(), std::make_move_iterator(child_contexts.begin()), std::make_move_iterator(child_contexts.end())); } " http://gerrit.cloudera.org:8080/#/c/24282/6/be/src/service/query-profile-redaction.cc@405 PS6, Line 405: if (!sql_statements.empty()) { : const string& sql_statement = sql_statements.front(); : boost::algorithm::replace_all( : redacted, JsonEscapeString(sql_statement), REDACTED_SQL_STATEMENT); : } Do we expect all the 'Sql Statement' values to be exactly the same? I think we should use a for loop to replace all of them just to be safe. Also, do you think we should deduplicate the results inside CollectInfoStringValuesByKeys? -- To view, visit http://gerrit.cloudera.org:8080/24282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0c5b4911a64888f319f212155df6e08c1800b32 Gerrit-Change-Number: 24282 Gerrit-PatchSet: 6 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 14 May 2026 23:16:03 +0000 Gerrit-HasComments: Yes
