Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/24283 )
Change subject: IMPALA-14962: Query Profile Parser and Section Retrieval Interface ...................................................................... Patch Set 3: (29 comments) http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools-test.cc File be/src/service/query-profile-parsing-tools-test.cc: http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools-test.cc@57 PS3, Line 57: return Substitute("$0/$1", impala_home, TESTDATA_BASE_RELATIVE_PATH); Use the std::filesystem functions/classes instead of directly creating paths with Substitute. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h File be/src/service/query-profile-parsing-tools.h: http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@31 PS3, Line 31: struct ParsedProfile { What is the purpose of this struct? If it is for clarity, use a typedef. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@34 PS3, Line 34: : class ImpalaProfileParser { : public: : Status Parse(const std::string& profile_text, ParsedProfile* out); : : private: : rapidjson::Value ConvertJsonProfile( : const rapidjson::Value& profile, rapidjson::Document::AllocatorType& alloc); : void AddChild(rapidjson::Value& parent_obj, const char* key, rapidjson::Value& value, : rapidjson::Document::AllocatorType& alloc); : }; Since this class consists of a single public function and no class members, it is unnecessary to have a class. Declare the Parse function stand-alone with ConvertJsonProfile and AddChild marked static. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@68 PS3, Line 68: static The static keyword means something different when used in the context of a class. Unless there is a specific reason for the static keyword, remove it. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc File be/src/service/query-profile-parsing-tools.cc: http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@50 PS3, Line 50: namespace { No need for anonymous namespace, use static keyword on function declarations instead. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@55 PS3, Line 55: std:: Add using namespace std; above line 41 and remove all the "std::" prefixes. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@79 PS3, Line 79: "contents" Eliminate magic strings with a static const string declaration. For example: static const string CONTENTS_KEY = "contents". http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@100 PS3, Line 100: auto Nit: use const auto& if possible. Same comment for all other places where a const auto& iterator is not used. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@109 PS3, Line 109: profile["info_strings"].IsArray() Does there need to be an else clause that returns an error if info_strings exists but is not an array? http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@113 PS3, Line 113: Value value("", alloc); This potentially performs an unnecessary object instantiation. Instead, add an else clause to the if statement on line 114 that calls AddChild: if (entry.HasMember("value")) { AddChild(out, key, CloneValue(entry["value"], alloc), alloc); } else { AddChild(out, key, Value("", alloc), alloc); } http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@119 PS3, Line 119: profile["child_profiles"].IsArray() Does there need to be an else clause that returns an error if child_profiles exists but is not an array? http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@133 PS3, Line 133: void ImpalaProfileParser::AddChild( Nit: a comment explaining the purpose of this function would be nice to have. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@159 PS3, Line 159: "QUERY" Use the TStmtType.QUERY constant instead. Same comment on line 163. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@168 PS3, Line 168: if (query_key_.empty()) { : for (auto it = parsed_.parsed.MemberBegin(); it != parsed_.parsed.MemberEnd(); : ++it) { : const string key = it->name.GetString(); : if (key.find("Query (id=") == 0) { : query_key_ = key; : break; : } : } : } What is the purpose of this second time looping through parsed_.parsed? http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@189 PS3, Line 189: Value empty(rapidjson::kObjectType); : return empty; Directly return the object instantiation: return Value(rapidjson::kObjectType); http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@205 PS3, Line 205: auto merge_per_node_profiles = [&](Value* value) { : if (value == nullptr || !value->IsObject() || value->HasMember("Per Node Profiles")) : return; : Value pnp = GetPerNodeProfiles(alloc); : if (pnp.IsObject() && !pnp.ObjectEmpty()) { : value->AddMember("Per Node Profiles", pnp, alloc); : } : }; Declaring a separate function variable is unnecessary. Just move the code from this function to line 216. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@219 PS3, Line 219: Value empty(rapidjson::kObjectType); : return empty; Directly return the object instantiation: return Value(rapidjson::kObjectType); http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@234 PS3, Line 234: static const std::regex FID_RE(R"(F\d+)"); Move the declaration of FID_RE to the top of this file and add flag to make it run faster: static const regex FID_RE(R"(F\d+)", regex::optimize); Same comment for other places where std::regex is declared. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@266 PS3, Line 266: Value QueryProfileToolAccessor::GetFragment( This function loops through fragments three times. Is it possible to reduce the number of times that looping happens? http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@317 PS3, Line 317: Value empty(rapidjson::kObjectType); : return empty; Directly return the object instantiation: return Value(rapidjson::kObjectType); http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@409 PS3, Line 409: Value empty(rapidjson::kObjectType); : return empty; Directly return the object instantiation: return Value(rapidjson::kObjectType); http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@418 PS3, Line 418: auto add_option = [&](const string& key, const string& value) { : if (key.empty()) return; : if (obj.HasMember(key.c_str())) { : obj[key.c_str()].SetString( : value.c_str(), static_cast<rapidjson::SizeType>(value.size()), alloc); : } else { : obj.AddMember(Value(key.c_str(), alloc), Value(value.c_str(), alloc), alloc); : } : No need for a separate function variable here. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@540 PS3, Line 540: // Checks whether a key corresponds to a SCAN node. : bool QueryProfileToolAccessor::IsScanNodeKey(const string& key) { : static const std::regex SCAN_NODE_RE( : R"((^\d+:\s*SCAN\b)|(\b[A-Z_]*SCAN_NODE\b))", std::regex::icase); : return std::regex_search(key, SCAN_NODE_RE); : } Once the regex definition has been moved to the top, this function can be eliminated. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@547 PS3, Line 547: // Extracts query id from a canonical "Query (id=...)" key. : string QueryProfileToolAccessor::ExtractQueryId(const string& query_key) { : static const std::regex QUERY_ID_RE(R"(^Query \(id=([^)]+)\):?$)"); : std::smatch m; : if (std::regex_match(query_key, m, QUERY_ID_RE)) return m[1].str(); : return string(); : } Once the regex definition has been moved to the top, this function can be eliminated. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@683 PS3, Line 683: (^F\d+(?:\b|:)) The outer parentheses create an unnecessary capture group. Define the regex like this: static const regex FRAG_SHORT_RE(R"^F\d+(\b|:)", regex::optimize | regex::nosubs); http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@705 PS3, Line 705: void QueryProfileToolAccessor::GetAllFragments( Since this function is called multiple times, would it be better to get all the fragments in the constructor and store them on a class member variable? http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@853 PS3, Line 853: namespace { Don't use an anonymous namespace. Declare the function static instead. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@906 PS3, Line 906: Status CreateQueryProfileToolExecutorForProfile( Nit: This function is also part of the public API. Add a comment stating that. http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@944 PS3, Line 944: // Public API: executes one tool method against a profile and returns JSON output. : Status GetQueryProfileToolOutputForProfile(const string& profile_text, : const string& tool_name, const string& tool_args_json, string* tool_output_json) { : QueryProfileToolExecutor tool_executor; : RETURN_IF_ERROR(CreateQueryProfileToolExecutorForProfile(profile_text, &tool_executor)); : return tool_executor(tool_name, tool_args_json, tool_output_json); : } This function appears to only exist for testing purposes. Move it to query-profile-parsing-tools-test.cc. If this function is part of the public API, it's very inefficient since it parses the entire query profile each time it is called. -- To view, visit http://gerrit.cloudera.org:8080/24283 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9c9e735aa089bb243b07af421553002a465a88 Gerrit-Change-Number: 24283 Gerrit-PatchSet: 3 Gerrit-Owner: Gokul Kolady <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Gokul Kolady <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 13 May 2026 22:39:21 +0000 Gerrit-HasComments: Yes
