Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24283 )

Change subject: IMPALA-14962: Query Profile Parser and Section Retrieval 
Interface
......................................................................


Patch Set 3:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools-test.cc
File be/src/service/query-profile-parsing-tools-test.cc:

http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools-test.cc@57
PS3, Line 57: return Substitute("$0/$1", impala_home, 
TESTDATA_BASE_RELATIVE_PATH);
Use the std::filesystem functions/classes instead of directly creating paths 
with Substitute.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h
File be/src/service/query-profile-parsing-tools.h:

http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@31
PS3, Line 31: struct ParsedProfile {
What is the purpose of this struct?  If it is for clarity, use a typedef.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@34
PS3, Line 34:
            : class ImpalaProfileParser {
            :  public:
            :   Status Parse(const std::string& profile_text, ParsedProfile* 
out);
            :
            :  private:
            :   rapidjson::Value ConvertJsonProfile(
            :       const rapidjson::Value& profile, 
rapidjson::Document::AllocatorType& alloc);
            :   void AddChild(rapidjson::Value& parent_obj, const char* key, 
rapidjson::Value& value,
            :       rapidjson::Document::AllocatorType& alloc);
            : };
Since this class consists of a single public function and no class members, it 
is unnecessary to have a class.  Declare the Parse function stand-alone with 
ConvertJsonProfile and AddChild marked static.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.h@68
PS3, Line 68: static
The static keyword means something different when used in the context of a 
class.  Unless there is a specific reason for the static keyword, remove it.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc
File be/src/service/query-profile-parsing-tools.cc:

http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@50
PS3, Line 50: namespace {
No need for anonymous namespace, use static keyword on function declarations 
instead.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@55
PS3, Line 55: std::
Add using namespace std; above line 41 and remove all the "std::" prefixes.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@79
PS3, Line 79: "contents"
Eliminate magic strings with a static const string declaration.  For example:
static const string CONTENTS_KEY = "contents".


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@100
PS3, Line 100: auto
Nit:  use const auto& if possible.  Same comment for all other places where a 
const auto& iterator is not used.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@109
PS3, Line 109: profile["info_strings"].IsArray()
Does there need to be an else clause that returns an error if info_strings 
exists but is not an array?


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@113
PS3, Line 113:       Value value("", alloc);
This potentially performs an unnecessary object instantiation.  Instead, add an 
else clause to the if statement on line 114 that calls AddChild:

if (entry.HasMember("value")) {
  AddChild(out, key, CloneValue(entry["value"], alloc), alloc);
} else {
  AddChild(out, key, Value("", alloc), alloc);
}


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@119
PS3, Line 119: profile["child_profiles"].IsArray()
Does there need to be an else clause that returns an error if child_profiles 
exists but is not an array?


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@133
PS3, Line 133: void ImpalaProfileParser::AddChild(
Nit: a comment explaining the purpose of this function would be nice to have.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@159
PS3, Line 159: "QUERY"
Use the TStmtType.QUERY constant instead.  Same comment on line 163.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@168
PS3, Line 168:     if (query_key_.empty()) {
             :       for (auto it = parsed_.parsed.MemberBegin(); it != 
parsed_.parsed.MemberEnd();
             :            ++it) {
             :         const string key = it->name.GetString();
             :         if (key.find("Query (id=") == 0) {
             :           query_key_ = key;
             :           break;
             :         }
             :       }
             :     }
What is the purpose of this second time looping through parsed_.parsed?


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@189
PS3, Line 189:     Value empty(rapidjson::kObjectType);
             :     return empty;
Directly return the object instantiation:  return Value(rapidjson::kObjectType);


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@205
PS3, Line 205:     auto merge_per_node_profiles = [&](Value* value) {
             :       if (value == nullptr || !value->IsObject() || 
value->HasMember("Per Node Profiles"))
             :         return;
             :       Value pnp = GetPerNodeProfiles(alloc);
             :       if (pnp.IsObject() && !pnp.ObjectEmpty()) {
             :         value->AddMember("Per Node Profiles", pnp, alloc);
             :       }
             :     };
Declaring a separate function variable is unnecessary.  Just move the code from 
this function to line 216.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@219
PS3, Line 219:     Value empty(rapidjson::kObjectType);
             :     return empty;
Directly return the object instantiation:  return Value(rapidjson::kObjectType);


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@234
PS3, Line 234:       static const std::regex FID_RE(R"(F\d+)");
Move the declaration of FID_RE to the top of this file and add flag to make it 
run faster:
static const regex FID_RE(R"(F\d+)", regex::optimize);

Same comment for other places where std::regex is declared.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@266
PS3, Line 266: Value QueryProfileToolAccessor::GetFragment(
This function loops through fragments three times.  Is it possible to reduce 
the number of times that looping happens?


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@317
PS3, Line 317:       Value empty(rapidjson::kObjectType);
             :       return empty;
Directly return the object instantiation:  return Value(rapidjson::kObjectType);


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@409
PS3, Line 409:     Value empty(rapidjson::kObjectType);
             :     return empty;
Directly return the object instantiation:  return Value(rapidjson::kObjectType);


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@418
PS3, Line 418:     auto add_option = [&](const string& key, const string& 
value) {
             :       if (key.empty()) return;
             :       if (obj.HasMember(key.c_str())) {
             :         obj[key.c_str()].SetString(
             :             value.c_str(), 
static_cast<rapidjson::SizeType>(value.size()), alloc);
             :       } else {
             :         obj.AddMember(Value(key.c_str(), alloc), 
Value(value.c_str(), alloc), alloc);
             :       }
             :
No need for a separate function variable here.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@540
PS3, Line 540: // Checks whether a key corresponds to a SCAN node.
             : bool QueryProfileToolAccessor::IsScanNodeKey(const string& key) {
             :     static const std::regex SCAN_NODE_RE(
             :         R"((^\d+:\s*SCAN\b)|(\b[A-Z_]*SCAN_NODE\b))", 
std::regex::icase);
             :     return std::regex_search(key, SCAN_NODE_RE);
             : }
Once the regex definition has been moved to the top, this function can be 
eliminated.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@547
PS3, Line 547: // Extracts query id from a canonical "Query (id=...)" key.
             : string QueryProfileToolAccessor::ExtractQueryId(const string& 
query_key) {
             :     static const std::regex QUERY_ID_RE(R"(^Query 
\(id=([^)]+)\):?$)");
             :     std::smatch m;
             :     if (std::regex_match(query_key, m, QUERY_ID_RE)) return 
m[1].str();
             :     return string();
             : }
Once the regex definition has been moved to the top, this function can be 
eliminated.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@683
PS3, Line 683: (^F\d+(?:\b|:))
The outer parentheses create an unnecessary capture group.  Define the regex 
like this:

static const regex FRAG_SHORT_RE(R"^F\d+(\b|:)", regex::optimize | 
regex::nosubs);


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@705
PS3, Line 705: void QueryProfileToolAccessor::GetAllFragments(
Since this function is called multiple times, would it be better to get all the 
fragments in the constructor and store them on a class member variable?


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@853
PS3, Line 853: namespace {
Don't use an anonymous namespace.  Declare the function static instead.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@906
PS3, Line 906: Status CreateQueryProfileToolExecutorForProfile(
Nit: This function is also part of the public API.  Add a comment stating that.


http://gerrit.cloudera.org:8080/#/c/24283/3/be/src/service/query-profile-parsing-tools.cc@944
PS3, Line 944: // Public API: executes one tool method against a profile and 
returns JSON output.
             : Status GetQueryProfileToolOutputForProfile(const string& 
profile_text,
             :     const string& tool_name, const string& tool_args_json, 
string* tool_output_json) {
             :   QueryProfileToolExecutor tool_executor;
             :   
RETURN_IF_ERROR(CreateQueryProfileToolExecutorForProfile(profile_text, 
&tool_executor));
             :   return tool_executor(tool_name, tool_args_json, 
tool_output_json);
             : }
This function appears to only exist for testing purposes.  Move it to 
query-profile-parsing-tools-test.cc.

If this function is part of the public API, it's very inefficient since it 
parses the entire query profile each time it is called.



--
To view, visit http://gerrit.cloudera.org:8080/24283
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9c9e735aa089bb243b07af421553002a465a88
Gerrit-Change-Number: 24283
Gerrit-PatchSet: 3
Gerrit-Owner: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Gokul Kolady <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 13 May 2026 22:39:21 +0000
Gerrit-HasComments: Yes

Reply via email to