Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21683 )

Change subject: IMPALA-13304: Include aggregate instance-level metrics within 
experimental profile(V2)
......................................................................


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/21683/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21683/2//COMMIT_MSG@80
PS2, Line 80: }
            :
            : When no. of instances <= 5 -
> Once the JSON's structure is refined after some reviews, I will add the tes
Done


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h@940
PS6, Line 940:   typedef std::map<std::string, AggEventSequence> 
EventSequenceMap;
             :   EventSequenceMap event_sequence_map_;
This will duplicate EventSequenceMap typedef in L786. Maybe rename this into 
AggEventSequenceMap ?
Please also put comment what the key is, both here and for typedef in L786.


http://gerrit.cloudera.org:8080/#/c/21683/2/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/21683/2/be/src/util/runtime-profile.h@940
PS2, Line 940: typedef std::map<std::string, AggEventSequence> EventSequenceMap;
             :   EventSequenceMap event_sequence_map_;
> I had done it for consistency. I will remove it in the next patchset, if no
Ack


http://gerrit.cloudera.org:8080/#/c/21683/2/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/21683/2/be/src/util/runtime-profile.cc@2927
PS2, Line 2927:   // If number of instances is more than 5, aggregate of event 
timestamps (i.e. Average,
              :   // Minimum and Maximum) is included after bucketing the 
instance timestamps into
              :   // 5 divisions each spanning 20%.
> To be consistent with the 'ToJson' method for the traditional profile above
Ack


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2952
PS6, Line 2952:     vector<vector<vector<int64_t>>> timestamps_vec;
              :     vector<vector<Value>> labels_vec;
Please put comment what each index in vector represent.
For example, timestamps_vec is a 3-level nested vector. What does each level 
represent? Maybe like this:

    // Vector of timestamps addressed as 
[unique_event_sequence][instances][event_timestamps]
    vector<vector<vector<int64_t>>> timestamps_vec;
    // Vector of labels addressed as [unique_event_sequence][event_labels]
    vector<vector<Value>> labels_vec;


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2968
PS6, Line 2968:           
timestamps_vec.emplace_back(event_sequence_contents.timestamps);
This does not seem to consult against event_sequence_contents.label_idxs, so 
you might get different ordering vs labels_vec?


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2985
PS6, Line 2985: if (timestamps_vec[es_idx].size() > BUCKET_SIZE) {
Please put comment right after this line describing what this big branch does. 
Something like this maybe:
"This is a big query profile with many AggEventSequence. Aggregate them into a 
bucket of 5."

nit: might be worth it to contain the branch content into its own function.


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2988
PS6, Line 2988:             int64_t min = timestamps_vec[es_idx][0][event_idx];
              :             int64_t max = min
Please rename to min_ts and max_ts to avoid confusion with min and max function.


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@3021
PS6, Line 3021:               if (inst_count_list[division_idx] == 0) {
Put comment on why this branch is needed. I assume this is to anticipate for 
missing events/timestamps?


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@3034
PS6, Line 3034:             Value event(kObjectType);
              :             event.AddMember("label", 
labels_vec[es_idx][event_idx], allocator);
Move this below, after L3040.


http://gerrit.cloudera.org:8080/#/c/21683/6/tests/observability/test_profile_tool.py
File tests/observability/test_profile_tool.py:

http://gerrit.cloudera.org:8080/#/c/21683/6/tests/observability/test_profile_tool.py@84
PS6, Line 84: )
> flake8: E501 line too long (91 > 90 characters)
You can replace the string with variable to workaround the "line to long" issue.



--
To view, visit http://gerrit.cloudera.org:8080/21683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Gerrit-Change-Number: 21683
Gerrit-PatchSet: 6
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Wed, 16 Oct 2024 18:42:58 +0000
Gerrit-HasComments: Yes

Reply via email to