Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16220 )

Change subject: IMPALA-9989 Improve admission control pool stats logging
......................................................................


Patch Set 12:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@57
PS12, Line 57: percentage
nit: fraction_of_pool_total_mem


http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@78
PS12, Line 78: reported
logged


http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@81
PS12, Line 81: dequeued due to memory exhaustion
did you mean to say that the query timed out in queue?


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.h@462
PS12, Line 462: ResetMemConsumedForAllMemTrackers
can this be moved to the test class since its a test only functionality that a 
friend class can implement


http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc@411
PS1, Line 411:
             : // Update the memory consumption related fields in pool_stats.
             : void MemTracker::UpdatePoolStatsForMemoryConsumed(
             :     int64_t mem_consumed, TPoolStats& pool_stats) {
             :   if (pool_stats.min_memory_consumed > mem_consume
> multi-line if statements need curly braces.
nit: can you address this in your next patch


http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc@568
PS1, Line 568:
             :   if (bytes_freed_by_last_gc_metric_ != NULL) {
             :     bytes_freed_by_last_gc_metric_->SetValue(pre_gc_consumption 
- curr_consumption);
             :   }
             :   return curr_consumption > max_consumption;
             : }
             :
             : // Recursively append info about this memory tracker and all its 
children to ss.
             : void MemTracker::GetAllMemTracker(std::stringstream& ss, int 
indent) {
             :   ss << std::string(indent, ' ') << " MemTracker: label=" << 
label_;
             :
             :   if (!pool_name_.empty()) {
             :     ss << ", pool_name =" << pool_name_;
             :   }
             :
             :   if (is_query_mem_tracker_) {
             :     ss << ", qid=" << PrintId(query_id_);
             :   }
             :
             :   ss << std::endl;
             :   indent += 3;
             :   for (MemTracker* child : child_trackers_) {
             :     child->GetAllMemTracker(ss, indent);
             :   }
             : }
             :
             : // Return a debug string for all memory trackers reachable from 
the root memory
             : /
> These two new methods are for debugging purpose which I found very useful.
Historically we have avoided adding methods only for debugging probably to 
avoid adding dead code.


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc@134
PS12, Line 134:     if (consumption_->current_value() !=  0) {
              :       int x = 0;
              :       x++;
              :     }
??


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc@441
PS12, Line 441: query_ids
nit: maybe rename this since this is no longer just the id


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@539
PS12, Line 539: typedef boost::unordered_map<std::string, TPoolStats> 
RemoteStatsMap;
nit: can you add a comment about what the key is


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@619
PS12, Line 619:     void DebugPoolStatsForConsumedMemory(
              :         std::stringstream& ss, const TPoolStats& stats) const;
              :     std::string DebugPoolStats(const TPoolStats& stats) const;
nit: maybe add a short method comment


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@722
PS12, Line 722: /// The list of topN queries in pool/host when this request 
could not be admitted
also add context as to under which queuing conditions this will be populated.


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@723
PS12, Line 723: topN_queries
nit: I feel like calling this topN_queries makes it confusing since those can 
be top among pool or host depending on the not_Admitted_reason. We can probably 
make this more generic by calling it not_admitted_details and write the comment 
appropriately describing the context about when this is populated. This would 
also simplify the method comment for CanAdmitRequest and 
HasAvailableMemResources.
What do you think?


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@853
PS12, Line 853: and topN_queries specifies the top 5 queries utilizing the 
memory most.
needs more context like most among what?


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@1037
PS12, Line 1037:   typedef std::tuple<int64_t, string, TUniqueId, const 
PoolStats*> Item;
nit: can you use a struct instead, that makes the code way more readable with 
var names providing the context.


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@1039
PS12, Line 1039: std::stringstream& ss,
               :       std::vector<Item>& listOfTopNs, std::vector<int>& indices
comment about what the input variables are. particularly i dont know what 
indices is unless i read the implementation.


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@265
PS12, Line 265: DebugPoolStatsForConsumedMemory
there is a lot of similarities in this with 
DebugHeavyMemoryQueriesForAPoolInHost, might be able to re-use some sections


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@320
PS12, Line 320: // Return a string reporting top 5 queries with most memory 
consumed among all
              : // pools in local host.
nit: can you add a short example of the output string


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@343
PS12, Line 343:   int items = (listOfTopNs.size() >= 5) ? 5 : 
listOfTopNs.size();
              :   // Keep first 'items' items and remove the rest.
              :   listOfTopNs.erase(listOfTopNs.begin() + items, 
listOfTopNs.end());
nit: can use resize()


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@785
PS12, Line 785: consumed_memory_details_by_pool
is this used anywhere?


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@793
PS12, Line 793: DebugTopNQueriesForAllHostsInPool
this method will give us the top memory consuming query at a particular host. 
What we need here is the top5 queries that are consuming the most memory from 
the pool.

To give a more concrete example of why this will return, if we have 3 hosts and 
2 running queries in this pool. Assume that query1 has cluster_mem_to_admit = 
10GB and query2 has cluster_mem_to_admit = 7GB, BUT currently query 2 is using 
more 3GB on host2 whereas query1 is using 2GB.

So this method would return list of [query2 > query1] whereas what we would 
have needed is [query1 > query2] and given enough queries we might lose the 
required top5 queries from the list.

So instead of looking through the remote_stats and local stats, we need to 
order the queries by cluster_mem_to_admit


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@814
PS12, Line 814: at this host
we need info about the queries on the host where this check failed not the 
local host


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@1207
PS12, Line 1207: queue_node.not_admitted_reason + " top memory consuming 
queries: " + queue_node.topN_queries)
nit: we prefer to use Substitute from google's string tools for something like 
this which is more efficient


http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/util/container-util.h
File be/src/util/container-util.h:

http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/util/container-util.h@122
PS12, Line 122:
nit: do we need other operators except the operator>() used by std::greater?



--
To view, visit http://gerrit.cloudera.org:8080/16220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
Gerrit-Change-Number: 16220
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 30 Jul 2020 03:26:11 +0000
Gerrit-HasComments: Yes

Reply via email to