Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16220 )
Change subject: IMPALA-9989 Improve admission control pool stats logging ...................................................................... Patch Set 12: (23 comments) http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@57 PS12, Line 57: percentage nit: fraction_of_pool_total_mem http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@78 PS12, Line 78: reported logged http://gerrit.cloudera.org:8080/#/c/16220/12//COMMIT_MSG@81 PS12, Line 81: dequeued due to memory exhaustion did you mean to say that the query timed out in queue? http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.h@462 PS12, Line 462: ResetMemConsumedForAllMemTrackers can this be moved to the test class since its a test only functionality that a friend class can implement http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc@411 PS1, Line 411: : // Update the memory consumption related fields in pool_stats. : void MemTracker::UpdatePoolStatsForMemoryConsumed( : int64_t mem_consumed, TPoolStats& pool_stats) { : if (pool_stats.min_memory_consumed > mem_consume > multi-line if statements need curly braces. nit: can you address this in your next patch http://gerrit.cloudera.org:8080/#/c/16220/1/be/src/runtime/mem-tracker.cc@568 PS1, Line 568: : if (bytes_freed_by_last_gc_metric_ != NULL) { : bytes_freed_by_last_gc_metric_->SetValue(pre_gc_consumption - curr_consumption); : } : return curr_consumption > max_consumption; : } : : // Recursively append info about this memory tracker and all its children to ss. : void MemTracker::GetAllMemTracker(std::stringstream& ss, int indent) { : ss << std::string(indent, ' ') << " MemTracker: label=" << label_; : : if (!pool_name_.empty()) { : ss << ", pool_name =" << pool_name_; : } : : if (is_query_mem_tracker_) { : ss << ", qid=" << PrintId(query_id_); : } : : ss << std::endl; : indent += 3; : for (MemTracker* child : child_trackers_) { : child->GetAllMemTracker(ss, indent); : } : } : : // Return a debug string for all memory trackers reachable from the root memory : / > These two new methods are for debugging purpose which I found very useful. Historically we have avoided adding methods only for debugging probably to avoid adding dead code. http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc@134 PS12, Line 134: if (consumption_->current_value() != 0) { : int x = 0; : x++; : } ?? http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/runtime/mem-tracker.cc@441 PS12, Line 441: query_ids nit: maybe rename this since this is no longer just the id http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@539 PS12, Line 539: typedef boost::unordered_map<std::string, TPoolStats> RemoteStatsMap; nit: can you add a comment about what the key is http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@619 PS12, Line 619: void DebugPoolStatsForConsumedMemory( : std::stringstream& ss, const TPoolStats& stats) const; : std::string DebugPoolStats(const TPoolStats& stats) const; nit: maybe add a short method comment http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@722 PS12, Line 722: /// The list of topN queries in pool/host when this request could not be admitted also add context as to under which queuing conditions this will be populated. http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@723 PS12, Line 723: topN_queries nit: I feel like calling this topN_queries makes it confusing since those can be top among pool or host depending on the not_Admitted_reason. We can probably make this more generic by calling it not_admitted_details and write the comment appropriately describing the context about when this is populated. This would also simplify the method comment for CanAdmitRequest and HasAvailableMemResources. What do you think? http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@853 PS12, Line 853: and topN_queries specifies the top 5 queries utilizing the memory most. needs more context like most among what? http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@1037 PS12, Line 1037: typedef std::tuple<int64_t, string, TUniqueId, const PoolStats*> Item; nit: can you use a struct instead, that makes the code way more readable with var names providing the context. http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.h@1039 PS12, Line 1039: std::stringstream& ss, : std::vector<Item>& listOfTopNs, std::vector<int>& indices comment about what the input variables are. particularly i dont know what indices is unless i read the implementation. http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@265 PS12, Line 265: DebugPoolStatsForConsumedMemory there is a lot of similarities in this with DebugHeavyMemoryQueriesForAPoolInHost, might be able to re-use some sections http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@320 PS12, Line 320: // Return a string reporting top 5 queries with most memory consumed among all : // pools in local host. nit: can you add a short example of the output string http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@343 PS12, Line 343: int items = (listOfTopNs.size() >= 5) ? 5 : listOfTopNs.size(); : // Keep first 'items' items and remove the rest. : listOfTopNs.erase(listOfTopNs.begin() + items, listOfTopNs.end()); nit: can use resize() http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@785 PS12, Line 785: consumed_memory_details_by_pool is this used anywhere? http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@793 PS12, Line 793: DebugTopNQueriesForAllHostsInPool this method will give us the top memory consuming query at a particular host. What we need here is the top5 queries that are consuming the most memory from the pool. To give a more concrete example of why this will return, if we have 3 hosts and 2 running queries in this pool. Assume that query1 has cluster_mem_to_admit = 10GB and query2 has cluster_mem_to_admit = 7GB, BUT currently query 2 is using more 3GB on host2 whereas query1 is using 2GB. So this method would return list of [query2 > query1] whereas what we would have needed is [query1 > query2] and given enough queries we might lose the required top5 queries from the list. So instead of looking through the remote_stats and local stats, we need to order the queries by cluster_mem_to_admit http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@814 PS12, Line 814: at this host we need info about the queries on the host where this check failed not the local host http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/scheduling/admission-controller.cc@1207 PS12, Line 1207: queue_node.not_admitted_reason + " top memory consuming queries: " + queue_node.topN_queries) nit: we prefer to use Substitute from google's string tools for something like this which is more efficient http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/util/container-util.h File be/src/util/container-util.h: http://gerrit.cloudera.org:8080/#/c/16220/12/be/src/util/container-util.h@122 PS12, Line 122: nit: do we need other operators except the operator>() used by std::greater? -- To view, visit http://gerrit.cloudera.org:8080/16220 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781 Gerrit-Change-Number: 16220 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 30 Jul 2020 03:26:11 +0000 Gerrit-HasComments: Yes
