Qifan Chen has uploaded a new patch set (#36). ( http://gerrit.cloudera.org:8080/16220 )
Change subject: IMPALA-9989 Improve admission control pool stats logging ...................................................................... IMPALA-9989 Improve admission control pool stats logging This work addresses the current limitation in admission controller by appending the last known memory consumption statistics about a pool to the existing memory exhaustion message. The statistics is logged in impalad.INFO when a query is queued or timed out due to memory pressure in the pool or on the host. The statistics can also be part of the query profile. The BNF of the new memory consumption statistics is as follows. topN_query_stats ::= queries: a list of query Ids and memory consumed for up to 5 queries with top memory consumptions total_consumed: total memory consumed by these topN queries fraction_of_pool_total_mem: total memory consumed divided by pool memory usage (if feasible to report) all_query_stats ::= num_running: the total number of queries running min: the minimal memory consumption of all running queries max: the maximal memory consumption of all running queries pool_total_mem: the total memory consumption of all running queries average: the average memory consumption of all running queries (if feasible to report) pool_stats ::= <pool_name> ":" <topN_query_stats> <all_query_stats> stats_on_host ::= "Stats for host " <host> List of <pool_stats> aggregated_pool_stats ::= "Aggregated stats for pool " <pool_name> <topN_query_stats> memory_consumption_statistics ::= <stats_on_host> | <aggregated_pool_stats> The stats_on_host describes memory consumption for every pool on a host and is useful in analyzing memory exhaustion on that host. The aggregated_pool_stats describes the aggregated memory consumption on all hosts for a pool for a set of queries and is useful in analyzing memory exhaustion in that pool. Example of stats_on_host for pool root.queueB and root.queueC on host host1:25000. Stats for host host1:25000 pool_name=root.queueB: topN_query_stats: queries=[ id=0000000000000001:0000000000000004, consumed=20.00 MB, id=0000000000000001:0000000000000003, consumed=19.00 MB, id=0000000000000001:0000000000000002, consumed=8.00 MB ], total_consumed=47.00 MB fraction_of_pool_total_mem=0.47 all_query_stats: num_running=4, min=5.00 MB, max=20.00 MB, pool_total_mem=100.00 MB, average=25.00 MB pool_name=root.queueC: topN_query_stats: queries=[ id=0000000000000002:0000000000000000, consumed=18.00 MB, id=0000000000000002:0000000000000001, consumed=12.00 MB ], total_consumed=30.00 MB fraction_of_pool_total_mem=0.06 all_query_stats: num_running=40, min=10.00 MB, max=200.00 MB, pool_total_mem=500.00 MB, average=12.50 MB Example of aggregated_pool_stats over all hosts for pool root.queueC: Aggregated stats for pool root.queueC: topN_query_stats: queries=[ id=0000000000000002:0000000000000001, consumed=32.00 MB, id=0000000000000002:0000000000000004, consumed=26.00 MB, id=0000000000000002:0000000000000000, consumed=21.00 MB, id=0000000000000002:0000000000000002, consumed=17.00 MB, id=0000000000000002:000000000000000e, consumed=9.00 MB ], total_consumed=105.00 MB fraction_of_pool_total_mem=0.82 When a query request is queued due to memory exhaustion, the above memory_consumption_statistics is logged when the logging is set at level 2 or higher. When a query request is timed out due to memory exhaustion, the above memory_consumption_statistics is reported when the logging is set at level 1 or higher. Testing: 1. Added a new test TopNQueryCheck in admission-controller-test.cc to verify that the topN query memory consumption details are reported correctly. 2. Add two new tests in test_admission_controller.py to simulate queries being queued and then timed out due to pool or host memory pressure. 3. Added a new test TopN in mem-tracker-test.cc to verify that the topN query memory consumption details are computed correctly from a mem tracker hierarchy. 4. Ran Core tests successfully. Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781 --- M be/src/runtime/mem-tracker-test.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/util/container-util.h M common/thrift/StatestoreService.thrift M common/thrift/generate_error_codes.py M tests/custom_cluster/test_admission_controller.py 10 files changed, 914 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16220/36 -- To view, visit http://gerrit.cloudera.org:8080/16220 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781 Gerrit-Change-Number: 16220 Gerrit-PatchSet: 36 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>