Qifan Chen has uploaded a new patch set (#37). ( 
http://gerrit.cloudera.org:8080/16220 )

Change subject: IMPALA-9989 Improve admission control pool stats logging
......................................................................

IMPALA-9989 Improve admission control pool stats logging

This work addresses the current limitation in admission controller by
appending the last known memory consumption statistics about a pool
to the existing memory exhaustion message. The statistics is
logged in impalad.INFO when a query is queued or timed out due to
memory pressure in the pool or on the host. The statistics can also be
part of the query profile.

The BNF of the new memory consumption statistics is as follows.

  topN_query_stats ::=
    queries: a list of query Ids and memory consumed for up to 5 queries
             with top memory consumptions
    total_consumed: total memory consumed by these topN queries
    fraction_of_pool_total_mem: total memory consumed divided
                                      by pool memory usage (if
                                      feasible to report)

  all_query_stats ::=
    num_running: the total number of queries running
    min: the minimal memory consumption of all running queries
    max: the maximal memory consumption of all running queries
    pool_total_mem: the total memory consumption of all running queries
    average: the average memory consumption of all running queries
             (if feasible to report)

  pool_stats ::=
             <pool_name> ":"
             <topN_query_stats>
             <all_query_stats>

  stats_on_host ::=
     "Stats for host " <host>
     List of <pool_stats>

  aggregated_pool_stats ::=
        "Aggregated stats for pool " <pool_name>
        <topN_query_stats>

  memory_consumption_statistics ::=
             <stats_on_host> | <aggregated_pool_stats>

The stats_on_host describes memory consumption for every pool on
a host and is useful in analyzing memory exhaustion on that host.
The aggregated_pool_stats describes the aggregated memory consumption
on all hosts for a pool for a set of queries and is useful in analyzing
memory exhaustion in that pool.

Example of stats_on_host for pool root.queueB and root.queueC on
host host1:25000.

Stats for host host1:25000
   pool_name=root.queueB:
      topN_query_stats:
         queries=[
            id=0000000000000001:0000000000000004, consumed=20.00 MB,
            id=0000000000000001:0000000000000003, consumed=19.00 MB,
            id=0000000000000001:0000000000000002, consumed=8.00 MB
         ],
         total_consumed=47.00 MB
         fraction_of_pool_total_mem=0.47
      all_query_stats:
         num_running=4,
         min=5.00 MB,
         max=20.00 MB,
         pool_total_mem=100.00 MB,
         average=25.00 MB
   pool_name=root.queueC:
      topN_query_stats:
         queries=[
            id=0000000000000002:0000000000000000, consumed=18.00 MB,
            id=0000000000000002:0000000000000001, consumed=12.00 MB
         ],
         total_consumed=30.00 MB
         fraction_of_pool_total_mem=0.06
      all_query_stats:
         num_running=40,
         min=10.00 MB,
         max=200.00 MB,
         pool_total_mem=500.00 MB,
         average=12.50 MB

Example of aggregated_pool_stats over all hosts for pool root.queueC:

Aggregated stats for pool root.queueC:
   topN_query_stats:
      queries=[
         id=0000000000000002:0000000000000001, consumed=32.00 MB,
         id=0000000000000002:0000000000000004, consumed=26.00 MB,
         id=0000000000000002:0000000000000000, consumed=21.00 MB,
         id=0000000000000002:0000000000000002, consumed=17.00 MB,
         id=0000000000000002:000000000000000e, consumed=9.00 MB
      ],
      total_consumed=105.00 MB
      fraction_of_pool_total_mem=0.82

When a query request is queued due to memory exhaustion, the above
memory_consumption_statistics is logged when the logging is set
at level 2 or higher.

When a query request is timed out due to memory exhaustion, the above
memory_consumption_statistics is reported when the logging is set
at level 1 or higher.

Testing:
1. Added a new test TopNQueryCheck in admission-controller-test.cc to
   verify that the topN query memory consumption details are reported
   correctly.
2. Add two new tests in test_admission_controller.py to simulate
   queries being queued and then timed out due to pool or host memory
   pressure.
3. Added a new test TopN in mem-tracker-test.cc to
   verify that the topN query memory consumption details are computed
   correctly from a mem tracker hierarchy.
4. Ran Core tests successfully.

Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
---
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/util/container-util.h
M common/thrift/StatestoreService.thrift
M common/thrift/generate_error_codes.py
M tests/custom_cluster/test_admission_controller.py
10 files changed, 916 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16220/37
--
To view, visit http://gerrit.cloudera.org:8080/16220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
Gerrit-Change-Number: 16220
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to