Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
......................................................................

PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans

Revert to the pre-IMPALA-3905 algorithm for deciding when to return a
batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for
wide tables where there are only a small number of rows per Avro block.

Optimise memory transfer for selective scans - don't attach unused
decompression buffers to the output batch. Combined with the previous
change, this dramatically reduces the amount of memory transferred out
of scanner threads for selective scans of wide tables.

Cap the maximum row batch queue size at 5 * the number of active
scanner threads. This means that num_scanner_threads gives better
control over memory consumption. It does not reduce the default
significantly on typical server configurations that would have
24+ cores except under high concurrency or low memory environments
where the number of scanner threads is limited. We should evaluate
reducing the default further or otherwise better controlling
memory consumption in a follow-up, based on experiments.

Includes some observability improvements including additional
counters that will help diagnose issues like this more easily:
* Add counters to give some insight into row batch queue.
* Don't create AverageScannerThreadConcurrency for MT scan node where
  it's not actually used.
* Track the row batch queue memory consumption against a sub-tracker

Testing:
Ran the repro in the JIRA. Memory consumption was reduced from ~500MB
to ~220MB on my system.

TODO:
- running stress test on a single node against Avro
- Running exhaustive tests

Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/scan-node.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/util/blocking-queue.h
13 files changed, 272 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/2
--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <[email protected]>

Reply via email to