Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10550 )
Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans ...................................................................... PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans Revert to the pre-IMPALA-3905 algorithm for deciding when to return a batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for wide tables where there are only a small number of rows per Avro block. Optimise memory transfer for selective scans - don't attach unused decompression buffers to the output batch. Combined with the previous change, this dramatically reduces the amount of memory transferred out of scanner threads for selective scans of wide tables. Cap the maximum row batch queue size at 5 * the number of active scanner threads. This means that num_scanner_threads gives better control over memory consumption. It does not reduce the default significantly on typical server configurations that would have 24+ cores except under high concurrency or low memory environments where the number of scanner threads is limited. We should evaluate reducing the default further or otherwise better controlling memory consumption in a follow-up, based on experiments. Includes some observability improvements including additional counters that will help diagnose issues like this more easily: * Add counters to give some insight into row batch queue. * Don't create AverageScannerThreadConcurrency for MT scan node where it's not actually used. * Track the row batch queue memory consumption against a sub-tracker Testing: Ran the repro in the JIRA. Memory consumption was reduced from ~500MB to ~220MB on my system. TODO: - running stress test on a single node against Avro - Running exhaustive tests Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b --- M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/scan-node.h M be/src/runtime/mem-pool.cc M be/src/runtime/mem-pool.h M be/src/runtime/mem-tracker-test.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/util/blocking-queue.h 13 files changed, 272 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/2 -- To view, visit http://gerrit.cloudera.org:8080/10550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b Gerrit-Change-Number: 10550 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong <[email protected]>
