Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10977 )
Change subject: IMPALA-7296: bytes limit for row batch queue ...................................................................... IMPALA-7296: bytes limit for row batch queue https://goo.gl/N9LgQt summarises the memory problems I'm trying to solve here. Limit the number of enqueued row batches to a number of bytes, instead of limiting the total number of batches. This helps avoid pathologically high memory consumption for wide rows where the # batches limit does not effectively limit the memory consumption. The bytes limit only lowers the effective capacity of the queue for wider rows, typically 150 bytes or wider. These are the cases when we want to reduce the queue's capacity. E.g. on a system with 10 disks, the previous sizing gave a queue of 100 batches. If we assume rows with 10x16 byte columns, then 100 batches is ~16MB of data. Remove RowBatchQueueCapacity counter that is less relevant now and was not correctly initialised. Testing: Added some basic unit tests. Add regression test that fails reliably before this change. Ran exhaustive build. Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681 --- M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/runtime/row-batch-queue.cc M be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue-test.cc M be/src/util/blocking-queue.h M tests/common/test_dimensions.py M tests/query_test/test_mem_usage_scaling.py 8 files changed, 204 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/10977/2 -- To view, visit http://gerrit.cloudera.org:8080/10977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681 Gerrit-Change-Number: 10977 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong <[email protected]>
