Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9449 )
Change subject: IMPALA-6585: increase test_low_mem_limit_q21 limit ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9449/1/tests/query_test/test_mem_usage_scaling.py File tests/query_test/test_mem_usage_scaling.py: http://gerrit.cloudera.org:8080/#/c/9449/1/tests/query_test/test_mem_usage_scaling.py@133 PS1, Line 133: 300 > before the scan change, this was at 187. Do we know why it has gone up so m It looks like it's the 3 scans of lineitem that are using most of the memory. They are somewhat over-reserving memory, e.g. here's the extreme case in scan node 5: {noformat} HDFS_SCAN_NODE (id=5):(Total: 88.007ms, non-child: 88.007ms, % non-child: 100.00%) ... - PeakMemoryUsage: 77.19 MB (80938074) ... Buffer pool: - AllocTime: 4.000ms - CumulativeAllocationBytes: 16.12 MB (16908288) - CumulativeAllocations: 5 (5) - PeakReservation: 72.00 MB (75497472) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 16.00 MB (16777216) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns {noformat} {noformat} | 05:SCAN HDFS [tpch_parquet.lineitem l3, RANDOM] | | partitions=1/1 files=3 size=193.71MB | | predicates: l3.l_receiptdate > l3.l_commitdate | | stored statistics: | | table: rows=6001215 size=193.71MB | | columns: all | | extrapolated-rows=disabled | | mem-estimate=320.00MB mem-reservation=32.00MB | | tuple-ids=6 row-size=68B cardinality=600122 {noformat} So it looks like it's able to get the "ideal" reservation at startup when there's plenty of memory and really gets more than it needs. Then at some point the non-buffer pool memory from the scans and exchanges exceeds 20% of the total query memory, and runs into the reserved memory, causing the query failure. So the root cause of the difference is that we're reserving more memory upfront, whereas the old code only allocated the I/O buffers on demand so used less peak memory. One interesting thing is that the query would succeed with lower memory if it didn't try to get the ideal memory. That suggests one solution is to only increase the reservation to the "ideal" amount once we know that we need it for Parquet. -- To view, visit http://gerrit.cloudera.org:8080/9449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c721a154e7f8fbb19d043e03fd001990be3f5fd Gerrit-Change-Number: 9449 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 26 Feb 2018 19:44:28 +0000 Gerrit-HasComments: Yes
