Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5648: fix count(*) mem estimate regression ......................................................................
IMPALA-5648: fix count(*) mem estimate regression The metadata-only scan doesn't allocate I/O buffers, contrary to an assumption of the memory estimation code in the planner. This fix also sets a floor on the memory estimate, to avoid estimating 0 bytes. 1MB seems like a reasonable approximation: I ran metadata-only scans on a few different data sizes and saw numbers from 128kb to 1mb. The estimate is now much closer to actual consumption (it was 80MB before): [localhost:21000] > select count(*) from tpch_parquet.lineitem; summary; Query: select count(*) from tpch_parquet.lineitem Query submitted at: 2017-08-23 11:58:29 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=cb4b8d41fc838c9a:c5496ff300000000 +----------+ | count(*) | +----------+ | 6001215 | +----------+ Fetched 1 row(s) in 0.13s +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ | 03:AGGREGATE | 1 | 168.49us | 168.49us | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE | | 02:EXCHANGE | 1 | 30.11ms | 30.11ms | 3 | 1 | 0 B | 0 B | UNPARTITIONED | | 01:AGGREGATE | 3 | 2.05us | 6.14us | 3 | 1 | 20.00 KB | 10.00 MB | | | 00:SCAN HDFS | 3 | 4.58ms | 4.72ms | 3 | 6.00M | 128.00 KB | 1.00 MB | tpch_parquet.lineitem | +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+ Testing: Updated affected planner tests. Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286 --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 3 files changed, 21 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7783/3 -- To view, visit http://gerrit.cloudera.org:8080/7783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]>
