[ https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pooja Nilangekar resolved IMPALA-7749. -------------------------------------- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Merge aggregation node memory estimate is incorrectly influenced by limit > ------------------------------------------------------------------------- > > Key: IMPALA-7749 > URL: https://issues.apache.org/jira/browse/IMPALA-7749 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend > Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0 > Reporter: Tim Armstrong > Assignee: Pooja Nilangekar > Priority: Critical > Fix For: Impala 3.1.0 > > > In the below query the estimate for node ID 3 is too low. If you remove the > limit it is correct. > {noformat} > [localhost:21000] default> set explain_level=2; explain select l_orderkey, > l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5; > EXPLAIN_LEVEL set to 2 > Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from > tpch.lineitem group by 1, 2, 3 limit 5 > +-------------------------------------------------------------------------------------------+ > | Explain String > | > +-------------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 > | > | Per-Host Resource Estimates: Memory=450MB > | > | > | > | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | | Per-Host Resources: mem-estimate=0B mem-reservation=0B > thread-reservation=1 | > | PLAN-ROOT SINK > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | > | > | 04:EXCHANGE [UNPARTITIONED] > | > | | limit: 5 > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | tuple-ids=1 row-size=28B cardinality=5 > | > | | in pipelines: 03(GETNEXT) > | > | | > | > | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 > instances=3 | > | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB > thread-reservation=1 | > | 03:AGGREGATE [FINALIZE] > | > | | output: count:merge(*) > | > | | group by: l_orderkey, l_partkey, l_linenumber > | > | | limit: 5 > | > | | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB > thread-reservation=0 | > | | tuple-ids=1 row-size=28B cardinality=5 > | > | | in pipelines: 03(GETNEXT), 00(OPEN) > | > | | > | > | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)] > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | tuple-ids=1 row-size=28B cardinality=6001215 > | > | | in pipelines: 00(GETNEXT) > | > | | > | > | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > | > | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB > thread-reservation=2 | > | 01:AGGREGATE [STREAMING] > | > | | output: count(*) > | > | | group by: l_orderkey, l_partkey, l_linenumber > | > | | mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=1 row-size=28B cardinality=6001215 > | > | | in pipelines: 00(GETNEXT) > | > | | > | > | 00:SCAN HDFS [tpch.lineitem, RANDOM] > | > | partitions=1/1 files=1 size=718.94MB > | > | stored statistics: > | > | table: rows=6001215 size=718.94MB > | > | columns: all > | > | extrapolated-rows=disabled max-scan-range-rows=1068457 > | > | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 > | > | tuple-ids=0 row-size=20B cardinality=6001215 > | > | in pipelines: 00(GETNEXT) > | > +-------------------------------------------------------------------------------------------+ > {noformat} > The bug is that we use cardinality_ to cap the number of distinct values, but > cardinality_ is capped at the output limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org