[ https://issues.apache.org/jira/browse/DRILL-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gautam Parai updated DRILL-7231: -------------------------------- Description: The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries. One of theĀ fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present. Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150. was:The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries. > TPCDS-21 regresses after fix for DRILL-7148 > ------------------------------------------- > > Key: DRILL-7231 > URL: https://issues.apache.org/jira/browse/DRILL-7231 > Project: Apache Drill > Issue Type: Bug > Reporter: Gautam Parai > Assignee: Gautam Parai > Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > The join rowcount regresses a lot after changes made for DRILL-7148. This > affects several TPC-DS queries. > One of theĀ fixes for DRILL-7148, introduced a change in > DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when > not all columns in the group-by key have NDV statistics. However, the fix was > incorrect and instead caused it to use the guess-timate NDV even when > statistics were present. > Since the NDV was estimated as 0.1 * input_count_count because of the > regression, the join cardinality was severely underestimated for TPCDS-21 = > 400M * 15 / Max(400K, 15) = 150. -- This message was sent by Atlassian JIRA (v7.6.3#76005)