[ 
https://issues.apache.org/jira/browse/DRILL-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai updated DRILL-7231:
--------------------------------
    Description: 
The join rowcount regresses a lot after changes made for DRILL-7148. This 
affects several TPC-DS queries.

One of theĀ  fixes for DRILL-7148, introduced a change in 
DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when 
not all columns in the group-by key have NDV statistics. However, the fix was 
incorrect and instead caused it to use the guess-timate NDV even when 
statistics were present.

Since the NDV was estimated as 0.1 * input_count_count because of the 
regression, the join cardinality was severely underestimated for TPCDS-21 = 
400M * 15 / Max(400K, 15) = 150.

  was:The join rowcount regresses a lot after changes made for DRILL-7148. This 
affects several TPC-DS queries.


> TPCDS-21 regresses after fix for DRILL-7148
> -------------------------------------------
>
>                 Key: DRILL-7231
>                 URL: https://issues.apache.org/jira/browse/DRILL-7231
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Gautam Parai
>            Assignee: Gautam Parai
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The join rowcount regresses a lot after changes made for DRILL-7148. This 
> affects several TPC-DS queries.
> One of theĀ  fixes for DRILL-7148, introduced a change in 
> DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when 
> not all columns in the group-by key have NDV statistics. However, the fix was 
> incorrect and instead caused it to use the guess-timate NDV even when 
> statistics were present.
> Since the NDV was estimated as 0.1 * input_count_count because of the 
> regression, the join cardinality was severely underestimated for TPCDS-21 = 
> 400M * 15 / Max(400K, 15) = 150.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to