[ 
https://issues.apache.org/jira/browse/HIVE-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877945#comment-16877945
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-21928 at 7/3/19 4:08 PM:
------------------------------------------------------------------------

[~kgyrtkirk], can you take a look at https://github.com/apache/hive/pull/700 ?
Current patch will scale the ndv for all columns involved in AND clauses 
proportionally to the reduction in the number of rows. As you can see from the 
q file changes, the net effect is the increase in the estimated number of rows 
for joins that follow another join.
bq. I think the following continue block should be removed; even thru the 
rowcount is not changed; the affectedcolumns might have, is there any reason I 
don't see why we should do it?
If you consider each column independent, then we could skip... but maybe this 
should only be done once we compute the reduction ratio per column as discussed 
above, as currently we just scale the ndv for all columns involved in the 
predicate proportionally. I will create a follow-up.



was (Author: jcamachorodriguez):
[~kgyrtkirk], can you take a look at https://github.com/apache/hive/pull/700 ?
Current patch will scale the ndv for all columns involved in AND clauses 
proportionally to the reduction in the number of rows. As you can see from the 
q file changes, the net effect is the increase in the estimated number of rows 
for joins that follow another join.
bq. I think the following continue block should be removed; even thru the 
rowcount is not changed; the affectedcolumns might have, is there any reason I 
don't see why we should do it?
If you consider each column independent, then we could skip... but maybe this 
should only be done once we compute the reduction ratio per column as discussed 
above, as currently we just scale the ndv for all columns involved in the 
predicate proportionally.


> Fix for statistics annotation in nested AND expressions
> -------------------------------------------------------
>
>                 Key: HIVE-21928
>                 URL: https://issues.apache.org/jira/browse/HIVE-21928
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: HIVE-21928.01.patch, HIVE-21928.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Discovered while working on HIVE-21867. Having predicates with nested AND 
> expressions may result in different stats, even if predicates are basically 
> similar (from stats estimation standpoint).
> For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to