[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-20332:
----------------------------------------------


> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20332
>                 URL: https://issues.apache.org/jira/browse/HIVE-20332
>             Project: Hive
>          Issue Type: Improvement
>          Components: Materialized views
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> Currently, we do not expose stats over {{ROW__ID.writeId}} to the optimizer. 
> Even if we did, we always assume uniform distribution of the column values, 
> which can easily lead to overestimations on the number of rows read when we 
> filter on {{ROW__ID.writeId}} for materialized views (think about a large 
> transaction for MV creation and then small ones for incremental maintenance). 
> This overestimation can lead to incremental view maintenance not being 
> triggered as cost of the incremental plan is overestimated (we think we will 
> read more rows than we actually do). This could be fixed by introducing 
> histograms that reflect better the column values distribution.
> Till that moment, we will use a config variable that will set the selectivity 
> for filter condition on ROW__ID during the cost calculation. Setting that 
> variable to a low value will favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to