[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2019-04-02 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20382.01.patch, HIVE-20382.02.patch, 
> HIVE-20382.02.patch, HIVE-20382.patch, HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2019-04-02 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Attachment: HIVE-20382.02.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.01.patch, HIVE-20382.02.patch, 
> HIVE-20382.02.patch, HIVE-20382.patch, HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2019-04-01 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Attachment: HIVE-20382.02.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.01.patch, HIVE-20382.02.patch, 
> HIVE-20382.patch, HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2019-04-01 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Attachment: HIVE-20382.01.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.01.patch, HIVE-20382.patch, HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2018-08-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Attachment: HIVE-20382.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.patch, HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2018-08-13 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Description: 
Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer (this 
should be fixed by HIVE-20313). Even if we did, we always assume uniform 
distribution of the column values, which can easily lead to overestimations on 
the number of rows read when we filter on ROW\_\_ID.writeId for materialized 
views (think about a large transaction for MV creation and then small ones for 
incremental maintenance). This overestimation can lead to incremental view 
maintenance not being triggered as cost of the incremental plan is 
overestimated (we think we will read more rows than we actually do). This could 
be fixed by introducing histograms that reflect better the column values 
distribution.

Till both fixes are implemented, we will use a config variable that will 
multiply the estimated cost of the rebuild plan and hence will be able to 
favour incremental rebuild over full rebuild.

  was:
Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
should be fixed by HIVE-20313). Even if we did, we always assume uniform 
distribution of the column values, which can easily lead to overestimations on 
the number of rows read when we filter on ROW__ID.writeId for materialized 
views (think about a large transaction for MV creation and then small ones for 
incremental maintenance). This overestimation can lead to incremental view 
maintenance not being triggered as cost of the incremental plan is 
overestimated (we think we will read more rows than we actually do). This could 
be fixed by introducing histograms that reflect better the column values 
distribution.

Till both fixes are implemented, we will use a config variable that will 
multiply the estimated cost of the rebuild plan and hence will be able to 
favour incremental rebuild over full rebuild.


> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW\_\_ID.writeId to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> ROW\_\_ID.writeId for materialized views (think about a large transaction for 
> MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2018-08-13 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Attachment: HIVE-20382.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
> should be fixed by HIVE-20313). Even if we did, we always assume uniform 
> distribution of the column values, which can easily lead to overestimations 
> on the number of rows read when we filter on ROW__ID.writeId for materialized 
> views (think about a large transaction for MV creation and then small ones 
> for incremental maintenance). This overestimation can lead to incremental 
> view maintenance not being triggered as cost of the incremental plan is 
> overestimated (we think we will read more rows than we actually do). This 
> could be fixed by introducing histograms that reflect better the column 
> values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2018-08-13 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Status: Patch Available  (was: In Progress)

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
> should be fixed by HIVE-20313). Even if we did, we always assume uniform 
> distribution of the column values, which can easily lead to overestimations 
> on the number of rows read when we filter on ROW__ID.writeId for materialized 
> views (think about a large transaction for MV creation and then small ones 
> for incremental maintenance). This overestimation can lead to incremental 
> view maintenance not being triggered as cost of the incremental plan is 
> overestimated (we think we will read more rows than we actually do). This 
> could be fixed by introducing histograms that reflect better the column 
> values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20382) Materialized views: Introduce heuristic to favour incremental rebuild

2018-08-13 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
---
Description: 
Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
should be fixed by HIVE-20313). Even if we did, we always assume uniform 
distribution of the column values, which can easily lead to overestimations on 
the number of rows read when we filter on ROW__ID.writeId for materialized 
views (think about a large transaction for MV creation and then small ones for 
incremental maintenance). This overestimation can lead to incremental view 
maintenance not being triggered as cost of the incremental plan is 
overestimated (we think we will read more rows than we actually do). This could 
be fixed by introducing histograms that reflect better the column values 
distribution.

Till both fixes are implemented, we will use a config variable that will 
multiply the estimated cost of the rebuild plan and hence will be able to 
favour incremental rebuild over full rebuild.

> Materialized views: Introduce heuristic to favour incremental rebuild
> -
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
> should be fixed by HIVE-20313). Even if we did, we always assume uniform 
> distribution of the column values, which can easily lead to overestimations 
> on the number of rows read when we filter on ROW__ID.writeId for materialized 
> views (think about a large transaction for MV creation and then small ones 
> for incremental maintenance). This overestimation can lead to incremental 
> view maintenance not being triggered as cost of the incremental plan is 
> overestimated (we think we will read more rows than we actually do). This 
> could be fixed by introducing histograms that reflect better the column 
> values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)