[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-17 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14018:
--
Labels: TODOC2.1.1 TODOC2.2  (was: )

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: TODOC2.1.1, TODOC2.2
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-17 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2.1. Thanks for reviewing [~ashutoshc]!

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-16 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
Status: Open  (was: Patch Available)

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-16 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
Attachment: HIVE-14018.1.patch

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-16 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
Status: Patch Available  (was: In Progress)

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
Attachment: HIVE-14018.patch

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

2016-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14018:
---
Status: Patch Available  (was: In Progress)

> Make IN clause row selectivity estimation customizable
> --
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)