[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14018: -- Labels: TODOC2.1.1 TODOC2.2 (was: ) > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: TODOC2.1.1, TODOC2.2 > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Pushed to master, branch-2.1. Thanks for reviewing [~ashutoshc]! > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Status: Open (was: Patch Available) > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Attachment: HIVE-14018.1.patch > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Status: Patch Available (was: In Progress) > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.1.patch, HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Attachment: HIVE-14018.patch > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable
[ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: --- Status: Patch Available (was: In Progress) > Make IN clause row selectivity estimation customizable > -- > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 2.1.0, 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Attachments: HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead > of just dividing incoming number of rows by 2). However, as the distribution > of values of the columns is considered uniform, we might end up heavily > underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we > can alleviate this problem. The solution is not very elegant, but it is the > best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)