[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-08 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969923#comment-16969923
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985234/HIVE-22238.10.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19344/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19344/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19344/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12985234/HIVE-22238.10.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12985234 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch, 
> HIVE-22238.07.patch, HIVE-22238.09.patch, HIVE-22238.10.patch, 
> HIVE-22238.10.patch, HIVE-22238.10.patch, HIVE-22238.10.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969799#comment-16969799
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985234/HIVE-22238.10.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17667 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19340/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19340/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19340/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12985234 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch, 
> HIVE-22238.07.patch, HIVE-22238.09.patch, HIVE-22238.10.patch, 
> HIVE-22238.10.patch, HIVE-22238.10.patch, HIVE-22238.10.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969792#comment-16969792
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
38s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
11s{color} | {color:blue} ql in master has 1548 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
5s{color} | {color:red} root: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
24s{color} | {color:red} ql generated 1 new + 1548 unchanged - 0 fixed = 1549 
total (was 1548) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19340/dev-support/hive-personality.sh
 |
| git revision | master / 358e5a9 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19340/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19340/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19340/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19340/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19340/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969397#comment-16969397
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985225/HIVE-22238.10.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17667 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestPartitionManagement.testPartitionDiscoveryTransactionalTable
 (batchId=224)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19332/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19332/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19332/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12985225 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch, 
> HIVE-22238.07.patch, HIVE-22238.09.patch, HIVE-22238.10.patch, 
> HIVE-22238.10.patch, HIVE-22238.10.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969390#comment-16969390
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
47s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
20s{color} | {color:blue} ql in master has 1548 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
9s{color} | {color:red} root: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
31s{color} | {color:red} ql generated 1 new + 1548 unchanged - 0 fixed = 1549 
total (was 1548) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19332/dev-support/hive-personality.sh
 |
| git revision | master / 358e5a9 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19332/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19332/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19332/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19332/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19332/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969082#comment-16969082
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985116/HIVE-22238.10.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 17573 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomNonExistent
 (batchId=285)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighBytesRead 
(batchId=285)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryElapsedTime
 (batchId=285)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19327/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19327/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19327/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12985116 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch, 
> HIVE-22238.07.patch, HIVE-22238.09.patch, HIVE-22238.10.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-07 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969071#comment-16969071
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
5s{color} | {color:blue} ql in master has 1550 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
0s{color} | {color:red} root: The patch generated 3 new + 103 unchanged - 0 
fixed = 106 total (was 103) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
20s{color} | {color:red} ql generated 1 new + 1550 unchanged - 0 fixed = 1551 
total (was 1550) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19327/dev-support/hive-personality.sh
 |
| git revision | master / 4cfb7a0 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19327/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19327/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19327/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19327/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19327/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-06 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968518#comment-16968518
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985059/HIVE-22238.09.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19316/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19316/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19316/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-11-06 17:15:48.258
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-19316/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-11-06 17:15:48.262
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   d0bd071..4cfb7a0  master -> origin/master
+ git reset --hard HEAD
HEAD is now at d0bd071 HIVE-22292: Implement Hypothetical-Set Aggregate 
Functions (Krisztian Kasa, reviewed by Jesus Camacho Rodriguez)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 4cfb7a0 HIVE-22311: Propagate min/max column values from 
statistics to the optimizer for timestamp type (Jesus Camacho Rodriguez, 
reviewed by Miklos Gergely)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-11-06 17:15:50.484
+ rm -rf ../yetus_PreCommit-HIVE-Build-19316
+ mkdir ../yetus_PreCommit-HIVE-Build-19316
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-19316
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-19316/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_interval_mapjoin.q.out:240
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/llap/vector_interval_mapjoin.q.out' with 
conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out:259
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out' with 
conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:368: trailing whitespace.

/data/hiveptest/working/scratch/build.patch:370: trailing whitespace.
  
/data/hiveptest/working/scratch/build.patch:374: trailing whitespace.
  
/data/hiveptest/working/scratch/build.patch:412: trailing whitespace.
st.type_name='galaxy class' 
/data/hiveptest/working/scratch/build.patch:429: trailing whitespace.
st.type_name='galaxy class' 
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_interval_mapjoin.q.out:240
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/llap/vector_interval_mapjoin.q.out' with 
conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out:259
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out' with 
conflicts.
U ql/src/test/results/clientpositive/llap/vector_interval_mapjoin.q.out
U ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out
warning: squelched 32 whitespace errors
warning: 37 lines add whitespace errors.
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-19316
+ exit 1
'
{noformat}

This 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-06 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968280#comment-16968280
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12985034/HIVE-22238.07.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 189 failed/errored test(s), 17572 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join6] 
(batchId=46)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[estimate_pkfk_push]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=182)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_inner_join]
 (batchId=182)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query23] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query6] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query11] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query12] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query13] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query15] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query17] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query18] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query19] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query1] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query1b] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query20] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query25] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query26] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query27] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query29] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query2] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query30] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query31] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query32] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query33] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query34] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query36] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query38] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query3] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query40] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query42] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query43] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query45] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query46] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query47] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query48] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query49] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query4] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query50] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query51] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query52] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query53] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query54] 
(batchId=300)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query55] 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-06 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968272#comment-16968272
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
11s{color} | {color:blue} ql in master has 1550 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 3 new + 102 unchanged - 0 
fixed = 105 total (was 102) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
0s{color} | {color:red} root: The patch generated 3 new + 102 unchanged - 0 
fixed = 105 total (was 102) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
21s{color} | {color:red} ql generated 1 new + 1550 unchanged - 0 fixed = 1551 
total (was 1550) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19311/dev-support/hive-personality.sh
 |
| git revision | master / d0bd071 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19311/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19311/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19311/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19311/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19311/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-05 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968004#comment-16968004
 ] 

Jesus Camacho Rodriguez commented on HIVE-22238:


+1

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-05 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967967#comment-16967967
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984978/HIVE-22238.06.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17570 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19302/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19302/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19302/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984978 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.06.patch, HIVE-22238.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-11-05 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967958#comment-16967958
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
49s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
14s{color} | {color:blue} ql in master has 1550 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 6 new + 101 unchanged - 1 
fixed = 107 total (was 102) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
0s{color} | {color:red} root: The patch generated 6 new + 101 unchanged - 1 
fixed = 107 total (was 102) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
22s{color} | {color:red} ql generated 1 new + 1550 unchanged - 0 fixed = 1551 
total (was 1550) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19302/dev-support/hive-personality.sh
 |
| git revision | master / 65a5935 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19302/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19302/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19302/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19302/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19302/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-31 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964335#comment-16964335
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984454/HIVE-22238.05.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17549 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19232/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19232/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19232/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984454 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-31 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964321#comment-16964321
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
7s{color} | {color:blue} ql in master has 1545 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} ql: The patch generated 1 new + 93 unchanged - 0 fixed 
= 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
1s{color} | {color:red} root: The patch generated 1 new + 93 unchanged - 0 
fixed = 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
24s{color} | {color:red} ql generated 1 new + 1545 unchanged - 0 fixed = 1546 
total (was 1545) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19232/dev-support/hive-personality.sh
 |
| git revision | master / 244de3b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19232/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19232/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19232/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19232/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19232/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-30 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963590#comment-16963590
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984346/HIVE-22238.05.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17545 tests 
executed
*Failed tests:*
{noformat}
TestStatsReplicationScenariosACIDNoAutogather - did not produce a TEST-*.xml 
file (likely timed out) (batchId=255)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19219/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19219/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19219/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984346 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-30 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963582#comment-16963582
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
6s{color} | {color:blue} ql in master has 1545 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 1 new + 93 unchanged - 0 fixed 
= 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
58s{color} | {color:red} root: The patch generated 1 new + 93 unchanged - 0 
fixed = 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
12s{color} | {color:red} ql generated 1 new + 1545 unchanged - 0 fixed = 1546 
total (was 1545) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19219/dev-support/hive-personality.sh
 |
| git revision | master / 6a154ee |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19219/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19219/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19219/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19219/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19219/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962810#comment-16962810
 ] 

Zoltan Haindrich commented on HIVE-22238:
-

unrelated tests are falling...attached the same patch the 3rd time
[~jcamachorodriguez] could you please take a look?

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch, HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-30 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962760#comment-16962760
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984290/HIVE-22238.05.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 17538 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniLlapArrow - did not produce a TEST-*.xml file (likely timed 
out) (batchId=284)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=364)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19205/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19205/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19205/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984290 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch, HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-30 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962755#comment-16962755
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
0s{color} | {color:blue} ql in master has 1545 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 1 new + 93 unchanged - 0 fixed 
= 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
52s{color} | {color:red} root: The patch generated 1 new + 93 unchanged - 0 
fixed = 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
5s{color} | {color:red} ql generated 1 new + 1545 unchanged - 0 fixed = 1546 
total (was 1545) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19205/dev-support/hive-personality.sh
 |
| git revision | master / 305e710 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19205/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19205/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19205/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19205/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19205/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-29 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962191#comment-16962191
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984247/HIVE-22238.05.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19193/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19193/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19193/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12984247/HIVE-22238.05.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984247 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-29 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962079#comment-16962079
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984247/HIVE-22238.05.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.insertOverwriteCreateAcid 
(batchId=353)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19191/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19191/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19191/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984247 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch, HIVE-22238.05.patch, 
> HIVE-22238.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-29 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962069#comment-16962069
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
56s{color} | {color:blue} ql in master has 1545 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 1 new + 93 unchanged - 0 fixed 
= 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
54s{color} | {color:red} root: The patch generated 1 new + 93 unchanged - 0 
fixed = 94 total (was 93) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
12s{color} | {color:red} ql generated 1 new + 1545 unchanged - 0 fixed = 1546 
total (was 1545) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19191/dev-support/hive-personality.sh
 |
| git revision | master / aceb8b6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19191/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19191/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19191/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19191/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19191/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-28 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961527#comment-16961527
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984172/HIVE-22238.04.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 17549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join33] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer9] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join45] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join47] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin47] (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_2] (batchId=97)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_2] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqualcolumnrefs]
 (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapjoin] 
(batchId=43)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby_groupingset_bug]
 (batchId=186)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[reopt_dpp] 
(batchId=181)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[reopt_semijoin]
 (batchId=186)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[retry_failure_reorder]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_mapjoin]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=184)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=194)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=111)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19180/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19180/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19180/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 26 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984172 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-28 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961516#comment-16961516
 ] 

Hive QA commented on HIVE-22238:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
56s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
5s{color} | {color:blue} ql in master has 1545 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
52s{color} | {color:red} ql: The patch generated 1 new + 883 unchanged - 0 
fixed = 884 total (was 883) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
5s{color} | {color:red} root: The patch generated 1 new + 900 unchanged - 0 
fixed = 901 total (was 900) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
11s{color} | {color:red} ql generated 3 new + 1545 unchanged - 0 fixed = 1548 
total (was 1545) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AccurateEstimatesCheckerHook.java:[line 61] |
|  |  Dead store to currCS in 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.xxx1(Operator,
 ColStatistics)  At 
StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.xxx1(Operator,
 ColStatistics)  At StatsRulesProcFactory.java:[line 2294] |
|  |  Dead store to o in 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.xxx1(Operator,
 ColStatistics)  At 
StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$JoinStatsRule.xxx1(Operator,
 ColStatistics)  At StatsRulesProcFactory.java:[line 2295] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19180/dev-support/hive-personality.sh
 |
| git revision | master / aceb8b6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19180/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19180/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19180/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19180/yetus/new-findbugs-ql.html
 |
| modules | C: ql . itests U: . |
| Console output | 

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961081#comment-16961081
 ] 

Zoltan Haindrich commented on HIVE-22238:
-

written some tests to check how well a particular solution works
I've added 2 small "tweaks":
* column stats are marked if they are used in a filter expr
* "is null" filters are not considered as affected by the filter if the number 
of nulls is 0 in the columnstat

there is  one test which doesnt pass with accurate estimates on the second 
join; because the first join has a .01 selectivity from the 2 0.1 branches; but 
min is used during estimation so it kept as .1 - I think this is okay.

I've uploaded a new patch to run hiveqa - will clean up the patch for the next 
run

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch, HIVE-22238.04.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-22 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957156#comment-16957156
 ] 

Jesus Camacho Rodriguez commented on HIVE-22238:


Let's work on the more robust solution. It is not worth to trade a wrong 
estimation for another, especially since overestimation will cause regressions 
in mapjoin transformation. If we have identified that this is due to filter 
conditions on the join keys accounted for more than once when selectivity is 
computed, we can focus on that specific issue. At least for the 
`getSelectivitySimpleTree` method (no binary operators below), it seems the fix 
is not too complicated and will improve our estimates.

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-22 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957002#comment-16957002
 ] 

Zoltan Haindrich commented on HIVE-22238:
-

[~jcamachorodriguez]: yes; things like that have crossed my mind...but those 
things are not readily available...that's why I didn't choosen that path.
right now we have an underestimation bug; which this ticket could change into a 
"sometimes" overestimation issue.
This new overestimation in some cases could be improved upon later.

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-21 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956461#comment-16956461
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12983619/HIVE-22238.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19093/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19093/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19093/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12983619/HIVE-22238.03.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12983619 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-21 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956422#comment-16956422
 ] 

Jesus Camacho Rodriguez commented on HIVE-22238:


[~kgyrtkirk], `getSelectivitySimpleTree` looks for the TS that is below that 
operator. Does it find it or do we go into logic for multiple operators? If it 
does, maybe we should skip the predicates that have already been accounted for 
on PK side (filter conditions on join keys) from the estimate. Does that make 
sense? Skipping any reduction performed by a join seems too radical (for 
instance, if we filter by year but joined by any other key, we will not predict 
any reduction due to join).

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-21 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956408#comment-16956408
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12983619/HIVE-22238.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 17545 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=16)
org.apache.hadoop.hive.metastore.security.TestHadoopAuthBridge23.testSaslWithHiveMetaStore
 (batchId=292)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19091/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19091/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19091/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12983619 - PreCommit-HIVE-Build

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-21 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956387#comment-16956387
 ] 

Hive QA commented on HIVE-22238:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 1547 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-19091/dev-support/hive-personality.sh
 |
| git revision | master / c9850b4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-19091/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948296#comment-16948296
 ] 

Zoltan Haindrich commented on HIVE-22238:
-

I went after this - but forgot to write an update here...so what's happening is 
somewhat both, now I think that the rescaling is accurate and I agree with the 
logic...but when calcite pushes the filter predicates to the other branch as 
well it ends up downscaling by the same factor again - hence my patch have 
solved some case...I'll try to get back to this sooner than later :)
my current idea is to somehow identify that the FK column in question is not 
filtered so far - so that we may downscale it by the PK factor

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-10-02 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943328#comment-16943328
 ] 

Jesus Camacho Rodriguez commented on HIVE-22238:


[~kgyrtkirk], reading the description of the issue, it seems this is expected 
since the FK input may be filtered, e.g., by a Filter condition in that input, 
while the FK-PK join may filter it again based on the condition in the PK side?

> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-09-24 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937115#comment-16937115
 ] 

Hive QA commented on HIVE-22238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981202/HIVE-22238.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 193 failed/errored test(s), 17000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[updateBasicStats] 
(batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_3]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_4]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_5]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_rebuild_dummy]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_time_window]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=172)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query23] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query11] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query12] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query13] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query15] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query17] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query18] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query19] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query1] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query1b] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query20] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query21] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query22] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query24] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query25] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query26] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query27] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query29] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query30] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query31] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query32] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query33] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query34] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query36] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query37] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query38] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query39] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query3] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query40] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query42] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query43] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query45] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query46] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query47] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query48] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query49] 
(batchId=297)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query4] 
(batchId=297)

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

2019-09-24 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937089#comment-16937089
 ] 

Hive QA commented on HIVE-22238:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  5m 
55s{color} | {color:blue} ql in master has 1567 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18706/dev-support/hive-personality.sh
 |
| git revision | master / 48ae7ef |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18706/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PK/FK selectivity estimation underscales estimations
> 
>
> Key: HIVE-22238
> URL: https://issues.apache.org/jira/browse/HIVE-22238
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-22238.01.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)