[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-01-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296867#comment-14296867
 ] 

Hive QA commented on HIVE-9392:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695176/HIVE-9392.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7407 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2568/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2568/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2568/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695176 - PreCommit-HIVE-TRUNK-Build

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 0.15.0

 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-01-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297825#comment-14297825
 ] 

Prasanth Jayachandran commented on HIVE-9392:
-

There is another case where data size becomes 0. I am suspecting it to be 
caused by HIVE-9512.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 0.15.0

 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-01-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296348#comment-14296348
 ] 

Prasanth Jayachandran commented on HIVE-9392:
-

[~mmokhtar] The data size 0 issue should be fixed in the v2 of the patch. 

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 0.15.0

 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-01-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288383#comment-14288383
 ] 

Hive QA commented on HIVE-9392:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12693780/HIVE-9392.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7346 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2481/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2481/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2481/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12693780 - PreCommit-HIVE-TRUNK-Build

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 0.15.0

 Attachments: HIVE-9392.1.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-01-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286824#comment-14286824
 ] 

Prasanth Jayachandran commented on HIVE-9392:
-

This patch uses the tag number in fully qualified column name to disambiguate 
between same key column names. 

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 0.15.0

 Attachments: HIVE-9392.1.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)