[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.6.patch

patch without q files updated.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.5.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch, HIVE-9392.5.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-06 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: (was: HIVE-9392.4.patch)

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-06 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.4.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: (was: HIVE-9392.01.patch)

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.3.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.4.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.01.patch

After discussing with [~jpullokkaran], we assume that this patch will solve the 
problem. And we already tried TPCDS 70,89 to confirm

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)