[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.6.patch patch without q files updated. JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.5.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch, HIVE-9392.5.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: (was: HIVE-9392.4.patch) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.4.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: (was: HIVE-9392.01.patch) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.3.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.4.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.01.patch After discussing with [~jpullokkaran], we assume that this patch will solve the problem. And we already tried TPCDS 70,89 to confirm JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)