[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review. > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Affects Version/s: 2.1.0 > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Fix Version/s: 2.2.0 > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Status: Patch Available (was: Open) > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Status: Open (was: Patch Available) > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Attachment: HIVE-14053.02.patch > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Status: Patch Available (was: Open) > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14053: --- Attachment: HIVE-14053.01.patch > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carter Shanklin updated HIVE-14053: --- Priority: Minor (was: Major) > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Priority: Minor > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)