[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review.

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Affects Version/s: 2.1.0

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Fix Version/s: 2.2.0

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Status: Patch Available  (was: Open)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Status: Open  (was: Patch Available)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Attachment: HIVE-14053.02.patch

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Status: Patch Available  (was: Open)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Attachment: HIVE-14053.01.patch

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-06-17 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-14053:
---
Priority: Minor  (was: Major)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Priority: Minor
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)