[jira] [Updated] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-6348: -- Labels: sub-query (was: ) > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Priority: Minor > Labels: sub-query > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757755#comment-15757755 ] Pengcheng Xiong commented on HIVE-14053: Yes, I noticed that and i am wondering what is the difference between the two. Sounds like duplicate? > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757465#comment-15757465 ] Ashutosh Chauhan edited comment on HIVE-14053 at 12/17/16 7:27 PM: --- There is another column called NULLABLE which can also be populated like IS_NULLABLE in resultset. It will good to update that too. was (Author: ashutoshc): There is another column called NULLABLE which can also be populated like IS_NULLABLE in resultset. Good to update that. > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757465#comment-15757465 ] Ashutosh Chauhan commented on HIVE-14053: - There is another column called NULLABLE which can also be populated like IS_NULLABLE in resultset. Good to update that. > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15459) Fix unit test failures on master
[ https://issues.apache.org/jira/browse/HIVE-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-15459: Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. > Fix unit test failures on master > > > Key: HIVE-15459 > URL: https://issues.apache.org/jira/browse/HIVE-15459 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-15459.patch > > > Golden file updates missed in HIVE-15397 and HIVE-15192 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator
[ https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757151#comment-15757151 ] Hive QA commented on HIVE-15339: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12843717/HIVE-15339.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10791 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,delete_where_non_partitioned.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_interval_mapjoin.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] (batchId=59) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=150) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias] (batchId=84) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2626/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2626/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2626/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12843717 - PreCommit-HIVE-Build > Batch metastore calls to get column stats for fields needed in > FilterSelectivityEstimator > - > > Key: HIVE-15339 > URL: https://issues.apache.org/jira/browse/HIVE-15339 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch, > HIVE-15339.4.patch, HIVE-15339.5.patch, HIVE-15339.6.patch > > > Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics > from metastore in multiple calls. For instance, in the following query, it > ends up getting individual column statistics for for flights multiple number > of times. > When the table has large number of partitions, getting statistics for columns > via multiple calls can be very expensive. This would adversely impact the > overall compilation time. The following query took 14 seconds to compile. > {noformat} > SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`, > YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok` > FROM `flights` as `flights` > JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`) > JOIN `airports` as `source_airport` ON (`flights`.`origin` = > `source_airport`.`iata`) > JOIN `airports` as `dest_airport` ON (`flights`.`dest` = > `dest_airport`.`iata`) > GROUP BY YEAR(`flights`.`dateofflight`); > {noformat} > It may be helpful to club all columns that need statistics and fetch these > details in single remote call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator
[ https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15339: Attachment: HIVE-15339.6.patch Attaching .6 patch which addresses the following 1. Added {{ExtendedCBOProfile.JOIN_REORDERING)}} option in CalcitePlanner. 2. Also virtual columns have to be removed from the set of columns needed for stats gathering. > Batch metastore calls to get column stats for fields needed in > FilterSelectivityEstimator > - > > Key: HIVE-15339 > URL: https://issues.apache.org/jira/browse/HIVE-15339 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch, > HIVE-15339.4.patch, HIVE-15339.5.patch, HIVE-15339.6.patch > > > Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics > from metastore in multiple calls. For instance, in the following query, it > ends up getting individual column statistics for for flights multiple number > of times. > When the table has large number of partitions, getting statistics for columns > via multiple calls can be very expensive. This would adversely impact the > overall compilation time. The following query took 14 seconds to compile. > {noformat} > SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`, > YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok` > FROM `flights` as `flights` > JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`) > JOIN `airports` as `source_airport` ON (`flights`.`origin` = > `source_airport`.`iata`) > JOIN `airports` as `dest_airport` ON (`flights`.`dest` = > `dest_airport`.`iata`) > GROUP BY YEAR(`flights`.`dateofflight`); > {noformat} > It may be helpful to club all columns that need statistics and fetch these > details in single remote call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator
[ https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15339: Status: Patch Available (was: Open) > Batch metastore calls to get column stats for fields needed in > FilterSelectivityEstimator > - > > Key: HIVE-15339 > URL: https://issues.apache.org/jira/browse/HIVE-15339 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch, > HIVE-15339.4.patch, HIVE-15339.5.patch, HIVE-15339.6.patch > > > Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics > from metastore in multiple calls. For instance, in the following query, it > ends up getting individual column statistics for for flights multiple number > of times. > When the table has large number of partitions, getting statistics for columns > via multiple calls can be very expensive. This would adversely impact the > overall compilation time. The following query took 14 seconds to compile. > {noformat} > SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`, > YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok` > FROM `flights` as `flights` > JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`) > JOIN `airports` as `source_airport` ON (`flights`.`origin` = > `source_airport`.`iata`) > JOIN `airports` as `dest_airport` ON (`flights`.`dest` = > `dest_airport`.`iata`) > GROUP BY YEAR(`flights`.`dateofflight`); > {noformat} > It may be helpful to club all columns that need statistics and fetch these > details in single remote call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15016) Run tests with Hadoop 3.0.0-alpha1
[ https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756877#comment-15756877 ] Steve Loughran commented on HIVE-15016: --- if you check out hadoop trunk, all you need to do is make a build with the declared version changed. {code} mvn install -DskipTests -Ddeclared.hadoop.version=2.11 {code} This *does not* change the version numbers enough to bring up HDFS; all it does is trick hive into thinking it knows about Hadoop 3. The real fix will have to be in hive & cherry picked into the Spark fork. > Run tests with Hadoop 3.0.0-alpha1 > -- > > Key: HIVE-15016 > URL: https://issues.apache.org/jira/browse/HIVE-15016 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: Hadoop3Upstream.patch > > > Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run > tests against this new version before GA. > We should start running tests with Hive to validate compatibility against > Hadoop 3.0. > NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 > GA is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15445) Subquery failing with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15445: --- Reporter: Aswathy Chellammal Sreekumar (was: Jesus Camacho Rodriguez) > Subquery failing with ClassCastException > > > Key: HIVE-15445 > URL: https://issues.apache.org/jira/browse/HIVE-15445 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Aswathy Chellammal Sreekumar >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15445.patch > > > To reproduce: > {code:sql} > CREATE TABLE table_7 (int_col INT); > SELECT > (t1.int_col) * (t1.int_col) AS int_col > FROM ( > SELECT > MIN(NULL) OVER () AS int_col > FROM table_7 > ) t1 > WHERE > (False) NOT IN (SELECT > False AS boolean_col > FROM ( > SELECT > MIN(NULL) OVER () AS int_col > FROM table_7 > ) tt1 > WHERE > (t1.int_col) = (tt1.int_col)); > {code} > The problem seems to be in the method that tries to resolve the subquery > column _MIN(NULL)_. It checks the column inspector and ends up returning a > constant expression instead of a column expression for _min(null)_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14496) Enable Calcite rewriting with materialized views
[ https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756709#comment-15756709 ] Lefty Leverenz commented on HIVE-14496: --- Doc note: This adds *hive.materializedview.rewriting* to HiveConf.java, so it needs to be documented in the wiki in Configuration Properties and perhaps also in the new DDL section for materialized views that will be created for HIVE-14497. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Added a TODOC2.2 label. > Enable Calcite rewriting with materialized views > > > Key: HIVE-14496 > URL: https://issues.apache.org/jira/browse/HIVE-14496 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, > HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, > HIVE-14496.07.patch, HIVE-14496.08.patch, HIVE-14496.09.patch, > HIVE-14496.10.patch, HIVE-14496.patch > > > Calcite already supports query rewriting using materialized views. We will > use it to support this feature in Hive. > In order to do that, we need to register the existing materialized views with > Calcite view service and enable the materialized views rewriting rules. > We should include a HiveConf flag to completely disable query rewriting using > materialized views if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14497) Fine control for using materialized views in rewriting
[ https://issues.apache.org/jira/browse/HIVE-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756700#comment-15756700 ] Lefty Leverenz commented on HIVE-14497: --- Doc note: This needs to be documented with a new section in the DDL wikidoc, perhaps after Create/Drop/Alter View. * [Hive DDL | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-HiveDataDefinitionLanguage] Added a TODOC2.2 label. > Fine control for using materialized views in rewriting > -- > > Key: HIVE-14497 > URL: https://issues.apache.org/jira/browse/HIVE-14497 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC2.2 > Fix For: 2.2.0 > > > Follow-up of HIVE-14495. Since the number of materialized views in the system > might grow very large, and query rewriting using materialized views might be > very expensive, we need to include a mechanism to enable/disable materialized > views for query rewriting. > Thus, we should extend the CREATE MATERIALIZED VIEW statement as follows: > {code:sql} > CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name > [BUILD DEFERRED] > [ENABLE REWRITE] -- NEW! > [COMMENT materialized_view_comment] > [ >[ROW FORMAT row_format] >[STORED AS file_format] > | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > ] > [LOCATION hdfs_path] > [TBLPROPERTIES (property_name=property_value, ...)] > AS select_statement; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14496) Enable Calcite rewriting with materialized views
[ https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14496: -- Labels: TODOC2.2 (was: ) > Enable Calcite rewriting with materialized views > > > Key: HIVE-14496 > URL: https://issues.apache.org/jira/browse/HIVE-14496 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, > HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, > HIVE-14496.07.patch, HIVE-14496.08.patch, HIVE-14496.09.patch, > HIVE-14496.10.patch, HIVE-14496.patch > > > Calcite already supports query rewriting using materialized views. We will > use it to support this feature in Hive. > In order to do that, we need to register the existing materialized views with > Calcite view service and enable the materialized views rewriting rules. > We should include a HiveConf flag to completely disable query rewriting using > materialized views if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14497) Fine control for using materialized views in rewriting
[ https://issues.apache.org/jira/browse/HIVE-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14497: -- Labels: TODOC2.2 (was: ) > Fine control for using materialized views in rewriting > -- > > Key: HIVE-14497 > URL: https://issues.apache.org/jira/browse/HIVE-14497 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC2.2 > Fix For: 2.2.0 > > > Follow-up of HIVE-14495. Since the number of materialized views in the system > might grow very large, and query rewriting using materialized views might be > very expensive, we need to include a mechanism to enable/disable materialized > views for query rewriting. > Thus, we should extend the CREATE MATERIALIZED VIEW statement as follows: > {code:sql} > CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name > [BUILD DEFERRED] > [ENABLE REWRITE] -- NEW! > [COMMENT materialized_view_comment] > [ >[ROW FORMAT row_format] >[STORED AS file_format] > | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > ] > [LOCATION hdfs_path] > [TBLPROPERTIES (property_name=property_value, ...)] > AS select_statement; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14053) Hive should report that primary keys can't be null.
[ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756655#comment-15756655 ] Hive QA commented on HIVE-14053: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12843706/HIVE-14053.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10800 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestEmbeddedThriftBinaryCLIService - did not produce a TEST-*.xml file (likely timed out) (batchId=211) TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file (likely timed out) (batchId=211) TestOperationLoggingAPIWithMr - did not produce a TEST-*.xml file (likely timed out) (batchId=211) TestThriftCLIServiceWithBinary - did not produce a TEST-*.xml file (likely timed out) (batchId=211) TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] (batchId=59) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=150) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias] (batchId=84) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2625/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2625/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2625/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12843706 - PreCommit-HIVE-Build > Hive should report that primary keys can't be null. > --- > > Key: HIVE-14053 > URL: https://issues.apache.org/jira/browse/HIVE-14053 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-14053.01.patch > > > HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. > With the right driver in place, tools like Tableau can do join elimination > and queries can run much faster. > Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't > work quite right for keys. In particular, primary keys by definition are not > null and the metadata should reflect this for improved join elimination. > In this example that uses the TPC-H schema and its constraints, we sum > l_extendedprice and group by l_shipmode. This query should not use more than > just the lineitem table. > With all the constraints in place, Tableau generates this query: > {code} > SELECT `lineitem`.`l_shipmode` AS `l_shipmode`, > SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok` > FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem` > JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = > `orders`.`o_orderkey`) > JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = > `customer`.`c_custkey`) > JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = > `nation`.`n_nationkey`) > WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT > (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS > NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT > (`nation`.`n_regionkey` IS NULL))) > {code} > Since these are the primary keys the denormalization and the where condition > is unnecessary and this sort of query can be a lot faster by just accessing > the lineitem table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way
[ https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756630#comment-15756630 ] Lefty Leverenz commented on HIVE-15147: --- Doc note: When this is merged to master, two new configuration parameters will need to be documented in the wiki (*hive.llap.io.encode.alloc.size* and *hive.llap.io.encode.slice.row.count*). * [Configuration Properties -- LLAP | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP] We don't have a TODOC label for branch master-15147. > LLAP: use LLAP cache for non-columnar formats in a somewhat general way > --- > > Key: HIVE-15147 > URL: https://issues.apache.org/jira/browse/HIVE-15147 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15147.01.WIP.noout.patch, > HIVE-15147.02.WIP.noout.patch, HIVE-15147.04.WIP.noout.patch, > HIVE-15147.05.WIP.noout.patch, HIVE-15147.WIP.noout.patch > > > The primary goal for the first pass is caching text files. Nothing would > prevent other formats from using the same path, in principle, although, as > was originally done with ORC, it may be better to have native caching support > optimized for each particular format. > Given that caching pure text is not smart, and we already have ORC-encoded > cache that is columnar due to ORC file structure, we will transform data into > columnar ORC. > The general idea is to treat all the data in the world as merely ORC that was > compressed with some poor compression codec, such as csv. Using the original > IF and serde, as well as an ORC writer (with some heavyweight optimizations > disabled, potentially), we can "uncompress" the csv/whatever data into its > "original" ORC representation, then cache it efficiently, by column, and also > reuse a lot of the existing code. > Various other points: > 1) Caching granularity will have to be somehow determined (i.e. how do we > slice the file horizontally, to avoid caching entire columns). As with ORC > uncompressed files, the specific offsets don't really matter as long as they > are consistent between reads. The problem is that the file offsets will > actually need to be propagated to the new reader from the original > inputformat. Row counts are easier to use but there's a problem of how to > actually map them to missing ranges to read from disk. > 2) Obviously, for row-based formats, if any one column that is to be read has > been evicted or is otherwise missing, "all the columns" have to be read for > the corresponding slice to cache and read that one column. The vague plan is > to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps > - it will just so happen that a missing column in disk range list to retrieve > will expand the disk-range-to-read into the whole horizontal slice of the > file. > 3) Granularity/etc. won't work for gzipped text. If anything at all is > evicted, the entire file has to be re-read. Gzipped text is a ridiculous > feature, so this is by design. > 4) In future, it would be possible to also build some form or > metadata/indexes for this cached data to do PPD, etc. This is out of the > scope for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15261) Exception in thread "main" java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1
[ https://issues.apache.org/jira/browse/HIVE-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756607#comment-15756607 ] Baranenko Nikolay commented on HIVE-15261: -- Hello, Try to run apache-hive-2.1.1-bin and have error too :-( SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hduser/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:366) at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:558) ... 9 more Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1654) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:80) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3367) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3406) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3386) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3640) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1652) ... 23 more Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:169) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:136) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:95) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:476) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:278) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:599) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:564) at
[jira] [Commented] (HIVE-15016) Run tests with Hadoop 3.0.0-alpha1
[ https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756602#comment-15756602 ] Baranenko Nikolay commented on HIVE-15016: -- Hello, Try to run apache-hive-2.1.1-bin and have error too :-( SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hduser/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:366) at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:558) ... 9 more Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1654) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:80) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3367) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3406) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3386) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3640) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1652) ... 23 more Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:169) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:136) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:95) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:476) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:278) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)