[jira] [Updated] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs
[ https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17191: Attachment: HIVE-17191.1.patch > Add InterfaceAudience and InterfaceStability annotations for StorageHandler > APIs > > > Key: HIVE-17191 > URL: https://issues.apache.org/jira/browse/HIVE-17191 > Project: Hive > Issue Type: Sub-task > Components: StorageHandler >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17191.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs
[ https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17191: Status: Patch Available (was: Open) > Add InterfaceAudience and InterfaceStability annotations for StorageHandler > APIs > > > Key: HIVE-17191 > URL: https://issues.apache.org/jira/browse/HIVE-17191 > Project: Hive > Issue Type: Sub-task > Components: StorageHandler >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17191.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104476#comment-16104476 ] Hive QA commented on HIVE-16759: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879231/HIVE16759.4.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10994 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6163/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6163/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6163/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879231 - PreCommit-HIVE-Build > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch, HIVE16759.4.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: HIVE-17139.4.patch > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch, HIVE-17139.4.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Patch Available (was: Open) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch, HIVE-17139.4.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Open (was: Patch Available) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: (was: HIVE-17139.4.patch) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: HIVE-17139.4.patch > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch, HIVE-17139.4.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Patch Available (was: Open) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch, HIVE-17139.4.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Open (was: Patch Available) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch, HIVE-17139.4.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17190) Don't store bitvectors for unpartitioned table
[ https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104467#comment-16104467 ] Gopal V commented on HIVE-17190: Does doing bitvectors for flat tables help with a multiple insert into merging? > Don't store bitvectors for unpartitioned table > -- > > Key: HIVE-17190 > URL: https://issues.apache.org/jira/browse/HIVE-17190 > Project: Hive > Issue Type: Test > Components: Metastore, Statistics >Affects Versions: 3.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-17190.patch > > > Since current ones can't be intersected, there is no advantage of storing > them for unpartitioned tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104464#comment-16104464 ] Gopal V commented on HIVE-16965: LGTM - +1 > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, > HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16811: --- Status: Patch Available (was: Open) > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16811: --- Status: Open (was: Patch Available) > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage
[ https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104419#comment-16104419 ] Hive QA commented on HIVE-15665: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879225/HIVE-15665.08.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=142) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6162/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6162/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6162/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879225 - PreCommit-HIVE-Build > LLAP: OrcFileMetadata objects in cache can impact heap usage > > > Key: HIVE-15665 > URL: https://issues.apache.org/jira/browse/HIVE-15665 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Sergey Shelukhin > Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, > HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, > HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, > HIVE-15665.patch > > > OrcFileMetadata internally has filestats, stripestats etc which are allocated > in heap. On large data sets, this could have an impact on the heap usage and > the memory usage by different executors in LLAP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17190) Don't store bitvectors for unpartitioned table
[ https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned HIVE-17190: --- Assignee: Ashutosh Chauhan > Don't store bitvectors for unpartitioned table > -- > > Key: HIVE-17190 > URL: https://issues.apache.org/jira/browse/HIVE-17190 > Project: Hive > Issue Type: Test > Components: Metastore, Statistics >Affects Versions: 3.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-17190.patch > > > Since current ones can't be intersected, there is no advantage of storing > them for unpartitioned tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17190) Don't store bitvectors for unpartitioned table
[ https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-17190: Attachment: HIVE-17190.patch Initial patch for testing. > Don't store bitvectors for unpartitioned table > -- > > Key: HIVE-17190 > URL: https://issues.apache.org/jira/browse/HIVE-17190 > Project: Hive > Issue Type: Test > Components: Metastore, Statistics >Affects Versions: 3.0.0 >Reporter: Ashutosh Chauhan > Attachments: HIVE-17190.patch > > > Since current ones can't be intersected, there is no advantage of storing > them for unpartitioned tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs
[ https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li reassigned HIVE-17193: - > HoS: don't combine map works that are targets of different DPPs > --- > > Key: HIVE-17193 > URL: https://issues.apache.org/jira/browse/HIVE-17193 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-591) create new type of join ( 1 row for a given key from multiple tables) (UNIQUEJOIN)
[ https://issues.apache.org/jira/browse/HIVE-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated HIVE-591: -- Description: It will be useful to support a new type of join: say: . select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where The semantics are that for a given key only 1 row is created - nulls are present for the the tables which do not contain a row for that key. There is no limit on the number of tables, the number of keys should be the same as the number of tables. was: It will be useful to support a new type of join: say: select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where The semantics are that for a given key only 1 row is created - nulls are present for the the tables which do not contain a row for that key. There is no limit on the number of tables, the number of keys should be the same as the number of tables. > create new type of join ( 1 row for a given key from multiple tables) > (UNIQUEJOIN) > -- > > Key: HIVE-591 > URL: https://issues.apache.org/jira/browse/HIVE-591 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Emil Ibrishimov > Fix For: 0.5.0 > > Attachments: HIVE-591.1.patch, HIVE-591.2.patch, HIVE-591.3.patch, > HIVE-591.4.patch > > > It will be useful to support a new type of join: > say: > . > select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where > The semantics are that for a given key only 1 row is created - nulls are > present for the the tables which do not contain a row for that key. > There is no limit on the number of tables, the number of keys should be the > same as the number of tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104380#comment-16104380 ] Rui Li commented on HIVE-16948: --- Thinking more about this, I find a bug in combing equivalent works. If 2 map works contain same operators, but will be pruned by different DPP sinks, then they can't be combined. E.g., let's slightly change the above example into: {code} explain select * from (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b on a.key=b.key; {code} The two map works for {{srcpart}} still get combined. However, they need to be pruned by different values: {{src.key}} and {{src.value}} respectively. In the current implementation we'll probably have incorrect results. > Invalid explain when running dynamic partition pruning query in Hive On Spark > - > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948_1.patch, HIVE-16948.patch > > > in > [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] > in spark_dynamic_partition_pruning.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator >
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104374#comment-16104374 ] Rui Li commented on HIVE-16948: --- [~kellyzly], I'm not talking about the 3 places. Here's an example: {noformat} set hive.cbo.enable=false; explain select * from (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) b on a.key=b.key; STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark DagName: lirui_20170728110559_4c2bc0ba-ab9a-428b-bf09-23f1b19b068f:16 Vertices: Map 8 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 1 Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 5 Execution mode: vectorized Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1) Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1) Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1) DagName: lirui_20170728110559_4c2bc0ba-ab9a-428b-bf09-23f1b19b068f:15 Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: ds (type: string) sort order: + Map-reduce partition columns: ds (type: string) Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE value expressions: key (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reduce
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.8.patch Last patch mysteriously failed in build. Recreated one after code refresh. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, > HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool
[ https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104361#comment-16104361 ] Hive QA commented on HIVE-17167: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879216/HIVE-17167.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11030 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=236) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=207) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6161/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6161/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6161/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879216 - PreCommit-HIVE-Build > Create metastore specific configuration tool > > > Key: HIVE-17167 > URL: https://issues.apache.org/jira/browse/HIVE-17167 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17167.patch > > > As part of making the metastore a separately releasable module we need > configuration tools that are specific to that module. It cannot use or > extend HiveConf as that is in hive common. But it must take a HiveConf > object and be able to operate on it. > The best way to achieve this is using Hadoop's Configuration object (which > HiveConf extends) together with enums and static methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
[ https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104272#comment-16104272 ] Hive QA commented on HIVE-17164: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879215/HIVE-17164.02.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_ptf_part_simple] (batchId=151) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning (batchId=292) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6160/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6160/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6160/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879215 - PreCommit-HIVE-Build > Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default) > --- > > Key: HIVE-17164 > URL: https://issues.apache.org/jira/browse/HIVE-17164 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch > > > Add disk storage backing. Turn hive.vectorized.execution.ptf.enabled on by > default. > Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the > maximum number of vectorized row batch to buffer in memory before spilling to > disk. > Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez > Reducer make small batches for making a lot of key group batches that cause > memory buffering and disk storage backing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs
[ https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-17192: --- > Add InterfaceAudience and InterfaceStability annotations for Stats Collection > APIs > -- > > Key: HIVE-17192 > URL: https://issues.apache.org/jira/browse/HIVE-17192 > Project: Hive > Issue Type: Sub-task > Components: Statistics >Reporter: Sahil Takiar >Assignee: Sahil Takiar > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs
[ https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-17191: --- > Add InterfaceAudience and InterfaceStability annotations for StorageHandler > APIs > > > Key: HIVE-17191 > URL: https://issues.apache.org/jira/browse/HIVE-17191 > Project: Hive > Issue Type: Sub-task > Components: StorageHandler >Reporter: Sahil Takiar >Assignee: Sahil Takiar > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104115#comment-16104115 ] Hive QA commented on HIVE-16965: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879247/HIVE-16965.7.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6159/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6159/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6159/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-27 23:55:16.127 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-6159/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-27 23:55:16.129 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive e15b2de..61d8b7c master -> origin/master + git reset --hard HEAD HEAD is now at e15b2de HIVE-17168 Create separate module for stand alone metastore (Alan Gates, reviewed by Vihang Karajgaonkar) + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruning.java Removing ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_mapjoin_only.q Removing ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_mapjoin_only.q.out + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 61d8b7c HIVE-17087: Remove unnecessary HoS DPP trees during map-join conversion (Sahil Takiar, reviewed by Liyun Zhang, Rui Li) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-27 23:55:21.989 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordSource.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValueInputMerger.java: No such file or directory error: a/ql/src/test/results/clientpositive/llap/llap_smb.q.out: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12879247 - PreCommit-HIVE-Build > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, > HIVE-16965.6.patch, HIVE-16965.7.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104108#comment-16104108 ] Hive QA commented on HIVE-16998: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879205/HIVE16998.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6158/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6158/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6158/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879205 - PreCommit-HIVE-Build > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17184) Unexpected new line in beeline output when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104106#comment-16104106 ] Vihang Karajgaonkar commented on HIVE-17184: Test failures are unrelated. [~pvary] Can you please review? > Unexpected new line in beeline output when running with -f option > - > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-17184.01.patch > > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17189 started by Vihang Karajgaonkar. -- > Fix backwards incompatibility in HiveMetaStoreClient > > > Key: HIVE-17189 > URL: https://issues.apache.org/jira/browse/HIVE-17189 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-17189.01.patch > > > HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and > {{alter partition}} commands. However, it changes the signature of @public > interface of MetastoreClient and removes some methods which breaks backwards > compatibility. This can be fixed easily by re-introducing the removed methods > and making them call into newly added method > {{alter_table_with_environment_context}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-17189: --- Status: Patch Available (was: In Progress) > Fix backwards incompatibility in HiveMetaStoreClient > > > Key: HIVE-17189 > URL: https://issues.apache.org/jira/browse/HIVE-17189 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-17189.01.patch > > > HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and > {{alter partition}} commands. However, it changes the signature of @public > interface of MetastoreClient and removes some methods which breaks backwards > compatibility. This can be fixed easily by re-introducing the removed methods > and making them call into newly added method > {{alter_table_with_environment_context}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-17189: --- Attachment: HIVE-17189.01.patch > Fix backwards incompatibility in HiveMetaStoreClient > > > Key: HIVE-17189 > URL: https://issues.apache.org/jira/browse/HIVE-17189 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-17189.01.patch > > > HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and > {{alter partition}} commands. However, it changes the signature of @public > interface of MetastoreClient and removes some methods which breaks backwards > compatibility. This can be fixed easily by re-introducing the removed methods > and making them call into newly added method > {{alter_table_with_environment_context}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17008) DbNotificationListener should skip failed events
[ https://issues.apache.org/jira/browse/HIVE-17008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Burkert updated HIVE-17008: --- Attachment: HIVE-17008.2.patch > DbNotificationListener should skip failed events > > > Key: HIVE-17008 > URL: https://issues.apache.org/jira/browse/HIVE-17008 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Dan Burkert >Assignee: Dan Burkert > Attachments: HIVE-17008.0.patch, HIVE-17008.1.patch, > HIVE-17008.2.patch > > > When dropping a non-existent database, the HMS will still fire registered > {{DROP_DATABASE}} event listeners. This results in an NPE when the listeners > attempt to deref the {{null}} database parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104042#comment-16104042 ] Sahil Takiar edited comment on HIVE-16998 at 7/27/17 11:11 PM: --- You'll probably need to rebase this too, since I just merged HIVE-17087. was (Author: stakiar): You'll probably need to rebase this too, since I just merged HIVE-16923. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()
[ https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104059#comment-16104059 ] Mithun Radhakrishnan commented on HIVE-17169: - An additional reason to avoid {{KeyProvider::getMetadata()}} is that the HDFS might be set up to disallow this call for all but HDFS super-users. The {{EncryptionZone}} instance already provides what we need. > Avoid extra call to KeyProvider::getMetadata() > -- > > Key: HIVE-17169 > URL: https://issues.apache.org/jira/browse/HIVE-17169 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17169.1.patch > > > Here's the code from {{Hadoop23Shims}}: > {code:title=Hadoop23Shims.java|borderStyle=solid} > @Override > public int comparePathKeyStrength(Path path1, Path path2) throws > IOException { > EncryptionZone zone1, zone2; > zone1 = hdfsAdmin.getEncryptionZoneForPath(path1); > zone2 = hdfsAdmin.getEncryptionZoneForPath(path2); > if (zone1 == null && zone2 == null) { > return 0; > } else if (zone1 == null) { > return -1; > } else if (zone2 == null) { > return 1; > } > return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName()); > } > private int compareKeyStrength(String keyname1, String keyname2) throws > IOException { > KeyProvider.Metadata meta1, meta2; > if (keyProvider == null) { > throw new IOException("HDFS security key provider is not configured > on your server."); > } > meta1 = keyProvider.getMetadata(keyname1); > meta2 = keyProvider.getMetadata(keyname2); > if (meta1.getBitLength() < meta2.getBitLength()) { > return -1; > } else if (meta1.getBitLength() == meta2.getBitLength()) { > return 0; > } else { > return 1; > } > } > } > {code} > It turns out that {{EncryptionZone}} already has the cipher's bit-length > stored in a member variable. One shouldn't need an additional name-node call > ({{KeyProvider::getMetadata()}}) only to fetch it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()
[ https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17169: Status: Patch Available (was: Open) Submitting patch for tests. > Avoid extra call to KeyProvider::getMetadata() > -- > > Key: HIVE-17169 > URL: https://issues.apache.org/jira/browse/HIVE-17169 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17169.1.patch > > > Here's the code from {{Hadoop23Shims}}: > {code:title=Hadoop23Shims.java|borderStyle=solid} > @Override > public int comparePathKeyStrength(Path path1, Path path2) throws > IOException { > EncryptionZone zone1, zone2; > zone1 = hdfsAdmin.getEncryptionZoneForPath(path1); > zone2 = hdfsAdmin.getEncryptionZoneForPath(path2); > if (zone1 == null && zone2 == null) { > return 0; > } else if (zone1 == null) { > return -1; > } else if (zone2 == null) { > return 1; > } > return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName()); > } > private int compareKeyStrength(String keyname1, String keyname2) throws > IOException { > KeyProvider.Metadata meta1, meta2; > if (keyProvider == null) { > throw new IOException("HDFS security key provider is not configured > on your server."); > } > meta1 = keyProvider.getMetadata(keyname1); > meta2 = keyProvider.getMetadata(keyname2); > if (meta1.getBitLength() < meta2.getBitLength()) { > return -1; > } else if (meta1.getBitLength() == meta2.getBitLength()) { > return 0; > } else { > return 1; > } > } > } > {code} > It turns out that {{EncryptionZone}} already has the cipher's bit-length > stored in a member variable. One shouldn't need an additional name-node call > ({{KeyProvider::getMetadata()}}) only to fetch it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()
[ https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17169: Attachment: (was: HIVE-17169.branch-2.2.patch) > Avoid extra call to KeyProvider::getMetadata() > -- > > Key: HIVE-17169 > URL: https://issues.apache.org/jira/browse/HIVE-17169 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17169.1.patch > > > Here's the code from {{Hadoop23Shims}}: > {code:title=Hadoop23Shims.java|borderStyle=solid} > @Override > public int comparePathKeyStrength(Path path1, Path path2) throws > IOException { > EncryptionZone zone1, zone2; > zone1 = hdfsAdmin.getEncryptionZoneForPath(path1); > zone2 = hdfsAdmin.getEncryptionZoneForPath(path2); > if (zone1 == null && zone2 == null) { > return 0; > } else if (zone1 == null) { > return -1; > } else if (zone2 == null) { > return 1; > } > return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName()); > } > private int compareKeyStrength(String keyname1, String keyname2) throws > IOException { > KeyProvider.Metadata meta1, meta2; > if (keyProvider == null) { > throw new IOException("HDFS security key provider is not configured > on your server."); > } > meta1 = keyProvider.getMetadata(keyname1); > meta2 = keyProvider.getMetadata(keyname2); > if (meta1.getBitLength() < meta2.getBitLength()) { > return -1; > } else if (meta1.getBitLength() == meta2.getBitLength()) { > return 0; > } else { > return 1; > } > } > } > {code} > It turns out that {{EncryptionZone}} already has the cipher's bit-length > stored in a member variable. One shouldn't need an additional name-node call > ({{KeyProvider::getMetadata()}}) only to fetch it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104048#comment-16104048 ] Janaki Lahorani commented on HIVE-16998: Thanks [~stakiar]. I will rebase. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104042#comment-16104042 ] Sahil Takiar commented on HIVE-16998: - You'll probably need to rebase this too, since I just merged HIVE-16923. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the reviews, [~lirui] and [~kellyzly]. Committed to master. > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 >
[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104030#comment-16104030 ] Mithun Radhakrishnan commented on HIVE-17188: - P.S. I've added clarification in the JIRA description. We've had a rash of JIRAs with anaemic descriptions recently. I hope this version is more clear. > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > Note: The problem being addressed here isn't so much with the size of the > hundreds of Partition objects, but the cruft that builds with the > PersistenceManager, in the JDO layer, as confirmed through memory-profiling. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17188: Description: For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem. Note: The problem being addressed here isn't so much with the size of the hundreds of Partition objects, but the cruft that builds with the PersistenceManager, in the JDO layer, as confirmed through memory-profiling. (Raising this on behalf of [~cdrome] and [~thiruvel].) was: For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem. (Raising this on behalf of [~cdrome] and [~thiruvel].) > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > Note: The problem being addressed here isn't so much with the size of the > hundreds of Partition objects, but the cruft that builds with the > PersistenceManager, in the JDO layer, as confirmed through memory-profiling. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17006: Attachment: HIVE-17006.01.patch > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.patch, > HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104017#comment-16104017 ] liyunzhang_intel commented on HIVE-16948: - [~lirui]: {quote} Is it possible the reduce works only contain one DPP sink? {quote} there are 3 conditions to remove dpp sink: 1. SparkRemoveDynamicPruningBySize 2. SparkCompiler#runCycleAnalysisForPartitionPruning 3. SparkMapJoinOptimizer(HIVE-17087) If i use 1 condition to remove dpp sink, can you give one example to show to remove 1 and remain another? > Invalid explain when running dynamic partition pruning query in Hive On Spark > - > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948_1.patch, HIVE-16948.patch > > > in > [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] > in spark_dynamic_partition_pruning.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator >
[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104014#comment-16104014 ] Mithun Radhakrishnan commented on HIVE-17188: - @[~vihangk1]: Thank you for your attention. :] bq. Can you please update the patch with HIVE specific JIRA number and description of this JIRA as per our convention? Sorry, it's been a while, so perhaps you could clarify for me. My memory of the convention is that patches are named {{HIVE-..patch}}. If the patch is a port to another branch, then it's {{HIVE-..patch}}. >From perusing the JIRAs included in [the Hive 2.2 >release|https://issues.apache.org/jira/projects/HIVE/versions/12335837], this >seems like the format of choice. Could you please clarify what I'm missing? bq. You can add a line in the description where this patch was cherry-picked from I you like.. This is a port from Yahoo's internal production branch. The commit dates back to April of 2014. :] bq. If there are hundreds of partitions being added, aren't they already in memory in the {{List}} parts object? A fair question. :] I can try answer this, although [~cdrome] and [~thiruvel] are really the experts on this one. The problem being addressed here isn't so much with the size of the hundreds of {{Partition}} objects, but the cruft that builds with the {{PersistenceManager}}, in the JDO layer, as confirmed through memory-profiling. Our larger commit also plugged leaks from neglecting to call {{Query::close()}}, etc. It looks like those have independently been solved already. > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17006: Attachment: (was: HIVE-17006.01.patch) > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17006: Attachment: HIVE-17006.01.patch Fixing the initialization order, other minor changes. I can observe the cache working on a small LLAP cluster, seemingly without errors. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations
[ https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104008#comment-16104008 ] Sahil Takiar commented on HIVE-17129: - [~spena] what are your thoughts on marking {{MetaStoreEventListener}}, {{ListenerEvent}}, and the classes under {{org.apache.hadoop.hive.metastore.events}} as Public APIs. Do we expect Hive users to use these APIs, or even other Apache projects? > Increase usage of InterfaceAudience and InterfaceStability annotations > --- > > Key: HIVE-17129 > URL: https://issues.apache.org/jira/browse/HIVE-17129 > Project: Hive > Issue Type: Improvement >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a > while ago to mark certain classes as available for public use. However, they > were only added to a few classes. The annotations are largely missing for > major APIs such as the SerDe and UDF APIs. We should update these interfaces > to use these annotations. > When done in conjunction with HIVE-17130, we should have an automated way to > prevent backwards incompatible changes to Hive APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned HIVE-17189: -- > Fix backwards incompatibility in HiveMetaStoreClient > > > Key: HIVE-17189 > URL: https://issues.apache.org/jira/browse/HIVE-17189 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > > HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and > {{alter partition}} commands. However, it changes the signature of @public > interface of MetastoreClient and removes some methods which breaks backwards > compatibility. This can be fixed easily by re-introducing the removed methods > and making them call into newly added method > {{alter_table_with_environment_context}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17184) Unexpected new line in beeline output when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103998#comment-16103998 ] Hive QA commented on HIVE-17184: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879204/HIVE-17184.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6157/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6157/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6157/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879204 - PreCommit-HIVE-Build > Unexpected new line in beeline output when running with -f option > - > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-17184.01.patch > > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103956#comment-16103956 ] Vihang Karajgaonkar commented on HIVE-17188: Hi Mithun, thanks for the providing the patch. Can you please update the patch with HIVE specific JIRA number and description of this JIRA as per our convention? You can add a line in the description where this patch was cherry-picked from I you like.. Also, wondering how the patch alleviates the problem? If there are hundreds of partitions being added, aren't they already in memory in the {{List parts}} object? If you have any stats to share it would be great. Eg. before --> running out of memory at X number of partitions ; after --> running out of memory at X+Y number of partitions. Thanks! > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.7.patch Fixed the assert to compare paths and not the objects. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, > HIVE-16965.6.patch, HIVE-16965.7.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103932#comment-16103932 ] Janaki Lahorani commented on HIVE-16759: Thanks [~spena]. I have uploaded HIVE16759.4.patch after rebasing. Job #6163 with the new patch is pending. > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch, HIVE16759.4.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException
[ https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103916#comment-16103916 ] Daniel Dai commented on HIVE-17115: --- [~erik.fang], I find if SerDe.initialize throw exception, the create table statement would also fail as it will go through the same MetaStoreUtils.getDeserializer code. Do you know how this table is created and why we don't see exception at time of creation? > MetaStoreUtils.getDeserializer doesn't catch the > java.lang.ClassNotFoundException > - > > Key: HIVE-17115 > URL: https://issues.apache.org/jira/browse/HIVE-17115 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1 >Reporter: Erik.fang >Assignee: Erik.fang > Attachments: HIVE-17115.1.patch, HIVE-17115.patch > > > Suppose we create a table with Custom SerDe, then call > HiveMetaStoreClient.getSchema(String db, String tableName) to extract the > metadata from HiveMetaStore Service > the thrift client hangs there with exception in HiveMetaStore Service's log, > such as > {code:java} > Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/util/Bytes > at > org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184) > at > org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73) > at > org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117) > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636) > at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.util.Bytes > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103910#comment-16103910 ] Sergio Peña commented on HIVE-16759: Someting failed with the patch. Could you rebase your patch? Btw, +1 on the patch. > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch, HIVE16759.4.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17188: Status: Patch Available (was: Open) Submitting, to run tests. > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17188: Attachment: HIVE-17188.1.patch Here's the patch ported for {{master/}}. I wonder if it's better to flush at an interval, instead of for *every* partition. > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
[ https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103878#comment-16103878 ] Hive QA commented on HIVE-17164: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879215/HIVE-17164.02.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_ptf_part_simple] (batchId=151) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6156/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6156/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6156/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879215 - PreCommit-HIVE-Build > Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default) > --- > > Key: HIVE-17164 > URL: https://issues.apache.org/jira/browse/HIVE-17164 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch > > > Add disk storage backing. Turn hive.vectorized.execution.ptf.enabled on by > default. > Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the > maximum number of vectorized row batch to buffer in memory before spilling to > disk. > Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez > Reducer make small batches for making a lot of key group batches that cause > memory buffering and disk storage backing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103871#comment-16103871 ] Sahil Takiar commented on HIVE-16998: - [~janulatha], some minor comments on the changes to {{HiveConf}}, other than that, LGTM. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-17188: --- > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.6.patch Added a better assert as suggested by Gopal. [~sershe][~gopalv] Can you please review. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, HIVE-16965.6.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103843#comment-16103843 ] Peter Cseh commented on HIVE-15767: --- The problem is that we're not setting the _proper_ mapreduce.job.credentials.binary, but [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L235], were passing every property from the HiveConf conf to the configuration for Spark. If HiveCLI is called from the Oozie LauncherMapper, that HiveConf will contain the "mapreduce.job.credentials.binary" property for the LauncherMapper. e.g /yarn/nm/usercache/systest/appcache/application_1501079366372_0045/container_1501079366372_0045_01_01/container_tokens This property have to be there so HiveCLI can access the tokens properly. Passing this folder to the Spark driver is problematic as the driver often will be executed on an other machine in the cluster where it won't be able to read this file as it's not there. There are a couple ways to define the location of the container_tokens file and Yarn takes care of Spark getting the correct location on the node the driver will be executed on. > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch, > HIVE-15767.1.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Status: Open (was: Patch Available) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Status: Patch Available (was: Open) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Attachment: (was: HIVE-8472.branch-2.2.patch) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16759: --- Attachment: HIVE16759.4.patch Resolved merge conflicts. > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch, HIVE16759.4.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103742#comment-16103742 ] Hive QA commented on HIVE-16998: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879205/HIVE16998.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_25] (batchId=84) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge1] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge3] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_diff_fs] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[quotedid_smb] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_combine_equivalent_work] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_column_buckets] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6155/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6155/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6155/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879205 - PreCommit-HIVE-Build > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17187) WebHCat SPNEGO support is incompleted
[ https://issues.apache.org/jira/browse/HIVE-17187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103740#comment-16103740 ] Eric Yang commented on HIVE-17187: -- See [the blog|https://developer.ibm.com/hadoop/2016/05/12/hbase-rest-gateway-security/] written by IBM about SPNEGO for HBase REST API. This is a good source to implement SPNEGO properly with doAs calls with service principal instead of proxy user with SPNEGO credential. > WebHCat SPNEGO support is incompleted > - > > Key: HIVE-17187 > URL: https://issues.apache.org/jira/browse/HIVE-17187 > Project: Hive > Issue Type: Bug > Components: WebHCat >Affects Versions: 1.2.1 >Reporter: Eric Yang > > [Some online > document|https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/spnego_setup_for_webhcat.html] > describes how to setup WebHCat with SPNEGO support. However, there could be > multiple services use SPNEGO on the same host. For example, HBase REST API > can also setup to use HTTP principal for SPNEGO support. When HTTP principal > is shared among other services, Hadoop proxy user settings can not identify > the origin of doAs call with HTTP principal, is invoked by HBase REST API or > WebHCat. Ideally, WebHCat should keep track of its own service principal > independent of SPNEGO principal to ensure that SPNEGO principal is only given > authentication access. SPNEGO principal should not be used in proxy user > setting to grant authorization access. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103723#comment-16103723 ] Sergey Shelukhin commented on HIVE-17006: - load_dyn_part5 may be related (need to dbl check), the rest are unrelated. [~prasanth_j] do you want to review? A lot of the code for metadata cache is the same as in HIVE-15665, so only parts of the patch need separate review > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage
[ https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-15665: Attachment: HIVE-15665.08.patch The same patch; looks like QA didn't trigger. > LLAP: OrcFileMetadata objects in cache can impact heap usage > > > Key: HIVE-15665 > URL: https://issues.apache.org/jira/browse/HIVE-15665 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Sergey Shelukhin > Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, > HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, > HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, > HIVE-15665.patch > > > OrcFileMetadata internally has filestats, stripestats etc which are allocated > in heap. On large data sets, this could have an impact on the heap usage and > the memory usage by different executors in LLAP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17167) Create metastore specific configuration tool
[ https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17167: -- Attachment: HIVE-17167.patch A patch with a MetastoreConf class. This class is not itself instantiated. It contains an enum that defines the conf values and a set of static methods that operation on Hadoop Configuration objects to read and write the values. It honors existing Hive configuration values (e.g. "hive.metastore.rawstore.impl") while allowing metastore specific values (e.g. "metastore.rawstore.impl"). Using Hadoop's Configuration class assures that a HiveConf object can be read from and written to using MetastoreConf methods. It also allows operations on plain Configuration objects, which are passed through many of Hive's interfaces. > Create metastore specific configuration tool > > > Key: HIVE-17167 > URL: https://issues.apache.org/jira/browse/HIVE-17167 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17167.patch > > > As part of making the metastore a separately releasable module we need > configuration tools that are specific to that module. It cannot use or > extend HiveConf as that is in hive common. But it must take a HiveConf > object and be able to operate on it. > The best way to achieve this is using Hadoop's Configuration object (which > HiveConf extends) together with enums and static methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17167) Create metastore specific configuration tool
[ https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17167: -- Status: Patch Available (was: Open) > Create metastore specific configuration tool > > > Key: HIVE-17167 > URL: https://issues.apache.org/jira/browse/HIVE-17167 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17167.patch > > > As part of making the metastore a separately releasable module we need > configuration tools that are specific to that module. It cannot use or > extend HiveConf as that is in hive common. But it must take a HiveConf > object and be able to operate on it. > The best way to achieve this is using Hadoop's Configuration object (which > HiveConf extends) together with enums and static methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool
[ https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103661#comment-16103661 ] Alan Gates commented on HIVE-17167: --- [~vihangk1], I'm not sure what in SchemaTool you are suggesting we use. It looked fairly different from what I was thinking, but I might be missing something. > Create metastore specific configuration tool > > > Key: HIVE-17167 > URL: https://issues.apache.org/jira/browse/HIVE-17167 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As part of making the metastore a separately releasable module we need > configuration tools that are specific to that module. It cannot use or > extend HiveConf as that is in hive common. But it must take a HiveConf > object and be able to operate on it. > The best way to achieve this is using Hadoop's Configuration object (which > HiveConf extends) together with enums and static methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool
[ https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103657#comment-16103657 ] ASF GitHub Bot commented on HIVE-17167: --- GitHub user alanfgates opened a pull request: https://github.com/apache/hive/pull/211 HIVE-17167 Create metastore specific configuration tool You can merge this pull request into a Git repository by running: $ git pull https://github.com/alanfgates/hive hive17167 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/211.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #211 > Create metastore specific configuration tool > > > Key: HIVE-17167 > URL: https://issues.apache.org/jira/browse/HIVE-17167 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As part of making the metastore a separately releasable module we need > configuration tools that are specific to that module. It cannot use or > extend HiveConf as that is in hive common. But it must take a HiveConf > object and be able to operate on it. > The best way to achieve this is using Hadoop's Configuration object (which > HiveConf extends) together with enums and static methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
[ https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17164: Attachment: HIVE-17164.02.patch > Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default) > --- > > Key: HIVE-17164 > URL: https://issues.apache.org/jira/browse/HIVE-17164 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch > > > Add disk storage backing. Turn hive.vectorized.execution.ptf.enabled on by > default. > Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the > maximum number of vectorized row batch to buffer in memory before spilling to > disk. > Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez > Reducer make small batches for making a lot of key group batches that cause > memory buffering and disk storage backing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-15863. Resolution: Duplicate > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17168) Create separate module for stand alone metastore
[ https://issues.apache.org/jira/browse/HIVE-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17168: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Patch committed. Thanks Vihang for the review. > Create separate module for stand alone metastore > > > Key: HIVE-17168 > URL: https://issues.apache.org/jira/browse/HIVE-17168 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 3.0.0 > > Attachments: HIVE-17168.patch > > > We need to create a separate maven module for the stand alone metastore. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103618#comment-16103618 ] Hive QA commented on HIVE-16759: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879190/HIVE16759.3.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6154/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6154/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6154/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-27 18:01:22.772 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-6154/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-27 18:01:22.775 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 0f7c33d HIVE-17088: HS2 WebUI throws a NullPointerException when opened (Sergio Pena, reviewed by Aihua Xu) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 0f7c33d HIVE-17088: HS2 WebUI throws a NullPointerException when opened (Sergio Pena, reviewed by Aihua Xu) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-27 18:01:25.483 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java:41 error: itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12879190 - PreCommit-HIVE-Build > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition
[ https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103612#comment-16103612 ] Hive QA commented on HIVE-17148: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879177/HIVE-17148.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[semijoin5] (batchId=15) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=157) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] (batchId=128) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6153/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6153/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6153/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879177 - PreCommit-HIVE-Build > Incorrect result for Hive join query with COALESCE in WHERE condition > - > > Key: HIVE-17148 > URL: https://issues.apache.org/jira/browse/HIVE-17148 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.1 >Reporter: Vlad Gudikov >Assignee: Vlad Gudikov > Attachments: HIVE-17148.patch > > > The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo > enabled: > STEPS TO REPRODUCE: > {code} > Step 1: Create a table ct1 > create table ct1 (a1 string,b1 string); > Step 2: Create a table ct2 > create table ct2 (a2 string); > Step 3 : Insert following data into table ct1 > insert into table ct1 (a1) values ('1'); > Step 4 : Insert following data into table ct2 > insert into table ct2 (a2) values ('1'); > Step 5 : Execute the following query > select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2; > {code} > ACTUAL RESULT: > {code} > The query returns nothing; > {code} > EXPECTED RESULT: > {code} > 1 NULL1 > {code} > The issue seems to be because of the incorrect query plan. In the plan we can > see: > predicate:(a1 is not null and b1 is not null) > which does not look correct. As a result, it is filtering out all the rows is > any column mentioned in the COALESCE has null value. > Please find the query plan below: > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Map 2 (BROADCAST_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_15] (rows=1 width=4) > > Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] > <-Map 2 [BROADCAST_EDGE] > BROADCAST [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=1) > Output:["_col0"] > Filter Operator [FIL_14] (rows=1 width=1) > predicate:a2 is not null > TableScan [TS_3] (rows=1 width=1) > default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"] > <-Select Operator [SEL_2] (rows=1 width=4) > Output:["_col0","_col1"] >
[jira] [Commented] (HIVE-17039) Implement optimization rewritings that rely on database SQL constraints
[ https://issues.apache.org/jira/browse/HIVE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103600#comment-16103600 ] Jesus Camacho Rodriguez commented on HIVE-17039: [~sershe], currently they are not, but we already have different options to enforce them vs rely on them for optimization purposes (other RDBMS can make this distinction too). > Implement optimization rewritings that rely on database SQL constraints > --- > > Key: HIVE-17039 > URL: https://issues.apache.org/jira/browse/HIVE-17039 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez > > Hive already has support to declare multiple SQL constraints (PRIMARY KEY, > FOREIGN KEY, UNIQUE, and NOT NULL). Although these constraints cannot be > currently enforced on the data, they can be made available to the optimizer > by using the 'RELY' keyword. > This ticket is an umbrella for all the rewriting optimizations based on SQL > constraints that we will be including in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.5.patch Somehow last patch did not trigger a run. Retrying. [~sershe][~gopalv] can you please review? > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Status: Patch Available (was: In Progress) > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Status: In Progress (was: Patch Available) > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-17184: --- Summary: Unexpected new line in beeline output when running with -f option (was: Unexpected new line in beeline when running with -f option) > Unexpected new line in beeline output when running with -f option > - > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-17184.01.patch > > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17184) Unexpected new line in beeline when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-17184: --- Status: Patch Available (was: Open) > Unexpected new line in beeline when running with -f option > -- > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-17184.01.patch > > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Attachment: HIVE16998.4.patch Generated using git show --full-index --no-prefix --no-renames > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17184) Unexpected new line in beeline when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-17184: --- Attachment: HIVE-17184.01.patch > Unexpected new line in beeline when running with -f option > -- > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-17184.01.patch > > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17186) `double` type constant operation loses precision
[ https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103559#comment-16103559 ] Gopal V commented on HIVE-17186: bq. Is there any way for Hive to fix this? No, this is {{0.1+0.2 != 0.3}} problem with IEEE 754 arithmetic. Decimal and 0.1BD + 0.2BD wouldn't cause these rounding errors. > `double` type constant operation loses precision > > > Key: HIVE-17186 > URL: https://issues.apache.org/jira/browse/HIVE-17186 > Project: Hive > Issue Type: Bug >Reporter: Dongjoon Hyun > > This might be an issue where Hive loses a precision and generates a wrong > result when handling *double* constant operations. This was reported in the > following environment. > *ENVIRONMENT* > https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql > *SQL* > {code} > hive> explain select l_discount from lineitem where l_discount between 0.06 - > 0.01 and 0.06 + 0.01 limit 10; > OK > Plan not optimized by CBO. > Stage-0 >Fetch Operator > limit:10 > Stage-1 > Map 1 vectorized > File Output Operator [FS_9] > compressed:false > Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE > Column stats: COMPLETE > table:{"input > format:":"org.apache.hadoop.mapred.TextInputFormat","output > format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} > Limit [LIM_8] >Number of rows:10 >Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE > Column stats: COMPLETE >Select Operator [OP_7] > outputColumnNames:["_col0"] > Statistics:Num rows: 294854 Data size: 2358832 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator [FIL_6] > predicate:l_discount BETWEEN 0.049996 AND > 0.06999 (type: boolean) > Statistics:Num rows: 294854 Data size: 2358832 > Basic stats: COMPLETE Column stats: COMPLETE > TableScan [TS_0] > alias:lineitem > Statistics:Num rows: 589709 Data size: > 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE > hive> select max(l_discount) from lineitem where l_discount between 0.06 - > 0.01 and 0.06 + 0.01 limit 10; > OK > 0.06 > Time taken: 314.923 seconds, Fetched: 1 row(s) > {code} > Hive excludes 0.07 differently from the users' intuitiion. Also, this > difference makes some users confused because they believe that Hive's result > is the correct one. Is there any way for Hive to fix this? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103555#comment-16103555 ] Jesus Camacho Rodriguez commented on HIVE-16614: [~ashutoshc], this should be ready to review, could you take a look at https://reviews.apache.org/r/61188/ ? Thanks > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16614.01.patch, HIVE-16614.02.patch, > HIVE-16614.03.patch, HIVE-16614.patch > > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17186) `double` type constant operation loses precision
[ https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103551#comment-16103551 ] Andrew Sherman commented on HIVE-17186: --- This looks like an artifact of floating point arithmetic: {noformat} double d1 = 0.06D; double d2 = 0.01D; double d3 = d1 + d2; double d4 = d1 - d2; System.out.println("d3 = " + d3); System.out.println("d4 = " + d4); {noformat} gives {noformat} d3 = 0.06999 d4 = 0.049996 {noformat} > `double` type constant operation loses precision > > > Key: HIVE-17186 > URL: https://issues.apache.org/jira/browse/HIVE-17186 > Project: Hive > Issue Type: Bug >Reporter: Dongjoon Hyun > > This might be an issue where Hive loses a precision and generates a wrong > result when handling *double* constant operations. This was reported in the > following environment. > *ENVIRONMENT* > https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql > *SQL* > {code} > hive> explain select l_discount from lineitem where l_discount between 0.06 - > 0.01 and 0.06 + 0.01 limit 10; > OK > Plan not optimized by CBO. > Stage-0 >Fetch Operator > limit:10 > Stage-1 > Map 1 vectorized > File Output Operator [FS_9] > compressed:false > Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE > Column stats: COMPLETE > table:{"input > format:":"org.apache.hadoop.mapred.TextInputFormat","output > format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} > Limit [LIM_8] >Number of rows:10 >Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE > Column stats: COMPLETE >Select Operator [OP_7] > outputColumnNames:["_col0"] > Statistics:Num rows: 294854 Data size: 2358832 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator [FIL_6] > predicate:l_discount BETWEEN 0.049996 AND > 0.06999 (type: boolean) > Statistics:Num rows: 294854 Data size: 2358832 > Basic stats: COMPLETE Column stats: COMPLETE > TableScan [TS_0] > alias:lineitem > Statistics:Num rows: 589709 Data size: > 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE > hive> select max(l_discount) from lineitem where l_discount between 0.06 - > 0.01 and 0.06 + 0.01 limit 10; > OK > 0.06 > Time taken: 314.923 seconds, Fetched: 1 row(s) > {code} > Hive excludes 0.07 differently from the users' intuitiion. Also, this > difference makes some users confused because they believe that Hive's result > is the correct one. Is there any way for Hive to fix this? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
[ https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17164: Status: Patch Available (was: Open) > Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default) > --- > > Key: HIVE-17164 > URL: https://issues.apache.org/jira/browse/HIVE-17164 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17164.01.patch > > > Add disk storage backing. Turn hive.vectorized.execution.ptf.enabled on by > default. > Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the > maximum number of vectorized row batch to buffer in memory before spilling to > disk. > Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez > Reducer make small batches for making a lot of key group batches that cause > memory buffering and disk storage backing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
[ https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17164: Attachment: HIVE-17164.01.patch > Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default) > --- > > Key: HIVE-17164 > URL: https://issues.apache.org/jira/browse/HIVE-17164 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17164.01.patch > > > Add disk storage backing. Turn hive.vectorized.execution.ptf.enabled on by > default. > Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the > maximum number of vectorized row batch to buffer in memory before spilling to > disk. > Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez > Reducer make small batches for making a lot of key group batches that cause > memory buffering and disk storage backing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15217) Add watch mode to llap status tool
[ https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103544#comment-16103544 ] Prasanth Jayachandran commented on HIVE-15217: -- [~leftylev] Thanks for the reminder again! Updated "LLAP Status" section in the wiki with all command options for llap status tool. > Add watch mode to llap status tool > -- > > Key: HIVE-15217 > URL: https://issues.apache.org/jira/browse/HIVE-15217 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15217.1.patch, HIVE-15217.2.patch, > HIVE-15217.3.patch > > > There is few seconds overhead for launching the llap status command. To avoid > we can add "watch" mode to llap status tool that refreshes the status after > configured interval. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103523#comment-16103523 ] Janaki Lahorani commented on HIVE-16998: Addressed comments from [~stakiar]. DPP for all joins: hive.spark.dynamic.partition.pruning is true DPP for map joins: hive.spark.dynamic.partition.pruning is false and hive.spark.dynamic.partition.pruning.map.join.only is true Fixed bug: remove unnecessary pruning sink Fixed comments. Will upload to RB after test results. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17185) TestHiveMetaStoreStatsMerge.testStatsMerge is failing
[ https://issues.apache.org/jira/browse/HIVE-17185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103519#comment-16103519 ] Ashutosh Chauhan commented on HIVE-17185: - [~pxiong] Can you please take a look? > TestHiveMetaStoreStatsMerge.testStatsMerge is failing > - > > Key: HIVE-17185 > URL: https://issues.apache.org/jira/browse/HIVE-17185 > Project: Hive > Issue Type: Test > Components: Metastore, Test >Affects Versions: 3.0.0 >Reporter: Ashutosh Chauhan > > Likely because of HIVE-16997 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17184) Unexpected new line in beeline when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned HIVE-17184: -- > Unexpected new line in beeline when running with -f option > -- > > Key: HIVE-17184 > URL: https://issues.apache.org/jira/browse/HIVE-17184 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > When running in -f mode on BeeLine I see an extra new line getting added at > the end of the results. > {noformat} > vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null > +--+---+ > | test.id | test.val | > +--+---+ > | 1| one | > | 2| two | > | 1| three | > +--+---+ > vihang-MBP:bin vihang$ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs
[ https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103513#comment-16103513 ] Aihua Xu commented on HIVE-17179: - +1. > Add InterfaceAudience and InterfaceStability annotations for Hook APIs > -- > > Key: HIVE-17179 > URL: https://issues.apache.org/jira/browse/HIVE-17179 > Project: Hive > Issue Type: Sub-task > Components: Hooks >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17179.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Attachment: HIVE16998.3.patch > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11681) sometimes when query mr job progress, stream closed exception will happen
[ https://issues.apache.org/jira/browse/HIVE-11681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103506#comment-16103506 ] frank luo commented on HIVE-11681: -- https://issues.apache.org/jira/browse/HADOOP-13809 is a similar case. I believe they are all related to https://bugs.openjdk.java.net/browse/JDK-6947916, which hasn't been released. I am able to recreate it with oracle jdk 1.8.0_131. > sometimes when query mr job progress, stream closed exception will happen > - > > Key: HIVE-11681 > URL: https://issues.apache.org/jira/browse/HIVE-11681 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1 >Reporter: wangwenli > > sometimes the hiveserver will throw below exception , > 2015-08-28 05:05:44,107 | FATAL | Thread-82995 | error parsing conf > mapred-default.xml | > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2404) > java.io.IOException: Stream closed > at > java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:84) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:160) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2902) > at > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:302) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1426) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2807) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) > at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2246) > at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2234) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2305) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2258) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2175) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:854) > at > org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2069) > at org.apache.hadoop.mapred.JobConf.(JobConf.java:477) > at org.apache.hadoop.mapred.JobConf.(JobConf.java:467) > at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:187) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1612) > at > org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596) > at > org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:289) > at > org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72) > after analysis, we found the root cause, below is step to reproduce the issue > 1. open one beeline
[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103489#comment-16103489 ] Janaki Lahorani commented on HIVE-16759: Thanks [~vihangk1]. I attached the patch again. > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, > HIVE16759.3.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)