Build failed in Jenkins: HIVE-TRUNK-JAVA8 #76
See http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-JAVA8/76/ -- Started by timer Building in workspace http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-JAVA8/ws/ git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.url https://git-wip-us.apache.org/repos/asf/hive.git # timeout=10 Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/hive.git git --version # timeout=10 git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/hive.git +refs/heads/*:refs/remotes/origin/* ERROR: Error fetching remote repo 'origin' ERROR: Error fetching remote repo 'origin' Archiving artifacts Recording test results
[jira] [Created] (HIVE-10881) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10881: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10881 URL: https://issues.apache.org/jira/browse/HIVE-10881 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 1.3.0 Reporter: Yongzhi Chen Priority: Critical When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated June 1, 2015, 4:19 p.m.) Review request for hive. Changes --- Rebased the previous patch on HIVE-10788. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs (updated) - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
Re: Review Request 34522: HIVE-10748 Replace StringBuffer with StringBuilder where possible
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34522/#review85989 --- Ship it! Looks good. The StringBuilder is not being used in places where multi-thread could access shared data. - Sergio Pena On May 30, 2015, 2:43 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34522/ --- (Updated May 30, 2015, 2:43 a.m.) Review request for hive and Chao Sun. Bugs: HIVE-10748 https://issues.apache.org/jira/browse/HIVE-10748 Repository: hive-git Description --- HIVE-10748 Replace StringBuffer with StringBuilder where possible Diffs - common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/TezJsonParser.java 6d6bbc2ee2bca67645356089046a39a3b6969df0 common/src/test/org/apache/hadoop/hive/common/type/TestHiveBaseChar.java 012c28b1a0024b7292a97076f42de1097dae6b2a common/src/test/org/apache/hadoop/hive/common/type/TestHiveVarchar.java 309d0427da3f17a85d16da0e0dca46ad29a1c48e hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatException.java 265d08dec6d3e260adfadfe7f629592ebeb5039d hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java 2947c4333b925e0beabd8a85b188419a4d71a2e3 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java eae91cbd79ebb47e59263e8e47b8acdb457d576d hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java 3c2548635b95509da8cbdf474149c01da0662bbb hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java 329e5da5c4675ad3d5f57fbdbddfc5ea168a6dbe jdbc/src/java/org/apache/hive/jdbc/HivePreparedStatement.java 8a0671fc28c4e8326df068f7de5cf278c863e362 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 52147bcbd0bd214b62e52d4ed2a6775e04a94143 ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 835015f249684820a9f0eb453d3316a98af52e00 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7b48b8b87a0c54f482c32e460930978b691bcdb5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java a9d2dbf1f7ddccaf71ce06a14e9681ab559186bb ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezJobMonitor.java 4423cd1a9960c68b74788f41e386bea105cee4eb ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 4a16b4c196c7080b1ec64d8ffdc25f359698b4d6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java c5f03d94672a80849400e51a238bcec1db56659d ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java acd9bf5017ca23578616a5bd9b902d2c2abed1ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbce4ef1c985b8f2987df413aed0ab087051 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 0de74882f3b92aa979c1960ac64023d3c750b876 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java e477f04d83715f5488e72bddd8527728faeb6789 ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java 7108a47676a6a8e2765f098c1799d08e587db58e ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 086d9a2b1740a8dc8560667c19826b7dff6cb75b ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 9e197331bffb8db4b02aa5d5d842d68d55f7001a ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 8dff2fcee46a4d366bef559576348e9ea8ef6336 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 87a25480740df061e0918228d71dd9ec8e08a275 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9b1f704c682c82d85675c5de17f3965403 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java b8e18eafb67307c9b974194de28482fa8a7c6f2a ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 847d75199d6d614bd17ea852a4e3e87bf6911be7 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java f26225a72c34252c8fdf615bd34b59532376c5de serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java f3f7d95ef90f3e4f1beacecb4d681030bd69a231 serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 19fe952f5e84755d1e7a8b752997c084dab339b9 service/src/java/org/apache/hive/service/auth/HttpAuthUtils.java 3ef55779a6bde85193ca63ec9683cf9f67d6a39d Diff: https://reviews.apache.org/r/34522/diff/ Testing --- Thanks, Alexander Pivovarov
Is the HCatRecord iterator ephemeral ? (A usage scenario on Timestamp)
Hi, I wanted to check a behavior reproducible with timestamp. It can be summarized as “When reading from stored HCatRecord iterator, the column value of data type *timestamp *of a previous row gets reset to 1970-01-01 00:00:00.0 (or locale adjusted epoch time 0) when the column value in the current row has *null*. Columns of other data types in previous row do not get affected by presence of *null* in its current column value.” Pls see the mail for details and steps to reproduce. Regards, Ujjwal -- Forwarded message -- From: Ujjwal ujjwal.wadha...@gmail.com Date: Fri, May 29, 2015 at 2:20 PM Subject: Re: only timestamp column value of previous row gets reset To: u...@hive.apache.org Hi all, The issue can be reproduced in a simple java program (code attached for reference/use) where I do not use the iterator right away after reading, but store it in a vector for later use. As per my understanding, the iterator should not change once given to the consumer. However the timestamp datatype object gets reset under one condition explained earlier.. I have attached the code for reference. Create a table - create table if not exists sample (dtcol date, tscol timestamp, stcol string) row format delimited fields terminated by ',' stored as textfile; truncate table sample; Input data (input) 9779-11-21,2014-04-01 11:30:55,abc 9779-11-21,2014-04-04 11:30:55,def ,null, Load the data --- hadoop fs -put input /apps/hive/warehouse/sample Check - hive select * from sample; OK 9779-11-21 2014-04-01 11:30:55 abc 9779-11-21 2014-04-04 11:30:55 def NULLNULL Time taken: 0.029 seconds, Fetched: 3 row(s) hive Execute export CLASSPATH=`hadoop classpath`:`hcat -classpath` java -classpath SampleHCatReader.jar:$CLASSPATH org.my.internal.SampleHCatReader Output having timestamp reset ! HCat record right after reading is 9779-11-21 2014-04-01 11:30:55.0 abc HCat record right after reading is 9779-11-21 2014-04-04 11:30:55.0 def HCat record right after reading is nullnull HCat record later is 9779-11-21 2014-04-01 11:30:55.0 abc HCat record later is 9779-11-21 1970-01-01 00:00:00.0 def HCat record later is null null As we see above, the output for time-stamp gets reset. Regards, Ujjwal W
[jira] [Created] (HIVE-10879) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10879: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10879 URL: https://issues.apache.org/jira/browse/HIVE-10879 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Yongzhi Chen Priority: Blocker When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10880) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10880: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10880 URL: https://issues.apache.org/jira/browse/HIVE-10880 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Yongzhi Chen Priority: Blocker When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10878) Add tests to cover avg() function for 'x preceding and y preceding' windowing spec.
Aihua Xu created HIVE-10878: --- Summary: Add tests to cover avg() function for 'x preceding and y preceding' windowing spec. Key: HIVE-10878 URL: https://issues.apache.org/jira/browse/HIVE-10878 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Affects Versions: 1.3.0 Reporter: Aihua Xu Assignee: Aihua Xu Priority: Trivial avg() function to support 'x preceding and y preceding' windowing spec has been fixed along with the one for sum(). Add tests for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10883) Hive should provide daemon scripts to start metastore and hiveserver2
Arpit Gupta created HIVE-10883: -- Summary: Hive should provide daemon scripts to start metastore and hiveserver2 Key: HIVE-10883 URL: https://issues.apache.org/jira/browse/HIVE-10883 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.2.0 Reporter: Arpit Gupta Priority: Critical Hive currently provide's no daemon scripts to launch its services. User has to start a service put it background and manage pid's himself. This is pretty bad user experience. We should have daemon scripts for hive so it is easier to manage this. I am sure ambari project will appreciate this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34897: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34897/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java 4f19caf Diff: https://reviews.apache.org/r/34897/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-10882) CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception
Pengcheng Xiong created HIVE-10882: -- Summary: CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception Key: HIVE-10882 URL: https://issues.apache.org/jira/browse/HIVE-10882 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong CBO return path creates join operator with empty filters. However, vectorization is checking the filters of bigTable in join. This causes NPE exception. To reproduce, run vector_outer_join2.q with return path turned on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34898: Create ExplainTask in ATS hook through ExplainWork
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34898/ --- Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- Right now ExplainTask is created directly. That's fragile and can lead to stuff like: HIVE-10829 Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/ATSHook.java 53d169d Diff: https://reviews.apache.org/r/34898/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 34522: HIVE-10748 Replace StringBuffer with StringBuilder where possible
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34522/ --- (Updated June 1, 2015, 7:20 p.m.) Review request for hive, Ashutosh Chauhan and Sergio Pena. Changes --- rebased to the latest Bugs: HIVE-10748 https://issues.apache.org/jira/browse/HIVE-10748 Repository: hive-git Description --- HIVE-10748 Replace StringBuffer with StringBuilder where possible Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/TezJsonParser.java 6d6bbc2ee2bca67645356089046a39a3b6969df0 common/src/test/org/apache/hadoop/hive/common/type/TestHiveBaseChar.java 012c28b1a0024b7292a97076f42de1097dae6b2a common/src/test/org/apache/hadoop/hive/common/type/TestHiveVarchar.java 309d0427da3f17a85d16da0e0dca46ad29a1c48e hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatException.java 265d08dec6d3e260adfadfe7f629592ebeb5039d hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java 2947c4333b925e0beabd8a85b188419a4d71a2e3 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java eae91cbd79ebb47e59263e8e47b8acdb457d576d hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java 3c2548635b95509da8cbdf474149c01da0662bbb hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java 329e5da5c4675ad3d5f57fbdbddfc5ea168a6dbe jdbc/src/java/org/apache/hive/jdbc/HivePreparedStatement.java 8a0671fc28c4e8326df068f7de5cf278c863e362 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 52147bcbd0bd214b62e52d4ed2a6775e04a94143 ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java ada79bd0a235eff06aa48c5550ff622f8e2f774d ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5d588390bfa00a956f4094310819204371f81122 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java a9d2dbf1f7ddccaf71ce06a14e9681ab559186bb ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezJobMonitor.java 4423cd1a9960c68b74788f41e386bea105cee4eb ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 4a16b4c196c7080b1ec64d8ffdc25f359698b4d6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java c5f03d94672a80849400e51a238bcec1db56659d ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java acd9bf5017ca23578616a5bd9b902d2c2abed1ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbce4ef1c985b8f2987df413aed0ab087051 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 43882e7cd9dfd0380035faff78120ce977e21226 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java e477f04d83715f5488e72bddd8527728faeb6789 ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java 7108a47676a6a8e2765f098c1799d08e587db58e ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d609732bf91bbeed68fa604f66893bf7734c7c56 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 9e197331bffb8db4b02aa5d5d842d68d55f7001a ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 8dff2fcee46a4d366bef559576348e9ea8ef6336 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java cb0b68075ca4101df0b5ad2699afc45f1d038d4a ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9b1f704c682c82d85675c5de17f3965403 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java b8e18eafb67307c9b974194de28482fa8a7c6f2a ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 847d75199d6d614bd17ea852a4e3e87bf6911be7 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java f26225a72c34252c8fdf615bd34b59532376c5de serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java f3f7d95ef90f3e4f1beacecb4d681030bd69a231 serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 19fe952f5e84755d1e7a8b752997c084dab339b9 service/src/java/org/apache/hive/service/auth/HttpAuthUtils.java 3ef55779a6bde85193ca63ec9683cf9f67d6a39d Diff: https://reviews.apache.org/r/34522/diff/ Testing --- Thanks, Alexander Pivovarov
hive.ppd.remove.duplicatefilters description is incorrect. What is the correct one?
I noticed that conf/hive-default.xml.template has the following description property namehive.ppd.remove.duplicatefilters/name valuetrue/value descriptionWhether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false./description /property Most probably the description was taken from hive.optimize.ppd.storage So, what is the correct description for hive.ppd.remove.duplicatefilters?
[jira] [Created] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default
Sergey Shelukhin created HIVE-10884: --- Summary: Enable some beeline tests and turn on HIVE-4239 by default Key: HIVE-10884 URL: https://issues.apache.org/jira/browse/HIVE-10884 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin See comments in HIVE-4239. Beeline tests with parallelism need to be enabled to turn compilation parallelism on by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 1, 2015, 8:25 p.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d22f11 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-10885) with vectorization enabled join operation involving interval_day_time fails
Jagruti Varia created HIVE-10885: Summary: with vectorization enabled join operation involving interval_day_time fails Key: HIVE-10885 URL: https://issues.apache.org/jira/browse/HIVE-10885 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Jagruti Varia Assignee: Matt McCline When vectorization is on, join operation involving interval_day_time type throws following error: {noformat} Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1432858236614_0247_1_01, diagnostics=[Task failed, taskId=task_1432858236614_0247_1_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more Caused by: java.lang.RuntimeException: Cannot allocate vector copy row for interval_day_time at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:213) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:581) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214) ... 15 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229) at
Re: hive.ppd.remove.duplicatefilters description is incorrect. What is the correct one?
Good catch, Alexander! hive.ppd.remove.duplicatefilters was added in 0.8.0 by HIVE-1538 https://issues.apache.org/jira/browse/HIVE-1538 (FilterOperator is applied twice with ppd on) without any description. It isn't documented in the wiki yet. -- Lefty On Mon, Jun 1, 2015 at 12:36 PM, Alexander Pivovarov apivova...@gmail.com wrote: I noticed that conf/hive-default.xml.template has the following description property namehive.ppd.remove.duplicatefilters/name valuetrue/value descriptionWhether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false./description /property Most probably the description was taken from hive.optimize.ppd.storage So, what is the correct description for hive.ppd.remove.duplicatefilters?
Creating branch-1
Based on our discussion and vote last week I'm working on creating branch-1. I plan to make the branch tomorrow. If anyone has a large commit they don't want to have to commit twice and they are close to committing it let me know so I can make sure it gets in before I branch. I'll also be updating https://cwiki.apache.org/confluence/display/Hive/HowToContribute to clarify how to handle feature and bug fix patches on master and branch-1. Also, we will need to make sure patches can be tested against master and branch-1. If I understand correctly the test system today will run a patch against a branch instead of master if the patch is named with the branch name. There are a couple of issues with this. One, people will often want to submit two versions of patches and have them both tested (one against master and one against branch-1) rather than one or the other. The second is we will want a way for one patch to be tested against both when appropriate. The first case could be handled by the system picking up both branch-1 and master patches and running them automatically. The second could be handled by hints in the comments so the system needs to run both. I'm open to other suggestions as well. Can someone familiar with the testing code point to where I'd look to see what it would take to make this work? Alan.
[jira] [Created] (HIVE-10886) LLAP: Fixes to TaskReporter after recent Tez changes
Siddharth Seth created HIVE-10886: - Summary: LLAP: Fixes to TaskReporter after recent Tez changes Key: HIVE-10886 URL: https://issues.apache.org/jira/browse/HIVE-10886 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10887) TestCliDriver tests ordering issues with Mac and CentOS
Hari Sankar Sivarama Subramaniyan created HIVE-10887: Summary: TestCliDriver tests ordering issues with Mac and CentOS Key: HIVE-10887 URL: https://issues.apache.org/jira/browse/HIVE-10887 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan unionDistinct_2 and update_after_multiple_inserts tests give different results in different environments due to ordering issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable
Goden Yao created HIVE-10888: Summary: Hive Dynamic Partition + Default Partition makes Null Values Not querable Key: HIVE-10888 URL: https://issues.apache.org/jira/browse/HIVE-10888 Project: Hive Issue Type: Bug Components: Hive, Query Processor Reporter: Goden Yao This is reported by Pivotal.io (Noa Horn) And HAWQ latest version should have this fixed in our queries. === Expected Behavior === When dynamic partition enabled and mode = nonstrict, the null value in the default partition should still be returned when user specify that in ...WHERE is Null. === Problem statment === *Enable dynamic partitions* {code} hive.exec.dynamic.partition = true hive.exec.dynamic.partition.mode = nonstrict #Get default partition name: hive.exec.default.partition.name Default Value: _HIVE_DEFAULT_PARTITION_ {code} Hive creates a default partition if the partition key value doesn’t conform to the field type. For example, if the partition key is NULL. *Hive Example* Add the following parameters to hive-site.xml {code} property namehive.exec.dynamic.partition/name valuetrue/value /property property namehive.exec.dynamic.partition.mode/name valuetrue/value /property {code} Create data: vi /tmp/base_data.txt 1,1.0,1900-01-01 2,2.2,1994-04-14 3,3.3,2011-03-31 4,4.5,bla 5,5.0,2013-12-06 Create hive table and load the data to it. This table is used to load data to the partition table. {code} hive CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base; SELECT * FROM base; OK 11.01900-01-01 22.21994-04-14 33.32011-03-31 44.5NULL 55.02013-12-06 {code} Note that one of the rows has NULL in its date field. Create hive partition table and load data from base table to it. The data will be dynamically partitioned {code} CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date); INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base; SELECT * FROM sales; OK 11.01900-01-01 22.21994-04-14 33.32011-03-31 55.02013-12-06 44.5NULL {code} Check that the table has different partitions {code} hdfs dfs -ls /hive/warehouse/sales Found 5 items drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01 drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14 drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31 drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06 drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__ {code} Hive queries with default partition Queries without a filter or with a filter on a different field returns the default partition data: {code} hive select * from sales; OK 11.01900-01-01 22.21994-04-14 33.32011-03-31 55.02013-12-06 44.5NULL Time taken: 0.578 seconds, Fetched: 5 row(s) {code} Queries with a filter on the partition field omit the default partition data: {code} hive select * from sales where date '2013-12-06'; OK 11.01900-01-01 22.21994-04-14 33.32011-03-31 Time taken: 0.19 seconds, Fetched: 3 row(s) hive select * from sales where date is null; OK Time taken: 0.035 seconds hive select * from sales where date is not null; OK 11.01900-01-01 22.21994-04-14 33.32011-03-31 55.02013-12-06 Time taken: 0.042 seconds, Fetched: 4 row(s) hive select * from sales where date='__HIVE_DEFAULT_PARTITION__'; OK Time taken: 0.056 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/#review86129 --- service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java https://reviews.apache.org/r/34776/#comment138049 would it be possible to use a synchronized set instead ? That would be less error prone. SetString mySet = Collections.newSetFromMap(new ConcurrentHashMapString, Boolean()); I also see that we should be synchronizing on open/close in SessionManager and creation of new operations. But I think that is something for another jira since the primary goal of this one is not to fix issues when running multiple operations in a session. - Thejas Nair On June 1, 2015, 8:25 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 1, 2015, 8:25 p.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d22f11 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/#review86130 --- service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java https://reviews.apache.org/r/34776/#comment138050 this class is never used outside of HS2 codepath. so this isHiveServer2 check is not needed. - Thejas Nair On June 1, 2015, 8:25 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 1, 2015, 8:25 p.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d22f11 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
On June 2, 2015, 12:23 a.m., Thejas Nair wrote: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java, line 103 https://reviews.apache.org/r/34776/diff/4/?file=975858#file975858line103 would it be possible to use a synchronized set instead ? That would be less error prone. SetString mySet = Collections.newSetFromMap(new ConcurrentHashMapString, Boolean()); I also see that we should be synchronizing on open/close in SessionManager and creation of new operations. But I think that is something for another jira since the primary goal of this one is not to fix issues when running multiple operations in a session. close call take out all the elements in the set atomically (and clears the set); normal set doesn't support such operation. I don't know how necessary it is... Given recent JDBC driver changes for multiple statements in one connection this may be a good change to make to prevent bugs, now that it's already done - Sergey --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/#review86129 --- On June 1, 2015, 8:25 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 1, 2015, 8:25 p.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d22f11 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 2, 2015, 12:53 a.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d733d71 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
On June 2, 2015, 12:23 a.m., Thejas Nair wrote: service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java, line 103 https://reviews.apache.org/r/34776/diff/4/?file=975858#file975858line103 would it be possible to use a synchronized set instead ? That would be less error prone. SetString mySet = Collections.newSetFromMap(new ConcurrentHashMapString, Boolean()); I also see that we should be synchronizing on open/close in SessionManager and creation of new operations. But I think that is something for another jira since the primary goal of this one is not to fix issues when running multiple operations in a session. Sergey Shelukhin wrote: close call take out all the elements in the set atomically (and clears the set); normal set doesn't support such operation. I don't know how necessary it is... Given recent JDBC driver changes for multiple statements in one connection this may be a good change to make to prevent bugs, now that it's already done I think the right way to do it is to synchronize on open/close vs creation of new operations in SessionManager. When the session is being closed, it should be within a lock and first thing it should do is to remove the Sessionhandle from the map so that no other operation can access it. It takes courage to delete old code, specially if they are doing some synchronization , I think it is better to do it the better way in a follow up jira. The JDBC driver changes does not suddenly enable use of multiple statements in one connection. That has always been there, that patch just adds some locks on it. So I think it is OK to do that as part of new jira. - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/#review86129 --- On June 2, 2015, 12:53 a.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 2, 2015, 12:53 a.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d733d71 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Re: hive.optimize.index.filter + ORC + TIMESTAMP throws NPE or IAE depending on hive version
Hi, That¹s expected behaviour since you are comparing a Timestamp to a string. Timestamp = String is being skipped because the SARGs need to be the same type to offer non-equality comparisons accurately. https://issues.apache.org/jira/browse/HIVE-10286 I logged the bug after I hit bugs with PPD for that case when using ORC APIs from outside Hive (i.e ³1² ³9² and ³11² ³9²). That was a mistake anyone could¹ve made while hand-creating SARGs, but I wanted to make it better for the next person who might miss it and bail out without PPD when the arguments don¹t match PredicateLeaf.Type. You can try the same with something where hive does the right thing with a Filter expression hive create temporary table xx(x int) stored as orc; hive insert into xx values (1),(9),(11); hive select * from xy where x 9¹; Cheers, Gopal On 6/1/15, 7:21 PM, Alexander Pivovarov apivova...@gmail.com wrote: if hive.optimize.index.filter is enabled then it causes the following the following stacktraces -- create table ts (ts timestamp); insert into table ts values('2015-01-01 00:00:00'); set hive.optimize.index.filter=true; select * from ts where ts = '2015-01-01 00:00:00'; -- -- HIVE-1.3.0 OK 15/06/01 19:07:08 [main]: INFO ql.Driver: OK 15/06/01 19:07:08 [main]: INFO log.PerfLogger: PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=releaseLocks start=1433210828865 end=1433210828865 duration=0 from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=Driver.run start=1433210828758 end=1433210828865 duration=107 from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl 15/06/01 19:07:08 [main]: INFO orc.OrcInputFormat: FooterCacheHitRatio: 0/0 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=OrcGetSplits start=1433210828870 end=1433210828876 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl 15/06/01 19:07:08 [main]: INFO orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (LESS_THAN ts 2015-01-01 00:00:00) expr = (not leaf-0) 15/06/01 19:07:08 [main]: INFO orc.OrcRawRecordMerger: min key = null, max key = null 15/06/01 19:07:08 [main]: INFO orc.ReaderImpl: Reading ORC rows from hdfs://localhost/apps/apivovarov/warehouse/ts/00_0 with {include: [true, true], offset: 0, length: 9223372036854775807, sarg: leaf-0 = (LESS_THAN ts 2015-01-01 00:00:00) expr = (not leaf-0), columns: ['null', 'ts']} 15/06/01 19:07:08 [main]: WARN orc.RecordReaderImpl: Exception when evaluating predicate. Skipping ORC PPD. Exception: java.lang.IllegalArgumentException: ORC SARGS could not convert from String to TIMESTAMP at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparis on(RecordReaderImpl.java:659) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(R ecordReaderImpl.java:373) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(R ecordReaderImpl.java:338) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroup s(RecordReaderImpl.java:711) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordRead erImpl.java:752) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderI mpl.java:778) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordRead erImpl.java:987) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordR eaderImpl.java:1020) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.init(RecordReaderImpl. java:205) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:53 9) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.init(OrcR awRecordMerger.java:183) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.in it(OrcRawRecordMerger.java:226) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.init(OrcRawRecordMer ger.java:437) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.j ava:1219) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFo rmat.java:1117) at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getReco rdReader(FetchOperator.java:673) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator .java:323) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java :445) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:41 4) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1671) at
[jira] [Created] (HIVE-10889) HIVE-10778 has NPE
Sergey Shelukhin created HIVE-10889: --- Summary: HIVE-10778 has NPE Key: HIVE-10889 URL: https://issues.apache.org/jira/browse/HIVE-10889 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34922: HIVE-10705 Update tests for HIVE-9302 after removing binaries
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34922/ --- (Updated June 2, 2015, 1:15 p.m.) Review request for hive, Hari Sankar Sivarama Subramaniyan, Sushanth Sowmyan, and Vaibhav Gumashta. Bugs: HIVE-10705 https://issues.apache.org/jira/browse/HIVE-10705 Repository: hive-git Description --- Summary: 1. remove binaries and make jar file in the runtime 2. move some common utilities to the HiveTestUtils Diffs (updated) - beeline/pom.xml 352f561 beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java 7a354f3 beeline/src/test/resources/DummyDriver-1.0-SNAPSHOT.jar 3dadc9e beeline/src/test/resources/DummyDriver.txt PRE-CREATION beeline/src/test/resources/postgresql-9.3.jdbc3.jar f537b98 common/src/java/org/apache/hive/common/util/HiveTestUtils.java db34494 ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 45ad22a Diff: https://reviews.apache.org/r/34922/diff/ Testing --- UT passed locally Thanks, cheng xu
hive.optimize.index.filter + ORC + TIMESTAMP throws NPE or IAE depending on hive version
if hive.optimize.index.filter is enabled then it causes the following the following stacktraces -- create table ts (ts timestamp); insert into table ts values('2015-01-01 00:00:00'); set hive.optimize.index.filter=true; select * from ts where ts = '2015-01-01 00:00:00'; -- -- HIVE-1.3.0 OK 15/06/01 19:07:08 [main]: INFO ql.Driver: OK 15/06/01 19:07:08 [main]: INFO log.PerfLogger: PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=releaseLocks start=1433210828865 end=1433210828865 duration=0 from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=Driver.run start=1433210828758 end=1433210828865 duration=107 from=org.apache.hadoop.hive.ql.Driver 15/06/01 19:07:08 [main]: INFO log.PerfLogger: PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl 15/06/01 19:07:08 [main]: INFO orc.OrcInputFormat: FooterCacheHitRatio: 0/0 15/06/01 19:07:08 [main]: INFO log.PerfLogger: /PERFLOG method=OrcGetSplits start=1433210828870 end=1433210828876 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl 15/06/01 19:07:08 [main]: INFO orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (LESS_THAN ts 2015-01-01 00:00:00) expr = (not leaf-0) 15/06/01 19:07:08 [main]: INFO orc.OrcRawRecordMerger: min key = null, max key = null 15/06/01 19:07:08 [main]: INFO orc.ReaderImpl: Reading ORC rows from hdfs://localhost/apps/apivovarov/warehouse/ts/00_0 with {include: [true, true], offset: 0, length: 9223372036854775807, sarg: leaf-0 = (LESS_THAN ts 2015-01-01 00:00:00) expr = (not leaf-0), columns: ['null', 'ts']} 15/06/01 19:07:08 [main]: WARN orc.RecordReaderImpl: Exception when evaluating predicate. Skipping ORC PPD. Exception: java.lang.IllegalArgumentException: ORC SARGS could not convert from String to TIMESTAMP at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:711) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:752) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:778) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:987) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1020) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.init(RecordReaderImpl.java:205) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.init(OrcRawRecordMerger.java:183) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.init(OrcRawRecordMerger.java:226) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.init(OrcRawRecordMerger.java:437) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1219) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1117) at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1671) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 15/06/01 19:07:08 [main]: INFO orc.OrcInputFormat: ORC pushdown predicate: leaf-0 =
Re: Review Request 34776: HIVE-4239 : Remove lock on compilation stage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/#review86144 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java https://reviews.apache.org/r/34776/#comment138067 If we follow this approach every place, it is going to lead to an explosion of thread local. Also, the use of thread local this way can lead to new bugs getting introduced over time. The driver objects lifetime is beyond compilation. The execution is usually done in a different thread. I feel a cleaner/more robust solution would be to have these objects tied to a driver instance rather than a thread. Not sure what the best approach for that would be - maybe have a thread local driver, and have it contain a context object that holds related fields ? Thoughts ? Rant : Having a Utils class have state is also counter intuitive, maybe the part that had state should have been part of a different class .. service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java https://reviews.apache.org/r/34776/#comment138071 I think this approach will cause the bug in HIVE-6245. If new Hive object is not created within doAs block, it would not create it with the correct user. I need to look some more into that. service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/34776/#comment138072 why do you still need this ? - Thejas Nair On June 2, 2015, 12:53 a.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34776/ --- (Updated June 2, 2015, 12:53 a.m.) Review request for hive. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d733d71 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5dac29f ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 37b6d6f service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 343c68e service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java a29e5d1 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java b4d517f Diff: https://reviews.apache.org/r/34776/diff/ Testing --- Thanks, Sergey Shelukhin
Review Request 34922: HIVE-10705 Update tests for HIVE-9302 after removing binaries
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34922/ --- Review request for hive, Hari Sankar Sivarama Subramaniyan, Sushanth Sowmyan, and Vaibhav Gumashta. Bugs: HIVE-10705 https://issues.apache.org/jira/browse/HIVE-10705 Repository: hive-git Description --- Summary: 1. remove binaries and make jar file in the runtime 2. move some common utilities to the HiveTestUtils Diffs - beeline/pom.xml 352f561 beeline/src/test/org/apache/hive/beeline/DummyDriver.java PRE-CREATION beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java 7a354f3 beeline/src/test/resources/DummyDriver-1.0-SNAPSHOT.jar 3dadc9e beeline/src/test/resources/DummyDriver.txt PRE-CREATION beeline/src/test/resources/postgresql-9.3.jdbc3.jar f537b98 common/src/java/org/apache/hive/common/util/HiveTestUtils.java db34494 ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 45ad22a Diff: https://reviews.apache.org/r/34922/diff/ Testing --- UT passed locally Thanks, cheng xu