[jira] [Created] (HIVE-25833) Inconsistent date type behavior between hive2 and hive3 for ORC files
Nemon Lou created HIVE-25833: Summary: Inconsistent date type behavior between hive2 and hive3 for ORC files Key: HIVE-25833 URL: https://issues.apache.org/jira/browse/HIVE-25833 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Nemon Lou In hive2 : create table hive2_orc(id date); insert into hive2_orc values('0001-01-01'); select * from hive2_orc; --will get '0001-01-01' in hive3: query the same orc file, --will get '0001-12-30' The same thing happens between hive3 and master branch. In hive3 writes '0001-01-01' and will get '0001-01-03' for master branch -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile
Nemon Lou created HIVE-25671: Summary: Hybrid Grace Hash Join NullPointer When query RCFile Key: HIVE-25671 URL: https://issues.apache.org/jira/browse/HIVE-25671 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Nemon Lou {format} 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace Hash Join: Deserializing spilled hash partition... 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace Hash Join: Number of rows in hashmap: 1 2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace Hash Join: Going to process spilled big table rows in partition 5. Number of rows: 1 2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected exception from MapJoinOperator : null java.lang.NullPointerException at org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114) at org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172) at org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:466) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.reProcessBigTable(MapJoinOperator.java:755) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {format} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24902) Incorrect result due to ReduceExpressionsRule
Nemon Lou created HIVE-24902: Summary: Incorrect result due to ReduceExpressionsRule Key: HIVE-24902 URL: https://issues.apache.org/jira/browse/HIVE-24902 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 3.1.2, 4.0.0 Reporter: Nemon Lou The following sql returns only one record (20210308)but we expect two(20210308 20210309). {code:sql} select * from ( select case when b.a=1 then cast (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 86400,'MMdd') as bigint) else 20210309 end as col from (select stack(2,1,2) as (a)) as b ) t where t.col is not null; {code} After debuging, i find the ReduceExpressionsRule changes expression in the wrong way. Original expression: {code:sql} IS NOT NULL(CASE(=($0, 1), CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 20210309)) {code} After reducing expressions: {code:sql} CASE(=($0, 1), IS NOT NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), true) {code} The query plan in main branch: {code:sql} STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: 2 (type: int), 1 (type: int), 2 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE UDTF Operator Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE function name: stack Filter Operator predicate: COALESCE((col0 = 1),false) (type: boolean) Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE (20210309L) END (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE ListSink Time taken: 0.155 seconds, Fetched: 28 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24579) Incorrect Result For Groupby With Limit
Nemon Lou created HIVE-24579: Summary: Incorrect Result For Groupby With Limit Key: HIVE-24579 URL: https://issues.apache.org/jira/browse/HIVE-24579 Project: Hive Issue Type: Bug Affects Versions: 3.1.2, 2.3.7, 4.0.0 Reporter: Nemon Lou {code:sql} create table test(id int); explain extended select id,count(*) from test group by id limit 10; {code} There is an TopN unexpectly for map phase, which casues incorrect result. {code:sql} STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: test Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE GatherStats: false Select Operator expressions: id (type: int) outputColumnNames: id Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() keys: id (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) null sort order: a sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE tag: -1 TopN: 10 TopN Hash Memory Usage: 0.1 value expressions: _col1 (type: bigint) auto parallelism: false Path -> Alias: file:/user/hive/warehouse/test [test] Path -> Partition: file:/user/hive/warehouse/test Partition base file name: test input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"} bucket_count -1 column.name.delimiter , columns id columns.comments columns.types int file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/user/hive/warehouse/test name default.test numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct test \{ i32 id} serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1609730036 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"} bucket_count -1 column.name.delimiter , columns id columns.comments columns.types int file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/user/hive/warehouse/test name default.test numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct test \{ i32 id} serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1609730036 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.test name: default.test Truncated Path -> Alias: /test [test] Needs Tagging: false Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 168 Data size: 672 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 10 Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false GlobalTableId: 0 directory: file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002 NumFilesPerFileSink: 1 Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: NONE Stats Publishing Key Prefix: file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002/ table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1 columns.types int:bigint escape.delim \ hive.serialization.extend.additional.nesting.levels true serialization.escape.crlf true serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TotalFiles: 1 GatherStats: false MultiFileSpray: false Stage: Stage-0 Fetch Operator limit: 10 Processor Tree: ListSink Time taken: 1.877 seconds, Fetched: 128 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite
Nemon Lou created HIVE-24165: Summary: CBO: Query fails after multiple count distinct rewrite Key: HIVE-24165 URL: https://issues.apache.org/jira/browse/HIVE-24165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0 Reporter: Nemon Lou One way to reproduce: ``` drop table test; CREATE TABLE test( `device_id` string, `level` string, `site_id` string, `user_id` string, `first_date` string, `last_date` string, `dt` string) ; set hive.execution.engine=tez; set hive.optimize.distinct.rewrite=true; set hive.cli.print.header=true; select dt, site_id, count(DISTINCT t1.device_id) as device_tol_cnt, count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else null end) as device_add_cnt from test t1 where dt='2020-09-15' group by dt, site_id ; ``` Error log: ``` Exception in thread "main" java.lang.AssertionError: Cannot add expression of different type to set:Exception in thread "main" java.lang.AssertionError: Cannot add expression of different type to set:set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULLexpression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULLset is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group=\{2, 3},agg#0=count($0),agg#1=count($1))expression is HiveProject#95 at org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411) at org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234) at org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280) at org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
[jira] [Created] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
Nemon Lou created HIVE-16907: Summary: "INSERT INTO" overwrite old data when destination table encapsulated by backquote Key: HIVE-16907 URL: https://issues.apache.org/jira/browse/HIVE-16907 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 2.1.1, 1.1.0 Reporter: Nemon Lou A way to reproduce: {noformat} create database tdb; use tdb; create table t1(id int); create table t2(id int); explain insert into `tdb.t1` select * from t2; {noformat} {noformat} +---+ | Explain | +---+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4 | | Stage-3 | | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 | | Stage-2 | | Stage-4 | | Stage-5 depends on stages: Stage-4 | | | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: t2 | | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE | | Select Operator | | expressions: id (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE | | File Output Operator | | compressed: false | | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE | | table: | | input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat | |
[jira] [Created] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently
Nemon Lou created HIVE-16839: Summary: Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently Key: HIVE-16839 URL: https://issues.apache.org/jira/browse/HIVE-16839 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Nemon Lou SQL to reproduce: prepare: {noformat} hdfs dfs -mkdir -p /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627 1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp string,ld string); {nofrmat} open one beeline run these two sql many times {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION (cp=2017060513,ld=2017060610); 3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET LOCATION 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627'; {noformat} open another beeline to run this sql many times at the same time. {noformat} 4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610); {noformat} MetaStore logs: {noformat} 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOObjectNotFoundException: No such database row FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475) at org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158) at org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231) at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java) at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299) at org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680) at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98) at com.sun.proxy.$Proxy0.getPartition(Unknown Source) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317) at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) NestedThrowablesStackTrace: No such database row org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row at org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357) at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324) at org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
[jira] [Created] (HIVE-15638) ArrayIndexOutOfBoundsException when output Columns for UDTF are pruned
Nemon Lou created HIVE-15638: Summary: ArrayIndexOutOfBoundsException when output Columns for UDTF are pruned Key: HIVE-15638 URL: https://issues.apache.org/jira/browse/HIVE-15638 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.1.0, 1.3.0 Reporter: Nemon Lou {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 151 at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:364) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:200) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:186) at org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:525) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:494) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:180) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:174) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: 151 at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:878) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 151 at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.populateCachedDistributionKeys(ReduceSinkOperator.java:443) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:350) ... 13 more {noformat} The way to reproduce : DDL: {noformat} create table tb_a(data_dt string,key string,src string,data_id string,tag_id string, entity_src string); create table tb_b(pos_tagging string,src string,data_id string); create table tb_c(key string,start_time string,data_dt string); insert into tb_a values('20160901','CPI','04','data_id','tag_id','entity_src'); insert into tb_b values('pos_tagging','04','data_id'); insert into tb_c values('data_id','start_time_','20160901'); create function hwrl as 'HotwordRelationUDTF' using jar 'hdfs:///tmp/nemon/udf/hotword.jar'; {noformat} UDF File : {code} import java.util.ArrayList; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import
[jira] [Created] (HIVE-14662) Wrong Class Instance When Using Custom SERDE
Nemon Lou created HIVE-14662: Summary: Wrong Class Instance When Using Custom SERDE Key: HIVE-14662 URL: https://issues.apache.org/jira/browse/HIVE-14662 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Nemon Lou Assignee: Nemon Lou Using [SERDE for mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java] DDL {noformat} create external table mytable (ID STRING..) ROW FORMAT SERDE 'com.mongodb.hadoop.hive.BSONSerDe' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }') STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat' OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat' LOCATION 'hdfs:///mypath'; {noformat} Open beeline and run the following query ,and then open another beeline,run this again.Then fails. {noformat} add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar; add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar; add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar; select * from mytable limit 1; {noformat} Error log : {noformat} 2016-08-25 09:30:34,475 | WARN | HiveServer2-Handler-Pool: Thread-11972 | Error fetching results: | org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058) org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass com.mongodb.hadoop.io.BSONWritable at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710) at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy20.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass com.mongodb.hadoop.io.BSONWritable at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361) ... 24 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass com.mongodb.hadoop.io.BSONWritable at com.mongodb.hadoop.hive.BSONSerDe.deserialize(BSONSerDe.java:196) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:488) ... 28 more {noformat} Note:must make sure the table is not
[jira] [Created] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled
Nemon Lou created HIVE-14557: Summary: Nullpointer When both SkewJoin and Mapjoin Enabled Key: HIVE-14557 URL: https://issues.apache.org/jira/browse/HIVE-14557 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 2.1.0, 1.1.0 Reporter: Nemon Lou The following sql failed with return code 2 on mr. {noformat} create table a(id int,id1 int); create table b(id int,id1 int); create table c(id int,id1 int); set hive.optimize.skewjoin=true; select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1; {noformat} Error log as follows: {noformat} 2016-08-17 21:13:42,081 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Id =0 Id =21 Id =28 Id =16 <\Children> Id = 28 null<\Parent> <\FS> <\Children> Id = 21 nullId = 33 Id =33 null <\Children> <\Parent> <\HASHTABLEDUMMY><\Parent> <\MAPJOIN> <\Children> Id = 0 null<\Parent> <\TS> <\Children> <\Parent> <\MAP> 2016-08-17 21:13:42,084 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21] 2016-08-17 21:13:42,084 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator 2016-08-17 21:13:42,086 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:0, 2016-08-17 21:13:42,087 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing operators - failing tree 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14390) Wrong Table alias when CBO is on
Nemon Lou created HIVE-14390: Summary: Wrong Table alias when CBO is on Key: HIVE-14390 URL: https://issues.apache.org/jira/browse/HIVE-14390 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.1 Reporter: Nemon Lou Priority: Minor There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5. But the query plan only has ws1 when CBO is on. query95 : {noformat} SELECT count(distinct ws1.ws_order_number) as order_count, sum(ws1.ws_ext_ship_cost) as total_shipping_cost, sum(ws1.ws_net_profit) as total_net_profit FROM web_sales ws1 JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk) JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk) JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk) LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number FROM web_sales ws2 JOIN web_sales ws3 ON (ws2.ws_order_number = ws3.ws_order_number) WHERE ws2.ws_warehouse_sk <> ws3.ws_warehouse_sk ) ws_wh1 ON (ws1.ws_order_number = ws_wh1.ws_order_number) LEFT SEMI JOIN (SELECT wr_order_number FROM web_returns wr JOIN (SELECT ws4.ws_order_number as ws_order_number FROM web_sales ws4 JOIN web_sales ws5 ON (ws4.ws_order_number = ws5.ws_order_number) WHERE ws4.ws_warehouse_sk <> ws5.ws_warehouse_sk ) ws_wh2 ON (wr.wr_order_number = ws_wh2.ws_order_number)) tmp1 ON (ws1.ws_order_number = tmp1.wr_order_number) WHERE d.d_date between '2002-05-01' and '2002-06-30' and ca.ca_state = 'GA' and s.web_company_name = 'pri'; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14353) Performance degradation after Projection Pruning in CBO
Nemon Lou created HIVE-14353: Summary: Performance degradation after Projection Pruning in CBO Key: HIVE-14353 URL: https://issues.apache.org/jira/browse/HIVE-14353 Project: Hive Issue Type: Bug Components: CBO, Logical Optimizer Affects Versions: 1.2.1 Reporter: Nemon Lou TPC-DS with factor 1024. Hive on Spark. With and without projection prunning,time spent are quite different. The way to disable projection prunning : disable HiveRelFieldTrimmer in code and compile a new jar. ||queries||CBO_no_projection_prune||CBO|| |q27| 160|251 | |q7 | 200|312 | |q88| 701|1092| |q68| 234|345 | |q39|53|78 | |q73| 160|228 | |q31| 463|659 | |q79| 242|343 | |q46| 256|363 | |q60| 271|382 | |q66| 198|278 | |q34| 155|217 | |q19| 184|256 | |q26| 154|214 | |q56| 262|364 | |q75| 942|1303| |q71| 288|388 | |q25| 329|442 | |q52| 142|190 | |q42| 142|189 | |q3 | 139|185 | |q98| 153|203 | |q89| 187|248 | |q58| 264|340 | |q43| 127|162 | |q32| 174|221 | |q96| 156|197 | |q70| 320|404 | |q29| 499|629 | |q18| 266|329 | |q21| 76 |92 | |q90| 139|165 | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14143) RawDataSize of RCFile is zero after analyze
Nemon Lou created HIVE-14143: Summary: RawDataSize of RCFile is zero after analyze Key: HIVE-14143 URL: https://issues.apache.org/jira/browse/HIVE-14143 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 2.1.0, 1.2.1 Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor After running the following analyze command ,rawDataSize becomes zero for rcfile tables. {noformat} analyze table RCFILE_TABLE compute statistics ; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13791) Fix failure Unit Test TestHiveSessionImpl.testLeakOperationHandle
Nemon Lou created HIVE-13791: Summary: Fix failure Unit Test TestHiveSessionImpl.testLeakOperationHandle Key: HIVE-13791 URL: https://issues.apache.org/jira/browse/HIVE-13791 Project: Hive Issue Type: Test Components: Test Affects Versions: 2.1.0 Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13602) TPCH q16 return wrong result when CBO is on
Nemon Lou created HIVE-13602: Summary: TPCH q16 return wrong result when CBO is on Key: HIVE-13602 URL: https://issues.apache.org/jira/browse/HIVE-13602 Project: Hive Issue Type: Bug Components: CBO, Logical Optimizer Affects Versions: 1.2.1 Reporter: Nemon Lou Running tpch with factor 2, q16 returns 1,160 rows when CBO is on, while returns 59,616 rows when CBO is off. See attachment for detail . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13546) Patch for HIVE-12893 is broken in branch-1
Nemon Lou created HIVE-13546: Summary: Patch for HIVE-12893 is broken in branch-1 Key: HIVE-13546 URL: https://issues.apache.org/jira/browse/HIVE-13546 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.3.0 Reporter: Nemon Lou The following sql fails: {noformat} set hive.map.aggr=true; set mapreduce.reduce.speculative=false; set hive.auto.convert.join=true; set hive.optimize.reducededuplication = false; set hive.optimize.reducededuplication.min.reducer=1; set hive.optimize.mapjoin.mapreduce=true; set hive.stats.autogather=true; set mapred.reduce.parallel.copies=30; set mapred.job.shuffle.input.buffer.percent=0.5; set mapred.job.reduce.input.buffer.percent=0.2; set mapred.map.child.java.opts=-server -Xmx2800m -Djava.net.preferIPv4Stack=true; set mapred.reduce.child.java.opts=-server -Xmx3800m -Djava.net.preferIPv4Stack=true; set mapreduce.map.memory.mb=3072; set mapreduce.reduce.memory.mb=4096; set hive.enforce.bucketing=true; set hive.enforce.sorting=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=10; set hive.exec.max.dynamic.partitions=10; set hive.exec.max.created.files=100; set hive.exec.parallel=true; set hive.exec.reducers.max=2000; set hive.stats.autogather=true; set hive.optimize.sort.dynamic.partition=true; set mapred.job.reduce.input.buffer.percent=0.0; set mapreduce.input.fileinputformat.split.minsizee=24000; set mapreduce.input.fileinputformat.split.minsize.per.node=24000; set mapreduce.input.fileinputformat.split.minsize.per.rack=24000; set hive.optimize.sort.dynamic.partition=true; use tpcds_bin_partitioned_orc_4; insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk from tpcds_text_4.store_sales ss; {noformat} Error log is as follows {noformat} 2016-04-19 15:15:35,252 FATAL [main] ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":null},"value":{"_col0":null,"_col1":5588,"_col2":170300,"_col3":null,"_col4":756,"_col5":91384,"_col6":16,"_col7":null,"_col8":855582,"_col9":28,"_col10":null,"_col11":48.83,"_col12":null,"_col13":0.0,"_col14":null,"_col15":899.64,"_col16":null,"_col17":6.14,"_col18":0.0,"_col19":null,"_col20":null,"_col21":null,"_col22":null}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:180) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1732) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:174) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:151) at org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:131) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:1003) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:919) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:713) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 7 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13141) Hive on Spark over HBase should accept parameters starting with "zookeeper.znode"
Nemon Lou created HIVE-13141: Summary: Hive on Spark over HBase should accept parameters starting with "zookeeper.znode" Key: HIVE-13141 URL: https://issues.apache.org/jira/browse/HIVE-13141 Project: Hive Issue Type: Bug Affects Versions: 2.0.0, 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor HBase related paramters has been added by HIVE-12708. Following the same way,parameters starting with "zookeeper.znode" should be add too,which are also HBase related paramters . Refering to http://blog.cloudera.com/blog/2013/10/what-are-hbase-znodes/ I have seen a failure with Hive on Spark over HBase due to customize zookeeper.znode.parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12847) ORC file footer cache should be memory sensitive
Nemon Lou created HIVE-12847: Summary: ORC file footer cache should be memory sensitive Key: HIVE-12847 URL: https://issues.apache.org/jira/browse/HIVE-12847 Project: Hive Issue Type: Improvement Components: File Formats, ORC Affects Versions: 1.2.1 Reporter: Nemon Lou The size based footer cache can not control memory usage properly. Having seen a HiveServer2 hang due to ORC file footer cache taking up too much heap memory. A simple query like "select * from orc_table limit 1" can make HiveServer2 hang. The input table has about 1000 ORC files and each ORC file owns about 2500 stripes. {noformat} num #instances #bytes class name -- 1: 21465360125758432120 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics 3: 122233301 8800797672 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics 5: 89439001 6439608072 org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics 7: 2981300 262354400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation 9: 2981300 143102400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics 12: 2983691 71608584 org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl 15: 809297121752 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type 17:1032825783792 org.apache.hadoop.mapreduce.lib.input.FileSplit 20: 516413305024 org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit 21: 516413305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit 31: 1 413152 [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit; 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12689) Support multiple spark sessions in one Hive Session
Nemon Lou created HIVE-12689: Summary: Support multiple spark sessions in one Hive Session Key: HIVE-12689 URL: https://issues.apache.org/jira/browse/HIVE-12689 Project: Hive Issue Type: Improvement Components: Spark Reporter: Nemon Lou As discussed in HIVE-12538,in case of one Hive Connection been used concurrently,there should be more than one spark sessions for that connection. {quote} A hive session may "own" more than one spark session in case of asynchronous queries. If a spark session is live (used to run a spark job), that spark session will not be used to run the next job. Therefore, whenever whenever a spark configuration change is detected in Hive session, we need to mark all the live Spark sessions as outdated. When we are getting a session from the pool and check if the flag is set, then we destroy it and get a new one. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12614) RESET command does not close spark session
Nemon Lou created HIVE-12614: Summary: RESET command does not close spark session Key: HIVE-12614 URL: https://issues.apache.org/jira/browse/HIVE-12614 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.3.0, 2.1.0 Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12615) Do not start spark session when only explain
Nemon Lou created HIVE-12615: Summary: Do not start spark session when only explain Key: HIVE-12615 URL: https://issues.apache.org/jira/browse/HIVE-12615 Project: Hive Issue Type: Improvement Affects Versions: 1.3.0, 2.1.0 Reporter: Nemon Lou When using beeline -e "set hive.execution.engine=spark;explain select count(*) from sometable",it's very slow due to starting of spark session on yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
Nemon Lou created HIVE-12616: Summary: NullPointerException when spark session is reused to run a mapjoin Key: HIVE-12616 URL: https://issues.apache.org/jira/browse/HIVE-12616 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.3.0 Reporter: Nemon Lou Assignee: Xuefu Zhang The way to reproduce: {noformat} set hive.execution.engine=spark; create table if not exists test(id int); create table if not exists test1(id int); insert into test values(1); insert into test1 values(1); select max(a.id) from test a ,test1 b where a.id = b.id; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12538) After set spark related config, SparkSession never get reused
Nemon Lou created HIVE-12538: Summary: After set spark related config, SparkSession never get reused Key: HIVE-12538 URL: https://issues.apache.org/jira/browse/HIVE-12538 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.3.0 Reporter: Nemon Lou Hive on Spark yarn-cluster mode. After setting "set spark.yarn.queue=QueueA;" , run the query "select count(*) from test" 3 times and you will find 3 different yarn applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12496) Open ServerTransport After MetaStore Initialization
Nemon Lou created HIVE-12496: Summary: Open ServerTransport After MetaStore Initialization Key: HIVE-12496 URL: https://issues.apache.org/jira/browse/HIVE-12496 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 1.2.1 Environment: Standalone MetaStore, cluster mode(multiple instances) Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor During HiveMetaStore starting,the following steps should be reordered: 1,Creation of TServerSocket 2,Creation of HMSHandler 3,Creation of TThreadPoolServer Step 2 involves some initialization work including : {noformat} createDefaultDB(); createDefaultRoles(); addAdminUsers(); {noformat} TServerSocket shall be created after these initialization work to prevent unnecessary waiting from client side.And if there are errors during initialization (multiple metastores creating default DB at the same time can cause errors),clients shall not connect to this metastore as it will shuting down due to error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12480) Hive Counters "RECORDS_OUT" is wrong when using union all
Nemon Lou created HIVE-12480: Summary: Hive Counters "RECORDS_OUT" is wrong when using union all Key: HIVE-12480 URL: https://issues.apache.org/jira/browse/HIVE-12480 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.1 Reporter: Nemon Lou Priority: Minor 1,prepare {noformat} set hive.execution.engine=mr; CREATE TABLE IF NOT EXISTS test(id INT); insert into test values (1), (2); {noformat} 2,the query that will return wrong counter {noformat} set hive.execution.engine=mr; insert into test select * from test union all select * from test; {noformat} The counter "RECORDS_OUT_1_default.test" is expected as 4,but actually 8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12464) Inconsistent behavior between MapReduce and Spark engine on bucketed mapjoin
Nemon Lou created HIVE-12464: Summary: Inconsistent behavior between MapReduce and Spark engine on bucketed mapjoin Key: HIVE-12464 URL: https://issues.apache.org/jira/browse/HIVE-12464 Project: Hive Issue Type: Bug Components: Query Planning, Spark Affects Versions: 1.2.1 Reporter: Nemon Lou Steps to reproduce: 1,prepare the table and data {noformat} create table if not exists lxw_test(imei string,sndaid string,data_time string) CLUSTERED BY(imei) SORTED BY(imei) INTO 10 BUCKETS; create table if not exists lxw_test1(imei string,sndaid string,data_time string) CLUSTERED BY(imei) SORTED BY(imei) INTO 5 BUCKETS; set hive.enforce.bucketing = true; set hive.enforce.sorting = true; insert overwrite table lxw_test values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10); insert overwrite table lxw_test1 values (1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10); set hive.enforce.bucketing; insert into table lxw_test1 select * from lxw_test; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; {noformat} 2,the following sql will success : {noformat} set hive.execution.engine=mr; select count(1) from lxw_test1 a join lxw_test b on a.imei = b.imei ; {noformat} 3,this one will fail : {noformat} set hive.execution.engine=spark; select count(1) from lxw_test1 a join lxw_test b on a.imei = b.imei ; {noformat} On spark,the query returns this error: {noformat} Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table lxw_test1 is 5, whereas the number of files is 10 (state=42000,code=10141) {noformat} After set hive.ignore.mapjoin.hint=false and use mapjoin hint,the MapReduce engine return the same error. {noformat} set hive.execution.engine=mr; set hive.ignore.mapjoin.hint=false; explain select /*+ mapjoin(b) */ count(1) from lxw_test1 a join lxw_test b on a.imei = b.imei ; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12432) Hive on Spark Counter "RECORDS_OUT" always be zero
Nemon Lou created HIVE-12432: Summary: Hive on Spark Counter "RECORDS_OUT" always be zero Key: HIVE-12432 URL: https://issues.apache.org/jira/browse/HIVE-12432 Project: Hive Issue Type: Bug Components: Spark, Statistics Affects Versions: 1.2.1 Reporter: Nemon Lou Assignee: Nemon Lou A simple way to reproduce : set hive.execution.engine=spark; CREATE TABLE test(id INT); insert into test values (1) (2); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12382) return actual row count for JDBC executeUpdate
Nemon Lou created HIVE-12382: Summary: return actual row count for JDBC executeUpdate Key: HIVE-12382 URL: https://issues.apache.org/jira/browse/HIVE-12382 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor when running sql like 'insert into/overwrite table', user may want to know how many rows are inserted . Return actual row count for HiveStatement.executeUpdate is useful in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12371) Adding a timeout connection parameter for JDBC
Nemon Lou created HIVE-12371: Summary: Adding a timeout connection parameter for JDBC Key: HIVE-12371 URL: https://issues.apache.org/jira/browse/HIVE-12371 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Nemon Lou Assignee: Vaibhav Gumashta There are some timeout settings from server side: HIVE-4766 HIVE-6679 Adding a timeout connection parameter for JDBC is useful in some scenario: 1,beeline (which can not set timeout manually) 2,customize timeout for different connections (among hive or RDBs,which can not be done via DriverManager.setLoginTimeout()) Just like postgresql, {noformat} jdbc:postgresql://localhost/test?user=fred=secret=true=0 {noformat} or mysql {noformat} jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6=6 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
Nemon Lou created HIVE-11768: Summary: java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances Key: HIVE-11768 URL: https://issues.apache.org/jira/browse/HIVE-11768 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.2.1 Reporter: Nemon Lou More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our long running HiveServer2 instances,taken up more than 100MB on heap. Most of the paths contains a suffix of ".piepout". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11244) Beeline prompt info improvement for cluster mode
Nemon Lou created HIVE-11244: Summary: Beeline prompt info improvement for cluster mode Key: HIVE-11244 URL: https://issues.apache.org/jira/browse/HIVE-11244 Project: Hive Issue Type: Improvement Components: Beeline Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor Currently Beeline prompt info for Cluster mode is like this: {noformat} 0: jdbc:hive2://192.168.115.1:24002,192.168.1 {noformat} Using the very HiveServer2's IP that this beeline connect to is more helpful for users. Like this: {noformat} 0: jdbc:hive2://192.168.115.1:24002 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11243) Changing log level in Utilities.getBaseWork
Nemon Lou created HIVE-11243: Summary: Changing log level in Utilities.getBaseWork Key: HIVE-11243 URL: https://issues.apache.org/jira/browse/HIVE-11243 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou Priority: Minor Seeing a lot this kind of log when running jobs without any reduce,changeing this log to debug level should be ok. {noformat} 2015-07-10 15:13:52,910 | INFO | HiveServer2-Background-Pool: Thread-6074 | File not found: File does not exist: /tmp/hive-scratch/admin/3f70dbe7-96c0-41be-baac-72f4a2e45ea0/hive_2015-07-10_15-13-40_991_7379130813954010484-5/-mr-10008/ef20bbe4-9311-4633-9057-e018ce08cc00/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1834) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1805) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:589) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:367) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2084) | org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:456) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10817) Blacklist For Bad MetaStore
Nemon Lou created HIVE-10817: Summary: Blacklist For Bad MetaStore Key: HIVE-10817 URL: https://issues.apache.org/jira/browse/HIVE-10817 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou During a reliability test ,when one of MetaStore 's machine power down ,HiveServer2 then never submit jobs to YARN. There are 100 JDBC clients (Beeline) running concurrently.And all the 100 JDBC clients hangs. After checking HiveServer2's thread stack,i find that most of the threads waiting to lock AbstractService while the one holding it is trying to connect to the bad MetaStore which has been power down.When the thread which hold this lock finally return SocketTimeoutException and release this lock,another thread will hold this lock and again stuck until socket time out. Adding a new blacklist mechanism finally solved this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly
Nemon Lou created HIVE-10815: Summary: Let HiveMetaStoreClient Choose MetaStore Randomly Key: HIVE-10815 URL: https://issues.apache.org/jira/browse/HIVE-10815 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.2.0 Reporter: Nemon Lou Assignee: Nemon Lou Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs when multiple metastores configured. Choosing MetaStore Randomly will be good for load balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs
Nemon Lou created HIVE-10781: Summary: HadoopJobExecHelper Leaks RunningJobs Key: HIVE-10781 URL: https://issues.apache.org/jira/browse/HIVE-10781 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 1.2.0, 0.13.1 Reporter: Nemon Lou On one of our busy hadoop cluster, hiveServer2 holds more than 4000 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less than 3 backgroud handler thread at the same time. All these instances are hold in one LinkedList from org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's runningJobs property,which is static. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10625) Handle Authorization for 'select expr' hive queries in SQL Standard Authorization
Nemon Lou created HIVE-10625: Summary: Handle Authorization for 'select expr' hive queries in SQL Standard Authorization Key: HIVE-10625 URL: https://issues.apache.org/jira/browse/HIVE-10625 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 1.1.0 Reporter: Nemon Lou Hive internally rewrites this 'select expression' query into 'select expression from _dummy_database._dummy_table', where these dummy db and table are temp entities for the current query. The SQL Standard Authorization need to handle these special objects. Typing select reverse(123); in beeline : ,will get this error : {code} Error: Error while compiling statement: FAILED: HiveAuthzPluginException Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=_dummy_database._dummy_table] (state=42000,code=4) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10417) Parallel Order By return wrong results for partitioned tables
Nemon Lou created HIVE-10417: Summary: Parallel Order By return wrong results for partitioned tables Key: HIVE-10417 URL: https://issues.apache.org/jira/browse/HIVE-10417 Project: Hive Issue Type: Bug Affects Versions: 1.0.0, 0.13.1, 0.14.0 Reporter: Nemon Lou Following is the script that reproduce this bug. set hive.optimize.sampling.orderby=true; set mapreduce.job.reduces=10; select * from src order by key desc limit 10; +--++ | src.key | src.value | +--++ | 98 | val_98 | | 98 | val_98 | | 97 | val_97 | | 97 | val_97 | | 96 | val_96 | | 95 | val_95 | | 95 | val_95 | | 92 | val_92 | | 90 | val_90 | | 90 | val_90 | +--++ 10 rows selected (47.916 seconds) reset; create table src_orc_p (key string ,value string ) partitioned by (kp string) stored as orc tblproperties(orc.compress=SNAPPY); set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1; set hive.exec.max.dynamic.partitions=1; insert into table src_orc_p partition(kp) select *,substring(key,1) from src distribute by substring(key,1); set mapreduce.job.reduces=10; set hive.optimize.sampling.orderby=true; select * from src_orc_p order by key desc limit 10; ++--+-+ | src_orc_p.key | src_orc_p.value | src_orc_p.kend | ++--+-+ | 0 | val_0| 0 | | 0 | val_0| 0 | | 0 | val_0| 0 | ++--+-+ 3 rows selected (39.861 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9839) HiveServer2 leaks OperationHandle on failed async queries
Nemon Lou created HIVE-9839: --- Summary: HiveServer2 leaks OperationHandle on failed async queries Key: HIVE-9839 URL: https://issues.apache.org/jira/browse/HIVE-9839 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.0.0, 0.13.1, 0.14.0 Reporter: Nemon Lou Using beeline to connect to HiveServer2.And type the following: drop table if exists table_not_exists; select * from table_not_exists; There will be an OperationHandle object staying in HiveServer2's memory for ever even after quit from beeline . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
[ https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291782#comment-14291782 ] Nemon Lou commented on HIVE-9100: - Mariusz Strzelecki is right.After changing metastore's TokenStore from memory to DB,the error disappears.Thanks, Mariusz Strzelecki. HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: hiveserver2.log, metastore.log Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251006#comment-14251006 ] Nemon Lou commented on HIVE-7797: - review link : https://reviews.apache.org/r/29136/ upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: HIVE-7797.1.patch Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Summary: upgrade hive schema from 0.9.0 to 0.13.1 failed (was: upgrade sql 014-HIVE-3764.postgres.sql failed) upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Description: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint was: The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Affects Version/s: 0.14.0 upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Description: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). was: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Attachment: HIVE-7797.1.patch Using blank space instead of '' ,so postgres won't convert the empty string into null. upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: HIVE-7797.1.patch Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
Nemon Lou created HIVE-9100: --- Summary: HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Affects Versions: 0.13.1, 0.14.0 Reporter: Nemon Lou Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
[ https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-9100: Attachment: metastore.log hiveserver2.log HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: hiveserver2.log, metastore.log Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9095) permanent functions' ClassLoader should be global instead of per-session
Nemon Lou created HIVE-9095: --- Summary: permanent functions' ClassLoader should be global instead of per-session Key: HIVE-9095 URL: https://issues.apache.org/jira/browse/HIVE-9095 Project: Hive Issue Type: Improvement Components: HiveServer2, UDF Affects Versions: 0.13.1, 0.14.0 Reporter: Nemon Lou FunctionRegistry.mFunctions is static. That means that in HS2 case, all users will share the same UDF class object from mFunctions ,which lead to share the same classloader that load this class. First,this will make the per-session classloader useless.Because only the first classLoader will be used to initailize the instances of the permanent UDF class. Second, it's will cause class not found exception,when the classLoader created by the first session be closed before load all the classes that need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9095) permanent functions' ClassLoader should be global instead of per-session
[ https://issues.apache.org/jira/browse/HIVE-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-9095: Description: FunctionRegistry.mFunctions is static. That means that in HS2 case, all users will share the same UDF class object from mFunctions ,which lead to share the same classloader that load this class. First,this will make the per-session classloader useless.Because only the first classLoader will be used to initailize the instances of the permanent UDF class. Second, it's will cause class not found exception,when the classLoader created by the first session has been closed before load all the classes that need. was: FunctionRegistry.mFunctions is static. That means that in HS2 case, all users will share the same UDF class object from mFunctions ,which lead to share the same classloader that load this class. First,this will make the per-session classloader useless.Because only the first classLoader will be used to initailize the instances of the permanent UDF class. Second, it's will cause class not found exception,when the classLoader created by the first session be closed before load all the classes that need. permanent functions' ClassLoader should be global instead of per-session Key: HIVE-9095 URL: https://issues.apache.org/jira/browse/HIVE-9095 Project: Hive Issue Type: Improvement Components: HiveServer2, UDF Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou FunctionRegistry.mFunctions is static. That means that in HS2 case, all users will share the same UDF class object from mFunctions ,which lead to share the same classloader that load this class. First,this will make the per-session classloader useless.Because only the first classLoader will be used to initailize the instances of the permanent UDF class. Second, it's will cause class not found exception,when the classLoader created by the first session has been closed before load all the classes that need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7021) HiveServer2 memory leak on failed queries
[ https://issues.apache.org/jira/browse/HIVE-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243680#comment-14243680 ] Nemon Lou commented on HIVE-7021: - Even without HIVE-4629,HiveServer2 can leak OperationHandle on failed queries. When a JDBCClient runs queries like select * from table_not_exists, HiveServer2 fail this query duiring compile,but leaves an OperationHandle in memory (due to async mode) without pass it back to client side . Shall I fire a new jira for this? Or would you fix it in this patch? HiveServer2 memory leak on failed queries - Key: HIVE-7021 URL: https://issues.apache.org/jira/browse/HIVE-7021 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: HIVE-4629+HIVE-7021.1.patch, HIVE-7021.1.patch The number of the following objects keeps increasing if a query causes an exception: org.apache.hive.service.cli.HandleIdentifier org.apache.hive.service.cli.OperationHandle org.apache.hive.service.cli.log.LinkedStringBuffer org.apache.hive.service.cli.log.OperationLog The leak can be observed using a JDBCClient that runs something like this connection = DriverManager.getConnection(jdbc:hive2:// + hostname + :1/default, , ); statement = connection.createStatement(); statement.execute(CREATE TEMPORARY FUNCTION dummy_function AS 'dummy.class.name'); The above SQL will fail if HS2 cannot load dummy.class.name class. Each iteration of such query will result in +1 increase in instance count for the classes mentioned above. This will eventually cause OOM in the HS2 service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8418) Upgrade to Thrift 0.9.1
Nemon Lou created HIVE-8418: --- Summary: Upgrade to Thrift 0.9.1 Key: HIVE-8418 URL: https://issues.apache.org/jira/browse/HIVE-8418 Project: Hive Issue Type: Task Components: Server Infrastructure Affects Versions: 0.13.1 Reporter: Nemon Lou THRIFT-1869 fixes a crash in HS2 when the thrift thread pool is consumed. The patch has been included in Thrift 0.9.1 . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4224) Upgrade to Thrift 1.0 when available
[ https://issues.apache.org/jira/browse/HIVE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152849#comment-14152849 ] Nemon Lou commented on HIVE-4224: - THRIFT-1869 has been fixed in Thrift 0.9.1,which is released on 21/Aug/13. Any plan to upgrade thrift to 0.9.1 ? Upgrade to Thrift 1.0 when available Key: HIVE-4224 URL: https://issues.apache.org/jira/browse/HIVE-4224 Project: Hive Issue Type: Sub-task Components: HiveServer2, Metastore, Server Infrastructure Affects Versions: 0.11.0 Reporter: Brock Noland Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7797) upgrade sql 014-HIVE-3764.postgres.sql failed
Nemon Lou created HIVE-7797: --- Summary: upgrade sql 014-HIVE-3764.postgres.sql failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.2#6252)