[jira] [Created] (HIVE-24060) When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution
LuGuangMing created HIVE-24060: -- Summary: When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution Key: HIVE-24060 URL: https://issues.apache.org/jira/browse/HIVE-24060 Project: Hive Issue Type: Bug Components: CBO, Hive Affects Versions: 3.1.2, 3.1.0 Reporter: LuGuangMing {code:java} set hive.cbo.enable=false; create table testtable(idx string, namex string) stored as orc; insert into testtable values('123', 'aaa'), ('234', 'bbb'); explain select a.idx from (select idx,namex from testtable intersect select idx,namex from testtable) a {code} The execution throws a NullPointException: {code:java} 2020-08-24 15:12:24,261 | WARN | HiveServer2-Handler-Pool: Thread-345 | Error executing statement: | org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1155) org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: NullPointerException null at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:341) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:215) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:316) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:253) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:684) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:670) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:342) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1144) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:1280) ~[hive-service-3.1.0.jar:3.1.0] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) ~[hive-service-rpc-3.1.0.jar:3.1.0] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) ~[hive-service-rpc-3.1.0.jar:3.1.0] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.3.jar:0.9.3] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.3.jar:0.9.3] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:648) ~[hive-standalone-metastore-3.1.0.jar:3.1.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[libthrift-0.9.3.jar:0.9.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4367) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4346) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10576) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10515) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11434) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11291) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11318) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11304) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12090) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12180) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11692) ~[hive-exec-3.1.0.jar:3.1.0] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnal
[jira] [Created] (HIVE-24061) Improve llap task scheduling for better cache hit rate
Rajesh Balamohan created HIVE-24061: --- Summary: Improve llap task scheduling for better cache hit rate Key: HIVE-24061 URL: https://issues.apache.org/jira/browse/HIVE-24061 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan TaskInfo is initialized with the "requestTime and locality delay". When lots of vertices are in the same level, "taskInfo" details would be available upfront. By the time, it gets to scheduling, "requestTime + localityDelay" won't be higher than current time. Due to this, it misses scheduling delay details and ends up choosing random node. This ends up missing cache hits and reads data from remote storage. E.g Observed this pattern in Q75 of tpcds. Related lines of interest in scheduler: [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java |https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java] {code:java} boolean shouldDelayForLocality = request.shouldDelayForLocality(schedulerAttemptTime); .. .. boolean shouldDelayForLocality(long schedulerAttemptTime) { return localityDelayTimeout > schedulerAttemptTime; } {code} Ideally, "localityDelayTimeout" should be adjusted based on it's first scheduling opportunity. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24062) Combine all table constrains RDBMS calls in one SQL call
Ashish Sharma created HIVE-24062: Summary: Combine all table constrains RDBMS calls in one SQL call Key: HIVE-24062 URL: https://issues.apache.org/jira/browse/HIVE-24062 Project: Hive Issue Type: Improvement Reporter: Ashish Sharma Assignee: Ashish Sharma Table consist of 6 different type of constrains namely PrimaryKey,ForeignKey,UniqueConstraint,NotNullConstraint,DefaultConstraint,CheckConstraint. All constrains has different SQL query to fetch the infromation from RDBMS. Which lead to 6 different RDBS call. Idea here is to have one complex query which fetch all the constrains information at once then filter the result set on the basis of constrains type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
Zhihua Deng created HIVE-24063: -- Summary: SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo Key: HIVE-24063 URL: https://issues.apache.org/jira/browse/HIVE-24063 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When the current SqlOperator is SqlCastFunction, FunctionRegistry.getFunctionInfo would return null, but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to metastore for the function definition, an exception stack trace can be seen here in HiveServer2 log: INFO exec.FunctionRegistry: Unable to look up default.cast in metastore org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Function @hive#default.cast does not exist) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] So it's may be better to handle explicit cast before geting the FunctionInfo from Registry. Even if there is no cast in the query, the method handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24064) Disable Materialized View Replication
Arko Sharma created HIVE-24064: -- Summary: Disable Materialized View Replication Key: HIVE-24064 URL: https://issues.apache.org/jira/browse/HIVE-24064 Project: Hive Issue Type: Bug Reporter: Arko Sharma -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
László Bodor created HIVE-24065: --- Summary: Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue Key: HIVE-24065 URL: https://issues.apache.org/jira/browse/HIVE-24065 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
Jainik Vora created HIVE-24066: -- Summary: Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception Key: HIVE-24066 URL: https://issues.apache.org/jira/browse/HIVE-24066 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.5 Reporter: Jainik Vora I created a hive table containing columns with struct data type {code:java} CREATE EXTERNAL TABLE abc_dwh.table_on_parquet ( `context` struct<`app`:struct<`build`:string, `name`:string, `namespace`:string, `version`:string>, `screen`:struct<`height`:bigint, `width`:bigint>, `timezone`:string>, `messageid` string, `timestamp` string, `userid` string) PARTITIONED BY (year string, month string, day string, hour string) STORED as PARQUET LOCATION 's3://abc/xyz'; {code} All columns are nullable hence the parquet files read by the table don't always contain all columns. If any file in a partition doesn't have "context.app" struct and if "context.app.version" is queried, Hive throws an exception as below. Same for "context.screen" as well. {code:java} Caused by: java.io.IOException: java.lang.RuntimeException: Primitive type appshould not doesn't match typeapp[version] at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 25 more Caused by: java.lang.RuntimeException: Primitive type appshould not doesn't match typeapp[version] at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(DataWritableReadSupport.java:249) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:379) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) ... 26 more {code} Querying context.app shows as null {code:java} hive> select context.app from abc_dwh.table_on_parquet where year=2020 and month='07' and day=26 and hour='03' limit 5; OK NULL NULL NULL NULL NULL {code} As a workaround, I tried querying "context.app.version" only if "context.app" is not null but that also gave the same error. *To verify the case statement for null check, I ran below query which should produce "0" in result for all columns produced "1".* Distinct value of context.app for the partition is NULL so ruled out differences in select with limit. Running the same query in SparkSQL provides the correct result. {code:java} hive> select case when context.app is null then 0 else 1 end status from abc_dwh.table_on_parquet where year=2020 and month='07' and day=26 and hour='03' limit 5; OK 1 1 1 1 1 {code} Hive Version used: 2.3.5-amzn-0 (on AWS EMR){color:#88} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
Pravin Sinha created HIVE-24067: --- Summary: TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop Key: HIVE-24067 URL: https://issues.apache.org/jira/browse/HIVE-24067 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha In TestReplicationScenariosExclusiveReplica during drop database operation for primary db, it leads to wrong FS error as the ReplChangeManager is associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well
Prasanth Jayachandran created HIVE-24068: Summary: ReExecutionOverlayPlugin can handle DAG submission failures as well Key: HIVE-24068 URL: https://issues.apache.org/jira/browse/HIVE-24068 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24069) HiveHistory should log the task that ends abnormally
Zhihua Deng created HIVE-24069: -- Summary: HiveHistory should log the task that ends abnormally Key: HIVE-24069 URL: https://issues.apache.org/jira/browse/HIVE-24069 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When the task returns with the exitVal not equal to 0, The Executor would skip marking the task return code and calling endTask. This may make the history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)