[jira] [Created] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger
Ganesha Shreedhara created HIVE-25765: - Summary: skip.header.line.count property skips rows of each block in FetchOperator when file size is larger Key: HIVE-25765 URL: https://issues.apache.org/jira/browse/HIVE-25765 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Ganesha Shreedhara Attachments: data.txt.gz When _skip.header.line.count_ property is set in table properties, simple select queries that gets converted into FetchTask skip rows of each block instead of skipping header lines of each file. This happens when the file size is larger and file is read in blocks. This issue doesn't exist when select query is converted into map only job by setting _hive.fetch.task.conversion_ to _none_ because the header lines are skipped only for the first block because of [this check|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330].] We should have similar check in FetchOperator to avoid this issue. *Steps to reproduce:* {code:java} -- Create table on top of the data file (uncompressed size: ~239M) attached in this ticket CREATE EXTERNAL TABLE test_table( col1 string, col2 string, col3 string, col4 string, col5 string, col6 string, col7 string, col8 string, col9 string, col10 string, col11 string, col12 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'location_of_data_file' TBLPROPERTIES ('skip.header.line.count'='1'); -- Counting number of rows gives correct result with only one header line skipped select count(*) from test_table; 3145727 -- Select query skips more rows and the result depends upon the number of blocks configured in underlying filesystem. 3 rows are skipped when the file is read in 3 blocks. select * from test_table; . . Fetched 3145724 rows {code} {{}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema
Ganesha Shreedhara created HIVE-25494: - Summary: Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema Key: HIVE-25494 URL: https://issues.apache.org/jira/browse/HIVE-25494 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara Attachments: test-struct.parquet When a struct type column's field is missing in parquet file schema but present in table schema and columns are accessed by names, the requestedSchema getting sent from Hive to Parquet storage layer has type even for missing field since we always add type as primitive type if a field is missing in file schema ([Ref|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).] On a parquet side, this missing field gets pruned and since this field belongs to struct type, it ends creating a GroupColumnIO without any children. This causes query to fail with IndexOutOfBoundsException, stack trace is given below. {code:java} Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file test-struct.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ... 15 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102) at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102) at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97) at org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101) at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154) at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214) {code} Steps to reproduce: {code:java} CREATE TABLE parquet_struct_test( `parent` struct COMMENT '', `toplevel` string COMMENT '') ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; -- Use the attached test-struct.parquet data file to load data to this table LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test; hive> select parent.extracol, toplevel from parquet_struct_test; OK Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet {code} Same query works fine in the following scenarios: 1) Accessing parquet file columns by index instead of names {code:java} hive> set parquet.column.index.access=true; hive> select parent.extracol, toplevel from parquet_struct_test; OK NULL toplevel{code} 2) When VectorizedParquetRecordReader is used {code:java} hive> set hive.fetch.task.conversion=none; hive> select parent.extracol, toplevel from parquet_struct_test; Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1630412697229_0031) -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
[jira] [Created] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled
Ganesha Shreedhara created HIVE-24209: - Summary: Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled Key: HIVE-24209 URL: https://issues.apache.org/jira/browse/HIVE-24209 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN operation when vectorization is enabled because of the improvement done as part of HIVE-15884. But, this is not handled during the conversion of filter expression to search argument due to which incorrect predicate gets pushed down to storage layer that leads to incorrect splits generation and incorrect result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:
Ganesha Shreedhara created HIVE-23756: - Summary: drop table fails with MySQLIntegrityConstraintViolationException: Key: HIVE-23756 URL: https://issues.apache.org/jira/browse/HIVE-23756 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara Drop table command fails intermittently with the following exception. {code:java} Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) Appat org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372) at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628) at org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207) at org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179) at org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901) ... 36 more Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) at com.mysql.jdbc.Util.getInstance(Util.java:360) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code} Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 table specified in package.jdo file is not same as the FK constraint name used while creating COLUMNS_V2 table ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation
Ganesha Shreedhara created HIVE-23473: - Summary: Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation Key: HIVE-23473 URL: https://issues.apache.org/jira/browse/HIVE-23473 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara NullPointerException is thrown in the following flow. {code:java} java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NullPointerException Caused by: java.lang.NullPointerException at org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312) at org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559) at org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463) at org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446) . . org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206) {code} Shouldn't we just throw NoDynamicValuesException when [ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]] is null instead of returning it similar to how we handled when [conf |[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or [DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]] is null while getting dynamic value? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22963) HiveParser is misinterpreting quotes in from/to string of translate function
Ganesha Shreedhara created HIVE-22963: - Summary: HiveParser is misinterpreting quotes in from/to string of translate function Key: HIVE-22963 URL: https://issues.apache.org/jira/browse/HIVE-22963 Project: Hive Issue Type: Bug Components: Parser Reporter: Ganesha Shreedhara Parsing of query fails when we use single or double quotes in from/to string of translate function in 2.3*/3.1.1 version of hive. Parsing of the same query is successful in 2.1.1 version of hive. *Steps to reproduce:* {code:java} CREATE TABLE test_table (data string); INSERT INTO test_table VALUES("d\"a\"t\"a"); select translate(data, '"', '') from test_table; {code} Parsing fails with the following exception: {code:java} NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource )])NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource )]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:116) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource0(HiveParser_FromClauseParser.java:2942) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2880) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1451) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1341) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45811) at org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:39699) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:39951) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:39597) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:38786) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:38674) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2340) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1369) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:223) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)FAILED: ParseException line 1:40 cannot recognize input near 'tt' ';' '' in from source 0org.apache.hadoop.hive.ql.parse.ParseException: line 1:40 cannot recognize input near 'tt' ';' '' in from source 0 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at sun.reflect.Nat
[jira] [Created] (HIVE-22670) ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file
Ganesha Shreedhara created HIVE-22670: - Summary: ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file Key: HIVE-22670 URL: https://issues.apache.org/jira/browse/HIVE-22670 Project: Hive Issue Type: Bug Affects Versions: 2.3.6, 3.1.2 Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds of a row group in parquet file with vectorization enabled. *Exception stack trace:* {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122) at org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) ... 24 more{code} This issue seems to be caused by re-using the same dictionary column vector while reading consecutive row groups. This looks like one of the corner case bug which occurs for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22233) Wrong result with vectorized execution when column value is casted to TINYINT
Ganesha Shreedhara created HIVE-22233: - Summary: Wrong result with vectorized execution when column value is casted to TINYINT Key: HIVE-22233 URL: https://issues.apache.org/jira/browse/HIVE-22233 Project: Hive Issue Type: Bug Affects Versions: 3.1.1 Reporter: Ganesha Shreedhara Casting a column value to TINYINT is giving incorrect result when vectorized execution is enabled. This is only when the sub query as SUM/COUNT aggregation operations in IF condition. *Steps to reproduce:* {code:java} create table test(id int); insert into test values (1); SELECT CAST(col AS TINYINT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM test) x; {code} *Result:* {code:java} 0{code} *Expected result:* {code:java} 1{code} We get the expected result when hive.vectorized.execution.enabled is disabled. We also get the expected result when we don't CAST or don't have SUM/COUNT aggregation in IF condition. The following queries give correct result when vectorized execution is enabled. {code:java} SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM test) x; SELECT col FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM test) x; SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(2 > 1, 1, 0) col FROM test) x; SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(true, 1, 0) col FROM test) x; {code} This issue is only when we use *CAST(col AS TINYINT)* along with *IF(SUM(1) > 0, 1, 0) or IF(COUNT(1) > 0, 1, 0)* in sub query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21660) Wrong result when union all and later view with explode is used
Ganesha Shreedhara created HIVE-21660: - Summary: Wrong result when union all and later view with explode is used Key: HIVE-21660 URL: https://issues.apache.org/jira/browse/HIVE-21660 Project: Hive Issue Type: Bug Affects Versions: 3.1.1 Reporter: Ganesha Shreedhara There is a data loss when the data is inserted to a partitioned table using union all and lateral view with explode. *Steps to reproduce:* {code:java} create table t1 (id int, dt string); insert into t1 values (2, '2019-04-01'); create table t3( id int, dates array); insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') as dates; create table dst (id int) partitioned by (dt string); set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; insert overwrite table dst partition (dt) select t.id, t.dt from ( select id, dt from t1 union all select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) t; select * from dst_hdfs; {code} *Actual Result:* {code:java} +--+--+ | 2| 2019-04-01 | +--+--+{code} *Expected Result* (Run only the select part from the above insert query)*:* {code:java} +---++ | 2 | 2019-04-01 | | 1 | 2019-01-01 | | 1 | 2019-01-02 | | 1 | 2019-01-03 | +---++{code} Data retrieved using union all and lateral view with explode from second table is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift
Ganesha Shreedhara created HIVE-21492: - Summary: VectorizedParquetRecordReader can't to read parquet file generated using thrift Key: HIVE-21492 URL: https://issues.apache.org/jira/browse/HIVE-21492 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara Taking an example of a parquet table having array of integers as below. {code:java} CREATE EXTERNAL TABLE ( list_of_ints` array) STORED AS PARQUET LOCATION '{location}'; {code} Parquet file generated using hive will have schema for Type as below: {code:java} group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n}{code} Parquet file generated using thrift may have schema for Type as below: {code:java} required group list_of_ints (LIST) { repeated int32 list_of_tuple}{code} VectorizedParquetRecordReader handles only parquet file generated using hive. It throws the following exception when parquet file generated using thrift is read because of the changes done as part of [HIVE-18553|https://issues.apache.org/jira/browse/HIVE-18553] . {code:java} Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is not a group at org.apache.parquet.schema.Type.asGroupType(Type.java:207) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code} I have done a small change to handle the case where the child type of group type can be PrimitiveType. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21428) field delimiter of serde set for partition is not getting respected when vectorization is enabled
Ganesha Shreedhara created HIVE-21428: - Summary: field delimiter of serde set for partition is not getting respected when vectorization is enabled Key: HIVE-21428 URL: https://issues.apache.org/jira/browse/HIVE-21428 Project: Hive Issue Type: Bug Affects Versions: 3.1.1 Reporter: Ganesha Shreedhara *Steps to reproduce:* create external table src (c1 string, c2, string, c3 string) partitioned by (part string) location '/tmp/src'; echo "d1\td2" >> data.txt; hadoop dfs -put data.txt /tmp/src/part=part1/; MSCK REPAIR TABLE src; ALTER TABLE src PARTITION (part='part1') SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 'field.delim'='\t'); create table dest (c1 string, c2 string, c3 string, c4 string); insert overwrite table dest select * from src; select * from dest; *Result* (wrong)*:* d1 d2 NULL NULL part1 set hive.vectorized.execution.enabled=false; insert overwrite table dest select * from src; select * from dest; *Result* (Correct)*:* d1 d2 NULL part1 This is because "d1\td2" is getting considered as single column because the filed delimiter used by deserialiser is *^A* instead of *\t* which is set at partition level. It is working fine if I alter the field delimiter of serde for the entire table. So, looks like serde properties in TableDesc is taking precedence over serde properties in PartitionDesc. This issue is not there in 2.x versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled
/nullgroup.q.out 318488d62f ql/src/test/results/clientpositive/spark/nullgroup2.q.out 93b4d3b6c0 ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11 ql/src/test/results/clientpositive/spark/union33.q.out 3117c56390 ql/src/test/results/clientpositive/union33.q.out 57c53089b9 ql/src/test/results/clientpositive/vector_groupby4.q.out 15b0427308 ql/src/test/results/clientpositive/vector_groupby6.q.out 31472a1ea9 Diff: https://reviews.apache.org/r/68121/diff/3/ Changes: https://reviews.apache.org/r/68121/diff/2-3/ Testing --- Qtests added Thanks, Ganesha Shreedhara
HIVE-20220: Incorrect result when hive.groupby.skewindata is enabled
Hi Team, I found a corner case bug with *hive.groupby.skewindata* configuration parameter as explained in the following jira. https://issues.apache.org/jira/browse/HIVE-20220 I need some help in reviewing the fix. RB request: https://reviews.apache.org/r/68121/ Thanks, Ganesha
Re: Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68121/ --- (Updated July 31, 2018, 2:06 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by keys to the reducers and hence avoids overloading a single reducer when there is a skew in data. This random distribution of keys is buggy when the reducer fails to fetch the mapper output due to a faulty datanode or any other reason. When reducer finds that it can't fetch mapper output, it sends a signal to Application Master to reattempt the corresponding map task. The reattempted map task will now get the different random value from rand function and hence the keys that gets distributed now to the reducer will not be same as the previous run. Steps to reproduce: create table test(id int); insert into test values (1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9); SET hive.groupby.skewindata=true; SET mapreduce.reduce.reduces=2; //Add a debug port for reducer select count(1) from test group by id; //Remove mapper's intermediate output file when map stage is completed and one out of 2 reduce tasks is completed and then continue the run. This causes 2nd reducer to send event to Application Master to rerun the map task. The following is the expected result. 1 2 3 4 5 6 8 8 9 But you may get different result due to a different value returned by the rand function in the second run causing different distribution of keys. This needs to be fixed such that the mapper distributes the same keys even if it is reattempted multiple times. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39c77b3fe5 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 250a085084 ql/src/test/queries/clientpositive/groupby_skew_rand_seed.q PRE-CREATION ql/src/test/results/clientpositive/groupby_skew_rand_seed.q.out PRE-CREATION Diff: https://reviews.apache.org/r/68121/diff/2/ Changes: https://reviews.apache.org/r/68121/diff/1-2/ Testing --- Qtests added Thanks, Ganesha Shreedhara
Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68121/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by keys to the reducers and hence avoids overloading a single reducer when there is a skew in data. This random distribution of keys is buggy when the reducer fails to fetch the mapper output due to a faulty datanode or any other reason. When reducer finds that it can't fetch mapper output, it sends a signal to Application Master to reattempt the corresponding map task. The reattempted map task will now get the different random value from rand function and hence the keys that gets distributed now to the reducer will not be same as the previous run. Steps to reproduce: create table test(id int); insert into test values (1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9); SET hive.groupby.skewindata=true; SET mapreduce.reduce.reduces=2; //Add a debug port for reducer select count(1) from test group by id; //Remove mapper's intermediate output file when map stage is completed and one out of 2 reduce tasks is completed and then continue the run. This causes 2nd reducer to send event to Application Master to rerun the map task. The following is the expected result. 1 2 3 4 5 6 8 8 9 But you may get different result due to a different value returned by the rand function in the second run causing different distribution of keys. This needs to be fixed such that the mapper distributes the same keys even if it is reattempted multiple times. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39c77b3fe5 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 250a085084 ql/src/test/queries/clientpositive/groupby_skew_rand_seed.q PRE-CREATION ql/src/test/queries/clientpositive/groupby_skew_rand_seed1.q PRE-CREATION ql/src/test/results/clientpositive/groupby_skew_rand_seed.q.out PRE-CREATION ql/src/test/results/clientpositive/groupby_skew_rand_seed1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/68121/diff/1/ Testing --- Qtests added Thanks, Ganesha Shreedhara
[jira] [Created] (HIVE-20220) Incorrect result when hive.groupby.skewindata is enabled
Ganesha Shreedhara created HIVE-20220: - Summary: Incorrect result when hive.groupby.skewindata is enabled Key: HIVE-20220 URL: https://issues.apache.org/jira/browse/HIVE-20220 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by keys to the reducers and hence avoids overloading a single reducer when there is a skew in data. This random distribution of keys is buggy when the reducer fails to fetch the mapper output due to a faulty datanode or any other reason. When reducer finds that it can't fetch mapper output, it sends a signal to Application Master to reattempt the corresponding map task. The reattempted map task will now get the different random value from rand function and hence the keys that gets distributed now to the reducer will not be same as the previous run. *Steps to reproduce:* create table test(id int); insert into test values (1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9); SET hive.groupby.skewindata=true; SET mapreduce.reduce.reduces=2; //Add a debug port for reducer select count(1) from test group by id; //Remove mapper's intermediate output file when map stage is completed and one out of 2 reduce tasks is completed and then continue the run. This causes 2nd reducer to send event to Application Master to rerun the map task. The following is the expected result. 1 2 3 4 5 6 8 8 9 But you may get different result due to a different value returned by the rand function in the second run causing different distribution of keys. This needs to be fixed such that the mapper distributes the same keys even if it is reattempted multiple times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19850) Dynamic partition pruning in Tez is leading to 'No work found for tablescan' error
Ganesha Shreedhara created HIVE-19850: - Summary: Dynamic partition pruning in Tez is leading to 'No work found for tablescan' error Key: HIVE-19850 URL: https://issues.apache.org/jira/browse/HIVE-19850 Project: Hive Issue Type: Bug Reporter: Ganesha Shreedhara When multiple views are used along with union all, it is resulting in the following error when dynamic partition pruning is enabled in tez. {code:java} Exception in thread "main" java.lang.AssertionError: No work found for tablescan TS[8] at org.apache.hadoop.hive.ql.parse.GenTezUtils.processAppMasterEvent(GenTezUtils.java:408) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:383) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10371) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:347) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1203) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1257) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1130) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:204) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:433) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:894) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:825) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:726) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:223) at org.apache.hadoop.util.RunJar.main(RunJar.java:136){code} *Steps to reproduce:* set hive.execution.engine=tez; hive.tez.dynamic.partition.pruning=true; CREATE TABLE t1(key string, value string, c_int int, c_float float, c_boolean boolean) partitioned by (dt string); CREATE TABLE t2(key string, value string, c_int int, c_float float, c_boolean boolean) partitioned by (dt string); CREATE TABLE t3(key string, value string, c_int int, c_float float, c_boolean boolean) partitioned by (dt string); insert into table t1 partition(dt='2018') values ('k1','v1',1,1.0,true); insert into table t2 partition(dt='2018') values ('k2','v2',2,2.0,true); insert into table t3 partition(dt='2018') values ('k3','v3',3,3.0,true); CREATE VIEW `view1` AS select `t1`.`key`,`t1`.`value`,`t1`.`c_int`,`t1`.`c_float`,`t1`.`c_boolean`,`t1`.`dt` from `t1` union all select `t2`.`key`,`t2`.`value`,`t2`.`c_int`,`t2`.`c_float`,`t2`.`c_boolean`,`t2`.`dt` from `t2`; CREATE VIEW `view2` AS select `t2`.`key`,`t2`.`value`,`t2`.`c_int`,`t2`.`c_float`,`t2`.`c_boolean`,`t2`.`dt` from `t2` union all select `t3`.`key`,`t3`.`value`,`t3`.`c_int`,`t3`.`c_float`,`t3`.`c_boolean`,`t3`.`dt` from `t3`; create table t4 as select key,value,c_int,c_float,c_boolean,dt from t1 union all select v1.key,v1.value,v1.c_int,v1.c_float,v1.c_boolean,v1.dt from view1 v1 join view2 v2 on v1.dt=v2.dt; CREATE VIEW `view3` AS select `t4`.`key`,`t4`.`value`,`t4`.`c_int`,`t4`.`c_float`,`t4`.`c_boolean`,`t4`.`dt` from `t4` union all select `t1`.`key`,`t1`.`value`,`t1`.`c_int`,`t1`.`c_float`,`t1`.`c_boolean`,`t1`.`dt` from `t1`; select count(0) from view2 v2 join view3 v3 on v2.dt=v3.dt; // Throws No work found for tablescan error -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19101) Apply rule [HiveJoinPushTransitivePredicatesRule] is getting stuck when there are huge number of predicates
Ganesha Shreedhara created HIVE-19101: - Summary: Apply rule [HiveJoinPushTransitivePredicatesRule] is getting stuck when there are huge number of predicates Key: HIVE-19101 URL: https://issues.apache.org/jira/browse/HIVE-19101 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.3.2, 2.3.1, 2.3.0, 2.2.0, 2.1.1 Reporter: Ganesha Shreedhara Attachments: queries Hive query is getting stuck during the optimisation phase while applying HiveJoinPushTransitivePredicatesRule when there are huge number of predicates. *DEBUG Log:* {code:java} 2018-04-04T11:22:47,991 [user: ganeshas] -1 DEBUG [6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10963: Apply rule [ReduceExpressionsRule(Join)] to [rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1, $73),joinType=inner,algorithm=none,cost=not available)] 2018-04-04T11:22:48,359 [user: ganeshas] -1 DEBUG [6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10964: Apply rule [HiveJoinAddNotNullRule] to [rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1, $73),joinType=inner,algorithm=none,cost=not available)] 2018-04-04T11:22:48,360 [user: ganeshas] -1 DEBUG [6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10965: Apply rule [HiveJoinPushTransitivePredicatesRule] to [rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1, $73),joinType=inner,algorithm=none,cost=not available)]{code} *Thread Status:* {code:java} "6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main" prio=5 tid=0x7ff18e006800 nid=0x1c03 runnable [0x78176000] java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.(String.java:203) at java.lang.StringBuilder.toString(StringBuilder.java:405) at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:95) at org.apache.calcite.rex.RexCall.toString(RexCall.java:100) at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:84) at org.apache.calcite.rex.RexCall.toString(RexCall.java:100) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates$JoinConditionBasedPredicateInference.infer(HiveRelMdPredicates.java:516) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates$JoinConditionBasedPredicateInference.inferPredicates(HiveRelMdPredicates.java:426) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates.getPredicates(HiveRelMdPredicates.java:186) at GeneratedMetadataHandler_Predicates.getPredicates_$(Unknown Source) at GeneratedMetadataHandler_Predicates.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:721) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveJoinPushTransitivePredicatesRule.onMatch(HiveJoinPushTransitivePredicatesRule.java:83) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:314) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:502) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:381) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:275) at org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:206) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:193) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1575) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:1448) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1174) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1096) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:905) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:920) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:330) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11206) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:251) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyze
[jira] [Created] (HIVE-18859) Incorrect handling of thrift metastore exceptions
Ganesha Shreedhara created HIVE-18859: - Summary: Incorrect handling of thrift metastore exceptions Key: HIVE-18859 URL: https://issues.apache.org/jira/browse/HIVE-18859 Project: Hive Issue Type: Bug Affects Versions: 2.1.1, 1.2.0 Reporter: Ganesha Shreedhara Assignee: Ganesha Shreedhara Currently any run time exception thrown in thrift metastore during the following operations is not getting sent to hive execution engine. * grant/revoke role * grant/revoke privileges * create role This is because ThriftHiveMetastore just handles MetaException and throws TException during the processing of these requests. So, the command just fails at thrift metastore end (Exception can be seen in metastore log) but the hive execution engine will keep on waiting for the response from thrift metatstore. Steps to reproduce this problem : Launch thrift metastore Launch hive cli by passing --hiveconf hive.metastore.uris=thrift://127.0.0.1:1 (pass the thrift metatstore host and port) Execute the following commands: # set role admin # create role test; (succeeds) # create role test; ( hive version 2.1.1 : command is stuck, waiting for the response from thrift metastore; hive version 1.2.1: command fails with exception as null) -- This message was sent by Atlassian JIRA (v7.6.3#76005)