[jira] [Created] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-25765:
-

 Summary: skip.header.line.count property skips rows of each block 
in FetchOperator when file size is larger
 Key: HIVE-25765
 URL: https://issues.apache.org/jira/browse/HIVE-25765
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Ganesha Shreedhara
 Attachments: data.txt.gz

When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because of 
[this 
check|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330].]
 We should have similar check in FetchOperator to avoid this issue. 

 

 

*Steps to reproduce:* 

 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}
 

 

 {{}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-25494:
-

 Summary: Hive query fails with IndexOutOfBoundsException when a 
struct type column's field is missing in parquet file schema but present in 
table schema
 Key: HIVE-25494
 URL: https://issues.apache.org/jira/browse/HIVE-25494
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
 Attachments: test-struct.parquet

When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema 
([Ref|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).]
 On a parquet side, this missing field gets pruned and since this field belongs 
to struct type, it ends creating a GroupColumnIO without any children. This 
causes query to fail with IndexOutOfBoundsException, stack trace is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Same query works fine in the following scenarios:


1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  

[jira] [Created] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-24209:
-

 Summary: Search argument conversion is incorrect for NOT BETWEEN 
operation when vectorization is enabled
 Key: HIVE-24209
 URL: https://issues.apache.org/jira/browse/HIVE-24209
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
operation when vectorization is enabled because of the improvement done as part 
of HIVE-15884. But, this is not handled during the conversion of filter 
expression to search argument due to which incorrect predicate gets pushed down 
to storage layer that leads to incorrect splits generation and incorrect 
result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-23756:
-

 Summary: drop table fails with 
MySQLIntegrityConstraintViolationException:
 Key: HIVE-23756
 URL: https://issues.apache.org/jira/browse/HIVE-23756
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


Drop table command fails intermittently with the following exception.
{code:java}
Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent row: 
a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
"COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
 com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
Appat 
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
at 
org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
... 36 more 
Caused by: 
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
Cannot delete or update a parent row: a foreign key constraint fails 
("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
REFERENCES "CDS" ("CD_ID"))
at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
table specified in package.jdo file is not same as the FK constraint name used 
while creating COLUMNS_V2 table 
([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-23473:
-

 Summary: Handle NPE when ObjectCache is null while getting 
DynamicValue during ORC split generation
 Key: HIVE-23473
 URL: https://issues.apache.org/jira/browse/HIVE-23473
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


NullPointerException is thrown in the following flow.

 

 
{code:java}
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.NullPointerException
Caused by: java.lang.NullPointerException
at 
org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
.
.
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
 at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
{code}
 

Shouldn't we just throw NoDynamicValuesException when 
[ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]]
 is null instead of returning it similar to how we handled when [conf 
|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or
 
[DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]]
 is null while getting dynamic value?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22963) HiveParser is misinterpreting quotes in from/to string of translate function

2020-03-02 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-22963:
-

 Summary: HiveParser is misinterpreting quotes in from/to string of 
translate function 
 Key: HIVE-22963
 URL: https://issues.apache.org/jira/browse/HIVE-22963
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Ganesha Shreedhara


Parsing of query fails when we use single or double quotes in from/to string of 
translate function in 2.3*/3.1.1 version of hive. Parsing of the same query is 
successful in 2.1.1 version of hive.

*Steps to reproduce:*

 
{code:java}
CREATE TABLE test_table (data string);
INSERT INTO test_table VALUES("d\"a\"t\"a");
select translate(data, '"', '') from test_table;
{code}
 

 

Parsing fails with the following exception:
{code:java}
 NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
)])NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
)]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at 
org.antlr.runtime.DFA.predict(DFA.java:116) at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource0(HiveParser_FromClauseParser.java:2942)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2880)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1451)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1341)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45811) at 
org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:39699)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:39951)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:39597) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:38786)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:38674)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2340) 
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1369) 
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at 
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at 
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.util.RunJar.run(RunJar.java:223) at 
org.apache.hadoop.util.RunJar.main(RunJar.java:136)FAILED: ParseException line 
1:40 cannot recognize input near 'tt' ';' '' in from source 
0org.apache.hadoop.hive.ql.parse.ParseException: line 1:40 cannot recognize 
input near 'tt' ';' '' in from source 0 at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at 
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at 
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at 
sun.reflect.Nat

[jira] [Created] (HIVE-22670) ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file

2019-12-26 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-22670:
-

 Summary: ArrayIndexOutOfBoundsException when vectorized reader is 
used for reading a parquet file
 Key: HIVE-22670
 URL: https://issues.apache.org/jira/browse/HIVE-22670
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.6, 3.1.2
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds 
of a row group in parquet file with vectorization enabled. 

*Exception stack trace:*
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
 at 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
 ... 24 more{code}
 

This issue seems to be caused by re-using the same dictionary column vector 
while reading consecutive row groups. This looks like one of the corner case 
bug which occurs for a certain distribution of dictionary/plain encoded data 
while we read/populate the underlying bit packed dictionary data into a 
column-vector based data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22233) Wrong result with vectorized execution when column value is casted to TINYINT

2019-09-23 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-22233:
-

 Summary: Wrong result with vectorized execution when column value 
is casted to TINYINT
 Key: HIVE-22233
 URL: https://issues.apache.org/jira/browse/HIVE-22233
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Ganesha Shreedhara


Casting a column value to TINYINT is giving incorrect result when vectorized 
execution is enabled. This is only when the sub query as SUM/COUNT aggregation 
operations in IF condition.  

*Steps to reproduce:*

 
{code:java}
create table test(id int);
insert into test values (1);
SELECT CAST(col AS TINYINT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col 
FROM test) x;
{code}
 

*Result:*
{code:java}
0{code}
*Expected result:*
{code:java}
1{code}
 

We get the expected result when hive.vectorized.execution.enabled is disabled. 

We also get the expected result when we don't CAST or don't have SUM/COUNT 
aggregation in IF condition.

The following queries give correct result when vectorized execution is enabled. 
 

 

 
{code:java}
SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM 
test) x;
SELECT col FROM ( SELECT IF(SUM(1) > 0, 1, 0) col FROM test) x;
SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(2 > 1, 1, 0) col FROM test) x;
SELECT CAST(col AS INT) col_cast FROM ( SELECT IF(true, 1, 0) col FROM test) x;
{code}
 

 

This issue is only when we use *CAST(col AS TINYINT)* along with *IF(SUM(1) > 
0, 1, 0) or IF(COUNT(1) > 0, 1, 0)* in sub query. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-21660) Wrong result when union all and later view with explode is used

2019-04-29 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-21660:
-

 Summary: Wrong result when union all and later view with explode 
is used
 Key: HIVE-21660
 URL: https://issues.apache.org/jira/browse/HIVE-21660
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Ganesha Shreedhara


There is a data loss when the data is inserted to a partitioned table using 
union all and lateral view with explode. 

 

*Steps to reproduce:*

 
{code:java}
create table t1 (id int, dt string);
insert into t1 values (2, '2019-04-01');
create table t3( id int, dates array);
insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') as 
dates;

create table dst (id int) partitioned by (dt string);

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
insert overwrite table dst partition (dt)
select t.id, t.dt from (
select id, dt from t1
union all
select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) t;
select * from dst_hdfs;
{code}
 

 

*Actual Result:*
{code:java}
+--+--+
| 2| 2019-04-01   |
+--+--+{code}
 

*Expected Result* (Run only the select part from the above insert query)*:* 
{code:java}
+---++
| 2     | 2019-04-01 |
| 1     | 2019-01-01 |
| 1     | 2019-01-02 |
| 1     | 2019-01-03 |
+---++{code}
 

Data retrieved using union all and lateral view with explode from second table 
is missing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift

2019-03-22 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-21492:
-

 Summary: VectorizedParquetRecordReader can't to read parquet file 
generated using thrift
 Key: HIVE-21492
 URL: https://issues.apache.org/jira/browse/HIVE-21492
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


Taking an example of a parquet table having array of integers as below. 

 
{code:java}
CREATE EXTERNAL TABLE ( list_of_ints` array)
STORED AS PARQUET 
LOCATION '{location}';
{code}
 

Parquet file generated using hive will have schema for Type as below:

 
{code:java}
group list_of_ints (LIST) { repeated group bag { optional int32 
array;\n};\n}{code}
 

 

Parquet file generated using thrift may have schema for Type as below:

 
{code:java}
required group list_of_ints (LIST) { repeated int32 list_of_tuple}{code}
 

 

VectorizedParquetRecordReader handles only parquet file generated using hive. 
It throws the following exception when parquet file generated using thrift is 
read because of the changes done as part of 
[HIVE-18553|https://issues.apache.org/jira/browse/HIVE-18553] .
{code:java}
Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
not a group
 at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
 

 I have done a small change to handle the case where the child type of group 
type can be PrimitiveType.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21428) field delimiter of serde set for partition is not getting respected when vectorization is enabled

2019-03-11 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-21428:
-

 Summary: field delimiter of serde set for partition is not getting 
respected when vectorization is enabled
 Key: HIVE-21428
 URL: https://issues.apache.org/jira/browse/HIVE-21428
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Ganesha Shreedhara


 

*Steps to reproduce:*

create external table src (c1 string, c2, string, c3 string) partitioned by 
(part string)

location '/tmp/src';

 

 

echo "d1\td2"  >> data.txt;

hadoop dfs -put  data.txt /tmp/src/part=part1/;

 

MSCK REPAIR TABLE src;

 

ALTER TABLE src PARTITION (part='part1')

SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 
'field.delim'='\t');

 

create table dest (c1 string, c2 string, c3 string, c4 string);

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (wrong)*:*

d1 d2 NULL NULL part1

 

set hive.vectorized.execution.enabled=false;

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (Correct)*:*

d1 d2 NULL part1

 

This is because "d1\td2" is getting considered as single column because the 
filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at 
partition level.

It is working fine if I alter the field delimiter of serde for the entire table.

So, looks like serde properties in TableDesc is taking precedence over serde 
properties in PartitionDesc. 

This issue is not there in 2.x versions. 

 

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled

2018-08-10 Thread Ganesha Shreedhara
/nullgroup.q.out 318488d62f 
  ql/src/test/results/clientpositive/spark/nullgroup2.q.out 93b4d3b6c0 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11 
  ql/src/test/results/clientpositive/spark/union33.q.out 3117c56390 
  ql/src/test/results/clientpositive/union33.q.out 57c53089b9 
  ql/src/test/results/clientpositive/vector_groupby4.q.out 15b0427308 
  ql/src/test/results/clientpositive/vector_groupby6.q.out 31472a1ea9 


Diff: https://reviews.apache.org/r/68121/diff/3/

Changes: https://reviews.apache.org/r/68121/diff/2-3/


Testing
---

Qtests added


Thanks,

Ganesha Shreedhara



HIVE-20220: Incorrect result when hive.groupby.skewindata is enabled

2018-08-06 Thread Ganesha Shreedhara
Hi Team,

I found a corner case bug with *hive.groupby.skewindata* configuration
parameter as explained in the following jira.

https://issues.apache.org/jira/browse/HIVE-20220

I need some help in reviewing the fix.

RB request: https://reviews.apache.org/r/68121/


Thanks,
Ganesha


Re: Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled

2018-07-31 Thread Ganesha Shreedhara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68121/
---

(Updated July 31, 2018, 2:06 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by 
keys to the reducers and hence avoids overloading a single reducer when there 
is a skew in data. 

This random distribution of keys is buggy when the reducer fails to fetch the 
mapper output due to a faulty datanode or any other reason. When reducer finds 
that it can't fetch mapper output, it sends a signal to Application Master to 
reattempt the corresponding map task. The reattempted map task will now get the 
different random value from rand function and hence the keys that gets 
distributed now to the reducer will not be same as the previous run. 

 

Steps to reproduce:

create table test(id int);

insert into test values 
(1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9);

SET hive.groupby.skewindata=true;

SET mapreduce.reduce.reduces=2;

//Add a debug port for reducer

select count(1) from test group by id;

//Remove mapper's intermediate output file when map stage is completed and one 
out of 2 reduce tasks is completed and then continue the run. This causes 2nd 
reducer to send event to Application Master to rerun the map task. 

The following is the expected result. 

1
2
3
4
5
6
8
8
9 

 

But you may get different result due to a different value returned by the rand 
function in the second run causing different distribution of keys.

This needs to be fixed such that the mapper distributes the same keys even if 
it is reattempted multiple times.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39c77b3fe5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 250a085084 
  ql/src/test/queries/clientpositive/groupby_skew_rand_seed.q PRE-CREATION 
  ql/src/test/results/clientpositive/groupby_skew_rand_seed.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/68121/diff/2/

Changes: https://reviews.apache.org/r/68121/diff/1-2/


Testing
---

Qtests added


Thanks,

Ganesha Shreedhara



Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled

2018-07-30 Thread Ganesha Shreedhara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68121/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by 
keys to the reducers and hence avoids overloading a single reducer when there 
is a skew in data. 

This random distribution of keys is buggy when the reducer fails to fetch the 
mapper output due to a faulty datanode or any other reason. When reducer finds 
that it can't fetch mapper output, it sends a signal to Application Master to 
reattempt the corresponding map task. The reattempted map task will now get the 
different random value from rand function and hence the keys that gets 
distributed now to the reducer will not be same as the previous run. 

 

Steps to reproduce:

create table test(id int);

insert into test values 
(1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9);

SET hive.groupby.skewindata=true;

SET mapreduce.reduce.reduces=2;

//Add a debug port for reducer

select count(1) from test group by id;

//Remove mapper's intermediate output file when map stage is completed and one 
out of 2 reduce tasks is completed and then continue the run. This causes 2nd 
reducer to send event to Application Master to rerun the map task. 

The following is the expected result. 

1
2
3
4
5
6
8
8
9 

 

But you may get different result due to a different value returned by the rand 
function in the second run causing different distribution of keys.

This needs to be fixed such that the mapper distributes the same keys even if 
it is reattempted multiple times.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39c77b3fe5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 250a085084 
  ql/src/test/queries/clientpositive/groupby_skew_rand_seed.q PRE-CREATION 
  ql/src/test/queries/clientpositive/groupby_skew_rand_seed1.q PRE-CREATION 
  ql/src/test/results/clientpositive/groupby_skew_rand_seed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/groupby_skew_rand_seed1.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/68121/diff/1/


Testing
---

Qtests added


Thanks,

Ganesha Shreedhara



[jira] [Created] (HIVE-20220) Incorrect result when hive.groupby.skewindata is enabled

2018-07-21 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-20220:
-

 Summary: Incorrect result when hive.groupby.skewindata is enabled
 Key: HIVE-20220
 URL: https://issues.apache.org/jira/browse/HIVE-20220
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara


hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by 
keys to the reducers and hence avoids overloading a single reducer when there 
is a skew in data. 

This random distribution of keys is buggy when the reducer fails to fetch the 
mapper output due to a faulty datanode or any other reason. When reducer finds 
that it can't fetch mapper output, it sends a signal to Application Master to 
reattempt the corresponding map task. The reattempted map task will now get the 
different random value from rand function and hence the keys that gets 
distributed now to the reducer will not be same as the previous run. 

 

*Steps to reproduce:*

create table test(id int);

insert into test values 
(1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9);

SET hive.groupby.skewindata=true;

SET mapreduce.reduce.reduces=2;

//Add a debug port for reducer

select count(1) from test group by id;

//Remove mapper's intermediate output file when map stage is completed and one 
out of 2 reduce tasks is completed and then continue the run. This causes 2nd 
reducer to send event to Application Master to rerun the map task. 

The following is the expected result. 

1
2
3
4
5
6
8
8
9 

 

But you may get different result due to a different value returned by the rand 
function in the second run causing different distribution of keys.

This needs to be fixed such that the mapper distributes the same keys even if 
it is reattempted multiple times. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19850) Dynamic partition pruning in Tez is leading to 'No work found for tablescan' error

2018-06-10 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-19850:
-

 Summary: Dynamic partition pruning in Tez is leading to 'No work 
found for tablescan' error
 Key: HIVE-19850
 URL: https://issues.apache.org/jira/browse/HIVE-19850
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara


 

When multiple views are used along with union all, it is resulting in the 
following error when dynamic partition pruning is enabled in tez. 

 
{code:java}
Exception in thread "main" java.lang.AssertionError: No work found for 
tablescan TS[8]
 at 
org.apache.hadoop.hive.ql.parse.GenTezUtils.processAppMasterEvent(GenTezUtils.java:408)
 at 
org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:383)
 at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10371)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:347)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1203)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1257)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1130)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:204)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:433)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:894)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:825)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:726)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:223)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136){code}
 

*Steps to reproduce:*

set hive.execution.engine=tez;

hive.tez.dynamic.partition.pruning=true;

CREATE TABLE t1(key string, value string, c_int int, c_float float, c_boolean 
boolean) partitioned by (dt string);

CREATE TABLE t2(key string, value string, c_int int, c_float float, c_boolean 
boolean) partitioned by (dt string);

CREATE TABLE t3(key string, value string, c_int int, c_float float, c_boolean 
boolean) partitioned by (dt string);

 

insert into table t1 partition(dt='2018') values ('k1','v1',1,1.0,true);

insert into table t2 partition(dt='2018') values ('k2','v2',2,2.0,true);

insert into table t3 partition(dt='2018') values ('k3','v3',3,3.0,true);

 

CREATE VIEW `view1` AS select 
`t1`.`key`,`t1`.`value`,`t1`.`c_int`,`t1`.`c_float`,`t1`.`c_boolean`,`t1`.`dt` 
from `t1` union all select 
`t2`.`key`,`t2`.`value`,`t2`.`c_int`,`t2`.`c_float`,`t2`.`c_boolean`,`t2`.`dt` 
from `t2`;


CREATE VIEW `view2` AS select 
`t2`.`key`,`t2`.`value`,`t2`.`c_int`,`t2`.`c_float`,`t2`.`c_boolean`,`t2`.`dt` 
from `t2` union all select 
`t3`.`key`,`t3`.`value`,`t3`.`c_int`,`t3`.`c_float`,`t3`.`c_boolean`,`t3`.`dt` 
from `t3`;


create table t4 as select key,value,c_int,c_float,c_boolean,dt from t1 union 
all select v1.key,v1.value,v1.c_int,v1.c_float,v1.c_boolean,v1.dt from view1 v1 
join view2 v2 on v1.dt=v2.dt;


CREATE VIEW `view3` AS select 
`t4`.`key`,`t4`.`value`,`t4`.`c_int`,`t4`.`c_float`,`t4`.`c_boolean`,`t4`.`dt` 
from `t4` union all select 
`t1`.`key`,`t1`.`value`,`t1`.`c_int`,`t1`.`c_float`,`t1`.`c_boolean`,`t1`.`dt` 
from `t1`;

 

select count(0) from view2 v2 join view3 v3 on v2.dt=v3.dt; // Throws No work 
found for tablescan error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19101) Apply rule [HiveJoinPushTransitivePredicatesRule] is getting stuck when there are huge number of predicates

2018-04-04 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-19101:
-

 Summary: Apply rule [HiveJoinPushTransitivePredicatesRule] is 
getting stuck when there are huge number of predicates 
 Key: HIVE-19101
 URL: https://issues.apache.org/jira/browse/HIVE-19101
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.3.2, 2.3.1, 2.3.0, 2.2.0, 2.1.1
Reporter: Ganesha Shreedhara
 Attachments: queries

Hive query is getting stuck during the optimisation phase while applying 
HiveJoinPushTransitivePredicatesRule when there are huge number of predicates.

 

*DEBUG Log:*
{code:java}
2018-04-04T11:22:47,991 [user: ganeshas] -1 DEBUG 
[6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10963: 
Apply rule [ReduceExpressionsRule(Join)] to 
[rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1,
 $73),joinType=inner,algorithm=none,cost=not available)]
2018-04-04T11:22:48,359 [user: ganeshas] -1 DEBUG 
[6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10964: 
Apply rule [HiveJoinAddNotNullRule] to 
[rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1,
 $73),joinType=inner,algorithm=none,cost=not available)]
2018-04-04T11:22:48,360 [user: ganeshas] -1 DEBUG 
[6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main] plan.RelOptPlanner: call#10965: 
Apply rule [HiveJoinPushTransitivePredicatesRule] to 
[rel#881:HiveJoin.HIVE.[](left=HepRelVertex#879,right=HepRelVertex#887,condition==($1,
 $73),joinType=inner,algorithm=none,cost=not available)]{code}
 

*Thread Status:*

 
{code:java}
"6f5e8faa-505c-48e3-a2cd-ce7bfced27f0 main" prio=5 tid=0x7ff18e006800 
nid=0x1c03 runnable [0x78176000]
 java.lang.Thread.State: RUNNABLE
 at java.util.Arrays.copyOfRange(Arrays.java:2694)
 at java.lang.String.(String.java:203)
 at java.lang.StringBuilder.toString(StringBuilder.java:405)
 at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:95)
 at org.apache.calcite.rex.RexCall.toString(RexCall.java:100)
 at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:84)
 at org.apache.calcite.rex.RexCall.toString(RexCall.java:100)
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates$JoinConditionBasedPredicateInference.infer(HiveRelMdPredicates.java:516)
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates$JoinConditionBasedPredicateInference.inferPredicates(HiveRelMdPredicates.java:426)
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdPredicates.getPredicates(HiveRelMdPredicates.java:186)
 at GeneratedMetadataHandler_Predicates.getPredicates_$(Unknown Source)
 at GeneratedMetadataHandler_Predicates.getPredicates(Unknown Source)
 at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:721)
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveJoinPushTransitivePredicatesRule.onMatch(HiveJoinPushTransitivePredicatesRule.java:83)
 at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:314)
 at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:502)
 at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:381)
 at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:275)
 at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72)
 at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:206)
 at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:193)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1575)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:1448)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1174)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1096)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:905)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:920)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:330)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11206)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:251)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyze

[jira] [Created] (HIVE-18859) Incorrect handling of thrift metastore exceptions

2018-03-05 Thread Ganesha Shreedhara (JIRA)
Ganesha Shreedhara created HIVE-18859:
-

 Summary: Incorrect handling of thrift metastore exceptions
 Key: HIVE-18859
 URL: https://issues.apache.org/jira/browse/HIVE-18859
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.1, 1.2.0
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


Currently any run time exception thrown in thrift metastore during the 
following operations is not getting sent to hive execution engine.
 * grant/revoke role
 * grant/revoke privileges
 * create role

This is because ThriftHiveMetastore just handles MetaException and throws 
TException during the processing of these requests.  So, the command just fails 
at thrift metastore end (Exception can be seen in metastore log) but the hive 
execution engine will keep on waiting for the response from thrift metatstore.

 

Steps to reproduce this problem :

Launch thrift metastore

Launch hive cli by passing --hiveconf 
hive.metastore.uris=thrift://127.0.0.1:1 (pass the thrift metatstore host 
and port)

Execute the following commands:
 # set role admin
 # create role test; (succeeds)
 # create role test; ( hive version 2.1.1 : command is stuck, waiting for the 
response from thrift metastore; hive version 1.2.1: command fails with 
exception as null) 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)