[jira] [Updated] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21492:
--
Status: Patch Available  (was: In Progress)

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21492:
--
Attachment: HIVE-21492.3.patch

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21492:
--
Status: In Progress  (was: Patch Available)

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22963) HiveParser misinterpretes quotes in parameters of built-in functions or UDFs

2020-03-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-22963:
-

Assignee: Ganesha Shreedhara

> HiveParser misinterpretes quotes in parameters of built-in functions or UDFs
> 
>
> Key: HIVE-22963
> URL: https://issues.apache.org/jira/browse/HIVE-22963
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 3.1.1, 2.3.6
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> Parsing of query fails when we use single or double quotes in from/to string 
> of translate function in 2.3*/3.1.1 version of hive. Parsing of the same 
> query is successful in 2.1.1 version of hive.
> *Steps to reproduce:*
>  
> {code:java}
> CREATE TABLE test_table (data string);
> INSERT INTO test_table VALUES("d\"a\"t\"a");
> select translate(data, '"', '') from test_table;
> {code}
>  
>  
> Parsing fails with the following exception:
> {code:java}
>  NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )])NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at 
> org.antlr.runtime.DFA.predict(DFA.java:116) at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource0(HiveParser_FromClauseParser.java:2942)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2880)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1451)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1341)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45811) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:39699)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:39951)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:39597) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:38786)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:38674)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2340)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1369) at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:223) at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:136)FAILED: ParseException 
> line 1:40 cannot recognize input near 'tt' ';' '' in from source 
> 0org.apache.hadoop.hive.ql.parse.ParseException: line 1:40 cannot recognize 
> input near 'tt' ';' '' in from source 0 at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 

[jira] [Resolved] (HIVE-22963) HiveParser misinterpretes quotes in parameters of built-in functions or UDFs

2020-03-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara resolved HIVE-22963.
---
Resolution: Duplicate

This issue is fixed as part of HIVE-19948. 

> HiveParser misinterpretes quotes in parameters of built-in functions or UDFs
> 
>
> Key: HIVE-22963
> URL: https://issues.apache.org/jira/browse/HIVE-22963
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 3.1.1, 2.3.6
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> Parsing of query fails when we use single or double quotes in from/to string 
> of translate function in 2.3*/3.1.1 version of hive. Parsing of the same 
> query is successful in 2.1.1 version of hive.
> *Steps to reproduce:*
>  
> {code:java}
> CREATE TABLE test_table (data string);
> INSERT INTO test_table VALUES("d\"a\"t\"a");
> select translate(data, '"', '') from test_table;
> {code}
>  
>  
> Parsing fails with the following exception:
> {code:java}
>  NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )])NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at 
> org.antlr.runtime.DFA.predict(DFA.java:116) at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource0(HiveParser_FromClauseParser.java:2942)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2880)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1451)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1341)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45811) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:39699)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:39951)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:39597) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:38786)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:38674)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2340)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1369) at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:223) at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:136)FAILED: ParseException 
> line 1:40 cannot recognize input near 'tt' ';' '' in from source 
> 0org.apache.hadoop.hive.ql.parse.ParseException: line 1:40 cannot recognize 
> input near 'tt' ';' '' in from source 0 at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
> 

[jira] [Issue Comment Deleted] (HIVE-22963) HiveParser misinterpretes quotes in parameters of built-in functions or UDFs

2020-03-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-22963:
--
Comment: was deleted

(was: [~pxiong] Can you please help with understanding if this is an expected 
behaviour? Does it require escaping double quote if it is is enclosed between 
single quotes in the parameter of a function? Also SelectClauseParser is able 
to parse the SelectExpression here, the exception is actually thrown by 
FromClauseParser even though the escaping is required in SelectExpression. 

I suspect that this behaviour is because of the changes done as part of 
https://issues.apache.org/jira/browse/HIVE-12764. Please check and advise. )

> HiveParser misinterpretes quotes in parameters of built-in functions or UDFs
> 
>
> Key: HIVE-22963
> URL: https://issues.apache.org/jira/browse/HIVE-22963
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 3.1.1, 2.3.6
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> Parsing of query fails when we use single or double quotes in from/to string 
> of translate function in 2.3*/3.1.1 version of hive. Parsing of the same 
> query is successful in 2.1.1 version of hive.
> *Steps to reproduce:*
>  
> {code:java}
> CREATE TABLE test_table (data string);
> INSERT INTO test_table VALUES("d\"a\"t\"a");
> select translate(data, '"', '') from test_table;
> {code}
>  
>  
> Parsing fails with the following exception:
> {code:java}
>  NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )])NoViableAltException(355@[157:5: ( ( Identifier LPAREN )=> 
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource 
> )]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at 
> org.antlr.runtime.DFA.predict(DFA.java:116) at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource0(HiveParser_FromClauseParser.java:2942)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2880)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1451)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1341)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45811) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:39699)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:39951)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:39597) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:38786)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:38674)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2340)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1369) at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:507) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1388) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1528) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1308) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298) at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:276) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:465) at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:992) at 
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:916) at 
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:795) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:223) at 
> org.apache.hadoop.util.RunJar.main(RunJar.java:136)FAILED: ParseException 
> line 1:40 cannot recognize input near 'tt' ';' '' in from source 
> 0org.apache.hadoop.hive.ql.parse.ParseException: line 1:40 cannot recognize 
> input near 'tt' ';' '' in from source 0 at 
> 

[jira] [Updated] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21492:
--
Status: Patch Available  (was: In Progress)

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21492 started by Ganesha Shreedhara.
-
> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21492:
--
Status: Open  (was: Patch Available)

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073324#comment-17073324
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Test failures are unrelated and mostly because the metstore server was down.
{code:java}
Could not connect to meta store using any of the URIs provided. Most recent 
failure: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused at 
org.apache.thrift.transport.TSocket.open(TSocket.java:226) at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:686){code}
Can we rerun the tests?

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-06 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076226#comment-17076226
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Test failures are unrelated. These tests were passed in the previous run.

[~Ferd] Are we good to merge this fix to master? 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, 
> HIVE-21492.4.patch, HIVE-21492.5.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108273#comment-17108273
 ] 

Ganesha Shreedhara commented on HIVE-23473:
---

[~jdere], [~ashutoshc] Please review the patch.

 

 

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.patch
>
>
> NullPointerException is thrown in the following flow.
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
> null instead of returning it similar to how we handled when [conf |#L110]]or 
> [DynamicValueRegistry|#L125]] is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-18 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Status: In Progress  (was: Patch Available)

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.1.patch
>
>
> NullPointerException is thrown in the following flow.
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
> null instead of returning it similar to how we handled when [conf |#L110]]or 
> [DynamicValueRegistry|#L125]] is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-18 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Status: Patch Available  (was: In Progress)

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.1.patch
>
>
> NullPointerException is thrown in the following flow.
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
> null instead of returning it similar to how we handled when [conf |#L110]]or 
> [DynamicValueRegistry|#L125]] is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-18 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Attachment: HIVE-23473.1.patch

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.1.patch
>
>
> NullPointerException is thrown in the following flow.
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
> null instead of returning it similar to how we handled when [conf |#L110]]or 
> [DynamicValueRegistry|#L125]] is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-18 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Attachment: (was: HIVE-23473.patch)

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.1.patch
>
>
> NullPointerException is thrown in the following flow.
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
> null instead of returning it similar to how we handled when [conf |#L110]]or 
> [DynamicValueRegistry|#L125]] is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-23473:
-


> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> NullPointerException is thrown in the following flow.
>  
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when 
> [ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]]
>  is null instead of returning it similar to how we handled when [conf 
> |[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or
>  
> [DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]]
>  is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Description: 
NullPointerException is thrown in the following flow.

 
{code:java}
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.NullPointerException
Caused by: java.lang.NullPointerException
at 
org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
.
.
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
 at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
{code}
 

Shouldn't we just throw NoDynamicValuesException when [ObjectCache|#L119]] is 
null instead of returning it similar to how we handled when [conf |#L110]]or 
[DynamicValueRegistry|#L125]] is null while getting dynamic value?

 

  was:
NullPointerException is thrown in the following flow.

 

 
{code:java}
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.NullPointerException
Caused by: java.lang.NullPointerException
at 
org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
.
.
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
 at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
{code}
 

Shouldn't we just throw NoDynamicValuesException when 
[ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]]
 is null instead of returning it similar to how we handled when [conf 
|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or
 
[DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]]
 is null while getting dynamic value?

 


> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.patch
>
>
> 

[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Attachment: HIVE-23473.patch

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.patch
>
>
> NullPointerException is thrown in the following flow.
>  
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when 
> [ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]]
>  is null instead of returning it similar to how we handled when [conf 
> |[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or
>  
> [DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]]
>  is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23473) Handle NPE when ObjectCache is null while getting DynamicValue during ORC split generation

2020-05-15 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23473:
--
Status: Patch Available  (was: Open)

> Handle NPE when ObjectCache is null while getting DynamicValue during ORC 
> split generation
> --
>
> Key: HIVE-23473
> URL: https://issues.apache.org/jira/browse/HIVE-23473
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23473.patch
>
>
> NullPointerException is thrown in the following flow.
>  
>  
> {code:java}
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
> at 
> org.apache.orc.impl.RecordReaderImpl.compareToRange(RecordReaderImpl.java:312)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:559)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
> at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicate(RecordReaderImpl.java:440)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isStripeSatisfyPredicate(OrcInputFormat.java:2214)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripesInternal(OrcInputFormat.java:2190)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.pickStripes(OrcInputFormat.java:2182)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$3000(OrcInputFormat.java:186)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1477)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1265)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1446)
> .
> .
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1895)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:526)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:649)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:206)
> {code}
>  
> Shouldn't we just throw NoDynamicValuesException when 
> [ObjectCache|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L119]]
>  is null instead of returning it similar to how we handled when [conf 
> |[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L110]]or
>  
> [DynamicValueRegistry|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java#L125]]
>  is null while getting dynamic value?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21660) Wrong result when union all and later view with explode is used

2020-03-09 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052726#comment-17052726
 ] 

Ganesha Shreedhara edited comment on HIVE-21660 at 3/9/20, 3:54 PM:


[~jcamachorodriguez] It looks like I do not have permission to create PR. I 
have created RB request ([https://reviews.apache.org/r/72203/]). Please review. 


was (Author: ganeshas):
[~jcamachorodriguez] It looks like I do not have permission to create PR. I 
have created RB req request ([https://reviews.apache.org/r/72203/]) . Please 
review. 

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2020-05-21 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112885#comment-17112885
 ] 

Ganesha Shreedhara commented on HIVE-19994:
---

[~karthik.manamcheri]  FK constraint name is COLUMNS_V2_FK1 in all the 
metastore scripts. Shouldn't we specify FK constraint name as COLUMNS_V2_FK1 
instead of COLUMNS_V2_FK in package.jdo?

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-10-05 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Comment: was deleted

(was: [~ashutoshc] Thanks for reviewing. Please help with pushing this fix to 
master. )

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-24209:
-


> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: orc_test_ppd

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch, orc_test_ppd
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Status: Patch Available  (was: Open)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: HIVE-24209.patch

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-24209.patch
>
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: (was: orc_test_ppd)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: (was: HIVE-24209.patch)

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24209) Search argument conversion is incorrect for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204104#comment-17204104
 ] 

Ganesha Shreedhara commented on HIVE-24209:
---

[~pxiong], [~ashutoshc] Please review the PR. 

> Search argument conversion is incorrect for NOT BETWEEN operation when 
> vectorization is enabled
> ---
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-29 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Summary: Incorrect search argument conversion for NOT BETWEEN operation 
when vectorization is enabled  (was: Search argument conversion is incorrect 
for NOT BETWEEN operation when vectorization is enabled)

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-30 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Status: In Progress  (was: Patch Available)

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-30 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Status: Patch Available  (was: In Progress)

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-30 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Attachment: HIVE-24209.patch

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-30 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Component/s: Vectorization

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-09-30 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205215#comment-17205215
 ] 

Ganesha Shreedhara commented on HIVE-24209:
---

[~ashutoshc] Thanks for reviewing. Please help with pushing this fix to master. 

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23756:
--
Summary: drop table command fails with 
MySQLIntegrityConstraintViolationException:  (was: drop table fails with 
MySQLIntegrityConstraintViolationException:)

> drop table command fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143719#comment-17143719
 ] 

Ganesha Shreedhara commented on HIVE-23756:
---

[~karthik.manamcheri], [~ngangam] Please review the patch. 

> drop table fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23756:
--
Attachment: HIVE-23756.1.patch

> drop table fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23756:
--
Status: Patch Available  (was: In Progress)

> drop table fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23756 started by Ganesha Shreedhara.
-
> drop table fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23756) drop table fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-23756:
-


> drop table fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-23756:
--
Description: 
Drop table command fails intermittently with the following exception.
{code:java}
Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent row: 
a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
"COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
 com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
Appat 
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
at 
org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
... 36 more 
Caused by: 
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
Cannot delete or update a parent row: a foreign key constraint fails 
("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
REFERENCES "CDS" ("CD_ID"))
at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
table specified in package.jdo file is not same as the FK constraint name used 
while creating COLUMNS_V2 table ([Ref|#L60]]). 

  was:
Drop table command fails intermittently with the following exception.
{code:java}
Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent row: 
a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
"COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
 com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
Appat 
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
at 
org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
... 36 more 
Caused by: 
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
Cannot delete or update a parent row: a foreign key constraint fails 
("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
REFERENCES "CDS" ("CD_ID"))
at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
table specified in package.jdo file is not same as the FK constraint name used 
while creating COLUMNS_V2 table 
([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
 


> drop table command fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: 

[jira] [Commented] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:

2020-06-24 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143966#comment-17143966
 ] 

Ganesha Shreedhara commented on HIVE-23756:
---

[~ngangam] I couldn't reproduce this locally. This issue was intermittent. But 
we were getting the same issue even after back porting the fix available in 
HIVE-19994. Also, it was reported in the same ticket that using FK constraint 
name as `COLUMNS_V2_FK` was not working and user had to use the exact FK 
constraint name which is `COLUMNS_V2_FK1`(Ref: [this 
comment|https://issues.apache.org/jira/browse/HIVE-19994?focusedCommentId=16895036=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16895036]).
 

> drop table command fails with MySQLIntegrityConstraintViolationException:
> -
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-23756.1.patch
>
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constrain name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table 
> ([Ref|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-3.2.0.mysql.sql#L60]]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.0

2021-05-24 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350340#comment-17350340
 ] 

Ganesha Shreedhara commented on HIVE-24484:
---

Hi [~belugabehr], Could you please update on the status here? Please also let 
me know if you have any ETA to complete this task.

> Upgrade Hadoop to 3.3.0
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema (Ref: 
[code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
 On a parquet side, this missing field gets pruned and since this field belongs 
to struct type, it ends creating a GroupColumnIO without any children. This 
causes query to fail with IndexOutOfBoundsException, stack trace is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       
0--
VERTICES: 01/01  [==>>] 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema (Ref 
[code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
 On a parquet side, this missing field gets pruned and since this field belongs 
to struct type, it ends creating a GroupColumnIO without any children. This 
causes query to fail with IndexOutOfBoundsException, stack trace is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       
0--
VERTICES: 01/01  [==>>] 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema 
([Ref|#L130]). On a parquet side, this missing field gets pruned and since this 
field belongs to struct type, it ends creating a GroupColumnIO without any 
children. This causes query to fail with IndexOutOfBoundsException, stack trace 
is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       
0--
VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 3.06 
s--
OK
NULL 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema 
([Ref|#L130]).]). On a parquet side, this missing field gets pruned and since 
this field belongs to struct type, it ends creating a GroupColumnIO without any 
children. This causes query to fail with IndexOutOfBoundsException, stack trace 
is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       
0--
VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 3.06 
s--
OK
NULL 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema (Ref: 
[code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
 On a parquet side, this missing field gets pruned and since this field belongs 
to struct type, it ends up creating a GroupColumnIO without any children. This 
causes query to fail with IndexOutOfBoundsException, stack trace is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Expected Result:  {{{color:#505f79}NULL toplevel{color}}}

 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       

[jira] [Commented] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407866#comment-17407866
 ] 

Ganesha Shreedhara commented on HIVE-25494:
---

I verified that this issue doesn't exist when requestedSchema just has the 
field types that are present in the file schema. But, noticed that 
VectorizedParquetRecordReader gets all the fields and creates a 
VectorizedDummyColumnReader when a field is not present in file schema. When 
columns are accessed by names, should we consider only the fields that are 
present in the file schema to be present in requestedSchema and return null for 
the rest of the columns that are selected but missing in file schema? 

 

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Priority: Major
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema 
> ([Ref|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).]
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data file to load data to this table
> LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;
> hive> select parent.extracol, toplevel from parquet_struct_test;
> OK
> Failed with exception 
> java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
> read value 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Affects Version/s: 3.1.2

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Priority: Major
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema (Ref: 
> [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data file to load data to this table
> LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;
> hive> select parent.extracol, toplevel from parquet_struct_test;
> OK
> Failed with exception 
> java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
> {code}
>  
> Same query works fine in the following scenarios:
> 1) Accessing parquet file columns by index instead of names
> {code:java}
> hive> set parquet.column.index.access=true;
> hive>  select parent.extracol, toplevel from parquet_struct_test;
> OK
> NULL toplevel{code}
>  
> 2) When VectorizedParquetRecordReader is used
> {code:java}
> hive> set hive.fetch.task.conversion=none;

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Labels: schema-evolution  (was: )

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Priority: Major
>  Labels: schema-evolution
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema (Ref: 
> [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data file to load data to this table
> LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;
> hive> select parent.extracol, toplevel from parquet_struct_test;
> OK
> Failed with exception 
> java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
> {code}
>  
> Same query works fine in the following scenarios:
> 1) Accessing parquet file columns by index instead of names
> {code:java}
> hive> set parquet.column.index.access=true;
> hive>  select parent.extracol, toplevel from parquet_struct_test;
> OK
> NULL toplevel{code}
>  
> 2) When 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Component/s: Parquet

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Priority: Major
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema (Ref: 
> [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data file to load data to this table
> LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;
> hive> select parent.extracol, toplevel from parquet_struct_test;
> OK
> Failed with exception 
> java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
> {code}
>  
> Same query works fine in the following scenarios:
> 1) Accessing parquet file columns by index instead of names
> {code:java}
> hive> set parquet.column.index.access=true;
> hive>  select parent.extracol, toplevel from parquet_struct_test;
> OK
> NULL toplevel{code}
>  
> 2) When VectorizedParquetRecordReader is used
> {code:java}
> hive> set 

[jira] [Updated] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-01 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25494:
--
Description: 
When a struct type column's field is missing in parquet file schema but present 
in table schema and columns are accessed by names, the requestedSchema getting 
sent from Hive to Parquet storage layer has type even for missing field since 
we always add type as primitive type if a field is missing in file schema (Ref: 
[code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
 On a parquet side, this missing field gets pruned and since this field belongs 
to struct type, it ends creating a GroupColumnIO without any children. This 
causes query to fail with IndexOutOfBoundsException, stack trace is given below.

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file test-struct.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
 at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
 at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
 ... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657)
 at java.util.ArrayList.get(ArrayList.java:433)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
 at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
 {code}
 

Steps to reproduce:

 
{code:java}
CREATE TABLE parquet_struct_test(
`parent` struct COMMENT '',
`toplevel` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
 
-- Use the attached test-struct.parquet data file to load data to this table

LOAD DATA LOCAL INPATH 'test-struct.parquet' INTO TABLE parquet_struct_test;

hive> select parent.extracol, toplevel from parquet_struct_test;
OK
Failed with exception 
java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not 
read value at 0 in block -1 in file 
hdfs://${host}/user/hive/warehouse/parquet_struct_test/test-struct.parquet 
{code}
 

Expected Result:  {{{color:#505f79}NULL toplevel{color}}}

 

Same query works fine in the following scenarios:

1) Accessing parquet file columns by index instead of names
{code:java}
hive> set parquet.column.index.access=true;
hive>  select parent.extracol, toplevel from parquet_struct_test;
OK
NULL toplevel{code}
 

2) When VectorizedParquetRecordReader is used
{code:java}
hive> set hive.fetch.task.conversion=none;
hive> select parent.extracol, toplevel from parquet_struct_test;
Query ID = hadoop_20210831154424_19aa6f7f-ab72-4c1e-ae37-4f985e72fce9Total 
jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1630412697229_0031)
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
KILLED--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       

[jira] [Commented] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-02 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408779#comment-17408779
 ] 

Ganesha Shreedhara commented on HIVE-25494:
---

I see that [HIVE-15156|https://issues.apache.org/jira/browse/HIVE-15156] opened 
to support nested column pruning in vectorized parquet reader.

[~Ferd] I have a question on 
[HIVE-13873|https://issues.apache.org/jira/browse/HIVE-13873]. Along with 
pruning the columns based on the selected fields, can we also prune the columns 
that don't exists in parquet file schema and read only the columns or nested 
fields that exist in the file? This happens when parquet columns are accessed 
by indexes but doesn't happen when columns are accessed by names because of 
[this 
line|https://github.com/apache/hive/blame/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130].

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Priority: Major
>  Labels: schema-evolution
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema (Ref: 
> [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends up creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data 

[jira] [Commented] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-04 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453520#comment-17453520
 ] 

Ganesha Shreedhara commented on HIVE-25765:
---

[~pgaref] Yes, this issue is reproducible in the latest master branch. 

> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: data.txt.gz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'location_of_data_file'
> TBLPROPERTIES ('skip.header.line.count'='1');
> -- Counting number of rows gives correct result with only one header line 
> skipped
> select count(*) from test_table;
> 3145727
> -- Select query skips more rows and the result depends upon the number of 
> blocks configured in underlying filesystem. 3 rows are skipped when the file 
> is read in 3 blocks. 
> select * from test_table;
> .
> .
> Fetched 3145724 rows
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-04 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25765:
--
Affects Version/s: 4.0.0

> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: data.txt.gz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'location_of_data_file'
> TBLPROPERTIES ('skip.header.line.count'='1');
> -- Counting number of rows gives correct result with only one header line 
> skipped
> select count(*) from test_table;
> 3145727
> -- Select query skips more rows and the result depends upon the number of 
> blocks configured in underlying filesystem. 3 rows are skipped when the file 
> is read in 3 blocks. 
> select * from test_table;
> .
> .
> Fetched 3145724 rows
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-25765:
-

Assignee: Ganesha Shreedhara

> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: data.txt.gz
>
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330].]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
>  
> *Steps to reproduce:* 
>  
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'location_of_data_file'
> TBLPROPERTIES ('skip.header.line.count'='1');
> -- Counting number of rows gives correct result with only one header line 
> skipped
> select count(*) from test_table;
> 3145727
> -- Select query skips more rows and the result depends upon the number of 
> blocks configured in underlying filesystem. 3 rows are skipped when the file 
> is read in 3 blocks. 
> select * from test_table;
> .
> .
> Fetched 3145724 rows
>  {code}
>  
>  
>  {{}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25765:
--
Description: 
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because of 
[this 
check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
 We should have similar check in FetchOperator to avoid this issue. 

 

*Steps to reproduce:* 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}

  was:
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because 
of[ this 
check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
 We should have similar check in FetchOperator to avoid this issue. 

 

*Steps to reproduce:* 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}


> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: data.txt.gz
>
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL 

[jira] [Work started] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25765 started by Ganesha Shreedhara.
-
> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: data.txt.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'location_of_data_file'
> TBLPROPERTIES ('skip.header.line.count'='1');
> -- Counting number of rows gives correct result with only one header line 
> skipped
> select count(*) from test_table;
> 3145727
> -- Select query skips more rows and the result depends upon the number of 
> blocks configured in underlying filesystem. 3 rows are skipped when the file 
> is read in 3 blocks. 
> select * from test_table;
> .
> .
> Fetched 3145724 rows
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25765:
--
Description: 
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because of 
[this check|#L330].] We should have similar check in FetchOperator to avoid 
this issue. 

 

*Steps to reproduce:* 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}

  was:
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because of 
[this 
check|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330].]
 We should have similar check in FetchOperator to avoid this issue. 

 

 

*Steps to reproduce:* 

 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}
 

 

 {{}}


> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: data.txt.gz
>
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this check|#L330].] We should have 
> similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 

[jira] [Updated] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2021-12-02 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-25765:
--
Description: 
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because 
of[ this 
check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
 We should have similar check in FetchOperator to avoid this issue. 

 

*Steps to reproduce:* 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}

  was:
When _skip.header.line.count_ property is set in table properties, simple 
select queries that gets converted into FetchTask skip rows of each block 
instead of skipping header lines of each file. This happens when the file size 
is larger and file is read in blocks. This issue doesn't exist when select 
query is converted into map only job by setting _hive.fetch.task.conversion_ to 
_none_ because the header lines are skipped only for the first block because of 
[this check|#L330].] We should have similar check in FetchOperator to avoid 
this issue. 

 

*Steps to reproduce:* 
{code:java}
-- Create table on top of the data file (uncompressed size: ~239M) attached in 
this ticket
CREATE EXTERNAL TABLE test_table(
  col1 string,
  col2 string,
  col3 string,
  col4 string,
  col5 string,
  col6 string,
  col7 string,
  col8 string,
  col9 string,
  col10 string,
  col11 string,
  col12 string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'location_of_data_file'
TBLPROPERTIES ('skip.header.line.count'='1');


-- Counting number of rows gives correct result with only one header line 
skipped

select count(*) from test_table;
3145727

-- Select query skips more rows and the result depends upon the number of 
blocks configured in underlying filesystem. 3 rows are skipped when the file is 
read in 3 blocks. 

select * from test_table;
.
.
Fetched 3145724 rows
 {code}


> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: data.txt.gz
>
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of[ this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,

[jira] [Updated] (HIVE-22013) "Show table extended" should not compute FS statistics

2022-04-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-22013:
--
Status: Patch Available  (was: In Progress)

> "Show table extended" should not compute FS statistics
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-22013) "Show table extended" should not compute FS statistics

2022-04-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-22013:
-

Assignee: Ganesha Shreedhara

> "Show table extended" should not compute FS statistics
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (HIVE-22013) "Show table extended" should not compute FS statistics

2022-04-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22013 started by Ganesha Shreedhara.
-
> "Show table extended" should not compute FS statistics
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-22013) "Show table extended" should not compute FS statistics

2022-04-25 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527889#comment-17527889
 ] 

Ganesha Shreedhara commented on HIVE-22013:
---

[~rajesh.balamohan] Please review the [pull 
request|https://github.com/apache/hive/pull/3231]. 

> "Show table extended" should not compute FS statistics
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-22013) "Show table extended" query fails with Wrong FS error for partition in customized location

2022-04-25 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-22013:
--
Summary: "Show table extended" query fails with Wrong FS error for 
partition in customized location  (was: "Show table extended" should not 
compute FS statistics)

> "Show table extended" query fails with Wrong FS error for partition in 
> customized location
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] (HIVE-22013) "Show table extended" query fails with Wrong FS error for partition in customized location

2022-04-27 Thread Ganesha Shreedhara (Jira)


[ https://issues.apache.org/jira/browse/HIVE-22013 ]


Ganesha Shreedhara deleted comment on HIVE-22013:
---

was (Author: ganeshas):
[~rajesh.balamohan] Please review the [pull 
request|https://github.com/apache/hive/pull/3231]. 

> "Show table extended" query fails with Wrong FS error for partition in 
> customized location
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-22013) "Show table extended" query fails with Wrong FS error for partition in customized location

2022-04-27 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528711#comment-17528711
 ] 

Ganesha Shreedhara commented on HIVE-22013:
---

[~mgergely] Please review the [pull 
request|https://github.com/apache/hive/pull/3231]. 

> "Show table extended" query fails with Wrong FS error for partition in 
> customized location
> --
>
> Key: HIVE-22013
> URL: https://issues.apache.org/jira/browse/HIVE-22013
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Ganesha Shreedhara
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some of the `show table extended` statements, following codepath is invoked
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L421]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L449]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L468]
> 1. Not sure why this invokes stats computation. This should be removed?
>  2. Even if #1 is needed, it would be broken when {{tblPath}} and 
> {{partitionPaths}} are different (i.e when both of them of them are in 
> different fs or configured via router etc).
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://xyz/blah/tables/location/, expected: hdfs://zzz..
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:698)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
>   at 
> org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.writeFileSystemStats(TextMetaDataFormatter.java
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-27582) Do not cache HBase table input format in FetchOperator

2023-08-08 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-27582:
--
Description: 
Caching of HBase table input format in FetchOperator causes Hive query to fail 
with following exception. 

 
{code:java}
2023-08-08T09:43:28,800 WARN  [HiveServer2-Handler-Pool: Thread-47([])]: 
thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(809)) - Error 
fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
 rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:485)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:926)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_382]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_382]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.3-amzn-2.jar:?]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at com.sun.proxy.$Proxy43.fetchResults(Unknown Source) ~[?:?]
        at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:568) 
~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:800)
 [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1900)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1880)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_382]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_382]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_382]
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.util.concurrent.RejectedExecutionException: Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
 rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:522) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2737) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 

[jira] [Created] (HIVE-27582) Do not cache HBase table input format in FetchOperator

2023-08-08 Thread Ganesha Shreedhara (Jira)
Ganesha Shreedhara created HIVE-27582:
-

 Summary: Do not cache HBase table input format in FetchOperator
 Key: HIVE-27582
 URL: https://issues.apache.org/jira/browse/HIVE-27582
 Project: Hive
  Issue Type: Bug
Reporter: Ganesha Shreedhara
Assignee: Ganesha Shreedhara


Caching of HBase table input format in FetchOperator causes Hive query to fail 
with following exception. 

```

2023-08-08T09:43:28,800 WARN  [HiveServer2-Handler-Pool: Thread-47([])]: 
thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(809)) - Error 
fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
 rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:485)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:926)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_382]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_382]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.3-amzn-2.jar:?]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at com.sun.proxy.$Proxy43.fetchResults(Unknown Source) ~[?:?]
        at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:568) 
~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:800)
 [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1900)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1880)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
 [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_382]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_382]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_382]
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.util.concurrent.RejectedExecutionException: Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
 rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:522) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
~[hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2737) 

[jira] [Updated] (HIVE-27582) Do not cache HBase table input format in FetchOperator

2023-08-08 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-27582:
--
Status: Patch Available  (was: Open)

> Do not cache HBase table input format in FetchOperator
> --
>
> Key: HIVE-27582
> URL: https://issues.apache.org/jira/browse/HIVE-27582
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>
> Caching of HBase table input format in FetchOperator causes Hive query to 
> fail with following exception. 
> ```
> 2023-08-08T09:43:28,800 WARN  [HiveServer2-Handler-Pool: Thread-47([])]: 
> thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(809)) - Error 
> fetching results:
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
>  rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:485)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:926)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_382]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_382]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_382]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>  ~[hadoop-common-3.3.3-amzn-2.jar:?]
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at com.sun.proxy.$Proxy43.fetchResults(Unknown Source) ~[?:?]
>         at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:568) 
> ~[hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:800)
>  [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1900)
>  [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1880)
>  [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  [hive-service-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
>  [hive-exec-3.1.3-amzn-3.jar:3.1.3-amzn-3]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_382]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_382]
>         at java.lang.Thread.run(Thread.java:750) [?:1.8.0_382]
> Caused by: java.io.IOException: java.lang.RuntimeException: 
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@2ae0e353
>  rejected from java.util.concurrent.ThreadPoolExecutor@663dd540[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
>         at 
> 

[jira] [Assigned] (HIVE-12930) Support SSL Shuffle for LLAP

2024-03-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara reassigned HIVE-12930:
-

Assignee: Ganesha Shreedhara

> Support SSL Shuffle for LLAP
> 
>
> Key: HIVE-12930
> URL: https://issues.apache.org/jira/browse/HIVE-12930
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Ganesha Shreedhara
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-12930) Support SSL Shuffle for LLAP

2024-03-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-12930:
--
Status: Patch Available  (was: In Progress)

> Support SSL Shuffle for LLAP
> 
>
> Key: HIVE-12930
> URL: https://issues.apache.org/jira/browse/HIVE-12930
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-12930) Support SSL Shuffle for LLAP

2024-03-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-12930:
--
Target Version/s: 4.0.0  (was: 3.0.0)

> Support SSL Shuffle for LLAP
> 
>
> Key: HIVE-12930
> URL: https://issues.apache.org/jira/browse/HIVE-12930
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-12930) Support SSL Shuffle for LLAP

2024-03-21 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-12930 started by Ganesha Shreedhara.
-
> Support SSL Shuffle for LLAP
> 
>
> Key: HIVE-12930
> URL: https://issues.apache.org/jira/browse/HIVE-12930
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3