[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-06 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076226#comment-17076226
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Test failures are unrelated. These tests were passed in the previous run.

[~Ferd] Are we good to merge this fix to master? 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, 
> HIVE-21492.4.patch, HIVE-21492.5.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-03 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074423#comment-17074423
 ] 

Hive QA commented on HIVE-21492:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12998682/HIVE-21492.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 18158 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniLlapArrow - did not produce a TEST-*.xml file (likely timed 
out) (batchId=292)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5]
 (batchId=203)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/21418/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21418/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21418/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12998682 - PreCommit-HIVE-Build

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, 
> HIVE-21492.4.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-03 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074393#comment-17074393
 ] 

Hive QA commented on HIVE-21492:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
41s{color} | {color:blue} ql in master has 1528 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-21418/dev-support/hive-personality.sh
 |
| git revision | master / 3ab174d |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21418/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, 
> HIVE-21492.4.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> 

[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-02 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074251#comment-17074251
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

For some reason test didn't pick the patch (cancelling and resubmitting the 
patch didn't work). I have created a new patch to make sure that it gets picked 
by Hive QA job in the next run. 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, 
> HIVE-21492.4.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-02 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074194#comment-17074194
 ] 

Hive QA commented on HIVE-21492:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12998415/HIVE-21492.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/21413/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21413/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21413/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12998415/HIVE-21492.3.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12998415 - PreCommit-HIVE-Build

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-02 Thread Ferdinand Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074169#comment-17074169
 ] 

Ferdinand Xu commented on HIVE-21492:
-

Yes, I manually started another job to double check this.

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073324#comment-17073324
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Test failures are unrelated and mostly because the metstore server was down.
{code:java}
Could not connect to meta store using any of the URIs provided. Most recent 
failure: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused at 
org.apache.thrift.transport.TSocket.open(TSocket.java:226) at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:686){code}
Can we rerun the tests?

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073305#comment-17073305
 ] 

Hive QA commented on HIVE-21492:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12998415/HIVE-21492.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 68 failed/errored test(s), 18162 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestMetastoreHousekeepingLeaderEmptyConfig.testHouseKeepingThreadExistence
 (batchId=252)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.alterTableBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.createTableInBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.dropTableBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.getAllTablesInBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.getMaterializedViewsInBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.getTableInBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.getTableObjectsByNameBogusCatalog[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.moveTablesBetweenCatalogsOnAlter[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.tablesInOtherCatalogs[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableAlreadyExists[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableCascade[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableChangeCols[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableChangingDatabase[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableEmptyTableNameInNew[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableExternalTableChangeLocation[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableExternalTable[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorAddPartitionColumns[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorAlterPartitionColumnName[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorInvalidColumnType[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorNullCols[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorNullColumnType[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorNullLocation[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorNullSerdeInfo[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidStorageDescriptorRemovePartitionColumn[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableInvalidTableNameInNew[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNoSuchDatabase[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNoSuchTableInThisDatabase[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNoSuchTable[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullDatabaseInNew[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullDatabase[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullNewTable[Remote]
 (batchId=230)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullStorageDescriptorInNew[Remote]
 (batchId=230)

[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073278#comment-17073278
 ] 

Hive QA commented on HIVE-21492:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
41s{color} | {color:blue} ql in master has 1528 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-21377/dev-support/hive-personality.sh
 |
| git revision | master / 709235c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21377/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> 

[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ferdinand Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073277#comment-17073277
 ] 

Ferdinand Xu commented on HIVE-21492:
-

LGTM +1 pending on the test

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072517#comment-17072517
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

[~Ferd] Please review the latest patch.

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072471#comment-17072471
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Done.

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.3.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-04-01 Thread Ferdinand Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072466#comment-17072466
 ] 

Ferdinand Xu commented on HIVE-21492:
-

Could you update the indents below?
{code:java}
+ return childType.asGroupType().getFields().get(0)
+  .asPrimitiveType();
{code}
 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072337#comment-17072337
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

[~Ferd] I could actually reproduce this problem using the available dataset 
_data/files/ThriftPrimitiveInList.parquet_ in hive repo. __ I have modified the 
parquet_thrift_array_of_primitives.q file accordingly. 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.2.patch, HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072322#comment-17072322
 ] 

Hive QA commented on HIVE-21492:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963424/HIVE-21492.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18163 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[topnkey_grouping_sets] 
(batchId=1)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/21362/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21362/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21362/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963424 - PreCommit-HIVE-Build

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072308#comment-17072308
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

[~Ferd] This change is in a private method. It looks like there is no UT 
created for VectorizedParquetRecordReader class. 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072302#comment-17072302
 ] 

Hive QA commented on HIVE-21492:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
39s{color} | {color:blue} ql in master has 1529 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-21362/dev-support/hive-personality.sh
 |
| git revision | master / aa142d1 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21362/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> 

[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Ferdinand Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071623#comment-17071623
 ] 

Ferdinand Xu commented on HIVE-21492:
-

[~ganeshas], could you create some UTs for this? 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2020-03-31 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071620#comment-17071620
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

This issue exists in 3.x version because of the changes done as part of 
HIVE-18553. 

[~Ferd], [~vihangk1], [~ashutoshc] Could you please review the patch?

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2019-04-09 Thread Nitin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813673#comment-16813673
 ] 

Nitin commented on HIVE-21492:
--

[~Ferd] Can you please review the patch ?

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool

2019-03-28 Thread Ganesha Shreedhara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803742#comment-16803742
 ] 

Ganesha Shreedhara commented on HIVE-21492:
---

Please review the patch. 

> VectorizedParquetRecordReader can't to read parquet file generated using 
> thrift/custom tool
> ---
>
> Key: HIVE-21492
> URL: https://issues.apache.org/jira/browse/HIVE-21492
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21492.patch
>
>
> Taking an example of a parquet table having array of integers as below. 
> {code:java}
> CREATE EXTERNAL TABLE ( list_of_ints` array)
> STORED AS PARQUET 
> LOCATION '{location}';
> {code}
> Parquet file generated using hive will have schema for Type as below:
> {code:java}
> group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} 
> {code}
> Parquet file generated using thrift or any custom tool (using 
> org.apache.parquet.io.api.RecordConsumer)
> may have schema for Type as below:
> {code:java}
> required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
> VectorizedParquetRecordReader handles only parquet file generated using hive. 
> It throws the following exception when parquet file generated using thrift is 
> read because of the changes done as part of HIVE-18553 .
> {code:java}
> Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is 
> not a group
>  at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
>  
>  I have done a small change to handle the case where the child type of group 
> type can be PrimitiveType.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)