Sindhu Subhas created HIVE-26271: ------------------------------------ Summary: ParquetDecodingException: Can not read value at 1 in block 0 when reading Parquet file generated from ADF sink from Hive Key: HIVE-26271 URL: https://issues.apache.org/jira/browse/HIVE-26271 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.1 Environment: ADF pipeline to create parquet table.
HDInsight 4.1 Reporter: Sindhu Subhas Steps to replicate: # Create a parquet file using ADF sink from any source. with a decimal column. # Move the file to hive external table's ABFS location. # Create external table on top of the file. # Create ORC table with string column CTAS on parquet external table. Error stack: Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 27 more Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419) ... 28 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792) at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) ... 33 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet [spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md] -- Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest. -- This message was sent by Atlassian Jira (v8.20.7#820007)