[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952474#comment-15952474 ] Harish edited comment on HIVE-10161 at 4/2/17 12:21 AM: [~sershe] I am having same issue in Hive 1.2.1. Is this issue fixed in 1.2.1 or later version. Scenario. I have Partitioned Hive table created in one cluster (ORC). I copied the ORC files from this cluster to Azure Data lake using Azure CLI. Once copy is done then i have created external table using the SAME DDL from the source Cluster/Hive. After repairing the table when i query few partitions i get same error. Can you help me on this?. Hadoop version : 3.0 alpha 2 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 7200075 at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) at java.io.InputStream.read(InputStream.java:101) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:170) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:144) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) was (Author: harishk15): [~sershe] I am having same issue in Hive 1.2.1. Is this issue fixed in 1.2.1 or later version. Scenario. I have Partitioned Hive table created in one cluster (ORC). I copied the ORC files from this cluster to Azure Data lake using Azure CLI. Once copy is done then i have created external table using the SAME DDL from the source Cluster/Hive. After repairing the table when i query few partitions i get same error. Can you help me on this?. Hadoop version : 3.0 alpha 2 > LLAP: ORC file contains compression buffers larger than bufferSize (OR reader > has a bug) > > > Key: HIVE-10161 > URL: https://issues.apache.org/jira/browse/HIVE-10161 > Project: Hive > Issue Type: Sub-task >Affects Versions: llap >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: llap > > > The EncodedReaderImpl will die when reading from the cache, when reading data > written by the regular ORC writer > {code} > Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer > size too small. size = 262144 needed = 3919246 > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > Caused by: java.lang.IllegalArgumentException: Buffer size too sm
[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482405#comment-14482405 ] Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:57 AM: - When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the buffer is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. was (Author: sershe): When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. > LLAP: ORC file contains compression buffers larger than bufferSize (OR reader > has a bug) > > > Key: HIVE-10161 > URL: https://issues.apache.org/jira/browse/HIVE-10161 > Project: Hive > Issue Type: Sub-task >Affects Versions: llap >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: llap > > > The EncodedReaderImpl will die when reading from the cache, when reading data > written by the regular ORC writer > {code} > Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer > size too small. size = 262144 needed = 3919246 > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = > 262144 needed = 3919246 > at > org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) > at > org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) > at > org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) > at > org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) > ... 4 more > ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex > vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] > {code} > Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482405#comment-14482405 ] Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:56 AM: - When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. was (Author: sershe): When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete RG and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. > LLAP: ORC file contains compression buffers larger than bufferSize (OR reader > has a bug) > > > Key: HIVE-10161 > URL: https://issues.apache.org/jira/browse/HIVE-10161 > Project: Hive > Issue Type: Sub-task >Affects Versions: llap >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: llap > > > The EncodedReaderImpl will die when reading from the cache, when reading data > written by the regular ORC writer > {code} > Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer > size too small. size = 262144 needed = 3919246 > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = > 262144 needed = 3919246 > at > org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) > at > org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) > at > org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) > at > org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) > ... 4 more > ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex > vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] > {code} > Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)