[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2017-04-01 Thread Harish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952474#comment-15952474
 ] 

Harish edited comment on HIVE-10161 at 4/2/17 12:21 AM:


[~sershe] I am having same issue in Hive 1.2.1. Is this issue fixed in 1.2.1 or 
later version.
Scenario.
 I have Partitioned Hive table created in one cluster (ORC). I copied the ORC 
files from this cluster to Azure Data lake using Azure CLI. Once copy is done 
then i have created external table using the SAME DDL from  the source 
Cluster/Hive. After repairing the table when i query few partitions i get same 
error. Can you help me on this?.

Hadoop version : 3.0 alpha 2



Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
262144 needed = 7200075
at 
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193)
at 
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
at java.io.InputStream.read(InputStream.java:101)
at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937)
at 
org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:170)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:144)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)




was (Author: harishk15):
[~sershe] I am having same issue in Hive 1.2.1. Is this issue fixed in 1.2.1 or 
later version.
Scenario.
 I have Partitioned Hive table created in one cluster (ORC). I copied the ORC 
files from this cluster to Azure Data lake using Azure CLI. Once copy is done 
then i have created external table using the SAME DDL from  the source 
Cluster/Hive. After repairing the table when i query few partitions i get same 
error. Can you help me on this?.

Hadoop version : 3.0 alpha 2



> LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
> has a bug)
> 
>
> Key: HIVE-10161
> URL: https://issues.apache.org/jira/browse/HIVE-10161
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> The EncodedReaderImpl will die when reading from the cache, when reading data 
> written by the regular ORC writer 
> {code}
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
> size too small. size = 262144 needed = 3919246
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> Caused by: java.lang.IllegalArgumentException: Buffer size too sm

[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-04-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482405#comment-14482405
 ] 

Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:57 AM:
-

When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
buffer is now offset by 3 bytes from the original read. Boom! Fixed that, also 
fixed some small issue with early unlocking.


was (Author: sershe):
When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.

> LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
> has a bug)
> 
>
> Key: HIVE-10161
> URL: https://issues.apache.org/jira/browse/HIVE-10161
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> The EncodedReaderImpl will die when reading from the cache, when reading data 
> written by the regular ORC writer 
> {code}
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
> size too small. size = 262144 needed = 3919246
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
> 262144 needed = 3919246
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
> {code}
> Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-04-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482405#comment-14482405
 ] 

Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:56 AM:
-

When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.


was (Author: sershe):
When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete RG and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.

> LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
> has a bug)
> 
>
> Key: HIVE-10161
> URL: https://issues.apache.org/jira/browse/HIVE-10161
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> The EncodedReaderImpl will die when reading from the cache, when reading data 
> written by the regular ORC writer 
> {code}
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
> size too small. size = 262144 needed = 3919246
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
> 262144 needed = 3919246
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
> {code}
> Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)