Hive 0.13 count(*) query issue for S3 data storage

Ravuri, Venkata Puneet Sun, 24 Aug 2014 18:18:07 -0700

Hello,

I am using Hadoop 2.5 and Hive 0.13 setup.
I have an external partitioned Hive table with files stored in S3 in RCFile 
format.
When I perform a 'select *', I get the rows correctly but aggregation queries 
are failing with the following exception:-


Caused by: java.io.EOFException: Attempted to seek or read past the end of the 
file
                    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
                    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
                    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
                    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:601)
                    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
                    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
                    at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown 
Source)
                    at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
                    at 
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
                    at 
org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
                    at 
java.io.DataInputStream.skipBytes(DataInputStream.java:220)
                    at 
org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
                    at 
org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
                    at 
org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
                    at 
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
                    at 
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
                    at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
                    ... 15 more

The same issue used to happen for Hive 0.12, but disabling column pruning by 
setting the property 'hive.optimize.cp' to false resolved this issue.
For Hive 0.13 this property was removed 
(HIVE-4113<https://issues.apache.org/jira/browse/HIVE-4113>).
Is there any configuration that needs to be changed for accessing RCFiles from 
S3 through Hive?


Thanks and Regards,
Puneet

Hive 0.13 count(*) query issue for S3 data storage

Reply via email to