Hello,
I am using Hadoop 2.5 and Hive 0.13 setup.
I have an external partitioned Hive table with files stored in S3 in RCFile
format.
When I perform a 'select *', I get the rows correctly but aggregation queries
are failing with the following exception:-
Caused by: java.io.EOFException: Attempted to seek or read past the end of the
file
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown
Source)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
at
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
at
org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
at
java.io.DataInputStream.skipBytes(DataInputStream.java:220)
at
org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
at
org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
at
org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
at
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
at
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
... 15 more
The same issue used to happen for Hive 0.12, but disabling column pruning by
setting the property 'hive.optimize.cp' to false resolved this issue.
For Hive 0.13 this property was removed
(HIVE-4113<https://issues.apache.org/jira/browse/HIVE-4113>).
Is there any configuration that needs to be changed for accessing RCFiles from
S3 through Hive?
Thanks and Regards,
Puneet