Hi all, I am testing RCFile on S3. I could execute queries which don't specify columns such as "select * from table". But, I could not execute queries which specify columns such as "select id from table".
This job progress to near the end of a map task, but cannot finish the task as the below log message. 2011-03-22 17:12:04,325 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50000365' 2011-03-22 17:12:04,362 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50458664' 2011-03-22 17:12:04,444 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50603753' 2011-03-22 17:12:04,509 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50651845' 2011-03-22 17:12:04,536 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50735249' 2011-03-22 17:12:04,570 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '50956751' 2011-03-22 17:12:04,600 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'user/hive/warehouse/rcfile_logs/dt=20110312/controller=recipe/000001_0' for reading at position '51025754' 2011-03-22 17:12:04,633 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 9 finished. closing... ... 2011-03-22 17:12:05,167 WARN org.apache.hadoop.mapred.Child: Error running child org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/user%2Fhive%2Fwarehouse%2Frcfile_logs%2Fdt%3D20110312%2Fcontroller%3Drecipe%2F000001_0' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><ActualObjectSize>51025754</ActualObjectSize><RequestId>***</RequestId><HostId>***</HostId><RangeRequested>bytes=51025754-</RangeRequested></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:229) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:220) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:133) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.fs.s3native.$Proxy1.retrieve(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:150) at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:76) at org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:56) at java.io.DataInputStream.skipBytes(DataInputStream.java:203) at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:443) at org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1304) at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1425) at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:88) at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:39) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) at org.apache.hadoop.mapred.Child$4.run(Child.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:234) Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/user%2Fhive%2Fwarehouse%2Frcfile_logs%2Fdt%3D20110312%2Fcontroller%3Drecipe%2F000001_0' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><ActualObjectSize>51025754</ActualObjectSize><RequestId>4E5BD7E6D94DBA1B</RequestId><HostId>l+oM6yDUt+MbQgDB4pzcGckUQ1E7pbaUGy26yuTqNE4Gn+FdiJIA6u4VvsQl2+aR</HostId><RangeRequested>bytes=51025754-</RangeRequested></Error> at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:424) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:686) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1558) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1501) at org.jets3t.service.S3Service.getObject(S3Service.java:1876) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:129) ... 28 more 2011-03-22 17:12:05,170 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task The client seems to request to the invalid range like this error code shows. S3 GET failed for '/user%2Fhive%2Fwarehouse%2Frcfile_logs%2Fdt%3D20110312%2Fcontroller%3Drecipe%2F000001_0' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error> <Code>InvalidRange</Code> <Message>The requested range is not satisfiable</Message> <ActualObjectSize>51025754</ActualObjectSize> <RequestId>***</RequestId> <HostId>***</HostId> <RangeRequested>bytes=51025754-</RangeRequested></Error> This error did not occur on HDFS, so I guess this is a bug. Or is there a person was able to run queries using RCFile on S3? Thanks, -- Shusuke Mikami shun0...@gmail.com