Hi all,
I am trying to copy a file out of Riak using the s3 protocol to HDFS. I
have the following file:
I created the following file: /etc/hadoop/conf/jets3t.properties
s3service.s3-endpoint=myhost
s3service.s3-endpoint-http-port=8080
s3service.disable-dns-buckets=true
s3service.s3-endpoint-virtual-path=/
s3service.max-thread-count=10
threaded-service.max-thread-count=10
s3service.https-only=false
httpclient.proxy-autodetect=false
httpclient.proxy-host=myhost
httpclient.proxy-port=8080
httpclient.retry-max=11
hadoop distcp s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test
I get this stack trace:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
Request Error. -- ResponseCode: 404, ResponseStatus: Object Not Found
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.jets3t.service.S3ServiceException: Request Error. --
ResponseCode: 404, ResponseStatus: Object Not Found
at org.jets3t.service.S3Service.getObject(S3Service.java:1379)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163)
... 20 more
Caused by: org.jets3t.service.impl.rest.HttpException
at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519)
at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:981)
at
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2150)
at
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2087)
at org.jets3t.service.StorageService.getObject(StorageService.java:1140)
at org.jets3t.service.S3Service.getObject(S3Service.java:2583)
at org.jets3t.service.S3Service.getObject(S3Service.java:84)
at org.jets3t.service.StorageService.getObject(StorageService.java:525)
at org.jets3t.service.S3Service.getObject(S3Service.java:1377)
However, with a local .s3cfg file that points to a Riak cluster, I can do
this:
[hdfs@dsg01 ~]$ s3cmd ls s3://test
DIR s3://test/home/
DIR s3://test/setup/
DIR s3://test/test/
DIR s3://test/tmp/
So, s3://test/test does exist and is in Riak, not AWS.
Now, if I comment out s3service.s3-endpoint-virtual-path and run:
hadoop distcp s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test
I see:
java.io.IOException: /test doesn't exist
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Using @test/test/ produces the same exception as above.
Using: hadoop distcp s3://<access key>:<secret key>@test
hdfs://localhost/tmp/test
java.io.IOException: /user/hdfs doesn't exist
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
I am the user 'hdfs'.
If I comment out these properties
#s3service.s3-endpoint=myhost
#s3service.s3-endpoint-http-port=8080
#s3service.disable-dns-buckets=true
#s3service.s3-endpoint-virtual-path=/
and run: hadoop distcp s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test
I get fresh, new exception:
16/04/22 21:53:34 ERROR tools.DistCp: Exception encountered
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access
Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error>
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.jets3t.service.S3ServiceException: S3 Error Message. --
ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml
version="1.0"
encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access
Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error>
at org.jets3t.service.S3Service.getObject(S3Service.java:1379)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163)
... 20 more
It's odd to see "/%2Ftest" which is a URL encoding for '/'. Why is that
there?
Note: 'myhost' is just a placeholder for the actual hostname which does
resolve.
What am I missing?
--
View this message in context:
http://riak-users.197444.n3.nabble.com/Unable-to-use-hadoop-distcp-with-Riak-tp4034185.html
Sent from the Riak Users mailing list archive at Nabble.com.
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com