Hi Kota/John/Andrew, Thanks for your suggestions.
So this is what i've tried with unsuccessful results. -* jets3t.properties file* s3service.s3-endpoint=<riak-host> s3service.s3-endpoint-http-port=8080 s3service.disable-dns-buckets=true s3service.s3-endpoint-virtual-path=/ httpclient.proxy-autodetect=false httpclient.proxy-host=<riak-host> httpclient.proxy-port=8080 I've tried the proxy and s3 service together and each separately. I've also tried putting the file in /opt/mapr/conf , /opt/mapr/hadoop/hadoop-0.20.2/ and /opt/mapr/hadoop/hadoop-0.20.2/conf After adding the settings, when I run hadoop distcp s3n://u:p@bucket/file /mymapr/ it still connects to s3, since I get access denied message from aws, saying they dont recognize the key and passphrase I've also tried using pig T = LOAD 's3n://u:p@bucket/file' using PigStorage() as (line:chararray); - */etc/hosts file* I know internally aws converts it to a https://<bucket>.s3.amazonaws.com/ request. So I added that to my hosts file and had my riak cs behind a haproxy forwarding 443 to the 8080 of the riak. When I run the hadoop distcp command as above, I get this error: 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request 14/08/01 20:59:30 INFO metrics.MetricsUtil: getSupportedProducts {} java.lang.RuntimeException: RPC /supportedProducts error Connection refused at amazon.emr.metrics.InstanceControllerRpcClient$RpcClient.call(Unknown Source) at amazon.emr.metrics.InstanceControllerRpcClient.getSupportedProducts(Unknown Source) at amazon.emr.metrics.MetricsUtil.emrClusterMapR(Unknown Source) at amazon.emr.metrics.MetricsSaver.<init>(Unknown Source) at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source) at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source) at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.fs.s3native.$Proxy0.retrieveMetadata(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:748) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:826) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:648) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:668) at org.apache.hadoop.tools.DistCp.run(DistCp.java:913) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:947) *- hadoop conf* When i add this setting to hadoop's core-site.xml (reverting the hosts file setting) <property> <name>fs.s3n.ssl.enabled</name> <value>false</value> </property> <property> <name>fs.s3n.endpoint</name> <value>riak-cluster</value> </property> I get the same error as the one with the hosts file, so looks like the setting makes it point to the riak cluster, however i am getting the rpc connection issue. *- s3cmd* s3cmd and python boto works fine with .s3cfg and .botoconfig respectively pointing to riak, so i know the connection works from mapr to riak, just not with hadoop. Any help is appreciated. Thanks On Thu, Jul 31, 2014 at 5:10 PM, Kota Uenishi <k...@basho.com> wrote: > I played on Hadoop MapReduce on Riak CS, and it actually worked with > the latest 1.5 beta package. Hadoop relies S3 connectivity on jets3t, > so if MapR uses vanilla jets3t it will work. I believe so because MapR > works on EMR (which usually extracts data from S3). > > Technically, you can add several options about S3 endpoints to connect > other S3-compatible cloud storages into jets3t.properties, which are > mainly "s3service.s3-endpoint" and > "s3service.s3-endpoint-http(s)-port". I put the properties file into > hadoop conf directory and it worked. Maybe there is a config-loading > in MapR, too. [1] In this case, you should properly configure your CS > use your domain by cs_root_host in app.config. [2] > > If your Riak CS is not configured with your own domain, you can also > configure MapReduce to use proxy setting like this: > > httpclient.proxy-host=localhost > httpclient.proxy-port=8080 > > I usually use this configuration when I play locally. Put them into > jets3t.properties. > > Note that 1.4.x CS won't work properly if the output file is on CS > again - it doesn't have copy API used in the final file copy after > reduce. We have 1.5 pre-release package internally and testing. Sooner > or later it will be released. > > [1] https://jets3t.s3.amazonaws.com/toolkit/configuration.html > [2] > http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak-CS/ > > On Fri, Aug 1, 2014 at 4:08 AM, John Daily <jda...@basho.com> wrote: > > This blog post on configuring S3 clients to work with CS may be useful: > > http://basho.com/riak-cs-proxy-vs-direct-configuration/ > > > > Sent from my iPhone > > > > On Jul 31, 2014, at 2:53 PM, Andrew Stone <ast...@basho.com> wrote: > > > > Hi Charles, > > > > AFAIK we haven't ever tested Riak Cs with the MapR connector. However, if > > MapR works with S3, you should just have to change the IP to point to a > load > > balancer in front of your local Riak CS cluster. I'm unaware of how to > > change that setting in MapR though. It seems like a question for them and > > not Basho. > > > > -Andrew > > > > > > On Wed, Jul 30, 2014 at 5:16 PM, Charles Shah <find.chuck...@gmail.com> > > wrote: > >> > >> Hi, > >> > >> I would like to use MapR with Riak CS for hadoop map reduce jobs. My > code > >> is currently referring to objects using s3n:// urls. > >> I'd like to be able to have the hadoop code on MapR point to the Riak CS > >> cluster using the s3 url. > >> Is there a proxy or hostname setting in hadoop to be able to route the > s3 > >> url to the riak cs cluster ? > >> > >> Thanks > >> > >> > >> _______________________________________________ > >> riak-users mailing list > >> riak-users@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > -- > Kota UENISHI / @kuenishi > Basho Japan KK > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com