Hi.

I'm trying to load data from a Hadoop cluster using distcp.  Distcp supports 
the S3 API, but I'm running into issues.
Has anyone tested/had success with this process?  Any help is appreciated!  
Details below...

Thanks,
Dan


Here's my setup:
Hadoop cluster with a small text file in hdfs.
Jets3t.properties file configured to use a proxy host.
Proxy host running Varnish basically to serve as a load balancer at this point. 
 All caching is currently disabled.
Riak-CS/Riak running on a 6 server cluster.

Here's the scenario:
I'm running this command...


Ø  hadoop distcp -libjars ./jets3t-config.jar 
hdfs://hadoop.node.address/user/dan/test.txt 
s3n://riak-user-key:riak-secret@testing/

I see many requests and responses in the varnishlog so I know communication is 
succeeding.  The distcp process throws an exception and I see empty files and 
directories left on my Riak system.

The exception looks like this:

13/07/08 15:52:42 INFO tools.DistCp: sourcePathsCount=1
13/07/08 15:52:42 INFO tools.DistCp: filesToCopyCount=1
13/07/08 15:52:42 INFO tools.DistCp: bytesToCopyCount=93.0
13/07/08 15:52:42 INFO mapred.JobClient: Running job: job_201307031542_0023
13/07/08 15:52:43 INFO mapred.JobClient:  map 0% reduce 0%
13/07/08 15:52:50 INFO mapred.JobClient: Job complete: job_201307031542_0023
13/07/08 15:52:50 INFO mapred.JobClient: Counters: 6
13/07/08 15:52:50 INFO mapred.JobClient:   Job Counters
13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6980
13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/07/08 15:52:50 INFO mapred.JobClient:     Launched map tasks=1
13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/07/08 15:52:50 INFO mapred.JobClient:     Failed map tasks=1
13/07/08 15:52:50 INFO mapred.JobClient: Job Failed: NA
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1246)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


And Riak is left like this:

Ø  s3cmd ls s3://testing
                       DIR   s3://testing/_distcp_logs_7bhq3z/
                       DIR   s3://testing/_distcp_logs_j9re3f/
2013-07-09 03:54         0   s3://testing/_distcp_logs_7bhq3z_$folder$
2013-07-09 00:03         0   s3://testing/_distcp_logs_j9re3f_$folder$
2013-07-08 19:52         0   s3://testing/test.txt


Confidentiality Notice: This electronic message transmission, including any 
attachment(s), may contain confidential, proprietary, or privileged information 
from Chemical Abstracts Service ("CAS"), a division of the American Chemical 
Society ("ACS"). If you have received this transmission in error, be advised 
that any disclosure, copying, distribution, or use of the contents of this 
information is strictly prohibited. Please destroy all copies of the message 
and contact the sender immediately by either replying to this message or 
calling 614-447-3600.

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to