Hi Dave, Thanks for your reply. Our hadoop instance is inside our corporate LAN.Could you please provide some details on how i could use the s3distcp from amazon to transfer data from our on-premises hadoop to amazon s3. Wouldn't some kind of VPN be needed between the Amazon EMR instance and our on-premises hadoop instance ? Did you mean use the jar from amazon on our local server ?
Thanks On Thu, Mar 28, 2013 at 3:56 AM, David Parks <[email protected]> wrote: > Have you tried using s3distcp from amazon? I used it many times to > transfer 1.5TB between S3 and Hadoop instances. The process took 45 min, > well over the 10min timeout period you’re running into a problem on.**** > > ** ** > > Dave**** > > ** ** > > ** ** > > *From:* Himanish Kushary [mailto:[email protected]] > *Sent:* Thursday, March 28, 2013 10:54 AM > *To:* [email protected] > *Subject:* Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput**** > > ** ** > > Hello,**** > > ** ** > > I am trying to transfer around 70 GB of files from HDFS to Amazon S3 using > the distcp utility.There are aaround 2200 files distributed over 15 > directories.The max individual file size is approx 50 MB.**** > > ** ** > > The distcp mapreduce job keeps on failing with this error **** > > ** ** > > "Task attempt_201303211242_0260_m_000005_0 failed to report status for > 600 seconds. Killing!" **** > > ** ** > > and in the task attempt logs I can see lot of INFO messages like **** > > ** ** > > "INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception > (java.io.IOException) caught when processing request: Resetting to invalid > mark"**** > > **** > > I am thinking either transferring individual folders instead of the entire > 70 GB folders as a workaround or as another option increasing the " > mapred.task.timeout" parameter to something like 6-7 hour ( as the avg > rate of transfer to S3 seems to be 5 MB/s).Is there any other better > option to increase the throughput for transferring bulk data from HDFS to > S3 ? Looking forward for suggestions.**** > > ** ** > > ** ** > > -- > Thanks & Regards > Himanish **** > -- Thanks & Regards Himanish
