Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, I am just wondering whether I can move data from Ftp to Hdfs via Hadoop distcp. Can someone give me an example ? In my case, I always encounter the "can not access ftp" error. I am quite sure that the link, login et passwd are correct, actually, I have just copy and paste the ftp address to Firefox. It does work. However,//it doesn't work with: bin/hadoop -ls ftp:// Any workaround here ? Thank you. Hao Le 16/07/2013 17:47, Hao Ren a écrit : Hi, Actually, I test with my own ftp host at first, however it doesn't work. Then I changed it into 0.0.0.0. But I always get the "can not access ftp" msg. Thank you . Hao. Le 16/07/2013 17:03, Ram a écrit : Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <mailto:h@claravista.fr>> wrote: Thank you, Ram I have configured core-site.xml as following: hadoop.tmp.dir /vol/persistent-hdfs fs.default.name <http://fs.default.name> hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 <http://ec2-23-23-33-234.compute-1.amazonaws.com:9010> io.file.buffer.size 65536 fs.ftp.host 0.0.0.0 fs.ftp.host.port 21 Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://:@// doesn't work as usual: ls: Cannot access ftp://:@//: No such file or directory. When ignoring as : hadoop fs -ls ftp://:@/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/ directroy: drwxr-xr-x 114096 jui 11 16:30 and all the files under /home/ have rights 755. I can easily copy the link ftp://:@// to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp:// -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr <http://www.claravista.fr> -- Hao Ren ClaraVista www.claravista.fr -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, Actually, I test with my own ftp host at first, however it doesn't work. Then I changed it into 0.0.0.0. But I always get the "can not access ftp" msg. Thank you . Hao. Le 16/07/2013 17:03, Ram a écrit : Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <mailto:h@claravista.fr>> wrote: Thank you, Ram I have configured core-site.xml as following: hadoop.tmp.dir /vol/persistent-hdfs fs.default.name <http://fs.default.name> hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 <http://ec2-23-23-33-234.compute-1.amazonaws.com:9010> io.file.buffer.size 65536 fs.ftp.host 0.0.0.0 fs.ftp.host.port 21 Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://:@// doesn't work as usual: ls: Cannot access ftp://:@//: No such file or directory. When ignoring as : hadoop fs -ls ftp://:@/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/ directroy: drwxr-xr-x 114096 jui 11 16:30 and all the files under /home/ have rights 755. I can easily copy the link ftp://:@// to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp:// -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr <http://www.claravista.fr> -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Thank you, Ram I have configured core-site.xml as following: hadoop.tmp.dir /vol/persistent-hdfs fs.default.name hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 io.file.buffer.size 65536 fs.ftp.host 0.0.0.0 fs.ftp.host.port 21 Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://:@// doesn't work as usual: ls: Cannot access ftp://:@ip>//: No such file or directory. When ignoring as : hadoop fs -ls ftp://:@/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/ directroy: drwxr-xr-x 114096 jui 11 16:30 and all the files under /home/ have rights 755. I can easily copy the link ftp://:@ip>// to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp:// -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit : multiple copy jobs to hdfs Thank you for your reply and the link. I read the link before, but I didn't find any examples about copying file from ftp to hdfs. There are about 20-40 file in my directory. I just want to move or copy that directory to hdfs on Amazon EC2. Actually, I am new to hadoop. I would like to know how to do multiple copy jobs to hdfs without distcp. Thank you again. -- Hao Ren ClaraVista www.claravista.fr
copy files from ftp to hdfs in parallel, distcp failed
Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need. But $ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ hdfs://namenode/some/path doesn't work. 13/07/05 16:13:46 INFO tools.DistCp: srcPaths=[ftp://username:passwd@hostname/some/path/] 13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source ftp://username:passwd@hostname/some/path/ does not exist. at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) I checked the path by copying the ftp path in Chrome , and the file really exists, I can even download it. And then, I tried to list the files under the path by: $ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/ It ends with: ls: Cannot access ftp://username:passwd@hostname/some/path/: No such file or directory. That seems the same pb. Any workaround here ? Thank you in advance. Hao. -- Hao Ren ClaraVista www.claravista.fr