Re: copy files from ftp to hdfs in parallel, distcp failed
Hi Can help me to solve this problem please, if you solved it. Best regards Shlash
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, I am just wondering whether I can move data from Ftp to Hdfs via Hadoop distcp. Can someone give me an example ? In my case, I always encounter the can not access ftp error. I am quite sure that the link, login et passwd are correct, actually, I have just copy and paste the ftp address to Firefox. It does work. However,//it doesn't work with: bin/hadoop -ls ftp://my ftp location Any workaround here ? Thank you. Hao Le 16/07/2013 17:47, Hao Ren a écrit : Hi, Actually, I test with my own ftp host at first, however it doesn't work. Then I changed it into 0.0.0.0. But I always get the can not access ftp msg. Thank you . Hao. Le 16/07/2013 17:03, Ram a écrit : Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr mailto:h@claravista.fr wrote: Thank you, Ram I have configured core-site.xml as following: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/vol/persistent-hdfs/value /property property namefs.default.name http://fs.default.name/name valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 http://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value /property property nameio.file.buffer.size/name value65536/value /property property namefs.ftp.host/name value0.0.0.0/value /property property namefs.ftp.host.port/name value21/value /property /configuration Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://login:password@ftp server ip/directory/ doesn't work as usual: ls: Cannot access ftp://user:password@ftp server ip/directory/: No such file or directory. When ignoring directroy as : hadoop fs -ls ftp://login:password@ftp server ip/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/user directroy: drwxr-xr-x 11 user user 4096 jui 11 16:30 user and all the files under /home/user have rights 755. I can easily copy the link ftp://user:password@ftp server ip/directory/ to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr http://www.claravista.fr -- Hao Ren ClaraVista www.claravista.fr -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr wrote: Thank you, Ram I have configured core-site.xml as following: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/vol/persistent-hdfs/value /property property namefs.default.name/name valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 /value /property property nameio.file.buffer.size/name value65536/value /property property namefs.ftp.host/name value0.0.0.0/value /property property namefs.ftp.host.port/name value21/value /property /configuration Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://login:password@ftp server ip/directory/ doesn't work as usual: ls: Cannot access ftp://user:password@ftp server ip/directory/: No such file or directory. When ignoring directroy as : hadoop fs -ls ftp://login:password@ftp server ip/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/user directroy: drwxr-xr-x 11 user user 4096 jui 11 16:30 user and all the files under /home/user have rights 755. I can easily copy the link ftp://user:password@ftp server ip/directory/ to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port 21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVistawww.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, Actually, I test with my own ftp host at first, however it doesn't work. Then I changed it into 0.0.0.0. But I always get the can not access ftp msg. Thank you . Hao. Le 16/07/2013 17:03, Ram a écrit : Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr mailto:h@claravista.fr wrote: Thank you, Ram I have configured core-site.xml as following: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/vol/persistent-hdfs/value /property property namefs.default.name http://fs.default.name/name valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010 http://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value /property property nameio.file.buffer.size/name value65536/value /property property namefs.ftp.host/name value0.0.0.0/value /property property namefs.ftp.host.port/name value21/value /property /configuration Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://login:password@ftp server ip/directory/ doesn't work as usual: ls: Cannot access ftp://user:password@ftp server ip/directory/: No such file or directory. When ignoring directroy as : hadoop fs -ls ftp://login:password@ftp server ip/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/user directroy: drwxr-xr-x 11 user user 4096 jui 11 16:30 user and all the files under /home/user have rights 755. I can easily copy the link ftp://user:password@ftp server ip/directory/ to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr http://www.claravista.fr -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Thank you, Ram I have configured core-site.xml as following: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/vol/persistent-hdfs/value /property property namefs.default.name/name valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value /property property nameio.file.buffer.size/name value65536/value /property property namefs.ftp.host/name value0.0.0.0/value /property property namefs.ftp.host.port/name value21/value /property /configuration Then I tried hadoop fs -ls file:/// , it works. But hadoop fs -ls ftp://login:password@ftp server ip/directory/ doesn't work as usual: ls: Cannot access ftp://user:password@ftp server ip/directory/: No such file or directory. When ignoring directroy as : hadoop fs -ls ftp://login:password@ftp server ip/ There are no error msgs, but it lists nothing. I have also check the rights for my /home/user directroy: drwxr-xr-x 11 user user 4096 jui 11 16:30 user and all the files under /home/user have rights 755. I can easily copy the link ftp://user:password@ftp server ip/directory/ to firefox, it lists all the files as expected. Any workaround here ? Thank you. Le 12/07/2013 14:01, Ram a écrit : Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host 0.0.0.0 FTP filesystem connects to this server fs.ftp.host.port21 FTP filesystem connects to fs.ftp.host on this port -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit : multiple copy jobs to hdfs Thank you for your reply and the link. I read the link before, but I didn't find any examples about copying file from ftp to hdfs. There are about 20-40 file in my directory. I just want to move or copy that directory to hdfs on Amazon EC2. Actually, I am new to hadoop. I would like to know how to do multiple copy jobs to hdfs without distcp. Thank you again. -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP filesystem connects to fs.ftp.host on this port and try to set the property also reference from hadoop definitive guide hadoop file system. Filesystem URI scheme Java implementation Description (all under org.apache.hadoop) FTP ftp fs.ftp.FTPFileSystem A filesystem backed by an FTP server. Hi, From, Ramesh. On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren h@claravista.fr wrote: Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit : multiple copy jobs to hdfs Thank you for your reply and the link. I read the link before, but I didn't find any examples about copying file from ftp to hdfs. There are about 20-40 file in my directory. I just want to move or copy that directory to hdfs on Amazon EC2. Actually, I am new to hadoop. I would like to know how to do multiple copy jobs to hdfs without distcp. Thank you again. -- Hao Ren ClaraVista www.claravista.fr
copy files from ftp to hdfs in parallel, distcp failed
Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need. But $ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ hdfs://namenode/some/path doesn't work. 13/07/05 16:13:46 INFO tools.DistCp: srcPaths=[ftp://username:passwd@hostname/some/path/] 13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source ftp://username:passwd@hostname/some/path/ does not exist. at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) I checked the path by copying the ftp path in Chrome , and the file really exists, I can even download it. And then, I tried to list the files under the path by: $ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/ It ends with: ls: Cannot access ftp://username:passwd@hostname/some/path/: No such file or directory. That seems the same pb. Any workaround here ? Thank you in advance. Hao. -- Hao Ren ClaraVista www.claravista.fr
Re: copy files from ftp to hdfs in parallel, distcp failed
On 11 July 2013 06:27, Hao Ren h@claravista.fr wrote: Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need. http://hadoop.apache.org/docs/stable/distcp.html DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting I doubt this is going to help. Are these lot of files. If yes, how about multiple copy jobs to hdfs? -balaji