Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-23 Thread Hao Ren

Hi,

I am just wondering whether I can move data from Ftp to Hdfs via Hadoop 
distcp.


Can someone give me an example ?

In my case, I always encounter the "can not access ftp" error.

I am quite sure that the link, login et passwd are correct, actually, I 
have just copy and paste the ftp address to Firefox. It does work. 
However,//it doesn't work with:

bin/hadoop -ls ftp://

Any workaround here ?

Thank you.

Hao

Le 16/07/2013 17:47, Hao Ren a écrit :

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the "can not access ftp" msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <mailto:h@claravista.fr>> wrote:


Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name <http://fs.default.name>
   
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010

<http://ec2-23-23-33-234.compute-1.amazonaws.com:9010>



io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// doesn't work as usual:
ls: Cannot access ftp://:@//: No such file or directory.

When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@// to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
    www.claravista.fr  <http://www.claravista.fr>





--
Hao Ren
ClaraVista
www.claravista.fr



--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Hao Ren

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the "can not access ftp" msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <mailto:h@claravista.fr>> wrote:


Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name <http://fs.default.name>
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
<http://ec2-23-23-33-234.compute-1.amazonaws.com:9010>



io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// doesn't work as usual:
ls: Cannot access ftp://:@//: No such file or directory.

When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@// to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
www.claravista.fr  <http://www.claravista.fr>





--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-15 Thread Hao Ren

Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010



io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// 
doesn't work as usual:
ls: Cannot access ftp://:@ip>//: No such file or directory.


When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@ip>// to firefox, it lists all the files as expected.


Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display ftp 
files if it is listing files go for distcp.


reference from 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on this 
port




--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Hao Ren

Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

multiple copy jobs to hdfs


Thank you for your reply and the link.

I read the link before, but I didn't find any examples about copying 
file from ftp to hdfs.


There are about 20-40 file in my directory. I just want to move or copy 
that directory to hdfs on Amazon EC2.


Actually, I am new to hadoop. I would like to know how to do multiple 
copy jobs to hdfs without distcp.


Thank you again.

--
Hao Ren
ClaraVista
www.claravista.fr


copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread Hao Ren

Hi,

I am running a hdfs on Amazon EC2

Say, I have a ftp server where stores some data.

I just want to copy these data directly to hdfs in a parallel way (which 
maybe more efficient).


I think hadoop distcp is what I need.

But

$ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ 
hdfs://namenode/some/path


doesn't work.

13/07/05 16:13:46 INFO tools.DistCp: 
srcPaths=[ftp://username:passwd@hostname/some/path/]

13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input 
source ftp://username:passwd@hostname/some/path/ does not exist.

at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

I checked the path by copying the ftp path in Chrome , and the file 
really exists, I can even download it.


And then, I tried to list the files under the path by:

$ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/

It ends with:

ls: Cannot access ftp://username:passwd@hostname/some/path/: No 
such file or directory.


That seems the same pb.

Any workaround here ?

Thank you in advance.

Hao.

--
Hao Ren
ClaraVista
www.claravista.fr