Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Andrey Pankov

Hi,

Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
already contains such part:


echo Copying private key to slaves
for slave in `cat slaves`; do
  scp $SSH_OPTS $PRIVATE_KEY_PATH [EMAIL PROTECTED]:/root/.ssh/id_rsa
  ssh $SSH_OPTS [EMAIL PROTECTED] chmod 600 /root/.ssh/id_rsa
  sleep 1
done

Anyway, did you tried hadoop-ec2 script? It works well for task you 
described.



Prasan Ary wrote:

Hi All,
  I have been trying to configure Hadoop on EC2 for large number of clusters ( 
100 plus). It seems that I have to copy EC2 private key to all the machines in 
the cluster so that they can have SSH connections.
  For now it seems I have to run a script to copy the key file to each of the 
EC2 instances. I wanted to know if there is a better way to accomplish this.
   
  Thanks,

  PA

   
-

Never miss a thing.   Make Yahoo your homepage.


---
Andrey Pankov


Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Tom White
Yes, this isn't ideal for larger clusters. There's a jira to address
this: https://issues.apache.org/jira/browse/HADOOP-2410.

Tom

On 20/03/2008, Prasan Ary [EMAIL PROTECTED] wrote:
 Hi All,
   I have been trying to configure Hadoop on EC2 for large number of clusters 
 ( 100 plus). It seems that I have to copy EC2 private key to all the machines 
 in the cluster so that they can have SSH connections.
   For now it seems I have to run a script to copy the key file to each of the 
 EC2 instances. I wanted to know if there is a better way to accomplish this.

   Thanks,

   PA



  -
  Never miss a thing.   Make Yahoo your homepage.


-- 
Blog: http://www.lexemetech.com/


Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Andreas Kostyrka
Actually, I personally use the following 2 part copy technique to copy
files to a cluster of boxes:

tar cf - myfile | dsh -f host-list-file -i -c -M tar xCfv /tmp -

The first tar packages myfile into a tar file.

dsh runs a tar that unpacks the tar (in the above case all boxes listed
in host-list-file would have a /tmp/myfile after the command).

Tar options that are relevant include C (chdir) and v (verbose, can be
given twice) so you see what got copied.

dsh options that are relevant:
-i copy stdin to all ssh processes, requires -c
-c do the ssh calls concurrently.
-M prefix the out from the ssh with the hostname.

While this is not rsync, it has the benefit of being processed
concurrently, and quite flexible.

Andreas

Am Donnerstag, den 20.03.2008, 19:57 +0200 schrieb Andrey Pankov:
 Hi,
 
 Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
 already contains such part:
 
 echo Copying private key to slaves
 for slave in `cat slaves`; do
scp $SSH_OPTS $PRIVATE_KEY_PATH [EMAIL PROTECTED]:/root/.ssh/id_rsa
ssh $SSH_OPTS [EMAIL PROTECTED] chmod 600 /root/.ssh/id_rsa
sleep 1
 done
 
 Anyway, did you tried hadoop-ec2 script? It works well for task you 
 described.
 
 
 Prasan Ary wrote:
  Hi All,
I have been trying to configure Hadoop on EC2 for large number of 
  clusters ( 100 plus). It seems that I have to copy EC2 private key to all 
  the machines in the cluster so that they can have SSH connections.
For now it seems I have to run a script to copy the key file to each of 
  the EC2 instances. I wanted to know if there is a better way to accomplish 
  this.
 
Thanks,
PA
  
 
  -
  Never miss a thing.   Make Yahoo your homepage.
 
 ---
 Andrey Pankov


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Chris K Wensel

you can't do this with the contrib/ec2 scripts/ami.

but passing the master private dns name to the slaves on boot as 'user- 
data' works fine. when a slave starts, it contacts the master and  
joins the cluster. there isn't any need for a slave to rsync from the  
master, thus removing the dependency on them having the private key.  
and not using the start|stop-all scripts, you don't need to maintain  
the slaves file, and can thus lazily boot your cluster.


to do this, you will need to create your own AMI that works this way.  
not hard, just time consuming.


On Mar 20, 2008, at 11:56 AM, Prasan Ary wrote:

Chris,
 What do you mean when you say boot the slaves with the master  
private name ?



 ===

Chris K Wensel [EMAIL PROTECTED] wrote:
 I found it much better to start the master first, then boot the  
slaves

with the master private name.

i do not use the start|stop-all scrips, so i do not need to maintain
the slaves file. thus i don't need to push private keys around to
support those scripts.

this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going  
on.


also, setting up FoxyProxy on firefox lets you browse your whole
cluster if you setup a ssh tunnel (socks).

On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:

Hi All,
I have been trying to configure Hadoop on EC2 for large number of
clusters ( 100 plus). It seems that I have to copy EC2 private key
to all the machines in the cluster so that they can have SSH
connections.
For now it seems I have to run a script to copy the key file to
each of the EC2 instances. I wanted to know if there is a better way
to accomplish this.

Thanks,
PA


-
Never miss a thing. Make Yahoo your homepage.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/






-
Looking for last minute shopping deals?  Find them fast with Yahoo!  
Search.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/