Hi all, I'm new to Hadoop and am posting my first message on this list. I have downloaded and installed the hadoop_1.1.1-1_x86_64.deb distro and have a couple of issues which are blocking me from progressing.
I'm working through the 'Hadoop - The Definitive Guide' book and am trying to set up a test VM in pseudodistributed mode using the RPM. The examples in the book allude to (although I don't think they explicitly state) having a single user for everything and creating a passwordless private/public key pair to allow the user to ssh to locahost to control things. I'm guessing this is because the book uses the .zip distribution which doesn't create any users and therefore assumes running as an already existing locally logged on user. I notice however that the RPM creates 2 users: mapred and hdfs. As a result I'm a bit unclear about the following: 1: Does it matter which user I log in as to perform various actions? e.g. if I want to run start-dfs.sh should I be logged in as 'hdfs'? I did try running start-dfs as root thinking it might drop down to hdfs using a RUN_AS user (like most init.d scripts do) but it didn't work like that. Is there any documentation covering which users should be used to do what when running the RPM distribution? 2: Whilst the RPMcreates the hdfs user and specifies /var/lib/hadoop/hdfs as the homedir, it doesn't actually create this directory. This results in an error when logging in as the user. Is this normal? 3: How should I set up ssh keys between the 2 users? Should each user's public key be in the authorized_keys file of the other user (i.e. is communication between the 2 processes bi-directional) or would something simpler suffice? Hope these questions are clear enough to advise on, please don't hesitate to ask for more info if there's something I've left out. Cheers, Edd -- Web: http://www.eddgrant.com Email: [email protected] Mobile: +44 (0) 7861 394 543
