Hey Keith, I am pretty new to Whirr myself but I think what you are seeing is a configuration thing.
What you are seeing is called local (or standalone) mode in Hadoop: http://hadoop.apache.org/docs/r0.20.2/quickstart.html#Local You will probably want to configure your cluster to be a psuedo-distributed cluster (if you are using one node) or regular distributed cluster (if you are using multiple nodes) for a closer to real-world scenario. Mark On Mon, Feb 11, 2013 at 3:54 PM, Keith Wiley <[email protected]> wrote: > I'm very confused by what I see when I use whirr to deploy a cluster. For > example, the HDFS directory clearly mirrors the nonHDFS file system from > the top dir, which is highly unconventional for hadoop, meaning that "$ ls > /" shows the same thing as "$ hadoop fs -ls /": > > $ hadoop fs -ls / > Found 25 items > drwxr-xr-x - root root 4096 2010-02-24 01:35 /bin > drwxr-xr-x - root root 4096 2010-02-24 01:40 /boot > drwxr-xr-x - root root 4096 2013-02-11 23:19 /data > drwxr-xr-x - root root 4096 2013-02-11 23:19 /data0 > drwxr-xr-x - root root 12900 2013-02-11 23:14 /dev > drwxr-xr-x - root root 4096 2013-02-11 23:19 /etc > drwxr-xr-x - root root 4096 2013-02-11 23:15 /home > -rw-r--r-- 1 root root 6763173 2010-02-24 01:40 /initrd.img > -rw-r--r-- 1 root root 3689712 2010-02-24 01:36 /initrd.img.old > drwxr-xr-x - root root 12288 2010-02-24 01:40 /lib > drwx------ - root root 16384 2010-02-24 01:28 /lost+found > drwxr-xr-x - root root 4096 2010-02-24 01:31 /media > drwxr-xr-x - root root 4096 2013-02-11 23:19 /mnt > drwxr-xr-x - root root 4096 2010-02-24 01:31 /opt > dr-xr-xr-x - root root 0 2013-02-11 23:14 /proc > drwx------ - root root 4096 2013-02-11 23:14 /root > drwxr-xr-x - root root 4096 2010-02-24 01:40 /sbin > drwxr-xr-x - root root 4096 2009-12-05 21:55 /selinux > drwxr-xr-x - root root 4096 2010-02-24 01:31 /srv > drwxr-xr-x - root root 0 2013-02-11 23:14 /sys > drwxrwxrwt - root root 4096 2013-02-11 23:20 /tmp > drwxr-xr-x - root root 4096 2010-02-24 01:31 /usr > drwxr-xr-x - root root 4096 2010-02-24 01:36 /var > -rw-r--r-- 1 root root 3089086 2010-02-06 20:26 /vmlinuz > -rw-r--r-- 1 root root 4252096 2010-02-20 10:31 /vmlinuz.old > $ > > Likewise, if I create a directory outside HDFS, I then see it from within > HDFS, so they really are looking at the same file system. That's not how > HDFS is usually configured. > > In addition, I can't actually operate within HDFS at all; I get an error > as shown here: > $ hadoop fs -mkdir /testdir > mkdir: `/testdir': Input/output error > > Even if I can straighten out these seemingly first-step issues, I also > don't understand how to tell whirr to put HDFS on S3. I tried putting the > following in hadoop.properties but I don't think it has any effect: > > hadoop-hdfs.fs.default.name > =s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY_esc}@somebucket > OR... > hadoop-hdfs.fs.default.name=s3://somebucket > hadoop-hdfs.fs.s3.awsAccessKeyId=${AWS_ACCESS_KEY_ID} > hadoop-hdfs.fs.s3.awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY_esc} > > I'm also not sure how to "su hadoop"; it asks for a password but I don't > know what that would be. When I ssh in of course, it uses the account name > from my computer (since that's the ssh command that whirr directly provides > as it wraps up cluster deployment), but presumably to actually run a > MapReduce job from the namenode I need to switch to the hadoop user, right > (hmmm, is this why I couldn't create a directory within hadoop, as shown > above)? > > Incidentally, I also can't operate from my own machine because I can't get > the proxy to connect either. It may have something to do with our > corporate firewall, I'm not sure. For example, I get this: > > $ export HADOOP_CONF_DIR=~/.whirr/hadoop-from-laptop/ > $ hadoop fs -ls / > 2013-02-11 15:34:07,767 WARN conf.Configuration > (Configuration.java:<clinit>(477)) - DEPRECATED: hadoop-site.xml found in > the classpath. Usage of hadoop-site.xml is deprecated. Instead use > core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of > core-default.xml, mapred-default.xml and hdfs-default.xml respectively > 2013-02-11 15:34:08.337 java[8291:1203] Unable to load realm info from > SCDynamicStore > 2013-02-11 15:34:08.408 java[8291:1203] Unable to load realm info from > SCDynamicStore > ls: Failed on local exception: java.net.SocketException: Malformed reply > from SOCKS server; Host Details : local host is: "MyMachine.local/[ip-1]"; > destination host is: "ec2-[ip-2].compute-1.amazonaws.com":8020; > ~/ $ > ...while the proxy shell produces this error: > $ .whirr/hadoop-from-laptop/hadoop-proxy.sh > Running proxy to Hadoop cluster at > ec2-54-234-185-62.compute-1.amazonaws.com. Use Ctrl-c to quit. > Warning: Permanently added '54.234.185.62' (RSA) to the list of known > hosts. > channel 2: open failed: connect failed: Connection refused > > Sooooooooo, I really don't understand what I'm seeing here: The HDFS > directories don't like like a normal Hadoop cluster, they mirror the actual > file system, I can't create directories within HDFS, I can't tell whirr to > put HDFS on S3, and I can't use the proxy to interact with HDFS from my > local machine. In fact, the ONLY thing I've managed to do so far is create > the cluster in the first place. > > This isn't working out very well so far. Where do I go from here? > > Thanks. > > > > ________________________________________________________________________________ > Keith Wiley [email protected] keithwiley.com > music.keithwiley.com > > "I used to be with it, but then they changed what it was. Now, what I'm > with > isn't it, and what's it seems weird and scary to me." > -- Abe (Grandpa) Simpson > > ________________________________________________________________________________ > >
