Check your fs.default.name setting in core-site.xml, it is probably set
to file:/// ; the value of this property is prepended to a path that has
no URI protocol when resolved by "hadoop fs -ls ..." and job
inputs/outputs.
This "feature" enables local mode (single process) and pseudo or full
cluster modes to work using the same absolute paths without protocol://
in test scripts. I think I used it once in 6 years, but someone out
there might rely on it.
The above means that when using a real cluster,
hadoop fs -ls file:///
will indeed show the local root filesystem.
Paul
On 20130211 20:40 , Mark Grover wrote:
Hey Keith,
I am pretty new to Whirr myself but I think what you are seeing is a
configuration thing.
What you are seeing is called local (or standalone) mode in Hadoop:
http://hadoop.apache.org/docs/r0.20.2/quickstart.html#Local
You will probably want to configure your cluster to be a
psuedo-distributed cluster (if you are using one node) or regular
distributed cluster (if you are using multiple nodes) for a closer to
real-world scenario.
Mark
On Mon, Feb 11, 2013 at 3:54 PM, Keith Wiley <[email protected]
<mailto:[email protected]>> wrote:
I'm very confused by what I see when I use whirr to deploy a
cluster. For example, the HDFS directory clearly mirrors the
nonHDFS file system from the top dir, which is highly
unconventional for hadoop, meaning that "$ ls /" shows the same
thing as "$ hadoop fs -ls /":
$ hadoop fs -ls /
Found 25 items
drwxr-xr-x - root root 4096 2010-02-24 01:35 /bin
drwxr-xr-x - root root 4096 2010-02-24 01:40 /boot
drwxr-xr-x - root root 4096 2013-02-11 23:19 /data
drwxr-xr-x - root root 4096 2013-02-11 23:19 /data0
drwxr-xr-x - root root 12900 2013-02-11 23:14 /dev
drwxr-xr-x - root root 4096 2013-02-11 23:19 /etc
drwxr-xr-x - root root 4096 2013-02-11 23:15 /home
-rw-r--r-- 1 root root 6763173 2010-02-24 01:40 /initrd.img
-rw-r--r-- 1 root root 3689712 2010-02-24 01:36 /initrd.img.old
drwxr-xr-x - root root 12288 2010-02-24 01:40 /lib
drwx------ - root root 16384 2010-02-24 01:28 /lost+found
drwxr-xr-x - root root 4096 2010-02-24 01:31 /media
drwxr-xr-x - root root 4096 2013-02-11 23:19 /mnt
drwxr-xr-x - root root 4096 2010-02-24 01:31 /opt
dr-xr-xr-x - root root 0 2013-02-11 23:14 /proc
drwx------ - root root 4096 2013-02-11 23:14 /root
drwxr-xr-x - root root 4096 2010-02-24 01:40 /sbin
drwxr-xr-x - root root 4096 2009-12-05 21:55 /selinux
drwxr-xr-x - root root 4096 2010-02-24 01:31 /srv
drwxr-xr-x - root root 0 2013-02-11 23:14 /sys
drwxrwxrwt - root root 4096 2013-02-11 23:20 /tmp
drwxr-xr-x - root root 4096 2010-02-24 01:31 /usr
drwxr-xr-x - root root 4096 2010-02-24 01:36 /var
-rw-r--r-- 1 root root 3089086 2010-02-06 20:26 /vmlinuz
-rw-r--r-- 1 root root 4252096 2010-02-20 10:31 /vmlinuz.old
$
Likewise, if I create a directory outside HDFS, I then see it from
within HDFS, so they really are looking at the same file system.
That's not how HDFS is usually configured.
In addition, I can't actually operate within HDFS at all; I get an
error as shown here:
$ hadoop fs -mkdir /testdir
mkdir: `/testdir': Input/output error
Even if I can straighten out these seemingly first-step issues, I
also don't understand how to tell whirr to put HDFS on S3. I
tried putting the following in hadoop.properties but I don't think
it has any effect:
hadoop-hdfs.fs.default.name
<http://hadoop-hdfs.fs.default.name>=s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY_esc}@somebucket
OR...
hadoop-hdfs.fs.default.name
<http://hadoop-hdfs.fs.default.name>=s3://somebucket
hadoop-hdfs.fs.s3.awsAccessKeyId=${AWS_ACCESS_KEY_ID}
hadoop-hdfs.fs.s3.awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY_esc}
I'm also not sure how to "su hadoop"; it asks for a password but I
don't know what that would be. When I ssh in of course, it uses
the account name from my computer (since that's the ssh command
that whirr directly provides as it wraps up cluster deployment),
but presumably to actually run a MapReduce job from the namenode I
need to switch to the hadoop user, right (hmmm, is this why I
couldn't create a directory within hadoop, as shown above)?
Incidentally, I also can't operate from my own machine because I
can't get the proxy to connect either. It may have something to
do with our corporate firewall, I'm not sure. For example, I get
this:
$ export HADOOP_CONF_DIR=~/.whirr/hadoop-from-laptop/
$ hadoop fs -ls /
2013-02-11 15:34:07,767 WARN conf.Configuration
(Configuration.java:<clinit>(477)) - DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated.
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
override properties of core-default.xml, mapred-default.xml and
hdfs-default.xml respectively
2013-02-11 15:34:08.337 java[8291:1203] Unable to load realm info
from SCDynamicStore
2013-02-11 15:34:08.408 java[8291:1203] Unable to load realm info
from SCDynamicStore
ls: Failed on local exception: java.net.SocketException: Malformed
reply from SOCKS server; Host Details : local host is:
"MyMachine.local/[ip-1]"; destination host is:
"ec2-[ip-2].compute-1.amazonaws.com
<http://compute-1.amazonaws.com>":8020;
~/ $
...while the proxy shell produces this error:
$ .whirr/hadoop-from-laptop/hadoop-proxy.sh
Running proxy to Hadoop cluster at
ec2-54-234-185-62.compute-1.amazonaws.com
<http://ec2-54-234-185-62.compute-1.amazonaws.com>. Use Ctrl-c to
quit.
Warning: Permanently added '54.234.185.62' (RSA) to the list of
known hosts.
channel 2: open failed: connect failed: Connection refused
Sooooooooo, I really don't understand what I'm seeing here: The
HDFS directories don't like like a normal Hadoop cluster, they
mirror the actual file system, I can't create directories within
HDFS, I can't tell whirr to put HDFS on S3, and I can't use the
proxy to interact with HDFS from my local machine. In fact, the
ONLY thing I've managed to do so far is create the cluster in the
first place.
This isn't working out very well so far. Where do I go from here?
Thanks.
________________________________________________________________________________
Keith Wiley [email protected] <mailto:[email protected]>
keithwiley.com <http://keithwiley.com> music.keithwiley.com
<http://music.keithwiley.com>
"I used to be with it, but then they changed what it was. Now,
what I'm with
isn't it, and what's it seems weird and scary to me."
-- Abe (Grandpa) Simpson
________________________________________________________________________________