Re: Whirr deployed hadoop cluster is very perplexing

Mark Grover Mon, 11 Feb 2013 20:41:27 -0800

Hey Keith,
I am pretty new to Whirr myself but I think what you are seeing is a
configuration thing.


What you are seeing is called local (or standalone) mode in Hadoop:
http://hadoop.apache.org/docs/r0.20.2/quickstart.html#Local

You will probably want to configure your cluster to be a psuedo-distributed
cluster (if you are using one node) or regular distributed cluster (if you
are using multiple nodes) for a closer to real-world scenario.

Mark

On Mon, Feb 11, 2013 at 3:54 PM, Keith Wiley <[email protected]> wrote:

> I'm very confused by what I see when I use whirr to deploy a cluster.  For
> example, the HDFS directory clearly mirrors the nonHDFS file system from
> the top dir, which is highly unconventional for hadoop, meaning that "$ ls
> /" shows the same thing as "$ hadoop fs -ls /":
>
> $ hadoop fs -ls /
> Found 25 items
> drwxr-xr-x   - root root       4096 2010-02-24 01:35 /bin
> drwxr-xr-x   - root root       4096 2010-02-24 01:40 /boot
> drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data
> drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data0
> drwxr-xr-x   - root root      12900 2013-02-11 23:14 /dev
> drwxr-xr-x   - root root       4096 2013-02-11 23:19 /etc
> drwxr-xr-x   - root root       4096 2013-02-11 23:15 /home
> -rw-r--r--   1 root root    6763173 2010-02-24 01:40 /initrd.img
> -rw-r--r--   1 root root    3689712 2010-02-24 01:36 /initrd.img.old
> drwxr-xr-x   - root root      12288 2010-02-24 01:40 /lib
> drwx------   - root root      16384 2010-02-24 01:28 /lost+found
> drwxr-xr-x   - root root       4096 2010-02-24 01:31 /media
> drwxr-xr-x   - root root       4096 2013-02-11 23:19 /mnt
> drwxr-xr-x   - root root       4096 2010-02-24 01:31 /opt
> dr-xr-xr-x   - root root          0 2013-02-11 23:14 /proc
> drwx------   - root root       4096 2013-02-11 23:14 /root
> drwxr-xr-x   - root root       4096 2010-02-24 01:40 /sbin
> drwxr-xr-x   - root root       4096 2009-12-05 21:55 /selinux
> drwxr-xr-x   - root root       4096 2010-02-24 01:31 /srv
> drwxr-xr-x   - root root          0 2013-02-11 23:14 /sys
> drwxrwxrwt   - root root       4096 2013-02-11 23:20 /tmp
> drwxr-xr-x   - root root       4096 2010-02-24 01:31 /usr
> drwxr-xr-x   - root root       4096 2010-02-24 01:36 /var
> -rw-r--r--   1 root root    3089086 2010-02-06 20:26 /vmlinuz
> -rw-r--r--   1 root root    4252096 2010-02-20 10:31 /vmlinuz.old
> $
>
> Likewise, if I create a directory outside HDFS, I then see it from within
> HDFS, so they really are looking at the same file system.  That's not how
> HDFS is usually configured.
>
> In addition, I can't actually operate within HDFS at all; I get an error
> as shown here:
> $ hadoop fs -mkdir /testdir
> mkdir: `/testdir': Input/output error
>
> Even if I can straighten out these seemingly first-step issues, I also
> don't understand how to tell whirr to put HDFS on S3.  I tried putting the
> following in hadoop.properties but I don't think it has any effect:
>
> hadoop-hdfs.fs.default.name
> =s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY_esc}@somebucket
> OR...
> hadoop-hdfs.fs.default.name=s3://somebucket
> hadoop-hdfs.fs.s3.awsAccessKeyId=${AWS_ACCESS_KEY_ID}
> hadoop-hdfs.fs.s3.awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY_esc}
>
> I'm also not sure how to "su hadoop"; it asks for a password but I don't
> know what that would be.  When I ssh in of course, it uses the account name
> from my computer (since that's the ssh command that whirr directly provides
> as it wraps up cluster deployment), but presumably to actually run a
> MapReduce job from the namenode I need to switch to the hadoop user, right
> (hmmm, is this why I couldn't create a directory within hadoop, as shown
> above)?
>
> Incidentally, I also can't operate from my own machine because I can't get
> the proxy to connect either.  It may have something to do with our
> corporate firewall, I'm not sure.  For example, I get this:
>
> $ export HADOOP_CONF_DIR=~/.whirr/hadoop-from-laptop/
> $ hadoop fs -ls /
> 2013-02-11 15:34:07,767 WARN  conf.Configuration
> (Configuration.java:<clinit>(477)) - DEPRECATED: hadoop-site.xml found in
> the classpath. Usage of hadoop-site.xml is deprecated. Instead use
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> 2013-02-11 15:34:08.337 java[8291:1203] Unable to load realm info from
> SCDynamicStore
> 2013-02-11 15:34:08.408 java[8291:1203] Unable to load realm info from
> SCDynamicStore
> ls: Failed on local exception: java.net.SocketException: Malformed reply
> from SOCKS server; Host Details : local host is: "MyMachine.local/[ip-1]";
> destination host is: "ec2-[ip-2].compute-1.amazonaws.com":8020;
> ~/ $
> ...while the proxy shell produces this error:
> $ .whirr/hadoop-from-laptop/hadoop-proxy.sh
> Running proxy to Hadoop cluster at
> ec2-54-234-185-62.compute-1.amazonaws.com. Use Ctrl-c to quit.
> Warning: Permanently added '54.234.185.62' (RSA) to the list of known
> hosts.
> channel 2: open failed: connect failed: Connection refused
>
> Sooooooooo, I really don't understand what I'm seeing here: The HDFS
> directories don't like like a normal Hadoop cluster, they mirror the actual
> file system, I can't create directories within HDFS, I can't tell whirr to
> put HDFS on S3, and I can't use the proxy to interact with HDFS from my
> local machine.  In fact, the ONLY thing I've managed to do so far is create
> the cluster in the first place.
>
> This isn't working out very well so far.  Where do I go from here?
>
> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     [email protected]     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

Re: Whirr deployed hadoop cluster is very perplexing

Reply via email to