Re: Whirr deployed hadoop cluster is very perplexing

Paul Baclace Mon, 11 Feb 2013 21:51:12 -0800

Check your fs.default.name setting in core-site.xml, it is probably setto file:/// ; the value of this property is prepended to a path that hasno URI protocol when resolved by "hadoop fs -ls ..." and jobinputs/outputs.

This "feature" enables local mode (single process) and pseudo or fullcluster modes to work using the same absolute paths without protocol://in test scripts. I think I used it once in 6 years, but someone outthere might rely on it.


The above means that when using a real cluster,

    hadoop fs -ls file:///

will indeed show the local root filesystem.


Paul

On 20130211 20:40 , Mark Grover wrote:

Hey Keith,

I am pretty new to Whirr myself but I think what you are seeing is aconfiguration thing.


What you are seeing is called local (or standalone) mode in Hadoop:
http://hadoop.apache.org/docs/r0.20.2/quickstart.html#Local

You will probably want to configure your cluster to be apsuedo-distributed cluster (if you are using one node) or regulardistributed cluster (if you are using multiple nodes) for a closer toreal-world scenario.


Mark

On Mon, Feb 11, 2013 at 3:54 PM, Keith Wiley <[email protected]<mailto:[email protected]>> wrote:


    I'm very confused by what I see when I use whirr to deploy a
    cluster.  For example, the HDFS directory clearly mirrors the
    nonHDFS file system from the top dir, which is highly
    unconventional for hadoop, meaning that "$ ls /" shows the same
    thing as "$ hadoop fs -ls /":

    $ hadoop fs -ls /
    Found 25 items
    drwxr-xr-x   - root root       4096 2010-02-24 01:35 /bin
    drwxr-xr-x   - root root       4096 2010-02-24 01:40 /boot
    drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data
    drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data0
    drwxr-xr-x   - root root      12900 2013-02-11 23:14 /dev
    drwxr-xr-x   - root root       4096 2013-02-11 23:19 /etc
    drwxr-xr-x   - root root       4096 2013-02-11 23:15 /home
    -rw-r--r--   1 root root    6763173 2010-02-24 01:40 /initrd.img
    -rw-r--r--   1 root root    3689712 2010-02-24 01:36 /initrd.img.old
    drwxr-xr-x   - root root      12288 2010-02-24 01:40 /lib
    drwx------   - root root      16384 2010-02-24 01:28 /lost+found
    drwxr-xr-x   - root root       4096 2010-02-24 01:31 /media
    drwxr-xr-x   - root root       4096 2013-02-11 23:19 /mnt
    drwxr-xr-x   - root root       4096 2010-02-24 01:31 /opt
    dr-xr-xr-x   - root root          0 2013-02-11 23:14 /proc
    drwx------   - root root       4096 2013-02-11 23:14 /root
    drwxr-xr-x   - root root       4096 2010-02-24 01:40 /sbin
    drwxr-xr-x   - root root       4096 2009-12-05 21:55 /selinux
    drwxr-xr-x   - root root       4096 2010-02-24 01:31 /srv
    drwxr-xr-x   - root root          0 2013-02-11 23:14 /sys
    drwxrwxrwt   - root root       4096 2013-02-11 23:20 /tmp
    drwxr-xr-x   - root root       4096 2010-02-24 01:31 /usr
    drwxr-xr-x   - root root       4096 2010-02-24 01:36 /var
    -rw-r--r--   1 root root    3089086 2010-02-06 20:26 /vmlinuz
    -rw-r--r--   1 root root    4252096 2010-02-20 10:31 /vmlinuz.old
    $

    Likewise, if I create a directory outside HDFS, I then see it from
    within HDFS, so they really are looking at the same file system.
     That's not how HDFS is usually configured.

    In addition, I can't actually operate within HDFS at all; I get an
    error as shown here:
    $ hadoop fs -mkdir /testdir
    mkdir: `/testdir': Input/output error

    Even if I can straighten out these seemingly first-step issues, I
    also don't understand how to tell whirr to put HDFS on S3.  I
    tried putting the following in hadoop.properties but I don't think
    it has any effect:

    hadoop-hdfs.fs.default.name
    
<http://hadoop-hdfs.fs.default.name>=s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY_esc}@somebucket
    OR...
    hadoop-hdfs.fs.default.name
    <http://hadoop-hdfs.fs.default.name>=s3://somebucket
    hadoop-hdfs.fs.s3.awsAccessKeyId=${AWS_ACCESS_KEY_ID}
    hadoop-hdfs.fs.s3.awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY_esc}

    I'm also not sure how to "su hadoop"; it asks for a password but I
    don't know what that would be.  When I ssh in of course, it uses
    the account name from my computer (since that's the ssh command
    that whirr directly provides as it wraps up cluster deployment),
    but presumably to actually run a MapReduce job from the namenode I
    need to switch to the hadoop user, right (hmmm, is this why I
    couldn't create a directory within hadoop, as shown above)?

    Incidentally, I also can't operate from my own machine because I
    can't get the proxy to connect either.  It may have something to
    do with our corporate firewall, I'm not sure.  For example, I get
    this:

    $ export HADOOP_CONF_DIR=~/.whirr/hadoop-from-laptop/
    $ hadoop fs -ls /
    2013-02-11 15:34:07,767 WARN  conf.Configuration
    (Configuration.java:<clinit>(477)) - DEPRECATED: hadoop-site.xml
    found in the classpath. Usage of hadoop-site.xml is deprecated.
    Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
    override properties of core-default.xml, mapred-default.xml and
    hdfs-default.xml respectively
    2013-02-11 15:34:08.337 java[8291:1203] Unable to load realm info
    from SCDynamicStore
    2013-02-11 15:34:08.408 java[8291:1203] Unable to load realm info
    from SCDynamicStore
    ls: Failed on local exception: java.net.SocketException: Malformed
    reply from SOCKS server; Host Details : local host is:
    "MyMachine.local/[ip-1]"; destination host is:
    "ec2-[ip-2].compute-1.amazonaws.com
    <http://compute-1.amazonaws.com>":8020;
    ~/ $
    ...while the proxy shell produces this error:
    $ .whirr/hadoop-from-laptop/hadoop-proxy.sh
    Running proxy to Hadoop cluster at
    ec2-54-234-185-62.compute-1.amazonaws.com
    <http://ec2-54-234-185-62.compute-1.amazonaws.com>. Use Ctrl-c to
    quit.
    Warning: Permanently added '54.234.185.62' (RSA) to the list of
    known hosts.
    channel 2: open failed: connect failed: Connection refused

    Sooooooooo, I really don't understand what I'm seeing here: The
    HDFS directories don't like like a normal Hadoop cluster, they
    mirror the actual file system, I can't create directories within
    HDFS, I can't tell whirr to put HDFS on S3, and I can't use the
    proxy to interact with HDFS from my local machine.  In fact, the
    ONLY thing I've managed to do so far is create the cluster in the
    first place.

    This isn't working out very well so far.  Where do I go from here?

    Thanks.


    
________________________________________________________________________________
    Keith Wiley [email protected] <mailto:[email protected]>
    keithwiley.com <http://keithwiley.com> music.keithwiley.com
    <http://music.keithwiley.com>

    "I used to be with it, but then they changed what it was.  Now,
    what I'm with
    isn't it, and what's it seems weird and scary to me."
                                               --  Abe (Grandpa) Simpson
    
________________________________________________________________________________

Re: Whirr deployed hadoop cluster is very perplexing

Reply via email to