Re: HDFS out of space

2009-06-22 Thread Allen Wittenauer



On 6/22/09 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote:

 Hi all,
 How does one handle a mount running out of space for HDFS?  We have two
 disks mounted on /mnt and /mnt2 respectively on one of the machines that are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way to
 tell the machine to balance itself out?  I know for the cluster, you can
 balance it using start-balancer.sh but I don't think that it will tell the
 individual machine to balance itself out.  Our hack right now would be
 just to delete the data on /mnt, since we have replication of 3x, we should
 be OK.  But I'd prefer not to do that.  Any thoughts?

Decommission the entire node, wait for data to be replicated,
re-commission, then do HDFS rebalance.  It blows, no doubt about it, but the
admin tools in the space are... lacking.




Re: Making sure the tmp directory is cleaned?

2009-06-22 Thread Allen Wittenauer



On 6/22/09 12:15 PM, Qin Gao q...@cs.cmu.edu wrote:
 Do you know if the tmp directory on every map/reduce task will be deleted
 automatically after the map task finishes or will do I have to delete them?
 
 I mean the tmp directory that automatically created by on current directory.

Past experience says that users will find writable space on nodes and fill
it, regardless of what Hadoop may do to try and keep it clean.  It is a good
idea to just wipe those spaces clean during hadoop upgrades and other
planned downtimes.



Re: Multicluster Communication

2009-06-19 Thread Allen Wittenauer
On 6/19/09 3:49 AM, Harish Mallipeddi harish.mallipe...@gmail.com wrote:
 Why do you want to do this in the first place? It seems like you want
 cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
 Doing something like that will be disastrous - Hadoop is all about sending
 computation closer to your data. If you don't want that, you need not even
 use hadoop.

Given some of the limitations with HDFS (quota operability, security), I
can easily why it would be desirable to have static data coming from one
grid while doing computation/intermediate outputs/real output to another.

Using performance as your sole metric of viability is a bigger disaster
waiting to happen.  Sure, we crashed the file system, but look how fast it
went down in flames!



Re: Restricting quota for users in HDFS

2009-06-16 Thread Allen Wittenauer



On 6/15/09 11:16 PM, Palleti, Pallavi pallavi.pall...@corp.aol.com
wrote:
 We have chown command in hadoop dfs to make a particular directory own
 by a person. Do we have something similar to create user with some space
 limit/restrict the disk usage by a particular user?

Quotas are implemented on a per-directory basis, not per-user.   There
is no support for this user can have X space, regardless of where he/she
writes only this directory has a limit of X space, regardless of who
writes there.



Re: data in yahoo / facebook hdfs

2009-06-15 Thread Allen Wittenauer



On 6/13/09 9:00 AM, PORTO aLET portoa...@gmail.com wrote:
 I am just wondering what do facebook/yahoo do with the data in hdfs after
 they finish processing the log files or whatever that are in hdfs?
 Are they simply deleted? or get backed up in tape ?
 whats the typical process?

The grid ops team here at Yahoo! has a strict retention policy that
dictates the data is deleted after X time period.  We perform no backups of
the data on the grid.  It is also worth mentioning that the data is loaded
from the primary source, so in the case of data corruption (hai hadoop-0.18)
or accidental deletion (where are my snapshots dev people?), we reload the
data from that primary source. (dependent, of course, on whether they still
have it or not)

 Also what is the process of adding a new node to the hadoop cluster? simply
 connect a new computer to the network (and setup the hadoop conf)?

http://wiki.apache.org/hadoop/FAQ#17



Re: ssh issues

2009-05-26 Thread Allen Wittenauer



On 5/26/09 3:40 AM, Steve Loughran ste...@apache.org wrote:
 HDFS is as secure as NFS: you are trusted to be who you say you are.
 Which means that you have to run it on a secured subnet -access
 restricted to trusted hosts and/or one two front end servers or accept
 that your dataset is readable and writeable by anyone on the network.
 
 There is user identification going in; it is currently at the level
 where it will stop someone accidentally deleting the entire filesystem
 if they lack the rights. Which has been known to happen.

Actually, I'd argue that HDFS is worse than even rudimentary NFS
implementations.  Off the top of my head:

a) There is no equivalent of squash root/force anonymous.  Any host can
assume privilege.
 
b) There is no 'read only from these hosts'.  If you can read blocks
over Hadoop RPC, you can write as well (minus safe mode).



Re: Optimal Filesystem (and Settings) for HDFS

2009-05-18 Thread Allen Wittenauer



On 5/18/09 11:33 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Do not forget 'tune2fs -m 2'. By default this value gets set at 5%.
 With 1 TB disks we got 33 GB more usable space. Talk about instant
 savings!

Yup. Although, I think we're using -m 1.


 On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard a...@cloudera.com wrote:
 I believe Yahoo! uses ext3,

Yup.  They won't buy me enough battery backed RAM to use a memory file
system. ;)



Re: Beware sun's jvm version 1.6.0_05-b13 on linux

2009-05-15 Thread Allen Wittenauer



On 5/15/09 11:38 AM, Owen O'Malley o...@yahoo-inc.com wrote:

 We have observed that the default jvm on RedHat 5

I'm sure some people are scratching their heads at this.

The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly
incapable of running Hadoop.  We [and, really, this is my doing... ^.^ ]
replace it with the JVM from the JPackage folks.  So while this isn't the
default JVM that comes from RHEL, the warning should still be heeded. 



Re: Advice on restarting HDFS in a cron

2009-04-24 Thread Allen Wittenauer



On 4/24/09 9:31 AM, Marc Limotte mlimo...@feeva.com wrote:
 I've heard that HDFS starts to slow down after it's been running for a long
 time.  And I believe I've experienced this.

We did an upgrade (== complete restart) of a 2000 node instance in ~20
minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.

I suspect people aren't running the secondary name node and therefore have
massively large edits file.  The name node appears slow on restart because
it has to apply the edits to the fsimage rather than having the secondary
keep it up to date.




Re: tuning performance

2009-03-13 Thread Allen Wittenauer



On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote:

    When you stripe you automatically make every disk in the system have the
 same speed as the slowest disk.  In our experiences, systems are more likely
 to have a 'slow' disk than a dead one and detecting that is really
 really hard.  In a distributed system, that multiplier effect can have
 significant consequences on the whole grids performance.
 
 All disk are the same, so there is no speed difference.

There will be when they start to fail. :)




Re: Backing up HDFS?

2009-02-09 Thread Allen Wittenauer
On 2/9/09 4:41 PM, Amandeep Khurana ama...@gmail.com wrote:
 Why would you want to have another backup beyond HDFS? HDFS itself
 replicates your data so if the reliability of the system shouldnt be a
 concern (if at all it is)...

I'm reminded of a previous job where a site administrator refused to make
tape backups (despite our continual harassment and pointing out that he was
in violation of the contract) because he said RAID was good enough.

Then the RAID controller failed. When we couldn't recover data from the
other mirror he was fired.  Not sure how they ever recovered, esp.
considering what the data was they lost.  Hopefully they had a paper trail.

To answer Nathan's question:

 On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote:
 
 How do people back up their data that they keep on HDFS? We have many TB of
 data which we need to get backed up but are unclear on how to do this
 efficiently/reliably.

The content of our HDFSes is loaded from elsewhere and is not considered
'the source of authority'.  It is the responsibility of the original sources
to maintain backups and we then follow their policies for data retention.
For user generated content, we provide *limited* (read: quota'ed) NFS space
that is backed up regularly.

Another strategy we take is multiple grids in multiple locations that get
the data loaded simultaneously.

The key here is to prioritize your data.  Impossible to replicate data gets
backed up using whatever means necessary, hard-to-regenerate data, next
priority. Easy to regenerate and ok to nuke data, doesn't get backed up.



Re: Cannot run program chmod: error=12, Not enough space

2009-01-30 Thread Allen Wittenauer
On 1/28/09 7:42 PM, Andy Liu andyliu1...@gmail.com wrote:
 I'm running Hadoop 0.19.0 on Solaris (SunOS 5.10 on x86) and many jobs are
 failing with this exception:
 
 Error initializing attempt_200901281655_0004_m_25_0:
 java.io.IOException: Cannot run program chmod: error=12, Not enough space
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
...
 at java.lang.UNIXProcess.forkAndExec(Native Method)
 at java.lang.UNIXProcess.(UNIXProcess.java:53)
 at java.lang.ProcessImpl.start(ProcessImpl.java:65)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
 ... 20 more
 
 However, all the disks have plenty of disk space left (over 800 gigs).  Can
 somebody point me in the right direction?

Not enough space is usually SysV kernel speak for not enough virtual
memory to swap.  See how much mem you have free.




Re: issues with hadoop in AIX

2008-12-27 Thread Allen Wittenauer
On 12/27/08 12:18 AM, Arun Venugopal arunvenugopa...@gmail.com wrote:
 Yes, I was able to run this on AIX as well with a minor change to the
 DF.java code. But this was more of a proof of concept than on a
 production system.

There are lots of places where Hadoop (esp. in contrib) interprets the
output of Unix command line utilities. Changes like this are likely going to
be required for AIX and other Unix systems that aren't being used by a
committer. :(




Re: 64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread Allen Wittenauer
On 11/25/08 3:58 PM, Sagar Naik [EMAIL PROTECTED] wrote:

 I am trying to migrate from 32 bit jvm and 64 bit for namenode only.
 *setup*
 NN - 64 bit
 Secondary namenode (instance 1) - 64 bit
 Secondary namenode (instance 2)  - 32 bit
 datanode- 32 bit
 
  From the mailing list I deduced that NN-64 bit and Datanode -32 bit
 combo works

Yup.  That's how we run it.

 But, I am not sure if S-NN-(instance 1--- 64 bit ) and S-NN (instance 2
 -- 32 bit) will work with this setup.

Considering that the primary and secondary process essentially the same
data, they should have the same memory requirements.  In other words, if you
need 64-bit for the name node, your secondary is going to require it too.

I'm also not sure if you can have two secondaries.  I'll let someone else
chime in on that. :)



Re: ls command output format

2008-11-21 Thread Allen Wittenauer



On 11/21/08 6:03 AM, Alexander Aristov [EMAIL PROTECTED]
wrote:

 Trying hadoop-0.18.2 I got next output
 
 [root]# hadoop fs -ls /
 Found 2 items
 drwxr-xr-x   - root supergroup  0 2008-11-21 08:08 /mnt
 drwxr-xr-x   - root supergroup  0 2008-11-21 08:19 /repos


... which reminds me.  I really wish ls didn't default to -l.



Re: Best way to handle namespace host failures

2008-11-12 Thread Allen Wittenauer



On 11/10/08 10:42 PM, Dhruba Borthakur [EMAIL PROTECTED] wrote:
 2. Create a virtual IP, say name.xx.com that points to the real
 machine name of the machine on which the namenode runs.

Everyone doing this should be aware of the discussion happening in

https://issues.apache.org/jira/browse/HADOOP-3988

though.



Re: NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-10 Thread Allen Wittenauer
On 11/10/08 1:30 AM, Aaron Kimball [EMAIL PROTECTED] wrote:
 It sounds like you think the 64- and 32-bit environments are effectively
 interchangable. May I ask why are you using both? The 64bit environment
 gives you access to more memory; do you see faster performance for the TT's
 in 32-bit mode? Do you get bit by library compatibility bugs that others
 should watch out for in running a dual-mode Hadoop environment?

Some random thoughts on our mixed environment:

A) The vast majority of user provided (legacy) code is 32-bit.  Since you
can't mix 64 and 32 bit objects at link or runtime, it just makes sense for
us to run TTs, etc, by default as 32-bit to give us the most bang for our
buck.

B) In the case of the data node, the memory usage is small enough that the
64-bit JVM isn't needed.

C) Since we currently run HOD, it should be possible for users to switch
their bit-ness and I think we have a handful of users that do.  We'll
probably lose this capability when we go back to a static job tracker. :(

D) For streaming jobs, the bit-ness of the JVM is irrelevant.  32-bit is
better due to the smaller footprint since streaming jobs eat memory like it
was candy. :)

E) We load the 64-bit and 32-bit versions of libraries on our nodes, thus
allowing us to move our bit-ness whenever we like.  This makes for a fat
image (so no RAM disk for the OS for us!), but given the streaming VM
issues, it works out mostly in our favor anyway.

In general, 64-bit code runs slower than 32-bit code.  So unless one
needs to access more memory or has external dependencies (JNIs, whatever),
32-bit for your Java environment is the way to go. The name node and maybe a
static job tracker are the potential problem children here and places where
I suspect most people will be using the 64-bit JVM.



Re: hadoop start problem

2008-11-10 Thread Allen Wittenauer
On 11/10/08 6:18 AM, Brian MacKay [EMAIL PROTECTED] wrote:
 I had a similar problem when I upgraded...  not sure of details why, but
 I had permissions problems trying to develop and run on windows out of
 cygwin.

At Apachecon, we think we identified a case where someone forgot to copy
the newer hadoop-defaults.xml into their old configuration directories that
they were using post-upgrade.  Hadoop acts really strangely under those
conditions.



Re: Can you specify the user on the map-reduce cluster in Hadoop streaming

2008-11-10 Thread Allen Wittenauer



On 11/10/08 12:21 PM, Rick Hangartner [EMAIL PROTECTED] wrote:
  But is there a proper way to allow developers to specify a remote_username
 they legitimately have access to on the cluster if it is not the same
 as the local_username of the account on their own machine they are
 using to submit a streaming job without setting HDFS permissions to
 777? 

There are ways that the Hadoop security as currently implemented can
be bypassed. If you really want to know how, that's probably better not
asked on a public list. ;)

But I'm curious as to your actual use case.

From what I can gather from your description, there are two possible
solutions, depending upon what you're looking to accomplish:

A) Turn off permissions

B) Create a group and make the output directory group writable

We use B a lot.  We don't use A at all.



Re: NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-06 Thread Allen Wittenauer



On 11/6/08 10:17 PM, C G [EMAIL PROTECTED] wrote:

 I've got a grid which has been up and running for some time. It's been using a
 32 bit JVM.  I am hitting the wall on memory within NameNode and need to
 specify max heap size  4G.  Is it possible to switch seemlessly from 32bit
 JVM to 64bit?  I've tried this on a small test grid and had no issues, but I
 want to make sure it's OK to proceed.

It should be.  We run name node with 64-bit JVM and everything else with
32-bit.

 Speaking of NameNode, what does it keep in memory?  Our memory usage ramped up
 rather suddenly recently.

Some one/thing created a ton of files, likely small.  Check your file
system contents, use the fs count command, etc, to look for anomalies.

 Also, does SecondaryNameNode require the same
 amount of memory as NameNode?

We're actually starting to see that it requires a tad bit more.



Re: Hadoop hardware specs

2008-11-04 Thread Allen Wittenauer



On 11/4/08 2:16 AM, Arijit Mukherjee [EMAIL PROTECTED]
wrote:

 * 1-5 TB external storage
 
 I'm curious to find out what sort of specs do people use normally. Is
 the external storage essential or will the individual disks on each node
 be sufficient? Why would you need an external storage in a hadoop
 cluster? 

The big reason for the external storage is two fold:

A) Provide shared home directory (especially for the HDFS user so that it is
easy to use the start scripts that call ssh)

B) An off-machine copy of the fsimage and edits file as used by the name
node.  This way if the name node goes belly up, you'll have an always
up-to-date backup to recover.

 How can I find out what other projects on hadoop are using?

Slide 12 of the Apachecon presentation I did earlier this year talks
about what Yahoo!'s typical node looks like.  For a small 5 node cluster,
your hardware specs seem fine to me.

An 8GB namenode for 4 data nodes (or maybe even running nn on the same
machine as a data node if memory size of jobs is kept in check) should be
a-ok, even if you double the storage.  You're likely going to run out of
disk space before the name node starts swapping.



Re: Keep free space with du.reserved and du.pct

2008-10-21 Thread Allen Wittenauer



On 10/21/08 3:33 AM, Jean-Adrien [EMAIL PROTECTED] wrote:
 I expected to keep 3.75 Gb free.
 But free space goes under 1 Gb, as if I kept the default settings

I noticed that you're running on /.  In general, this is a bad idea, as
space can disappear in various ways and you'll never know.  For example,
/var/log can grow tremendously without warning or there might be a
deleted-but-still-open file handle on /tmp.

What does a du on the dfs directories tell you?  How space is *actually*
being used by Hadoop?

You might also look around for dead task leftovers.

 I read a bit in jira (HADOOP-2991) and I saw that the implementation of
 these directives was subject to discussions. But it is not marked as
 affecting 0.18.1. What is the situation now ?

I'm fairly certain it is unchanged.  None of the developers seem
particularly interested in a static allocation method, deeming it too hard
to maintain when you have large or heterogeneous clusters.

HADOOP-2816 going into 0.19 is somewhat relevant, though, because the
name node UI is completely wrong when it comes to the actual capacity.



Re: Pushing jar files on slave machines

2008-10-13 Thread Allen Wittenauer
On 10/13/08 11:06 AM, Tarandeep Singh [EMAIL PROTECTED] wrote:
 I want to push third party jar files that are required to execute my job, on
 slave machines. What is the best way to do this?

Use a DistributedCache as part of your job submission.



Re: How to make LZO work?

2008-10-10 Thread Allen Wittenauer
On 10/9/08 6:46 PM, Songting Chen [EMAIL PROTECTED] wrote:
 Does that mean I have to rebuild the native library?
 
 Also, the LZO installation puts liblzo2.a and liblzo2.la under /usr/local/lib.
 There is no liblzo2.so there. Do I need to rename them to liblzo2.so somehow?


You need to compile and install lzo2 as a shared library.  IIRC, this is
not the default.
   

Also, the shared version (.so) will need to be part of your link path
(LD_LIBRARY_PATH env var, /etc/ld.so.conf on Linux, runtime option (usually
-R) to ld, ...) when you fire up the JVM so that Java can locate it when it
needs it.




Re: Hadoop and security.

2008-10-06 Thread Allen Wittenauer



On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote:

 Edward Capriolo wrote:
 You bring up some valid points. This would be a great topic for a
 white paper. 
 
 -a wiki page would be a start too

I was thinking about doing Deploying Hadoop Securely for a ApacheCon EU
talk, as by that time, some of the basic Kerberos stuff should be in
place... This whole conversation is starting to reinforce the idea




Re: is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-06 Thread Allen Wittenauer



On 10/2/08 11:33 PM, Frank Singleton [EMAIL PROTECTED] wrote:

 Just to clarify, this is for when the chown will modify all files owner
 attributes
 
 eg: toggle all from frank:frank to hadoop:hadoop (see below)

When we converted from 0.15 to 0.16, we chown'ed all of our files.  The
local dev team wrote the code in
https://issues.apache.org/jira/browse/HADOOP-3052 , but it wasn't committed
as a standard feature as they viewed this as a one off. :(

Needless to say, running a large chown as a MR job should be
significantly faster.



Re: Hadoop Cluster Size Scalability Numbers?

2008-09-21 Thread Allen Wittenauer



On 9/21/08 9:40 AM, Guilherme Menezes [EMAIL PROTECTED]
wrote:
 We currently have 4 nodes (16GB of
 ram, 6 * 750 GB disks, Quad-Core AMD Opteron processor). Our initial plans
 are to perform a Web crawl for academic purposes (something between 500
 million and 1 billion pages), and we need to expand the number of nodes for
 that. Is it better to have a larger number of nodes simpler than the ones we
 currently have (less memory, less processing?) in terms of Hadoop
 performance?

Your current boxes seem overpowered for crawling. If it were me, I'd
probably:

a) turn the current four machines into dedicated namenode, job
tracker, secondary name node, oh-no-a-machine-just-died! backup node (setup
an nfs server on it and run it as your secondary direct copy of the fsimage
and edits file if you don't have one).   With 16gb name nodes, you should be
able to store a lot of data.

b) when you buy new nodes, I'd cut down on memory and cpu and just
turn them into your work horses

That said, I know little-to-nothing about crawling.  So, IMHO on the
above.



Re: How to manage a large cluster?

2008-09-11 Thread Allen Wittenauer
On 9/11/08 2:39 AM, Alex Loddengaard [EMAIL PROTECTED] wrote:
 I've never dealt with a large cluster, though I'd imagine it is managed the
 same way as small clusters:

Maybe. :)

 -Use hostnames or ips, whichever is more convenient for you

Use hostnames.  Seriously.  Who are you people using raw IPs for things?
:)  Besides, you're going to need it for the eventual support of Kerberos.

 -All the slaves need to go into the slave file

We only put this file on the namenode and 2ndary namenode to prevent
accidents.

 -You can update software by using bin/hadoop-daemons.sh.  Something like:
 #bin/hadoop-daemons.sh rsync (mastersrcpath) (localdestpath)

We don't use that because it doesn't take inconsideration down nodes
(and you *will* have down nodes!) or deal with nodes that are outside the
grid (such as our gateways/bastion hosts, data loading machines, etc).

Instead, use a real system configuration management package such as
bcfg2, smartfrog, puppet, cfengine, etc.  [Steve, you owe me for the plug.
:) ]

 I created a wiki page that currently contains one tip for managing large
 clusters.  Could others add to this wiki page?
 
 http://wiki.apache.org/hadoop/LargeClusterTips

Quite a bit of what we do is covered in the latter half of
http://tinyurl.com/5foamm .  This is a presentation I did at ApacheCon EU
this past April that included some of the behind-the-scenes of the large
clusters at Y!.  At some point I'll probably do an updated version that
includes more adminy things (such as why we push four different types of
Hadoop configurations per grid) while others talk about core Hadoop stuff.



Re: critical name node problem

2008-09-05 Thread Allen Wittenauer



On 9/5/08 5:53 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote:

 Another idea would be a tool or namenode startup mode that would make it
 ignore EOFExceptions to recover as much of the edits as possible.

We clearly need to change the how to configure docs to make sure
people put at least two directories on two different storage systems for the
dfs.name.dir  .  This problem seems to happen quite often, and having two+
dirs helps protect against it.

We recently had one of the disks on one of our copies go bad.  The
system kept going just fine until we had a chance to reconfig the name node.

That said, I've just HADOOP-4080 to help alert admins in these
situations.



Re: Slaves Hot-Swaping

2008-09-02 Thread Allen Wittenauer



On 9/2/08 8:33 AM, Camilo Gonzalez [EMAIL PROTECTED] wrote:

 I was wondering if there is a way to Hot-Swap Slave machines, for example,
 in case an Slave machine fails while the Cluster is running and I want to
 mount a new Slave machine to replace the old one, is there a way to tell the
 Master that a new Slave machine is Online without having to stop and start
 again the Cluster? I would appreciate the name of this, I don't think it is
 named Hot-Swaping, I don't know even if this exists. Lol


:)

Using hadoop dfsadmin -refreshNodes, you can have the name node reload
the include and exclude files.



Re: Load balancing in HDFS

2008-09-01 Thread Allen Wittenauer



On 8/27/08 7:51 AM, Mork0075 [EMAIL PROTECTED] wrote:

 This sound really interesting. And while increasing the replicas for
 certain files, the available troughput for these files increases too?

Yes, as there are more places to pull the file from.  This needs to get
weighed against the amount of work the name node will use to re-replicate
the file in case of failure and the total amount of disk space used... So
the extra bandwidth isn't free.

 
 Allen Wittenauer schrieb:
 
 
 On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
 i'am planning to use HDFS as a DFS in a web application evenvironment.
 There are two requirements: fault tolerence, which is ensured by the
 replicas and load balancing.
 
 There is a SPOF in the form of the name node.  So depending upon your
 needs, that may or may not be acceptable risk.
 
 On 8/27/08 1:23 AM, Mork0075 [EMAIL PROTECTED] wrote:
 Some documents stored in the HDFS could be very popular and
 therefor accessed more often then others. Then HDFS needs to balance the
 load - distribute the requests to different nodes. Is i possible?
 
 Not automatically.  However, it is possible to manually/programmatically
 increase the replication on files.
 
 This is one of the possible uses for the new audit logging in 0.18... By
 watching the log, it should be possible to determine which files need a
 higher replication factor.
 
 
 



Re: Load balancing in HDFS

2008-08-27 Thread Allen Wittenauer



On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
 i'am planning to use HDFS as a DFS in a web application evenvironment.
 There are two requirements: fault tolerence, which is ensured by the
 replicas and load balancing.

There is a SPOF in the form of the name node.  So depending upon your
needs, that may or may not be acceptable risk.

On 8/27/08 1:23 AM, Mork0075 [EMAIL PROTECTED] wrote:
 Some documents stored in the HDFS could be very popular and
 therefor accessed more often then others. Then HDFS needs to balance the
 load - distribute the requests to different nodes. Is i possible?

Not automatically.  However, it is possible to manually/programmatically
increase the replication on files.

This is one of the possible uses for the new audit logging in 0.18... By
watching the log, it should be possible to determine which files need a
higher replication factor.



Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment

2008-08-18 Thread Allen Wittenauer
On 8/17/08 10:56 AM, Filippo Spiga [EMAIL PROTECTED] wrote:
 I read the tutorial about HOD (Hadoop on demand) but HOD use torque only for
 initial node allocation. I would use TORQUE also for computation, allowing
 users to load data into HDFS, submit a TORQUE JOB that execute a Map/Reduce
 task and after retrive results. It's important for me that Map/Reduce tasks
 run only on the subset of nodes selected by TORQUE.
 
 Can someone help me?

This is essentially how we use HOD.  We have a HDFS that run on all
nodes and then use torque to allocate mini-mapreduce clusters on top of
that.

To limit the amount of nodes get used for MapReduce, IIRC you need to
create a queue in torque to be used for MR.  Then limit the amount of nodes
that queue will be allowed to allocate at one time in Maui using the
classcfg stuff.




Re: NameNode hardware specs

2008-08-12 Thread Allen Wittenauer



On 8/12/08 12:07 PM, lohit [EMAIL PROTECTED] wrote:
  - why RAID5?
 - If running RAID 5, why is this necessary?
 Not absolute necessary.

I'd be afraid of the write penalty of RAID5 vs, say, RAID10 or even just
plain RAID1.

For the record, I don't think we have any production systems except
maybe one that uses any sort of RAID methods on the name node.

I'm sure Steve will pop up at some point and explain his reasoning on
this one. ;)



Re: performance not great, or did I miss something?

2008-08-08 Thread Allen Wittenauer
On 8/8/08 1:25 PM, James Graham (Greywolf) [EMAIL PROTECTED] wrote:
 226GB of available disk space on each one;
 4 processors (2 x dualcore)
 8GB of RAM each.

Some simple stuff:

(Assuming SATA):
Are you using AHCI?
Do you have the write cache enabled?

Is the topologyProgram providing proper results?
Is DNS performing as expected? Is it fast?
How many tasks per node?
How much heap does your name node have?  Is it going into garbage collection
or swapping?



Re: Configuration: I need help.

2008-08-06 Thread Allen Wittenauer
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 You can put the same hadoop-site.xml on all machines.  Yes, you do want a
 secondary NN - a single NN is a SPOF.  Browser the archives a few days back to
 find an email from Paul about DRBD (disk replication) to avoid this SPOF.

Keep in mind that even with a secondary name node, you still have a
SPOF.  If the NameNode process dies, so does your HDFS. 



Re: having different HADOOP_HOME for master and slaves?

2008-08-04 Thread Allen Wittenauer



On 8/4/08 11:10 AM, Meng Mao [EMAIL PROTECTED] wrote:
 I suppose I could, for each datanode, symlink things to point to the actual
 Hadoop installation. But really, I would like the setup that is hinted as
 possible by statement 1). Is there a way I could do it, or should that bit
 of documentation read, All machines in the cluster _must_ have the same
 HADOOP_HOME?

If you run the -all scripts, they assume the location is the same.
AFAIK, there is nothing preventing you from building your own -all scripts
that point to the different location to start/stop the data nodes.




Re: Hadoop 4 disks per server

2008-07-29 Thread Allen Wittenauer
On 7/29/08 6:37 PM, Rafael Turk [EMAIL PROTECTED] wrote:
  I´m setting up a cluster with 4 disks per server. Is there any way to make
 Hadoop aware of this setup and take benefits from that?

This is how we run our nodes.  You just need to list the four file
systems in the configuration files and the datanode and map/red processes
will know what to do.



Re: Restricting Job Submission Access

2008-07-17 Thread Allen Wittenauer



On 7/17/08 3:33 PM, Theocharis Ian Athanasakis [EMAIL PROTECTED] wrote:

 What's the recommended way to restrict access to job submissions and
 HDFS access, besides a firewall?

We basically put bastion hosts (we call them gateways) next to hadoop
that users use to submit jobs, access the HDFS, etc.  By limiting who can
get onto the gateways, we limit access.  We also use HOD, so we have all of
Torque's access and resource control capabilities as well.

Not a replacement for real security, obviously.

Oh, I think there might be some diagrams, pictures and other info  about
this in my preso on the hadoop wiki.




Re: client connect as different username?

2008-06-11 Thread Allen Wittenauer



On 6/11/08 5:17 PM, Chris Collins [EMAIL PROTECTED] wrote:

 The finer point to this is that in development you may be logged in as
 user x and have a shared hdfs instance that a number of people are
 using.  In that mode its not practical to sudo as you have all your
 development tools setup for userx.  hdfs is setup with a single user,
 what is the procedure to add users to that hdfs instance?  It has to
 support it surely?  Its really not obvious, looking in the hdfs docs
 that come with the distro nothing springs out.  the hadoop command
 line tool doesnt have anything that vaguely looks like a way to create
 a user.

User information is sent from the client.  The code literally does a
'whoami' and 'groups' and sends that information to the server.

Shared data should be handled just like you would in UNIX:

- create a directory
- set permissions to be insecure
- go crazy

  



Re: compressed/encrypted file

2008-06-05 Thread Allen Wittenauer
On 6/5/08 11:38 AM, Ted Dunning [EMAIL PROTECTED] wrote:
 We use encryption on log files using standard AES.  I wrote an input format
 to deal with it.
 
 Key distribution should be done better than we do it.  My preference would
 be to insert an auth key into the job conf which is then used by the input
 to open a well known keyring via an API that prevents auths from surviving
 for long term.

   This sounds like it opens the door for key stealing in a
multi-user/static job tracker system, since the job conf is readable by all
jobs running on the same machine.



Re: compressed/encrypted file

2008-06-05 Thread Allen Wittenauer
On 6/5/08 11:57 AM, Ted Dunning [EMAIL PROTECTED] wrote:
 Can you suggest an alternative way to communicate a secret to hadoop tasks
 short of embedding it into source code?

This is one of the reasons why we use hod--job isolation such that it
helps prevent data leaks from one job to the next.



Re: hadoop on EC2

2008-05-28 Thread Allen Wittenauer



On 5/28/08 1:22 PM, Andreas Kostyrka [EMAIL PROTECTED] wrote:
 I just wondered what other people use to access the hadoop webservers,
 when running on EC2?

While we don't run on EC2 :), we do protect the hadoop web processes by
putting a proxy in front of it.  A user connects to the proxy,
authenticates, and then gets the output from the hadoop process.  All of the
redirection magic happens via a localhost connection, so no data is leaked
unprotected.



Re: How do people keep their client configurations in sync with the remote cluster(s)

2008-05-15 Thread Allen Wittenauer



On 5/15/08 8:56 AM, Steve Loughran [EMAIL PROTECTED] wrote:

 Allen Wittenauer wrote:
 On 5/15/08 5:05 AM, Steve Loughran [EMAIL PROTECTED] wrote:
 I have a question for users: how do they ensure their client apps have
 configuration XML file that are kept up to date?
 
 We control the client as well as the servers, so it all gets pushed at
 once. :)
 
 yes, but you use NFS, so you have your own problems, like the log
 message NFS Server not responding still trying appearing across
 everyone's machines simultaneously, which is to be feared almost as much
 as when ClearCase announces that its filesystem is offline.

We don't use NFS for this.




Re: using sge, or drmaa for HOD

2008-05-02 Thread Allen Wittenauer
On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote:
 Also I was thinking of
 modifying HOD to run on grid engine.  I haven't really begun to pour
 over all the code for HOD but, my question is this, can I just write a
 python module similar to that of torque.py under hod/schedulers/ for sge
   or would this require significant modification in HOD and possibly
 hadoop?

Given that both torque and SGE are based off of (IEEE standard) PBS, it
might even run unmodified. 



Re: Hadoop Cluster Administration Tools?

2008-05-01 Thread Allen Wittenauer
On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote:
 *Very* cool information. As someone who's leading the transition to
 open-source and cluster-orientation  at a company of about 50 people,
 finding good tools for the IT staff to use is essential. Thanks so much for
 the continued feedback.

Hmm.  I should upload my slides.




Re: Help with configuration

2008-04-22 Thread Allen Wittenauer
On 4/22/08 7:12 AM, [EMAIL PROTECTED] [EMAIL PROTECTED]
wrote:
 I am getting this annoying error message every time I start
 bin/start-all.sh with one single node
 command-line: line 0: Bad configuration option: ConnectTimeout
 
 Do you know what could be the issue?? I can not find it in the FAQs, Thank
 you for your help.


You likely have an ancient version of ssh installed that doesn't support
ConnectTimeout. Best bet is to hack the start-all.sh or to manually start
each data node.

What OS? I'm guessing Solaris 9, from the bits and pieces I know about
JPMC's infrastructure. ;)




Re: Newbie asking: ordinary filesystem above Hadoop

2008-04-22 Thread Allen Wittenauer



On 4/22/08 12:23 PM, Mika Joukainen [EMAIL PROTECTED] wrote:

 All right, I have to refrase: like to have storage system for files which
 are inserted by the users. Users are going to use normal human operable sw
 entities ;) System is going to have: fault tolerance, parallelism etc. ==
 HDFS, isn't it.

No, it isn't. You're looking for Lustre and similar file systems.




Re: New bee quick questions :-)

2008-04-21 Thread Allen Wittenauer



On 4/21/08 3:36 AM, vikas [EMAIL PROTECTED] wrote:

Most of your questions have been answered by Luca, from what I can see,
so let me tackle the rest a bit...

 4) Let us suppose I want to shutdown one datanode for maintenance  purpose.
 is there any way to inform Hadoop saying that this particular datanode is
 going done -- please make sure the data in it is replicated else where ?

You want to do datanode decommissioning.  See
http://wiki.apache.org/hadoop/FAQ#17 for details.

 5) I was going through some videos on MAP-Reduce and few Yahoo tech talks.
 in that they were specifying a Hadoop cluster has multiple cores -- what
 does this mean ?

I haven't watched the tech talks in ages, but we generally refer to
cores in a variety of ways.  There is the single physical box verson--an
individual processor has more than one execution unit, thereby giving it a
degree of parallelism.  Then there is the complete grid count--an individual
grid can have lots and lots of processors with lots and lots of individual
cores on those processors which works out to be a pretty good rough
estimation of how many individual Hadoop tasks can be run simultaneously.

   5.1) can I have multiple instance of namenodes running in a cluster apart
 from secondary nodes ?

No.  The name node is a single point of failure in the system.
 
 6) If I go on create huge files will they be balanced among all the
 datanodes ? or do I need to change the creation of file location in the
 application.

In addition to what Luca said, be aware that if you load a file on a
machine with a data node process, the data for that file will *always* get
loaded to that machine.  This can cause your data nodes to get extremely
unbalanced.   You are much better off doing data loads *off grid*/from
another machine.  Since you only need the hadoop configuration and binaries
available (in other words, no hadoop processes need be running), this
usually isn't too painful to do.

In 0.16.x, there is a rebalancer to help fix this situation, but I have
no practical experience with it yet to say whether or not it works.



Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Allen Wittenauer
On 4/21/08 2:18 PM, Ted Dunning [EMAIL PROTECTED] wrote:
 I agree with the fair and balanced part.  I always try to keep my clusters
 fair and balanced!
 
 Joydeep should mention his background.  In any case, I agree that high-end
 filers may provide good enough NFS service, but I would also contend that
 HDFS has been better for me than NFS from generic servers.

We take a mixed approach to the NFS problem.

For grids that have some sort of service level agreement associated with
it, we do not allow NFS connections.  The jobs must be reasonably self
contained.

For other grids (research, development, etc), we do allow NFS
connections and hope that people don't do stupid things.

It is probably worth pointing out that it is much easier for a user to
do stupid things with, say, 500 nodes than 5. So we take a much a more
conservative view for grids we care about.

As Joydeep said, the implementation of the stack does make a huge
difference.  NetApp and Sun are leaps and bounds better than most.  In the
case of Linux, it has made great strides forward but I'd be leary using it
for the sorts of workloads we have.



Re: Add your project or company to the powered by page?

2008-02-21 Thread Allen Wittenauer
On 2/21/08 11:34 AM, Jeff Hammerbacher [EMAIL PROTECTED]
wrote:
 yeah, i've heard those facebook groups can be a great way to get the word
 out...
 
 anyways, just got approval yesterday for a 320 node cluster.  each node has
 8 cores and 4 TB of raw storage so this guy is gonna be pretty powerful.
 can we claim largest cluster outside of yahoo?

I guess it depends upon how you define outside.

   *Technically*, M45 is outside of a Yahoo! building, given that it is in
one of those shipping-container-data-center-thingies ...



Re: Starting up a larger cluster

2008-02-08 Thread Allen Wittenauer
On 2/7/08 11:01 PM, Tim Wintle [EMAIL PROTECTED] wrote:

  it's
 useful to be able to connect from nodes that aren't in the slaves file
 so that you can put in input data direct from another machine that's not
 part of the cluster,

I'd actually recommend this as a best practice.  We've been bit over...
and over... and over... with users loading data into HDFS from a data node
only to discover that the block distribution is pretty horrid which in
turn means that MR performance is equally horrid. [Remember: all writes will
go the local node if it is a data node!]

We're now down to the point that we've got one (relatively smaller) grid
that is used for data loading/creation/extraction which then distcp's its
contents to another grid.

Less than ideal, but definitely helps the performance of the entire
'real' grid.




Re: Starting up a larger cluster

2008-02-08 Thread Allen Wittenauer

On 2/8/08 9:32 AM, Jeff Eastman [EMAIL PROTECTED] wrote:
 I noticed that phenomena right off the bat. Is that a designed feature
 or just an unhappy consequence of how blocks are allocated?

My understanding is that this is by design--when you are running a MR
job, you want the output, temp files, etc, to be local.

 Ted
 compensates for this by aggressively rebalancing his cluster often by
 adjusting the replication up and down, but I wonder if an improvement in
 the allocation strategy would improve this.

IIRC, we're getting a block re-balancer in 0.16 so this particular
annoyance should mostly go away soon.

 I've also used Ted's trick, with less than marvelous results. I'd hate
 to pull my biggest machine (where I store all the backup files) out of
 the cluster just to get more even block distribution but I may have to.

Been there, done that.  (At one time, we were decomm'ing entire racks to
force redistribution.  I seem recall that we hit a bug so we then slowed
down to doing 10 at a time.)