Re: Hadoop Cluster Administration Tools?

2008-05-02 Thread Steve Loughran

Allen Wittenauer wrote:

On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote:

*Very* cool information. As someone who's leading the transition to
open-source and cluster-orientation  at a company of about 50 people,
finding good tools for the IT staff to use is essential. Thanks so much for
the continued feedback.


Hmm.  I should upload my slides.




That would be excellent! I was trying not to scare people with things 
like PXE preboot or the challenge of bringing up a farm of 500+ servers 
when the building has just suffered a power outage. I will let your 
slides do that.


The key things people have to remember are
-you can't do stuff by hand once you have more than one box; you need to 
have some story for scaling things up. It could be hand creating some 
machine image that is cloned, it could be using CM tools. If you find 
yourself trying to ssh in to boxes to configure them by hand, you are in 
trouble


-once you have enough racks in your cluster, you can abandon any notion 
of 100% availability. You have to have be prepared to deal with the 
failures as an everyday event. The worst failures are not the machines 
that drop off the net, its the ones that start misbehaving with memory 
corruption or a network card that starts flooding the network,.




Re: Hadoop Cluster Administration Tools?

2008-05-02 Thread Khalil Honsali
useful information indeed, though a bit complicated for my level I must say
I think it is more than useful to post these online, say maybe in Hadoop's
wiki or as an article on cluster resource sites..
How about it? I can volunteer for this if you wish, a central information
place on the hadoop wiki for pre-install clusters admin?
- OS image install
- ssh setup
- dsh ant tools setup
- rpm automation
- this.next( ? )

2008/5/2 Steve Loughran [EMAIL PROTECTED]:

 Allen Wittenauer wrote:

  On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED]
  wrote:
 
   *Very* cool information. As someone who's leading the transition to
   open-source and cluster-orientation  at a company of about 50 people,
   finding good tools for the IT staff to use is essential. Thanks so
   much for
   the continued feedback.
  
 
 Hmm.  I should upload my slides.
 
 
 
 That would be excellent! I was trying not to scare people with things like
 PXE preboot or the challenge of bringing up a farm of 500+ servers when the
 building has just suffered a power outage. I will let your slides do that.

 The key things people have to remember are
 -you can't do stuff by hand once you have more than one box; you need to
 have some story for scaling things up. It could be hand creating some
 machine image that is cloned, it could be using CM tools. If you find
 yourself trying to ssh in to boxes to configure them by hand, you are in
 trouble

 -once you have enough racks in your cluster, you can abandon any notion of
 100% availability. You have to have be prepared to deal with the failures as
 an everyday event. The worst failures are not the machines that drop off the
 net, its the ones that start misbehaving with memory corruption or a network
 card that starts flooding the network,.




--


Re: JobConf: How to pass List/Map

2008-05-02 Thread Enis Soztutar

It is exactly what DefaultStringifier does, ugly but useful *smile*.

Jason Venner wrote:
We have been serializing to a bytearrayoutput stream then base64 
encoding the underlying byte array and passing that string in the conf.

It is ugly but it works well until 0.17

Enis Soztutar wrote:
Yes Stringifier was committed in 0.17. What you can do in 0.16 is to 
simulate DefaultStringifier. The key feature of the Stringifier is 
that it can convert/restore any object to string using base64 
encoding on the binary form of the object. If your objects can be 
easily converted to and from strings, then you can directly store 
them in conf. The other obvious alternative would be to switch to 
0.17, once it is out.


Tarandeep Singh wrote:
On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar 
[EMAIL PROTECTED] wrote:
 

Hi,

 There are many ways which you can pass objects using configuration.
Possibly the easiest way would be to use Stringifier interface.

 you can for example :

 DefaultStringifier.store(conf, variable ,mykey);

 variable = DefaultStringifier.load(conf, mykey, variableClass );



thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 
0.17 version -

https://issues.apache.org/jira/browse/HADOOP-3048

Any thoughts on how to do this in 0.16 version ?

thanks,
Taran

 
 you should take into account that the variable you pass to 
configuration

should be serializable by the framework. That means it must implement
Writable of Serializable interfaces. In your particular case, you 
might want

to look at ArrayWritable and MapWritable classes.

 That said, you should however not pass large objects via 
configuration,
since it can seriously effect job overhead. If the data you want to 
pass is
large, then you should use other alternatives(such as 
DistributedCache,

HDFS, etc).



 Tarandeep Singh wrote:

  

Hi,

How can I set a list or map to JobConf that I can access in
Mapper/Reducer class ?
The get/setObject method from Configuration has been deprecated and
the documentation says -
A side map of Configuration to Object should be used instead.
I could not follow this :(

Can someone please explain to me how to do this ?

Thanks,
Taran



  


  




Re: using sge, or drmaa for HOD

2008-05-02 Thread Allen Wittenauer
On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote:
 Also I was thinking of
 modifying HOD to run on grid engine.  I haven't really begun to pour
 over all the code for HOD but, my question is this, can I just write a
 python module similar to that of torque.py under hod/schedulers/ for sge
   or would this require significant modification in HOD and possibly
 hadoop?

Given that both torque and SGE are based off of (IEEE standard) PBS, it
might even run unmodified. 



Re: using sge, or drmaa for HOD

2008-05-02 Thread Andre Gauthier
Yeah they are very similar, the commands are the same , but some options 
are different.


Allen Wittenauer wrote:

On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote:

Also I was thinking of
modifying HOD to run on grid engine.  I haven't really begun to pour
over all the code for HOD but, my question is this, can I just write a
python module similar to that of torque.py under hod/schedulers/ for sge
  or would this require significant modification in HOD and possibly
hadoop?


Given that both torque and SGE are based off of (IEEE standard) PBS, it
might even run unmodified. 



Re: Hadoop Cluster Administration Tools?

2008-05-02 Thread Ted Dunning

Great idea.

Go for it.  Pick a  place and start writing.  It is a wiki so if you start
it, others will comment on it.


On 5/2/08 5:28 AM, Khalil Honsali [EMAIL PROTECTED] wrote:

 useful information indeed, though a bit complicated for my level I must say
 I think it is more than useful to post these online, say maybe in Hadoop's
 wiki or as an article on cluster resource sites..
 How about it? I can volunteer for this if you wish, a central information
 place on the hadoop wiki for pre-install clusters admin?
 - OS image install
 - ssh setup
 - dsh ant tools setup
 - rpm automation
 - this.next( ? )
 
 2008/5/2 Steve Loughran [EMAIL PROTECTED]:
 
 Allen Wittenauer wrote:
 
 On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED]
 wrote:
 
 *Very* cool information. As someone who's leading the transition to
 open-source and cluster-orientation  at a company of about 50 people,
 finding good tools for the IT staff to use is essential. Thanks so
 much for
 the continued feedback.
 
 
Hmm.  I should upload my slides.
 
 
 
 That would be excellent! I was trying not to scare people with things like
 PXE preboot or the challenge of bringing up a farm of 500+ servers when the
 building has just suffered a power outage. I will let your slides do that.
 
 The key things people have to remember are
 -you can't do stuff by hand once you have more than one box; you need to
 have some story for scaling things up. It could be hand creating some
 machine image that is cloned, it could be using CM tools. If you find
 yourself trying to ssh in to boxes to configure them by hand, you are in
 trouble
 
 -once you have enough racks in your cluster, you can abandon any notion of
 100% availability. You have to have be prepared to deal with the failures as
 an everyday event. The worst failures are not the machines that drop off the
 net, its the ones that start misbehaving with memory corruption or a network
 card that starts flooding the network,.
 
 
 
 
 --



Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
In the system I am working, we have 6 million blocks total and the namenode
heap size is about 600 MB and it takes about 5 minutes for namenode to leave
the safemode.

I try to estimate what would be the heap size if we have 100 - 150 million
blocks, and what would be the amount of time for namenode to leave the
safemode.
From the extrapolation based on the numbers I have, I am calculating very
scary numbers for both (Terabytes for heap size) and half an hour or so
startup time. I am hoping that my extrapolation is not accurate.

From your clusters, could you provide some numbers for number of files and
blocks in the system vs. the master heap size and master startup time.

I really appreciate your help.

Thanks.
Cagdas


-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Doug Cutting

Cagdas Gerede wrote:

In the system I am working, we have 6 million blocks total and the namenode
heap size is about 600 MB and it takes about 5 minutes for namenode to leave
the safemode.


How big is are your files?  Are they several blocks on average?  Hadoop 
is not designed for small files, but rather for larger files.  An 
Archive system is currently being designed to help with this.


https://issues.apache.org/jira/browse/HADOOP-3307


I try to estimate what would be the heap size if we have 100 - 150 million
blocks, and what would be the amount of time for namenode to leave the
safemode.


At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this 
means a ~10,000 node system, larger than Hadoop currently supports well 
(for this and other reasons).


If your files are generally large, you can increase your block size to 
250MB to decrease the number of blocks in the system.


Doug



Re: HDFS: Good practices for Number of Blocks per Datanode

2008-05-02 Thread Doug Cutting

Cagdas Gerede wrote:

For a system with 60 million blocks, we can have 3 datanodes with 20 million
blocks each, or we can have 60 datanodes with 1 million blocks each. In
either case, would there be performance implications or would they behave
the same way?


If you're using mapreduce, then you want your computations to run on 
nodes where the data is local.  The most cost-effective way to buy CPUs 
is generally in 2-8 core boxes that hold 2-4 hard drives, and this also 
generally gives good i/o performance.  In theory, boxes with 64 CPUs and 
64 drives each will perform similarly to 16 times as many boxes, each 
with 4 CPUs and 4 drives, but the former is both more expensive, and, 
when a box fails, you take a bigger hit.  Also, with more boxes, you 
generally get more network interfaces and hence more aggregate 
bandwidth, assuming you have a good switch.


Doug


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
Thanks Doug for your answers. Our interest is on more distributed file
system part rather than map reduce.
I must confess that our block size is not as large as how a lot of people
configure. I appreciate if I can get your and others' input.

Do you think these numbers are suitable?

We will have 5 million files each having 20 blocks of 2MB. With the minimum
replication of 3, we would have 300 million blocks.
300 million blocks would store 600TB. At ~10TB/node, this means a 60 node
system.

Do you think these numbers are suitable for Hadoop DFS.

Cagdas



At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this means
 a ~10,000 node system, larger than Hadoop currently supports well (for this
 and other reasons).

 Doug




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Ted Dunning

Why did you pick such a small block size?

Why not go with the default of 64MB?

That would give you only 10 million blocks for your 600TB.

I don't see any advantage to the tiny block size.

On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

 Thanks Doug for your answers. Our interest is on more distributed file
 system part rather than map reduce.
 I must confess that our block size is not as large as how a lot of people
 configure. I appreciate if I can get your and others' input.
 
 Do you think these numbers are suitable?
 
 We will have 5 million files each having 20 blocks of 2MB. With the minimum
 replication of 3, we would have 300 million blocks.
 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node
 system.
 
 Do you think these numbers are suitable for Hadoop DFS.
 
 Cagdas
 
 
 
 At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this means
 a ~10,000 node system, larger than Hadoop currently supports well (for this
 and other reasons).
 
 Doug
 
 
 



Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Doug Cutting

Cagdas Gerede wrote:

We will have 5 million files each having 20 blocks of 2MB. With the minimum
replication of 3, we would have 300 million blocks.
300 million blocks would store 600TB. At ~10TB/node, this means a 60 node
system.

Do you think these numbers are suitable for Hadoop DFS.


Why are you using such small blocks?  A larger block size will decrease 
the strain on Hadoop, but perhaps you have reasons?


Doug


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
fault tolerance. As files are uploaded into our server, we can continuously
write the data in small chunks and if our server fails, we can tolerate this
failure by switching our user to another server and the user can continue to
write. Otherwise we have to wait on the server until we get the whole file
to write it to Hadoop (if server fails then we lose all the data), or we
need the user to cash all the data he is generating which is not feasible
for our requirements.

I appreciate your comment on this.

Cagdas

On Fri, May 2, 2008 at 1:09 PM, Ted Dunning [EMAIL PROTECTED] wrote:


 Why did you pick such a small block size?

 Why not go with the default of 64MB?

 That would give you only 10 million blocks for your 600TB.

 I don't see any advantage to the tiny block size.

 On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

  Thanks Doug for your answers. Our interest is on more distributed file
  system part rather than map reduce.
  I must confess that our block size is not as large as how a lot of
 people
  configure. I appreciate if I can get your and others' input.
 
  Do you think these numbers are suitable?
 
  We will have 5 million files each having 20 blocks of 2MB. With the
 minimum
  replication of 3, we would have 300 million blocks.
  300 million blocks would store 600TB. At ~10TB/node, this means a 60
 node
  system.
 
  Do you think these numbers are suitable for Hadoop DFS.
 
  Cagdas
 
 
 
  At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this
 means
  a ~10,000 node system, larger than Hadoop currently supports well (for
 this
  and other reasons).
 
  Doug
 
 
 




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Ted Dunning

But you could do all this with larger blocks as well.  Having a large block
size only says that a block CAN be that long, not that it MUST be that long.

Also, you said that the average size was ~ 40MB (20 x 2MB blocks).  If that
is so, then you should be able to radically decrease the number of blocks
with a larger block size.


On 5/2/08 1:17 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

 fault tolerance. As files are uploaded into our server, we can continuously
 write the data in small chunks and if our server fails, we can tolerate this
 failure by switching our user to another server and the user can continue to
 write. Otherwise we have to wait on the server until we get the whole file
 to write it to Hadoop (if server fails then we lose all the data), or we
 need the user to cash all the data he is generating which is not feasible
 for our requirements.
 
 I appreciate your comment on this.
 
 Cagdas
 
 On Fri, May 2, 2008 at 1:09 PM, Ted Dunning [EMAIL PROTECTED] wrote:
 
 
 Why did you pick such a small block size?
 
 Why not go with the default of 64MB?
 
 That would give you only 10 million blocks for your 600TB.
 
 I don't see any advantage to the tiny block size.
 
 On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:
 
 Thanks Doug for your answers. Our interest is on more distributed file
 system part rather than map reduce.
 I must confess that our block size is not as large as how a lot of
 people
 configure. I appreciate if I can get your and others' input.
 
 Do you think these numbers are suitable?
 
 We will have 5 million files each having 20 blocks of 2MB. With the
 minimum
 replication of 3, we would have 300 million blocks.
 300 million blocks would store 600TB. At ~10TB/node, this means a 60
 node
 system.
 
 Do you think these numbers are suitable for Hadoop DFS.
 
 Cagdas
 
 
 
 At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this
 means
 a ~10,000 node system, larger than Hadoop currently supports well (for
 this
 and other reasons).
 
 Doug