Re: Hadoop Cluster Administration Tools?
Allen Wittenauer wrote: On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote: *Very* cool information. As someone who's leading the transition to open-source and cluster-orientation at a company of about 50 people, finding good tools for the IT staff to use is essential. Thanks so much for the continued feedback. Hmm. I should upload my slides. That would be excellent! I was trying not to scare people with things like PXE preboot or the challenge of bringing up a farm of 500+ servers when the building has just suffered a power outage. I will let your slides do that. The key things people have to remember are -you can't do stuff by hand once you have more than one box; you need to have some story for scaling things up. It could be hand creating some machine image that is cloned, it could be using CM tools. If you find yourself trying to ssh in to boxes to configure them by hand, you are in trouble -once you have enough racks in your cluster, you can abandon any notion of 100% availability. You have to have be prepared to deal with the failures as an everyday event. The worst failures are not the machines that drop off the net, its the ones that start misbehaving with memory corruption or a network card that starts flooding the network,.
Re: Hadoop Cluster Administration Tools?
useful information indeed, though a bit complicated for my level I must say I think it is more than useful to post these online, say maybe in Hadoop's wiki or as an article on cluster resource sites.. How about it? I can volunteer for this if you wish, a central information place on the hadoop wiki for pre-install clusters admin? - OS image install - ssh setup - dsh ant tools setup - rpm automation - this.next( ? ) 2008/5/2 Steve Loughran [EMAIL PROTECTED]: Allen Wittenauer wrote: On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote: *Very* cool information. As someone who's leading the transition to open-source and cluster-orientation at a company of about 50 people, finding good tools for the IT staff to use is essential. Thanks so much for the continued feedback. Hmm. I should upload my slides. That would be excellent! I was trying not to scare people with things like PXE preboot or the challenge of bringing up a farm of 500+ servers when the building has just suffered a power outage. I will let your slides do that. The key things people have to remember are -you can't do stuff by hand once you have more than one box; you need to have some story for scaling things up. It could be hand creating some machine image that is cloned, it could be using CM tools. If you find yourself trying to ssh in to boxes to configure them by hand, you are in trouble -once you have enough racks in your cluster, you can abandon any notion of 100% availability. You have to have be prepared to deal with the failures as an everyday event. The worst failures are not the machines that drop off the net, its the ones that start misbehaving with memory corruption or a network card that starts flooding the network,. --
Re: JobConf: How to pass List/Map
It is exactly what DefaultStringifier does, ugly but useful *smile*. Jason Venner wrote: We have been serializing to a bytearrayoutput stream then base64 encoding the underlying byte array and passing that string in the conf. It is ugly but it works well until 0.17 Enis Soztutar wrote: Yes Stringifier was committed in 0.17. What you can do in 0.16 is to simulate DefaultStringifier. The key feature of the Stringifier is that it can convert/restore any object to string using base64 encoding on the binary form of the object. If your objects can be easily converted to and from strings, then you can directly store them in conf. The other obvious alternative would be to switch to 0.17, once it is out. Tarandeep Singh wrote: On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar [EMAIL PROTECTED] wrote: Hi, There are many ways which you can pass objects using configuration. Possibly the easiest way would be to use Stringifier interface. you can for example : DefaultStringifier.store(conf, variable ,mykey); variable = DefaultStringifier.load(conf, mykey, variableClass ); thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 0.17 version - https://issues.apache.org/jira/browse/HADOOP-3048 Any thoughts on how to do this in 0.16 version ? thanks, Taran you should take into account that the variable you pass to configuration should be serializable by the framework. That means it must implement Writable of Serializable interfaces. In your particular case, you might want to look at ArrayWritable and MapWritable classes. That said, you should however not pass large objects via configuration, since it can seriously effect job overhead. If the data you want to pass is large, then you should use other alternatives(such as DistributedCache, HDFS, etc). Tarandeep Singh wrote: Hi, How can I set a list or map to JobConf that I can access in Mapper/Reducer class ? The get/setObject method from Configuration has been deprecated and the documentation says - A side map of Configuration to Object should be used instead. I could not follow this :( Can someone please explain to me how to do this ? Thanks, Taran
Re: using sge, or drmaa for HOD
On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote: Also I was thinking of modifying HOD to run on grid engine. I haven't really begun to pour over all the code for HOD but, my question is this, can I just write a python module similar to that of torque.py under hod/schedulers/ for sge or would this require significant modification in HOD and possibly hadoop? Given that both torque and SGE are based off of (IEEE standard) PBS, it might even run unmodified.
Re: using sge, or drmaa for HOD
Yeah they are very similar, the commands are the same , but some options are different. Allen Wittenauer wrote: On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote: Also I was thinking of modifying HOD to run on grid engine. I haven't really begun to pour over all the code for HOD but, my question is this, can I just write a python module similar to that of torque.py under hod/schedulers/ for sge or would this require significant modification in HOD and possibly hadoop? Given that both torque and SGE are based off of (IEEE standard) PBS, it might even run unmodified.
Re: Hadoop Cluster Administration Tools?
Great idea. Go for it. Pick a place and start writing. It is a wiki so if you start it, others will comment on it. On 5/2/08 5:28 AM, Khalil Honsali [EMAIL PROTECTED] wrote: useful information indeed, though a bit complicated for my level I must say I think it is more than useful to post these online, say maybe in Hadoop's wiki or as an article on cluster resource sites.. How about it? I can volunteer for this if you wish, a central information place on the hadoop wiki for pre-install clusters admin? - OS image install - ssh setup - dsh ant tools setup - rpm automation - this.next( ? ) 2008/5/2 Steve Loughran [EMAIL PROTECTED]: Allen Wittenauer wrote: On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote: *Very* cool information. As someone who's leading the transition to open-source and cluster-orientation at a company of about 50 people, finding good tools for the IT staff to use is essential. Thanks so much for the continued feedback. Hmm. I should upload my slides. That would be excellent! I was trying not to scare people with things like PXE preboot or the challenge of bringing up a farm of 500+ servers when the building has just suffered a power outage. I will let your slides do that. The key things people have to remember are -you can't do stuff by hand once you have more than one box; you need to have some story for scaling things up. It could be hand creating some machine image that is cloned, it could be using CM tools. If you find yourself trying to ssh in to boxes to configure them by hand, you are in trouble -once you have enough racks in your cluster, you can abandon any notion of 100% availability. You have to have be prepared to deal with the failures as an everyday event. The worst failures are not the machines that drop off the net, its the ones that start misbehaving with memory corruption or a network card that starts flooding the network,. --
Master Heap Size and Master Startup Time vs. Number of Blocks
In the system I am working, we have 6 million blocks total and the namenode heap size is about 600 MB and it takes about 5 minutes for namenode to leave the safemode. I try to estimate what would be the heap size if we have 100 - 150 million blocks, and what would be the amount of time for namenode to leave the safemode. From the extrapolation based on the numbers I have, I am calculating very scary numbers for both (Terabytes for heap size) and half an hour or so startup time. I am hoping that my extrapolation is not accurate. From your clusters, could you provide some numbers for number of files and blocks in the system vs. the master heap size and master startup time. I really appreciate your help. Thanks. Cagdas -- Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
Cagdas Gerede wrote: In the system I am working, we have 6 million blocks total and the namenode heap size is about 600 MB and it takes about 5 minutes for namenode to leave the safemode. How big is are your files? Are they several blocks on average? Hadoop is not designed for small files, but rather for larger files. An Archive system is currently being designed to help with this. https://issues.apache.org/jira/browse/HADOOP-3307 I try to estimate what would be the heap size if we have 100 - 150 million blocks, and what would be the amount of time for namenode to leave the safemode. At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means a ~10,000 node system, larger than Hadoop currently supports well (for this and other reasons). If your files are generally large, you can increase your block size to 250MB to decrease the number of blocks in the system. Doug
Re: HDFS: Good practices for Number of Blocks per Datanode
Cagdas Gerede wrote: For a system with 60 million blocks, we can have 3 datanodes with 20 million blocks each, or we can have 60 datanodes with 1 million blocks each. In either case, would there be performance implications or would they behave the same way? If you're using mapreduce, then you want your computations to run on nodes where the data is local. The most cost-effective way to buy CPUs is generally in 2-8 core boxes that hold 2-4 hard drives, and this also generally gives good i/o performance. In theory, boxes with 64 CPUs and 64 drives each will perform similarly to 16 times as many boxes, each with 4 CPUs and 4 drives, but the former is both more expensive, and, when a box fails, you take a bigger hit. Also, with more boxes, you generally get more network interfaces and hence more aggregate bandwidth, assuming you have a good switch. Doug
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input. Do you think these numbers are suitable? We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Cagdas At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means a ~10,000 node system, larger than Hadoop currently supports well (for this and other reasons). Doug -- Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
Why did you pick such a small block size? Why not go with the default of 64MB? That would give you only 10 million blocks for your 600TB. I don't see any advantage to the tiny block size. On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input. Do you think these numbers are suitable? We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Cagdas At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means a ~10,000 node system, larger than Hadoop currently supports well (for this and other reasons). Doug
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
Cagdas Gerede wrote: We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Why are you using such small blocks? A larger block size will decrease the strain on Hadoop, but perhaps you have reasons? Doug
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
fault tolerance. As files are uploaded into our server, we can continuously write the data in small chunks and if our server fails, we can tolerate this failure by switching our user to another server and the user can continue to write. Otherwise we have to wait on the server until we get the whole file to write it to Hadoop (if server fails then we lose all the data), or we need the user to cash all the data he is generating which is not feasible for our requirements. I appreciate your comment on this. Cagdas On Fri, May 2, 2008 at 1:09 PM, Ted Dunning [EMAIL PROTECTED] wrote: Why did you pick such a small block size? Why not go with the default of 64MB? That would give you only 10 million blocks for your 600TB. I don't see any advantage to the tiny block size. On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input. Do you think these numbers are suitable? We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Cagdas At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means a ~10,000 node system, larger than Hadoop currently supports well (for this and other reasons). Doug -- Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info
Re: Master Heap Size and Master Startup Time vs. Number of Blocks
But you could do all this with larger blocks as well. Having a large block size only says that a block CAN be that long, not that it MUST be that long. Also, you said that the average size was ~ 40MB (20 x 2MB blocks). If that is so, then you should be able to radically decrease the number of blocks with a larger block size. On 5/2/08 1:17 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: fault tolerance. As files are uploaded into our server, we can continuously write the data in small chunks and if our server fails, we can tolerate this failure by switching our user to another server and the user can continue to write. Otherwise we have to wait on the server until we get the whole file to write it to Hadoop (if server fails then we lose all the data), or we need the user to cash all the data he is generating which is not feasible for our requirements. I appreciate your comment on this. Cagdas On Fri, May 2, 2008 at 1:09 PM, Ted Dunning [EMAIL PROTECTED] wrote: Why did you pick such a small block size? Why not go with the default of 64MB? That would give you only 10 million blocks for your 600TB. I don't see any advantage to the tiny block size. On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input. Do you think these numbers are suitable? We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Cagdas At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means a ~10,000 node system, larger than Hadoop currently supports well (for this and other reasons). Doug