Re: Recommendation for Beowulf/Apache Setup
Thanks for the tip. I was looking at the all the options and FreeBSD/Xen looks like the best bet as far as resource throttling goes. Install ROCKS on the nodes, install Xen on ROCKS, install FreeBSD as domU and give it domU a lot of priority. I'll give it a shot and publish my findings in the future. But of course, to keep it relevant, OpenBSD will run on the router and will use hoststated http://home.nuug.no/~peter/riga2008/relayd.html. I guess it's been renamed. I haven't paid attention. The book of PF uses hoststated, so I guess it's already kind of obsolete. Thanks, Vivek On Thu, May 7, 2009 at 10:17 PM, James Peltier james_a_pelt...@yahoo.ca wrote: --- On Thu, 5/7/09, Vivek Ayer vivek.a...@gmail.com wrote: From: Vivek Ayer vivek.a...@gmail.com Subject: Recommendation for Beowulf/Apache Setup To: misc misc@openbsd.org Received: Thursday, May 7, 2009, 12:36 PM Hey guys, This is a very general question, but I'm sure not exactly sure how to proceed. I'll be getting a lot of hardware soon to be clustered and I was wondering what was your take on the setup. My setup was going to be: 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the 12 nodes will run apache running LAMP-like services. The router will round-robin using hoststated for load-balancing. hoststated? What is that? I think you mean relayd! ;) However, they will serve an additional task: The master mysql server will be head node for MPI jobs delivered to the 12 nodes. Basically, this setup will double up as a beowulf and web server. Is this efficient? I imagine the MPI jobs won't be running all the time and while they're up, might as well do something. I think you are going to be heading for a world of hurt here. I am the HPC director at a university supporting 3 faculties. Once people begin to use the resource the *will* crash nodes. Having any critical services running on HPC compute nodes is *not advisable*. Firstly, would you recommend BSD or Linux for this. The router is a given to have OpenBSD of course, but what about the others? OS doesn't matter! It's all about the tools. We use GNU/Linux (CentOS 5) for our HPC cluster because there are more tools available natively for it. This is an unfortunate fact. More and more applications out there are becoming GNU/Linux specific and just don't work properly or at all on other OSs. Evaluate your tools and make a decision. AFAIK, Open-MPI, MPICH and MPICH2 compile and run fine on the BSDs. Other tools and libs, well, YMMV. I figured it makes sense to parallelize as much as possible so that the HTTP/MPI load can be shared among as many computers as possible. Let me know your thoughts. Unless you have hard memory and CPU provisioning limiting what the cluster nodes can do, alah XEN/VMWare. Forget about it. Trust me. I've rebooted enough deadlocked/crash nodes due to user error to know better. If you have to... well... NO CARRIER... __ Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca
Re: Recommendation for Beowulf/Apache Setup
Vivek Ayer vivek.a...@gmail.com writes: But of course, to keep it relevant, OpenBSD will run on the router and will use hoststated http://home.nuug.no/~peter/riga2008/relayd.html. I guess it's been renamed. I haven't paid attention. The book of PF uses hoststated, so I guess it's already kind of obsolete. yes, The Book of PF states somewhere in the early parts that it's up to date per OpenBSD 4.2. The hoststated-relayd name change happened in -current about two weeks after the book went to print. - P -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/ Remember to set the evil bit on all malicious network traffic delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
Recommendation for Beowulf/Apache Setup
Hey guys, This is a very general question, but I'm sure not exactly sure how to proceed. I'll be getting a lot of hardware soon to be clustered and I was wondering what was your take on the setup. My setup was going to be: 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the 12 nodes will run apache running LAMP-like services. The router will round-robin using hoststated for load-balancing. However, they will serve an additional task: The master mysql server will be head node for MPI jobs delivered to the 12 nodes. Basically, this setup will double up as a beowulf and web server. Is this efficient? I imagine the MPI jobs won't be running all the time and while they're up, might as well do something. Firstly, would you recommend BSD or Linux for this. The router is a given to have OpenBSD of course, but what about the others? I figured it makes sense to parallelize as much as possible so that the HTTP/MPI load can be shared among as many computers as possible. Let me know your thoughts. Thanks, Vivek
Re: Recommendation for Beowulf/Apache Setup
Hi Vivek- On Thu, May 07, 2009 at 09:36:17AM -0700, Vivek Ayer wrote: 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the 12 nodes will run apache running LAMP-like services. The router will round-robin using hoststated for load-balancing. There are some FreeBSD clusters out there (NCSA has one, IIRC), but they're certainly not as common as Linux. If your users can run on FreeBSD, you might as well use it. If their code is all Linuxy (and lots of cluster and -- even more so -- grid code make silly assumptions like that), you should give them a platform that they can easily use. However, they will serve an additional task: The master mysql server will be head node for MPI jobs delivered to the 12 nodes. Basically, this setup will double up as a beowulf and web server. Is this efficient? I imagine the MPI jobs won't be running all the time and while they're up, might as well do something. This might work. But you're setting yourself up for contention and degraded service to at least one set of users. Do the people who care about perfomance of your LAMP stack mind waiting a bit while MPI jobs chew memory and network bandwidth? Do your MPI users mind if their jobs take longer to complete while your LAMP stuff is getting pounded? With regard to MPI, what sort of interconnects will your execute nodes have? MPI wants lots of bandwidth between nodes and regular gigabit might not cut it (depending on your users' applications). -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | email.willma...@ml1.net | *-[ BSD: Live Free or Die ]*
Re: Recommendation for Beowulf/Apache Setup
I was going to start small given the budget I have. Eventually, I'd like dedicate a gigabit switch for HTTP traffic and Infiniband for compute traffic. At first, I don't expect too much MPI work to be done, but I've heard FreeBSD performing better under duress than linux as the number of HTTP threads increases. Knowing that beowulf stuff is done better on linux another option would be to run FreeBSD inside of Xen for HTTP, while Linux does computing. How good is FreeBSD for clustering? I'm not really familiar with FreeBSD for that use so much and there isn't a lot of documentation for FreeBSD for building beowulfs. The final option would be to divide and conquer: 6 for HTTP, 6 for computing, but my reasoning is why not scale for HTTP as much as possible. In this setup, HTTP would be primary deal, which was why I went to FreeBSD first. Does OpenMPI or MPICH2 run well under FreeBSD? I got a build working on OpenBSD/sparc64, but haven't really done much with it yet. Thanks for the help, Vivek On Thu, May 7, 2009 at 9:55 AM, Will Maier willma...@ml1.net wrote: Hi Vivek- On Thu, May 07, 2009 at 09:36:17AM -0700, Vivek Ayer wrote: 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the 12 nodes will run apache running LAMP-like services. The router will round-robin using hoststated for load-balancing. There are some FreeBSD clusters out there (NCSA has one, IIRC), but they're certainly not as common as Linux. If your users can run on FreeBSD, you might as well use it. If their code is all Linuxy (and lots of cluster and -- even more so -- grid code make silly assumptions like that), you should give them a platform that they can easily use. However, they will serve an additional task: The master mysql server will be head node for MPI jobs delivered to the 12 nodes. Basically, this setup will double up as a beowulf and web server. Is this efficient? I imagine the MPI jobs won't be running all the time and while they're up, might as well do something. This might work. But you're setting yourself up for contention and degraded service to at least one set of users. Do the people who care about perfomance of your LAMP stack mind waiting a bit while MPI jobs chew memory and network bandwidth? Do your MPI users mind if their jobs take longer to complete while your LAMP stuff is getting pounded? With regard to MPI, what sort of interconnects will your execute nodes have? MPI wants lots of bandwidth between nodes and regular gigabit might not cut it (depending on your users' applications). -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | email.willma...@ml1.net | *-[ BSD: Live Free or Die ]*
[OT] Re: Recommendation for Beowulf/Apache Setup
Hi Vivek- This has gone decidedly off topic... On Thu, May 07, 2009 at 12:05:35PM -0700, Vivek Ayer wrote: I was going to start small given the budget I have. Eventually, I'd like dedicate a gigabit switch for HTTP traffic and Infiniband for compute traffic. At first, I don't expect too much MPI work to be done, but I've heard FreeBSD performing better under duress than linux as the number of HTTP threads increases. [...] The final option would be to divide and conquer: 6 for HTTP, 6 for computing, but my reasoning is why not scale for HTTP as much as possible. This is really the only reasonable approach. No one would run a production web service on top of a parallel computing cluster unless they had to. Remember that your execute nodes will run random jobs from random users -- do you want that on a box that hosts a critical database or webserver? The scenario is worse if you participate on a grid. As always, use the best tool for the job. As you've noticed, OpenBSD will do well managing your network. Frankly, in most cases it also makes for an excellent database or webserver. As for the execute nodes, run Linux on them unless you have some reason (user requirements, demonstrated performance gains, etc) to do otherwise. -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | email.willma...@ml1.net | *-[ BSD: Live Free or Die ]*
Re: [OT] Re: Recommendation for Beowulf/Apache Setup
OpenBSD does a good job with web serving. I have two Sun Blades that run openbsd/sparc64. But do you really think it matches up with FreeBSD? I know my router will be openbsd (that's a given), but I'm sure how well OpenBSD performs under many threads. I guess it comes down to how much RAM you have in the end, right? Vivek On Thu, May 7, 2009 at 12:28 PM, Will Maier willma...@ml1.net wrote: Hi Vivek- This has gone decidedly off topic... On Thu, May 07, 2009 at 12:05:35PM -0700, Vivek Ayer wrote: I was going to start small given the budget I have. Eventually, I'd like dedicate a gigabit switch for HTTP traffic and Infiniband for compute traffic. At first, I don't expect too much MPI work to be done, but I've heard FreeBSD performing better under duress than linux as the number of HTTP threads increases. [...] The final option would be to divide and conquer: 6 for HTTP, 6 for computing, but my reasoning is why not scale for HTTP as much as possible. This is really the only reasonable approach. No one would run a production web service on top of a parallel computing cluster unless they had to. Remember that your execute nodes will run random jobs from random users -- do you want that on a box that hosts a critical database or webserver? The scenario is worse if you participate on a grid. As always, use the best tool for the job. As you've noticed, OpenBSD will do well managing your network. Frankly, in most cases it also makes for an excellent database or webserver. As for the execute nodes, run Linux on them unless you have some reason (user requirements, demonstrated performance gains, etc) to do otherwise. -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | email.willma...@ml1.net | *-[ BSD: Live Free or Die ]*
Re: Recommendation for Beowulf/Apache Setup
--- On Thu, 5/7/09, Vivek Ayer vivek.a...@gmail.com wrote: From: Vivek Ayer vivek.a...@gmail.com Subject: Recommendation for Beowulf/Apache Setup To: misc misc@openbsd.org Received: Thursday, May 7, 2009, 12:36 PM Hey guys, This is a very general question, but I'm sure not exactly sure how to proceed. I'll be getting a lot of hardware soon to be clustered and I was wondering what was your take on the setup. My setup was going to be: 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the 12 nodes will run apache running LAMP-like services. The router will round-robin using hoststated for load-balancing. hoststated? What is that? I think you mean relayd! ;) However, they will serve an additional task: The master mysql server will be head node for MPI jobs delivered to the 12 nodes. Basically, this setup will double up as a beowulf and web server. Is this efficient? I imagine the MPI jobs won't be running all the time and while they're up, might as well do something. I think you are going to be heading for a world of hurt here. I am the HPC director at a university supporting 3 faculties. Once people begin to use the resource the *will* crash nodes. Having any critical services running on HPC compute nodes is *not advisable*. Firstly, would you recommend BSD or Linux for this. The router is a given to have OpenBSD of course, but what about the others? OS doesn't matter! It's all about the tools. We use GNU/Linux (CentOS 5) for our HPC cluster because there are more tools available natively for it. This is an unfortunate fact. More and more applications out there are becoming GNU/Linux specific and just don't work properly or at all on other OSs. Evaluate your tools and make a decision. AFAIK, Open-MPI, MPICH and MPICH2 compile and run fine on the BSDs. Other tools and libs, well, YMMV. I figured it makes sense to parallelize as much as possible so that the HTTP/MPI load can be shared among as many computers as possible. Let me know your thoughts. Unless you have hard memory and CPU provisioning limiting what the cluster nodes can do, alah XEN/VMWare. Forget about it. Trust me. I've rebooted enough deadlocked/crash nodes due to user error to know better. If you have to... well... NO CARRIER...