Re: Recommendation for Beowulf/Apache Setup

2009-05-08 Thread Vivek Ayer
Thanks for the tip. I was looking at the all the options and
FreeBSD/Xen looks like the best bet as far as resource throttling
goes.

Install ROCKS on the nodes, install Xen on ROCKS, install FreeBSD as
domU and give it domU a lot of priority. I'll give it a shot and
publish my findings in the future.

But of course, to keep it relevant, OpenBSD will run on the router and
will use hoststated http://home.nuug.no/~peter/riga2008/relayd.html. I
guess it's been renamed. I haven't paid attention. The book of PF uses
hoststated, so I guess it's already kind of obsolete.

Thanks,
Vivek



On Thu, May 7, 2009 at 10:17 PM, James Peltier james_a_pelt...@yahoo.ca
wrote:

 --- On Thu, 5/7/09, Vivek Ayer vivek.a...@gmail.com wrote:

 From: Vivek Ayer vivek.a...@gmail.com
 Subject: Recommendation for Beowulf/Apache Setup
 To: misc misc@openbsd.org
 Received: Thursday, May 7, 2009, 12:36 PM
 Hey guys,

 This is a very general question, but I'm sure not exactly
 sure how to
 proceed. I'll be getting a lot of hardware soon to be
 clustered and I
 was wondering what was your take on the setup.

 My setup was going to be:

 1 OpenBSD Router running 4.5 routing to a subnet of 13
 nodes running
 FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql
 server and the
 12 nodes will run apache running LAMP-like services. The
 router will
 round-robin using hoststated for load-balancing.

 hoststated? What is that?  I think you mean relayd! ;)

 However, they will serve an additional task: The master
 mysql server
 will be head node for MPI jobs delivered to the 12 nodes.
 Basically,
 this setup will double up as a beowulf and web server. Is
 this
 efficient? I imagine the MPI jobs won't be running all the
 time and
 while they're up, might as well do something.

 I think you are going to be heading for a world of hurt here.  I am the HPC
director at a university supporting 3 faculties.  Once people begin to use the
resource the *will* crash nodes.  Having any critical services running on HPC
compute nodes is *not advisable*.

 Firstly, would you recommend BSD or Linux for this. The
 router is a
 given to have OpenBSD of course, but what about the
 others?

 OS doesn't matter!  It's all about the tools.  We use GNU/Linux (CentOS 5)
for our HPC cluster because there are more tools available natively for it.
This is an unfortunate fact.  More and more applications out there are
becoming GNU/Linux specific and just don't work properly or at all on other
OSs.  Evaluate your tools and make a decision.  AFAIK, Open-MPI, MPICH and
MPICH2 compile and run fine on the BSDs.  Other tools and libs, well, YMMV.

 I figured it makes sense to parallelize as much as possible
 so that
 the HTTP/MPI load can be shared among as many computers as
 possible.
 Let me know your thoughts.

 Unless you have hard memory and CPU provisioning limiting what the cluster
nodes can do, alah XEN/VMWare.  Forget about it.  Trust me.  I've rebooted
enough deadlocked/crash nodes due to user error to know better. If you have
to... well... NO CARRIER...


  __
 Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail.  Click on Options in Mail and switch to New
Mail today or register for free at http://mail.yahoo.ca



Re: Recommendation for Beowulf/Apache Setup

2009-05-08 Thread Peter N. M. Hansteen
Vivek Ayer vivek.a...@gmail.com writes:

 But of course, to keep it relevant, OpenBSD will run on the router and
 will use hoststated http://home.nuug.no/~peter/riga2008/relayd.html. I
 guess it's been renamed. I haven't paid attention. The book of PF uses
 hoststated, so I guess it's already kind of obsolete.

yes, The Book of PF states somewhere in the early parts that it's up
to date per OpenBSD 4.2.  The hoststated-relayd name change happened
in -current about two weeks after the book went to print.

- P
-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/
Remember to set the evil bit on all malicious network traffic
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.



Recommendation for Beowulf/Apache Setup

2009-05-07 Thread Vivek Ayer
Hey guys,

This is a very general question, but I'm sure not exactly sure how to
proceed. I'll be getting a lot of hardware soon to be clustered and I
was wondering what was your take on the setup.

My setup was going to be:

1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running
FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the
12 nodes will run apache running LAMP-like services. The router will
round-robin using hoststated for load-balancing.

However, they will serve an additional task: The master mysql server
will be head node for MPI jobs delivered to the 12 nodes. Basically,
this setup will double up as a beowulf and web server. Is this
efficient? I imagine the MPI jobs won't be running all the time and
while they're up, might as well do something.

Firstly, would you recommend BSD or Linux for this. The router is a
given to have OpenBSD of course, but what about the others?

I figured it makes sense to parallelize as much as possible so that
the HTTP/MPI load can be shared among as many computers as possible.
Let me know your thoughts.

Thanks,
Vivek



Re: Recommendation for Beowulf/Apache Setup

2009-05-07 Thread Will Maier
Hi Vivek-

On Thu, May 07, 2009 at 09:36:17AM -0700, Vivek Ayer wrote:
 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running
 FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the
 12 nodes will run apache running LAMP-like services. The router will
 round-robin using hoststated for load-balancing.

There are some FreeBSD clusters out there (NCSA has one, IIRC), but
they're certainly not as common as Linux. If your users can run on
FreeBSD, you might as well use it. If their code is all Linuxy (and lots
of cluster and -- even more so -- grid code make silly assumptions like
that), you should give them a platform that they can easily use.

 However, they will serve an additional task: The master mysql server
 will be head node for MPI jobs delivered to the 12 nodes. Basically,
 this setup will double up as a beowulf and web server. Is this
 efficient? I imagine the MPI jobs won't be running all the time and
 while they're up, might as well do something.

This might work. But you're setting yourself up for contention and
degraded service to at least one set of users. Do the people who care
about perfomance of your LAMP stack mind waiting a bit while MPI jobs
chew memory and network bandwidth? Do your MPI users mind if their jobs
take longer to complete while your LAMP stuff is getting pounded?

With regard to MPI, what sort of interconnects will your execute nodes
have? MPI wants lots of bandwidth between nodes and regular gigabit
might not cut it (depending on your users' applications).

-- 

o--{ Will Maier }--o
| web:...http://www.lfod.us/ | email.willma...@ml1.net |
*-[ BSD: Live Free or Die ]*



Re: Recommendation for Beowulf/Apache Setup

2009-05-07 Thread Vivek Ayer
I was going to start small given the budget I have. Eventually, I'd
like dedicate a gigabit switch for HTTP traffic and Infiniband for
compute traffic. At first, I don't expect too much MPI work to be
done, but I've heard FreeBSD performing better under duress than linux
as the number of HTTP threads increases.

Knowing that beowulf stuff is done better on linux another option
would be to run FreeBSD inside of Xen for HTTP, while Linux does
computing.

How good is FreeBSD for clustering? I'm not really familiar with
FreeBSD for that use so much and there isn't a lot of documentation
for FreeBSD for building beowulfs.

The final option would be to divide and conquer: 6 for HTTP, 6 for
computing, but my reasoning is why not scale for HTTP as much as
possible.

In this setup, HTTP would be primary deal, which was why I went to
FreeBSD first. Does OpenMPI or MPICH2 run well under FreeBSD? I got a
build working on OpenBSD/sparc64, but haven't really done much with it
yet.

Thanks for the help,
Vivek

On Thu, May 7, 2009 at 9:55 AM, Will Maier willma...@ml1.net wrote:
 Hi Vivek-

 On Thu, May 07, 2009 at 09:36:17AM -0700, Vivek Ayer wrote:
 1 OpenBSD Router running 4.5 routing to a subnet of 13 nodes running
 FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql server and the
 12 nodes will run apache running LAMP-like services. The router will
 round-robin using hoststated for load-balancing.

 There are some FreeBSD clusters out there (NCSA has one, IIRC), but
 they're certainly not as common as Linux. If your users can run on
 FreeBSD, you might as well use it. If their code is all Linuxy (and lots
 of cluster and -- even more so -- grid code make silly assumptions like
 that), you should give them a platform that they can easily use.

 However, they will serve an additional task: The master mysql server
 will be head node for MPI jobs delivered to the 12 nodes. Basically,
 this setup will double up as a beowulf and web server. Is this
 efficient? I imagine the MPI jobs won't be running all the time and
 while they're up, might as well do something.

 This might work. But you're setting yourself up for contention and
 degraded service to at least one set of users. Do the people who care
 about perfomance of your LAMP stack mind waiting a bit while MPI jobs
 chew memory and network bandwidth? Do your MPI users mind if their jobs
 take longer to complete while your LAMP stuff is getting pounded?

 With regard to MPI, what sort of interconnects will your execute nodes
 have? MPI wants lots of bandwidth between nodes and regular gigabit
 might not cut it (depending on your users' applications).

 --

 o--{ Will Maier }--o
 | web:...http://www.lfod.us/ | email.willma...@ml1.net |
 *-[ BSD: Live Free or Die ]*



[OT] Re: Recommendation for Beowulf/Apache Setup

2009-05-07 Thread Will Maier
Hi Vivek-

This has gone decidedly off topic...

On Thu, May 07, 2009 at 12:05:35PM -0700, Vivek Ayer wrote:
 I was going to start small given the budget I have. Eventually, I'd
 like dedicate a gigabit switch for HTTP traffic and Infiniband for
 compute traffic. At first, I don't expect too much MPI work to be
 done, but I've heard FreeBSD performing better under duress than linux
 as the number of HTTP threads increases.
[...]
 The final option would be to divide and conquer: 6 for HTTP, 6 for
 computing, but my reasoning is why not scale for HTTP as much as
 possible.

This is really the only reasonable approach. No one would run a
production web service on top of a parallel computing cluster unless
they had to. Remember that your execute nodes will run random jobs from
random users -- do you want that on a box that hosts a critical database
or webserver? The scenario is worse if you participate on a grid.

As always, use the best tool for the job. As you've noticed, OpenBSD
will do well managing your network. Frankly, in most cases it also
makes for an excellent database or webserver. As for the execute nodes,
run Linux on them unless you have some reason (user requirements,
demonstrated performance gains, etc) to do otherwise.

-- 

o--{ Will Maier }--o
| web:...http://www.lfod.us/ | email.willma...@ml1.net |
*-[ BSD: Live Free or Die ]*



Re: [OT] Re: Recommendation for Beowulf/Apache Setup

2009-05-07 Thread Vivek Ayer
OpenBSD does a good job with web serving. I have two Sun Blades that
run openbsd/sparc64. But do you really think it matches up with
FreeBSD? I know my router will be openbsd (that's a given), but I'm
sure how well OpenBSD performs under many threads. I guess it comes
down to how much RAM you have in the end, right?

Vivek

On Thu, May 7, 2009 at 12:28 PM, Will Maier willma...@ml1.net wrote:
 Hi Vivek-

 This has gone decidedly off topic...

 On Thu, May 07, 2009 at 12:05:35PM -0700, Vivek Ayer wrote:
 I was going to start small given the budget I have. Eventually, I'd
 like dedicate a gigabit switch for HTTP traffic and Infiniband for
 compute traffic. At first, I don't expect too much MPI work to be
 done, but I've heard FreeBSD performing better under duress than linux
 as the number of HTTP threads increases.
 [...]
 The final option would be to divide and conquer: 6 for HTTP, 6 for
 computing, but my reasoning is why not scale for HTTP as much as
 possible.

 This is really the only reasonable approach. No one would run a
 production web service on top of a parallel computing cluster unless
 they had to. Remember that your execute nodes will run random jobs from
 random users -- do you want that on a box that hosts a critical database
 or webserver? The scenario is worse if you participate on a grid.

 As always, use the best tool for the job. As you've noticed, OpenBSD
 will do well managing your network. Frankly, in most cases it also
 makes for an excellent database or webserver. As for the execute nodes,
 run Linux on them unless you have some reason (user requirements,
 demonstrated performance gains, etc) to do otherwise.

 --

 o--{ Will Maier }--o
 | web:...http://www.lfod.us/ | email.willma...@ml1.net |
 *-[ BSD: Live Free or Die ]*



Re: Recommendation for Beowulf/Apache Setup

2009-05-07 Thread James Peltier
--- On Thu, 5/7/09, Vivek Ayer vivek.a...@gmail.com wrote:

 From: Vivek Ayer vivek.a...@gmail.com
 Subject: Recommendation for Beowulf/Apache Setup
 To: misc misc@openbsd.org
 Received: Thursday, May 7, 2009, 12:36 PM
 Hey guys,
 
 This is a very general question, but I'm sure not exactly
 sure how to
 proceed. I'll be getting a lot of hardware soon to be
 clustered and I
 was wondering what was your take on the setup.
 
 My setup was going to be:
 
 1 OpenBSD Router running 4.5 routing to a subnet of 13
 nodes running
 FreeBSD 7.2. Of the 13 nodes, 1 node is a master mysql
 server and the
 12 nodes will run apache running LAMP-like services. The
 router will
 round-robin using hoststated for load-balancing.

hoststated? What is that?  I think you mean relayd! ;)
 
 However, they will serve an additional task: The master
 mysql server
 will be head node for MPI jobs delivered to the 12 nodes.
 Basically,
 this setup will double up as a beowulf and web server. Is
 this
 efficient? I imagine the MPI jobs won't be running all the
 time and
 while they're up, might as well do something.

I think you are going to be heading for a world of hurt here.  I am the HPC 
director at a university supporting 3 faculties.  Once people begin to use the 
resource the *will* crash nodes.  Having any critical services running on HPC 
compute nodes is *not advisable*.

 Firstly, would you recommend BSD or Linux for this. The
 router is a
 given to have OpenBSD of course, but what about the
 others?

OS doesn't matter!  It's all about the tools.  We use GNU/Linux (CentOS 5) for 
our HPC cluster because there are more tools available natively for it.  This 
is an unfortunate fact.  More and more applications out there are becoming 
GNU/Linux specific and just don't work properly or at all on other OSs.  
Evaluate your tools and make a decision.  AFAIK, Open-MPI, MPICH and MPICH2 
compile and run fine on the BSDs.  Other tools and libs, well, YMMV.

 I figured it makes sense to parallelize as much as possible
 so that
 the HTTP/MPI load can be shared among as many computers as
 possible.
 Let me know your thoughts.

Unless you have hard memory and CPU provisioning limiting what the cluster 
nodes can do, alah XEN/VMWare.  Forget about it.  Trust me.  I've rebooted 
enough deadlocked/crash nodes due to user error to know better. If you have 
to... well... NO CARRIER...