> > I have been asked to recommend a small compute cluster for a small > software firm (they support the R statistics language) and am hoping > that some of the experts on these lists can help me with ideas and let > me know of pitfalls to avoid. The basic requirements are: > > 1. 5 nodes. Would like to be have option to expand number of nodes if > not too costly
It'll really depend on what they consider... Costly. 5 nodes is not a terribly large number of systems. Any maintenance will likely impact the cluster heavily. Do they have uptime SLA's? > 2. each node needs 500GB local disk space, 4 cores, 2.5GHz or better, > 8GB ram. That's fairly easy to find these days. They should identify if the 500 gig for scratch space only, or if it's for OS & application installs also. This will lead into the shared storage area... and the uptime SLA's. With 5 servers, any problem will likely cause an outage. How much do they want to pay to prevent outages? Spare power supplies, etc can get costly, and might not be worth it when you have so few machines, especially if the job runs are shorter. (This brings to mind the nice bits of statistics: if you've got a hardware fault every... 1 in 5000 days, you've got a very small chance of it hitting with 1 machine. with 5, you're pretty safe w/ 1 fault every 1000 days. with 500 - it's every 5 days. So, how much you pay for redundancy in a small environment is worth thinking about.) > 3. need to have cluster be dual boot between windows HPC and linux RHEL5. How much automation do they expect? If they're not sure of which OS they'll be running, I'm guessing they haven't settled on things, and are likely to do be doing some things by hand, like switching OS's. I also suspect they won't be switching them all that often between OS's. Multi OS will make shared storage interesting. SAN can be very expensive, NAS can be NFS or CIFS, but slower. iSCSI might be a good option in a small environment, w/ a dedicated NIC, since you'll be able to get a decently sized switch & use VLAN's, if you want. (Keep track of the backplane speed, though. Make sure it can deal w/ the traffic you expect to through at it.) > 4. a controlling node that can be switched between linux and windows to > control the cluster. Perhaps run a hypervisor on this and have both > controllers running at the same time? Not a bad option. As for power: you should be able to go under 2 amps a node. How much this matters depends on # of nodes, and if you're hosting it in the office DC or a Colo. Probably going to be local, from the size. We try to squeeze out every drop of efficiency, and have enjoyed using SGI / Rackable for that. As someone else mentioned, Silicon Mechanics is good people, and we've worked w/ them happily over the years. They're also offered a LOPSA discount for members over the years. It doesn't sound like this will be a huge issue in this case, but i believe that for larger clusters, it's good practice to have a separate mgmt. IP / NIC, You really have to know your workload, though, and it doesn't sound like they are as worried about getting every dribble of performance our of the systems as might be necessary from a much larger cluster. In a short sentence: Keep it sane. There are going to be missteps. Don't worry about them. Don't go crazy spending money on things you may not ever need, but get enough to keep things flexible and adapt. Matthew _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
