Hi Gerald, First of all thanks for your time. Sorry but I can not just give too many info regarding the setup because of confidentiality issue. Regarding redhat support, the company has it and we already open a ticket to them, Actually they are looking into the issue as press time.
I get your point and I will look into your sugestions. As I mentioned on my previous email, I am just looking opinions. :) Thanks again. On 10/14/07, Gerald Timothy Quimpo <[EMAIL PROTECTED]> wrote: > > On Sun, 2007-10-14 at 07:24 +0800, Michael Calizo wrote: > > Its a HIGH Performance Computing (HPC). To elaborate the setup, we > > have a thinanywhere setup for user to use and to submit jobs to those > > computing nodes. The management node will then push the those jobs to > > available computing nodes. > > I don't know much about that end of the linux performance spectrum, > but I suspect you'll just have to provide more information. there > is NO information in your posts that can help anyone help you. > > e.g., are you running something like openMosix or openSSI? or is it > similar to beowulf? just what kind of HPC are you running? what > distro is running? what kernel? how are tasks allocated? how does > each node know to run something? does it poll a central location and > pull jobs its supposed to run and run them? or does a central server > directly send commands to it to run (e.g. similar to > ssh [run-some-program-on-your-cpu]). are these jobs IO-bound or > cpu bound? it's probably a mix, what is the mix? is disk shared among > multiple servers? if yes, how is it shared? > > > Cron job is not an option because we can not just > > kill/restart/powercycle those jobs/server on the compute node without > > informing the job owner. > > do the servers ever die by themselves? e.g., OOM. i would think that > it's always possible to kill a node at least via OOM. or do you have > strong ulimit settings so that it's never possible to kill a node via > OOM or some similar denial of service? > > > What I want is an opinion if it is safe to say that upgrade is needed > > for those low-end computer node. This is actually a matter of how to > > defend my case to the boss :) > > if your statistics are pretty good (e.g., we get N node failures a > month, and of those N, 99% are on low-end nodes, or we've had M node > failures in 3 years and of those 99% were on low-end nodes) then you > can certainly safely show those stats to your boss. even 80% is > probably high enough that he'll agree to upgrade all the low end > boxes. if you're lower than around 80% (or, pick a number, any > number higher than 50%) then you'll need to actually understand > why those low end nodes are failing rather than making blanket > statements that your statistics don't support. > > is there some big-vendor you can push that question to? if you're > describing your setup accurately (and not just giving us big numbers > so our eyes will grow big too), then i'm sure you've got some sort > of expensive support agreement. Maybe, if your vendor is, e.g., > Redhat, you can get them to have Alan Cox look at your setup. Maybe > you can pay people on this list (Ed Tongson? Fooler? maybe Ian Sison > or Orly Andico [but maybe not, if they're very busy, unless they'd > look at it for fun :-]) to look at the issue. Posting vague > descriptions of the problem though is certainly not going to get you > useful replies. You need to be specific. If there's too much that's > confidential, then you're just going to have to pay someone good > who will sign an NDA. > > tiger > > > _________________________________________________ > Philippine Linux Users' Group (PLUG) Mailing List > [email protected] (#PLUG @ irc.free.net.ph) > Read the Guidelines: http://linux.org.ph/lists > Searchable Archives: http://archives.free.net.ph > -- Mike Calizo Registered Linux User # 365113 _________________________________________________ Even the longest journey has to start with a small first-step
_________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List [email protected] (#PLUG @ irc.free.net.ph) Read the Guidelines: http://linux.org.ph/lists Searchable Archives: http://archives.free.net.ph

