Hi Gerald, First of all thanks for your time.

Sorry but I can not just give too many info regarding the setup because of
confidentiality issue. Regarding redhat support, the company has it and we
already open a ticket to them, Actually they are looking into the issue as
press time.

I get your point and I will look into your sugestions. As I mentioned on my
previous email, I am just looking opinions. :)

Thanks again.

On 10/14/07, Gerald Timothy Quimpo <[EMAIL PROTECTED]> wrote:
>
> On Sun, 2007-10-14 at 07:24 +0800, Michael Calizo wrote:
> > Its a HIGH Performance Computing (HPC).  To elaborate the setup, we
> > have a thinanywhere setup for user to use and  to submit jobs to those
> > computing nodes. The management node  will then push the those jobs to
> > available computing nodes.
>
> I don't know much about that end of the linux performance spectrum,
> but I suspect you'll just have to provide more information.  there
> is NO information in your posts that can help anyone help you.
>
> e.g., are you running something like openMosix or openSSI?  or is it
> similar to beowulf?  just what kind of HPC are you running?  what
> distro is running?  what kernel?  how are tasks allocated?  how does
> each node know to run something?  does it poll a central location and
> pull jobs its supposed to run and run them?  or does a central server
> directly send commands to it to run (e.g. similar to
> ssh [run-some-program-on-your-cpu]).  are these jobs IO-bound or
> cpu bound?  it's probably a mix, what is the mix?  is disk shared among
> multiple servers?  if yes, how is it shared?
>
> > Cron job is not an option because we can not just
> > kill/restart/powercycle those jobs/server on the compute node without
> > informing the job owner.
>
> do the servers ever die by themselves?  e.g., OOM.  i would think that
> it's always possible to kill a node at least via OOM.  or do you have
> strong ulimit settings so that it's never possible to kill a node via
> OOM or some similar denial of service?
>
> > What I want is an opinion if it is safe to say that upgrade is needed
> > for those low-end computer node. This is actually a matter of  how to
> > defend  my case to the boss :)
>
> if your statistics are pretty good (e.g., we get N node failures a
> month, and of those N, 99% are on low-end nodes, or we've had M node
> failures in 3 years and of those 99% were on low-end nodes) then you
> can certainly safely show those stats to your boss.  even 80% is
> probably high enough that he'll agree to upgrade all the low end
> boxes.  if you're lower than around 80% (or, pick a number, any
> number higher than 50%) then you'll need to actually understand
> why those low end nodes are failing rather than making blanket
> statements that your statistics don't support.
>
> is there some big-vendor you can push that question to?  if you're
> describing your setup accurately (and not just giving us big numbers
> so our eyes will grow big too), then i'm sure you've got some sort
> of expensive support agreement.  Maybe, if your vendor is, e.g.,
> Redhat, you can get them to have Alan Cox look at your setup.  Maybe
> you can pay people on this list (Ed Tongson? Fooler? maybe Ian Sison
> or Orly Andico [but maybe not, if they're very busy, unless they'd
> look at it for fun :-]) to look at the issue.  Posting vague
> descriptions of the problem though is certainly not going to get you
> useful replies.  You need to be specific.  If there's too much that's
> confidential, then you're just going to have to pay someone good
> who will sign an NDA.
>
> tiger
>
>
> _________________________________________________
> Philippine Linux Users' Group (PLUG) Mailing List
> [email protected] (#PLUG @ irc.free.net.ph)
> Read the Guidelines: http://linux.org.ph/lists
> Searchable Archives: http://archives.free.net.ph
>



-- 
Mike Calizo
Registered Linux User # 365113

_________________________________________________
Even the longest journey has to start with a small first-step
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
[email protected] (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph

Reply via email to