Bernard,
        It sounds like a similar thing.  When you restart the server all is
okay.  The problem seems to be with how non-blocking I/O is handled by
pbs_server.  The code is:

   while ( (wrc = write(fd, buf, count)) < 0){

       if (errno == EAGAIN) continue;

   };

and when there is a real error it can never break out of the loop!

        There do seem to be a few patches around to fix it (e.g. the OSC site)
but I'm concerned with making this change, as the OSC patch implies
there are a number of other things that should be done.

Frank

On Thu, 2003-12-11 at 11:49, Bernard Li wrote:
> Hi Frank:
> 
> I have had issues when I bring UP nodes and the server just goes crazy :-)
> 
> I suppose when your re-start the server it is okay?
> 
> I think the problem is when a node goes down and PBS tries to schedule a 
> job to the node, it just gets stuck...
> 
> I have tested SGE and am quite happy with it, and I will in the near 
> future be creating an OSCAR package for it.  We should also be providing 
> multiple batch systems support so that the users can choose to use.
> 
> Cheers,
> 
> Bernard
> 
> Frank Crawford wrote:
> 
> > Folks,
> >     Has anyone else seen problems with pbs_server using 100% of a CPU when
> > one of the compute nodes goes down (or is taken offline)?  I think this
> > is a problem with the cplantFR patch (as suggested by a couple of mail
> > items on the OpenPBS mailing list), but want to check with others first
> > before going down the wrong track.
> > 
> >     BTW, I've seen this in the OpenPBS RPM found in OSCAR 2.3, but given
> > the slow rate of change for OpenPBS, I don't expect it to be different
> > in others either.
> > 
> > Thanks
> > Frank
> > 
-- 
ac3
Suite G16, Bay 7, Locomotive Workshop   Phone:  02 9209 4600
Australian Technology Park              Fax:    02 9209 4611
Eveleigh   NSW   1430



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to