The nodes are "awake". I have jobs running on them. And, I can log onto the nodes and there aren't any error messages. Torque sees the nodes as alive and submits jobs without any problems.
David > It almost sounds like they are going into a sleep cycle. I get the > impression that ganglia runs off of a heartbeat on the client nodes > and so if they went into a power save state they would probably show > as down. Outside query would probably wake them up though... > > Might check and see if those BIOS are set the same as the others (or > if there is a related bug). Could explain why just some of them are > doing it too (especially if the nodes were purchased in phases). > > Just a thought, probably a silly one. > > > On Wed, 19 Jan 2005 07:53:23 -0800, Bernard Li <[EMAIL PROTECTED]> wrote: >> Hey David: >> >> Are the downed nodes always the same or are they sort of random? >> >> Can you check /var/log/messages on the nodes and see if there are any >> clues to why Ganglia is reporting them as down? >> >> Cheers, >> >> Bernard >> >> > -----Original Message----- >> > From: [EMAIL PROTECTED] >> > [mailto:[EMAIL PROTECTED] On Behalf Of >> > Dr. David F. Robinson >> > Sent: Saturday, January 15, 2005 7:04 >> > To: [email protected]; >> > [email protected] >> > Subject: [Oscar-users] ganglia >> > >> > >> > Ganglia is reporting nodes 121-140 of my 140 node system as >> > down. If I do a >> > >> > cexec '/etc/init.d/gmond restart' all of the nodes show up as >> > available. >> > However, after an hour or two these nodes go back to a 'down' state. >> > >> > They do not show up under a 'pbsnodes -l' command and they >> > are working fine. >> > I can submit and run jobs on these nodes. >> > >> > Any suggestions? >> > >> > Thanks in advance, >> > >> > David >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------- >> > The SF.Net email is sponsored by: Beat the post-holiday blues >> > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >> > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >> > _______________________________________________ >> > Oscar-users mailing list >> > [email protected] >> > https://lists.sourceforge.net/lists/listinfo/oscar-users >> > >> >> ------------------------------------------------------- >> The SF.Net email is sponsored by: Beat the post-holiday blues >> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >> _______________________________________________ >> Oscar-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/oscar-devel >> > ======================================== David F. Robinson, Ph.D. Corvid Technologies 149 Plantation Ridge Rd. Suite 170 Mooresville, NC 28117 704-799-6944 (Voice) 704-799-7974 (Fax) 704-252-1310 (Cell) ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Oscar-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-users
