From: [EMAIL PROTECTED] on behalf of Dr. David F. Robinson
Sent: Wed 19/01/2005 9:47 PM
To: Michael Edwards
Cc: Bernard Li; [EMAIL PROTECTED]; oscar-users@lists.sourceforge.net; OSCAR-DEVEL
Subject: Re: [Oscar-devel] RE: [Oscar-users] ganglia
The nodes are "awake". I have jobs running on them.
And, I can log onto
the nodes and there aren't any error messages.
Torque sees the nodes as
alive and submits jobs without any
problems.
David
> It almost sounds like they are going
into a sleep cycle. I get the
> impression that ganglia runs off of
a heartbeat on the client nodes
> and so if they went into a power save
state they would probably show
> as down. Outside query would
probably wake them up though...
>
> Might check and see if those
BIOS are set the same as the others (or
> if there is a related
bug). Could explain why just some of them are
> doing it too
(especially if the nodes were purchased in phases).
>
> Just a
thought, probably a silly one.
>
>
> On Wed, 19 Jan 2005
07:53:23 -0800, Bernard Li <[EMAIL PROTECTED]> wrote:
>> Hey
David:
>>
>> Are the downed nodes always the same or are they
sort of random?
>>
>> Can you check /var/log/messages on the
nodes and see if there are any
>> clues to why Ganglia is reporting
them as down?
>>
>> Cheers,
>>
>>
Bernard
>>
>> > -----Original Message-----
>> >
From: [EMAIL PROTECTED]
>> > [mailto:[EMAIL PROTECTED]]
On Behalf Of
>> > Dr. David F. Robinson
>> > Sent:
Saturday, January 15, 2005 7:04
>> > To:
oscar-users@lists.sourceforge.net;
>> >
oscar-devel@lists.sourceforge.net
>> > Subject: [Oscar-users]
ganglia
>> >
>> >
>> > Ganglia is reporting
nodes 121-140 of my 140 node system as
>> > down. If I do
a
>> >
>> > cexec '/etc/init.d/gmond restart' all of the
nodes show up as
>> > available.
>> > However, after an
hour or two these nodes go back to a 'down' state.
>> >
>>
> They do not show up under a 'pbsnodes -l' command and they
>> >
are working fine.
>> > I can submit and run jobs on these
nodes.
>> >
>> > Any suggestions?
>>
>
>> > Thanks in advance,
>> >
>> >
David
>> >
>> >
>> >
>>
>
>> >
>> >
>> >
-------------------------------------------------------
>> > The
SF.Net email is sponsored by: Beat the post-holiday blues
>> > Get a
FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> >
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>>
> _______________________________________________
>> >
Oscar-users mailing list
>> >
Oscar-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>
>>
>>
-------------------------------------------------------
>> The SF.Net
email is sponsored by: Beat the post-holiday blues
>> Get a FREE
limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and
FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>>
_______________________________________________
>> Oscar-devel mailing
list
>> Oscar-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-devel
>>
>
========================================
David
F. Robinson, Ph.D.
Corvid Technologies
149 Plantation Ridge Rd.
Suite
170
Mooresville, NC 28117
704-799-6944 (Voice)
704-799-7974
(Fax)
704-252-1310
(Cell)
-------------------------------------------------------
This
SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for
open source databases. Create drag-&-drop reports. Save time
by over 75%!
Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE
copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Oscar-users
mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users