Title: Re: [Oscar-devel] RE: [Oscar-users] ganglia
Hi David:
 
Ganglia is dependent on multi-cast, so it is possible that your network switch(es) might have some limitations with respect to how much multi-cast traffic it can handle...  perhaps you can check that?
 
Cheers,
 
Bernard


From: [EMAIL PROTECTED] on behalf of Dr. David F. Robinson
Sent: Wed 19/01/2005 9:47 PM
To: Michael Edwards
Cc: Bernard Li; [EMAIL PROTECTED]; oscar-users@lists.sourceforge.net; OSCAR-DEVEL
Subject: Re: [Oscar-devel] RE: [Oscar-users] ganglia

The nodes are "awake".  I have jobs running on them.  And, I can log onto
the nodes and there aren't any error messages.  Torque sees the nodes as
alive and submits jobs without any problems.

David



> It almost sounds like they are going into a sleep cycle.  I get the
> impression that ganglia runs off of a heartbeat on the client nodes
> and so if they went into a power save state they would probably show
> as down.  Outside query would probably wake them up though...
>
> Might check and see if those BIOS are set the same as the others (or
> if there is a related bug).  Could explain why just some of them are
> doing it too (especially if the nodes were purchased in phases).
>
> Just a thought, probably a silly one.
>
>
> On Wed, 19 Jan 2005 07:53:23 -0800, Bernard Li <[EMAIL PROTECTED]> wrote:
>> Hey David:
>>
>> Are the downed nodes always the same or are they sort of random?
>>
>> Can you check /var/log/messages on the nodes and see if there are any
>> clues to why Ganglia is reporting them as down?
>>
>> Cheers,
>>
>> Bernard
>>
>> > -----Original Message-----
>> > From: [EMAIL PROTECTED]
>> > [mailto:[EMAIL PROTECTED]] On Behalf Of
>> > Dr. David F. Robinson
>> > Sent: Saturday, January 15, 2005 7:04
>> > To: oscar-users@lists.sourceforge.net;
>> > oscar-devel@lists.sourceforge.net
>> > Subject: [Oscar-users] ganglia
>> >
>> >
>> > Ganglia is reporting nodes 121-140 of my 140 node system as
>> > down.  If I do a
>> >
>> > cexec '/etc/init.d/gmond restart' all of the nodes show up as
>> > available.
>> > However, after an hour or two these nodes go back to a 'down' state.
>> >
>> > They do not show up under a 'pbsnodes -l' command and they
>> > are working fine.
>> > I can submit and run jobs on these nodes.
>> >
>> > Any suggestions?
>> >
>> > Thanks in advance,
>> >
>> > David
>> >
>> >
>> >
>> >
>> >
>> >
>> > -------------------------------------------------------
>> > The SF.Net email is sponsored by: Beat the post-holiday blues
>> > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> > _______________________________________________
>> > Oscar-users mailing list
>> > Oscar-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >
>>
>> -------------------------------------------------------
>> The SF.Net email is sponsored by: Beat the post-holiday blues
>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> _______________________________________________
>> Oscar-devel mailing list
>> Oscar-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-devel
>>
>


========================================
David F. Robinson, Ph.D.
Corvid Technologies
149 Plantation Ridge Rd.
Suite 170
Mooresville, NC 28117
704-799-6944 (Voice)
704-799-7974 (Fax)
704-252-1310 (Cell)


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to