Hi,

ok we have made significant progress I think. If i am on the manager and 
telnet to localhost 8649 I get xml data. And if I am on my node I get 
local xml data. I have turned pfilter off on both and still no luck when I 
try from one to the other.

I restart gmond on both and then refresh my ganglia page. Am I leaving 
something out?

Would an upgrade of ganglia be the best way to go - might fix whatever is 
wrong even if I never find out what it was. I would like to figure this 
out though?

Could you give me a laymans explanation of how the mulicast cahnnel 
relates to the IPs I chose for my cluster (192.168.1.[1-21])? Could this 
be a probelm - I just left it on default?

Mark




On Thu, 5 Sep 2002, Joe Griffin wrote:

> Mark,
> 
> > thanks for the help. Unfortunately my telnet issues have not improved.
> > I created a gmond.conf file in /etc which has all the suggested  inputs. 
> > And restarted the service. No luck.
> 
> Don't thank me until it works :-)
> 
> I did not see.  Can you telnet w/ the 8649 to yourself?
> 
> Which version of Ganglia are you running?  I
> have:
> 
> virtue:82) rpm -q ganglia-monitor-core
> ganglia-monitor-core-2.4.1-1
> 
> 
> I believe older versions required the mcast_if to be
> set in /etc/init.d/gmond:
> 
>     deamon $GMOND  --mcast_if=eth0
> 
> 
> 
> > I put the same file on bambino1 one and tried after restarting the 
> > service and no luck. I get the same negative response when I try to 
> > telnet to 8649 on any of the machines from any other one.
> > 
> > I am getting what I consider anomolous behavious when I start and stop the 
> > service - see below:
> > 
> > root@qgp3:/etc>service gmond start
> > Starting GANGLIA gmond:                                    [  OK  ]
> > root@qgp3:/etc>service gmond stop
> > Shutting down GANGLIA gmond: /etc/init.d/gmond: kill: (7394) - No such 
> > process
> > /etc/init.d/gmond: kill: (7393) - No such process
> > /etc/init.d/gmond: kill: (7392) - No such process
> > /etc/init.d/gmond: kill: (7391) - No such process
> > /etc/init.d/gmond: kill: (7390) - No such process
> > /etc/init.d/gmond: kill: (7389) - No such process
> > /etc/init.d/gmond: kill: (7388) - No such process
> > /etc/init.d/gmond: kill: (7387) - No such process
> > /etc/init.d/gmond: kill: (7386) - No such process
> >                                                            [  OK  ]
> > root@qgp3:/etc>service gmond start
> > Starting GANGLIA gmond:                                    [  OK  ]
> > root@qgp3:/etc>service gmond stop
> > Shutting down GANGLIA gmond:                               [  OK  ]
> > root@qgp3:/etc>service gmond start
> > Starting GANGLIA gmond:                                    [  OK  ]
> > root@qgp3:/etc>
> 
> 
> Huh?
> 
> Are you saying that sometimes you start/stop and get
> an error, and sometimes you start/stop and do not
> get the error?
> 
> You might try turning up the "debug."  Perhaps that
> will give you more information:
> 
> # Run gmond in "debug" mode.  Gmond will not background.  Debug messages
> # are sent to stdout.  Value from 0-100.  The higher the number the more
> # detailed debugging information will be sent.
> # default: 0
> # debug_level 10
> 
> Another possibility (but I think it's a long
> shot) is to set a static route for the ganglia multicast channel:
> 
> route add -host 239.2.11.71 dev eth0
> 
> Joe
> 
> 
> # Run gmond in "debug" mode.  Gmond will not background.  Debug messages
> # are sent to stdout.  Value from 0-100.  The higher the number the more
> # detailed debugging information will be sent.
> # default: 0
> # debug_level 10
> 
> > 
> > Another thing that worries me is that the gmond -help mentions nothing 
> > about config files?
> > 
> > Any other things I might try?
> > 
> > I have the following in my gmond.conf file:
> > 
> >  setuid nobody
> >  all_trusted on
> >  mcast_channel 239.2.11.71
> >  mcast_port 8649
> >  mcast_ttl 1
> >  mcast_threads 2
> >  xml_port 8649
> >  num_nodes 22
> >  xml_threads 2
> > 
> > Thanks, 
> > 
> > Mark
> > 
> > 
> > On Thu, 5 Sep 2002, Joe Griffin wrote:
> > 
> > 
> >>Hi Mark,
> >>
> >>I have four comments:
> >>
> >>1. Is gmond running on bambino1 as well as on the headnode?
> >>    The "telnet bambino1 8649" should produce output like:
> >>
> >>    virtue:81) telnet msc1 8649
> >>Trying 192.168.3.21...
> >>Connected to msc1.
> >>Escape character is '^]'.
> >><?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
> >><!DOCTYPE GANGLIA_XML [
> >>    <!ELEMENT GANGLIA_XML (CLUSTER)+>
> >>    <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
> >>                          SOURCE  CDATA #REQUIRED>
> >>    <!ELEMENT CLUSTER (HOST)+>
> >>    <!ATTLIST CLUSTER NAME  CDATA #REQUIRED
> >>                      LOCALTIME CDATA #REQUIRED>
> >>
> >>     ... lines deleted ...
> >>
> >>2. If you are logged on the headnode, can you "telnet $HEADNODE 8649"?
> >>
> >>3. If you are on bambino1, can you "telnet bambino1 8649"?
> >>
> >>4. You mentioned "gmond -ieth1".  I assume eth1 is the NIC
> >>    connecting to your cluster.  If so, have you put the
> >>    following in /etc/gmond.conf:
> >>
> >>     mcast_if  eth1
> >>
> >>     Then restart the deamons:
> >>
> >>    /etc/init.d/gmond stop
> >>    /etc/init.d/gmond start
> >>
> >>
> >>You should be able to an telnet to your headnode (my #2) with
> >>the 8649 and see all the attached nodes.  If you cannot
> >>it is either because the compute nodes are NOT running
> >>gmond (my #1) or the gmond on the headnode can't see the
> >>gmond on the compute nodes (my #4).  Trying to do the
> >>telnet from bambino1 will let you know if gmond is
> >>running correctly on it.
> >>
> >>
> >>Regards,
> >>Joe Griffin
> >>MSC.Software
> >>
> >>
> >>
> >>Mark Horner wrote:
> >>
> >>>Hi,
> >>>
> >>>Ganglia only shows my head node.
> >>>
> >>>I am using oscar 1.4b4 on RH 7.3. I have checked that gmond is running on my 
>nodes 
> >>>and on the manager - I have tried gmond -ieth1 to no avail.
> >>>
> >>>In a previous posting someone suggested telneting to port 8649 and that a 
> >>>stream of xml data should be visible - this isn't the case :
> >>>
> >>>
> >>>
> >>>>telnet bambino1 8649
> >>>
> >>>Trying 192.168.1.2...
> >>>Connected to bambino1.phy.uct.ac.za (192.168.1.2).
> >>>Escape character is '^]'.
> >>>Connection closed by foreign host.
> >>>
> >>>Any suggestions - could it be firewall issue?
> >>>
> >>
> >>
> >>
> > 
> 
> 
> 

-- 
Mark Horner

Physics Department
University of Cape Town
Rondebosch
7700
South Africa

Phone: +27 21 650 3366 (office)
Phone: +27 83 564 6272 (cellular)
Fax:   +27 21 650 3342



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to