Mark, What does the output from 'gstat' return?
--tjn _________________________________________________________________________ Thomas Naughton [EMAIL PROTECTED] Research Associate (865) 576-4184 On Thu, 5 Sep 2002, Mark Horner wrote: > Hi, > > ok we have made significant progress I think. If i am on the manager and > telnet to localhost 8649 I get xml data. And if I am on my node I get > local xml data. I have turned pfilter off on both and still no luck when I > try from one to the other. > > I restart gmond on both and then refresh my ganglia page. Am I leaving > something out? > > Would an upgrade of ganglia be the best way to go - might fix whatever is > wrong even if I never find out what it was. I would like to figure this > out though? > > Could you give me a laymans explanation of how the mulicast cahnnel > relates to the IPs I chose for my cluster (192.168.1.[1-21])? Could this > be a probelm - I just left it on default? > > Mark > > > > > On Thu, 5 Sep 2002, Joe Griffin wrote: > > > Mark, > > > > > thanks for the help. Unfortunately my telnet issues have not improved. > > > I created a gmond.conf file in /etc which has all the suggested inputs. > > > And restarted the service. No luck. > > > > Don't thank me until it works :-) > > > > I did not see. Can you telnet w/ the 8649 to yourself? > > > > Which version of Ganglia are you running? I > > have: > > > > virtue:82) rpm -q ganglia-monitor-core > > ganglia-monitor-core-2.4.1-1 > > > > > > I believe older versions required the mcast_if to be > > set in /etc/init.d/gmond: > > > > deamon $GMOND --mcast_if=eth0 > > > > > > > > > I put the same file on bambino1 one and tried after restarting the > > > service and no luck. I get the same negative response when I try to > > > telnet to 8649 on any of the machines from any other one. > > > > > > I am getting what I consider anomolous behavious when I start and stop the > > > service - see below: > > > > > > root@qgp3:/etc>service gmond start > > > Starting GANGLIA gmond: [ OK ] > > > root@qgp3:/etc>service gmond stop > > > Shutting down GANGLIA gmond: /etc/init.d/gmond: kill: (7394) - No such > > > process > > > /etc/init.d/gmond: kill: (7393) - No such process > > > /etc/init.d/gmond: kill: (7392) - No such process > > > /etc/init.d/gmond: kill: (7391) - No such process > > > /etc/init.d/gmond: kill: (7390) - No such process > > > /etc/init.d/gmond: kill: (7389) - No such process > > > /etc/init.d/gmond: kill: (7388) - No such process > > > /etc/init.d/gmond: kill: (7387) - No such process > > > /etc/init.d/gmond: kill: (7386) - No such process > > > [ OK ] > > > root@qgp3:/etc>service gmond start > > > Starting GANGLIA gmond: [ OK ] > > > root@qgp3:/etc>service gmond stop > > > Shutting down GANGLIA gmond: [ OK ] > > > root@qgp3:/etc>service gmond start > > > Starting GANGLIA gmond: [ OK ] > > > root@qgp3:/etc> > > > > > > Huh? > > > > Are you saying that sometimes you start/stop and get > > an error, and sometimes you start/stop and do not > > get the error? > > > > You might try turning up the "debug." Perhaps that > > will give you more information: > > > > # Run gmond in "debug" mode. Gmond will not background. Debug messages > > # are sent to stdout. Value from 0-100. The higher the number the more > > # detailed debugging information will be sent. > > # default: 0 > > # debug_level 10 > > > > Another possibility (but I think it's a long > > shot) is to set a static route for the ganglia multicast channel: > > > > route add -host 239.2.11.71 dev eth0 > > > > Joe > > > > > > # Run gmond in "debug" mode. Gmond will not background. Debug messages > > # are sent to stdout. Value from 0-100. The higher the number the more > > # detailed debugging information will be sent. > > # default: 0 > > # debug_level 10 > > > > > > > > Another thing that worries me is that the gmond -help mentions nothing > > > about config files? > > > > > > Any other things I might try? > > > > > > I have the following in my gmond.conf file: > > > > > > setuid nobody > > > all_trusted on > > > mcast_channel 239.2.11.71 > > > mcast_port 8649 > > > mcast_ttl 1 > > > mcast_threads 2 > > > xml_port 8649 > > > num_nodes 22 > > > xml_threads 2 > > > > > > Thanks, > > > > > > Mark > > > > > > > > > On Thu, 5 Sep 2002, Joe Griffin wrote: > > > > > > > > >>Hi Mark, > > >> > > >>I have four comments: > > >> > > >>1. Is gmond running on bambino1 as well as on the headnode? > > >> The "telnet bambino1 8649" should produce output like: > > >> > > >> virtue:81) telnet msc1 8649 > > >>Trying 192.168.3.21... > > >>Connected to msc1. > > >>Escape character is '^]'. > > >><?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> > > >><!DOCTYPE GANGLIA_XML [ > > >> <!ELEMENT GANGLIA_XML (CLUSTER)+> > > >> <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED > > >> SOURCE CDATA #REQUIRED> > > >> <!ELEMENT CLUSTER (HOST)+> > > >> <!ATTLIST CLUSTER NAME CDATA #REQUIRED > > >> LOCALTIME CDATA #REQUIRED> > > >> > > >> ... lines deleted ... > > >> > > >>2. If you are logged on the headnode, can you "telnet $HEADNODE 8649"? > > >> > > >>3. If you are on bambino1, can you "telnet bambino1 8649"? > > >> > > >>4. You mentioned "gmond -ieth1". I assume eth1 is the NIC > > >> connecting to your cluster. If so, have you put the > > >> following in /etc/gmond.conf: > > >> > > >> mcast_if eth1 > > >> > > >> Then restart the deamons: > > >> > > >> /etc/init.d/gmond stop > > >> /etc/init.d/gmond start > > >> > > >> > > >>You should be able to an telnet to your headnode (my #2) with > > >>the 8649 and see all the attached nodes. If you cannot > > >>it is either because the compute nodes are NOT running > > >>gmond (my #1) or the gmond on the headnode can't see the > > >>gmond on the compute nodes (my #4). Trying to do the > > >>telnet from bambino1 will let you know if gmond is > > >>running correctly on it. > > >> > > >> > > >>Regards, > > >>Joe Griffin > > >>MSC.Software > > >> > > >> > > >> > > >>Mark Horner wrote: > > >> > > >>>Hi, > > >>> > > >>>Ganglia only shows my head node. > > >>> > > >>>I am using oscar 1.4b4 on RH 7.3. I have checked that gmond is running on my >nodes > > >>>and on the manager - I have tried gmond -ieth1 to no avail. > > >>> > > >>>In a previous posting someone suggested telneting to port 8649 and that a > > >>>stream of xml data should be visible - this isn't the case : > > >>> > > >>> > > >>> > > >>>>telnet bambino1 8649 > > >>> > > >>>Trying 192.168.1.2... > > >>>Connected to bambino1.phy.uct.ac.za (192.168.1.2). > > >>>Escape character is '^]'. > > >>>Connection closed by foreign host. > > >>> > > >>>Any suggestions - could it be firewall issue? > > >>> > > >> > > >> > > >> > > > > > > > > > > > -- > Mark Horner > > Physics Department > University of Cape Town > Rondebosch > 7700 > South Africa > > Phone: +27 21 650 3366 (office) > Phone: +27 83 564 6272 (cellular) > Fax: +27 21 650 3342 > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > Oscar-users mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/oscar-users > ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
