Hi, ok we have made significant progress I think. If i am on the manager and telnet to localhost 8649 I get xml data. And if I am on my node I get local xml data. I have turned pfilter off on both and still no luck when I try from one to the other.
I restart gmond on both and then refresh my ganglia page. Am I leaving something out? Would an upgrade of ganglia be the best way to go - might fix whatever is wrong even if I never find out what it was. I would like to figure this out though? Could you give me a laymans explanation of how the mulicast cahnnel relates to the IPs I chose for my cluster (192.168.1.[1-21])? Could this be a probelm - I just left it on default? Mark On Thu, 5 Sep 2002, Joe Griffin wrote: > Mark, > > > thanks for the help. Unfortunately my telnet issues have not improved. > > I created a gmond.conf file in /etc which has all the suggested inputs. > > And restarted the service. No luck. > > Don't thank me until it works :-) > > I did not see. Can you telnet w/ the 8649 to yourself? > > Which version of Ganglia are you running? I > have: > > virtue:82) rpm -q ganglia-monitor-core > ganglia-monitor-core-2.4.1-1 > > > I believe older versions required the mcast_if to be > set in /etc/init.d/gmond: > > deamon $GMOND --mcast_if=eth0 > > > > > I put the same file on bambino1 one and tried after restarting the > > service and no luck. I get the same negative response when I try to > > telnet to 8649 on any of the machines from any other one. > > > > I am getting what I consider anomolous behavious when I start and stop the > > service - see below: > > > > root@qgp3:/etc>service gmond start > > Starting GANGLIA gmond: [ OK ] > > root@qgp3:/etc>service gmond stop > > Shutting down GANGLIA gmond: /etc/init.d/gmond: kill: (7394) - No such > > process > > /etc/init.d/gmond: kill: (7393) - No such process > > /etc/init.d/gmond: kill: (7392) - No such process > > /etc/init.d/gmond: kill: (7391) - No such process > > /etc/init.d/gmond: kill: (7390) - No such process > > /etc/init.d/gmond: kill: (7389) - No such process > > /etc/init.d/gmond: kill: (7388) - No such process > > /etc/init.d/gmond: kill: (7387) - No such process > > /etc/init.d/gmond: kill: (7386) - No such process > > [ OK ] > > root@qgp3:/etc>service gmond start > > Starting GANGLIA gmond: [ OK ] > > root@qgp3:/etc>service gmond stop > > Shutting down GANGLIA gmond: [ OK ] > > root@qgp3:/etc>service gmond start > > Starting GANGLIA gmond: [ OK ] > > root@qgp3:/etc> > > > Huh? > > Are you saying that sometimes you start/stop and get > an error, and sometimes you start/stop and do not > get the error? > > You might try turning up the "debug." Perhaps that > will give you more information: > > # Run gmond in "debug" mode. Gmond will not background. Debug messages > # are sent to stdout. Value from 0-100. The higher the number the more > # detailed debugging information will be sent. > # default: 0 > # debug_level 10 > > Another possibility (but I think it's a long > shot) is to set a static route for the ganglia multicast channel: > > route add -host 239.2.11.71 dev eth0 > > Joe > > > # Run gmond in "debug" mode. Gmond will not background. Debug messages > # are sent to stdout. Value from 0-100. The higher the number the more > # detailed debugging information will be sent. > # default: 0 > # debug_level 10 > > > > > Another thing that worries me is that the gmond -help mentions nothing > > about config files? > > > > Any other things I might try? > > > > I have the following in my gmond.conf file: > > > > setuid nobody > > all_trusted on > > mcast_channel 239.2.11.71 > > mcast_port 8649 > > mcast_ttl 1 > > mcast_threads 2 > > xml_port 8649 > > num_nodes 22 > > xml_threads 2 > > > > Thanks, > > > > Mark > > > > > > On Thu, 5 Sep 2002, Joe Griffin wrote: > > > > > >>Hi Mark, > >> > >>I have four comments: > >> > >>1. Is gmond running on bambino1 as well as on the headnode? > >> The "telnet bambino1 8649" should produce output like: > >> > >> virtue:81) telnet msc1 8649 > >>Trying 192.168.3.21... > >>Connected to msc1. > >>Escape character is '^]'. > >><?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> > >><!DOCTYPE GANGLIA_XML [ > >> <!ELEMENT GANGLIA_XML (CLUSTER)+> > >> <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED > >> SOURCE CDATA #REQUIRED> > >> <!ELEMENT CLUSTER (HOST)+> > >> <!ATTLIST CLUSTER NAME CDATA #REQUIRED > >> LOCALTIME CDATA #REQUIRED> > >> > >> ... lines deleted ... > >> > >>2. If you are logged on the headnode, can you "telnet $HEADNODE 8649"? > >> > >>3. If you are on bambino1, can you "telnet bambino1 8649"? > >> > >>4. You mentioned "gmond -ieth1". I assume eth1 is the NIC > >> connecting to your cluster. If so, have you put the > >> following in /etc/gmond.conf: > >> > >> mcast_if eth1 > >> > >> Then restart the deamons: > >> > >> /etc/init.d/gmond stop > >> /etc/init.d/gmond start > >> > >> > >>You should be able to an telnet to your headnode (my #2) with > >>the 8649 and see all the attached nodes. If you cannot > >>it is either because the compute nodes are NOT running > >>gmond (my #1) or the gmond on the headnode can't see the > >>gmond on the compute nodes (my #4). Trying to do the > >>telnet from bambino1 will let you know if gmond is > >>running correctly on it. > >> > >> > >>Regards, > >>Joe Griffin > >>MSC.Software > >> > >> > >> > >>Mark Horner wrote: > >> > >>>Hi, > >>> > >>>Ganglia only shows my head node. > >>> > >>>I am using oscar 1.4b4 on RH 7.3. I have checked that gmond is running on my >nodes > >>>and on the manager - I have tried gmond -ieth1 to no avail. > >>> > >>>In a previous posting someone suggested telneting to port 8649 and that a > >>>stream of xml data should be visible - this isn't the case : > >>> > >>> > >>> > >>>>telnet bambino1 8649 > >>> > >>>Trying 192.168.1.2... > >>>Connected to bambino1.phy.uct.ac.za (192.168.1.2). > >>>Escape character is '^]'. > >>>Connection closed by foreign host. > >>> > >>>Any suggestions - could it be firewall issue? > >>> > >> > >> > >> > > > > > -- Mark Horner Physics Department University of Cape Town Rondebosch 7700 South Africa Phone: +27 21 650 3366 (office) Phone: +27 83 564 6272 (cellular) Fax: +27 21 650 3342 ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
