<x-flowed>
Mark,
thanks for the help. Unfortunately my telnet issues have not improved.
I created a gmond.conf file in /etc which has all the suggested inputs.
And restarted the service. No luck.
Don't thank me until it works :-)
I did not see. Can you telnet w/ the 8649 to yourself?
Which version of Ganglia are you running? I
have:
virtue:82) rpm -q ganglia-monitor-core
ganglia-monitor-core-2.4.1-1
I believe older versions required the mcast_if to be
set in /etc/init.d/gmond:
deamon $GMOND --mcast_if=eth0
I put the same file on bambino1 one and tried after restarting the
service and no luck. I get the same negative response when I try to
telnet to 8649 on any of the machines from any other one.
I am getting what I consider anomolous behavious when I start and stop the
service - see below:
root@qgp3:/etc>service gmond start
Starting GANGLIA gmond: [ OK ]
root@qgp3:/etc>service gmond stop
Shutting down GANGLIA gmond: /etc/init.d/gmond: kill: (7394) - No such
process
/etc/init.d/gmond: kill: (7393) - No such process
/etc/init.d/gmond: kill: (7392) - No such process
/etc/init.d/gmond: kill: (7391) - No such process
/etc/init.d/gmond: kill: (7390) - No such process
/etc/init.d/gmond: kill: (7389) - No such process
/etc/init.d/gmond: kill: (7388) - No such process
/etc/init.d/gmond: kill: (7387) - No such process
/etc/init.d/gmond: kill: (7386) - No such process
[ OK ]
root@qgp3:/etc>service gmond start
Starting GANGLIA gmond: [ OK ]
root@qgp3:/etc>service gmond stop
Shutting down GANGLIA gmond: [ OK ]
root@qgp3:/etc>service gmond start
Starting GANGLIA gmond: [ OK ]
root@qgp3:/etc>
Huh?
Are you saying that sometimes you start/stop and get
an error, and sometimes you start/stop and do not
get the error?
You might try turning up the "debug." Perhaps that
will give you more information:
# Run gmond in "debug" mode. Gmond will not background. Debug messages
# are sent to stdout. Value from 0-100. The higher the number the more
# detailed debugging information will be sent.
# default: 0
# debug_level 10
Another possibility (but I think it's a long
shot) is to set a static route for the ganglia multicast channel:
route add -host 239.2.11.71 dev eth0
Joe
# Run gmond in "debug" mode. Gmond will not background. Debug messages
# are sent to stdout. Value from 0-100. The higher the number the more
# detailed debugging information will be sent.
# default: 0
# debug_level 10
Another thing that worries me is that the gmond -help mentions nothing
about config files?
Any other things I might try?
I have the following in my gmond.conf file:
setuid nobody
all_trusted on
mcast_channel 239.2.11.71
mcast_port 8649
mcast_ttl 1
mcast_threads 2
xml_port 8649
num_nodes 22
xml_threads 2
Thanks,
Mark
On Thu, 5 Sep 2002, Joe Griffin wrote:
Hi Mark,
I have four comments:
1. Is gmond running on bambino1 as well as on the headnode?
The "telnet bambino1 8649" should produce output like:
virtue:81) telnet msc1 8649
Trying 192.168.3.21...
Connected to msc1.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (CLUSTER)+>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
SOURCE CDATA #REQUIRED>
<!ELEMENT CLUSTER (HOST)+>
<!ATTLIST CLUSTER NAME CDATA #REQUIRED
LOCALTIME CDATA #REQUIRED>
... lines deleted ...
2. If you are logged on the headnode, can you "telnet $HEADNODE 8649"?
3. If you are on bambino1, can you "telnet bambino1 8649"?
4. You mentioned "gmond -ieth1". I assume eth1 is the NIC
connecting to your cluster. If so, have you put the
following in /etc/gmond.conf:
mcast_if eth1
Then restart the deamons:
/etc/init.d/gmond stop
/etc/init.d/gmond start
You should be able to an telnet to your headnode (my #2) with
the 8649 and see all the attached nodes. If you cannot
it is either because the compute nodes are NOT running
gmond (my #1) or the gmond on the headnode can't see the
gmond on the compute nodes (my #4). Trying to do the
telnet from bambino1 will let you know if gmond is
running correctly on it.
Regards,
Joe Griffin
MSC.Software
Mark Horner wrote:
Hi,
Ganglia only shows my head node.
I am using oscar 1.4b4 on RH 7.3. I have checked that gmond is running on my nodes
and on the manager - I have tried gmond -ieth1 to no avail.
In a previous posting someone suggested telneting to port 8649 and that a
stream of xml data should be visible - this isn't the case :
telnet bambino1 8649
Trying 192.168.1.2...
Connected to bambino1.phy.uct.ac.za (192.168.1.2).
Escape character is '^]'.
Connection closed by foreign host.
Any suggestions - could it be firewall issue?
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users
</x-flowed>