i hope this well help to track down the problems people are having getting 
ganglia up and running.

there is documentation for ganglia at
http://ganglia.sourceforge.net/ganglia_docs/
but i know that it is lacking and will be updated in time.

installation (step-by-step)
1. install gmond on all hosts that you want to monitor.  let's call the 
hosts: host A, host B and host C (to be creative).  
  a. make that gmond is running and working on all three machines. 
     (ps -ef | grep gmond) on each host will let you know it's running.
  b. make sure that each gmond is multicasting to its peers, i.e. is
     host A sending to B and C, B->AC, C->AB.  one EACH host run
     (telnet localhost 8649 | grep "HOST NAME").  if you run this on host 
     A for example, you should see Host A, Host B And Host C listed.
     
     Host A> telnet localhost 8649 | grep "HOST NAME"
     <HOST NAME="Host B" IP="..." ...>
     <HOST NAME="Host A" IP="..." ...>
     <HOST NAME="Host C" IP="..." ...>

     if you don't see all the hosts in the list on EACH machine then you 
     have a multicast problem.  btw, multicast is similar to broadcast
     but more efficient and routable.  multicast problems are almost 
     always a problem with the switch your hosts are plugged into.

     once you are sure that all the gmonds on all hosts are talking to
     each other then you can install and configure gmetad. 
 
http://ganglia.sourceforge.net/ganglia_docs/configuration.html#GMOND-CONFIGURATION
     explains ways to customize gmond.

2. now we have gmond running Host A,B, and C.  they are outputting XML but 
we need a way to track historical trends.  that is where gmetad comes in.  
gmetad peridically pulls data from a 1 or more gmonds, saves the XML and 
writes the values to round-robin databases.  you CAN run gmetad on machine
which is already running gmond OR put gmetad on another host (say Host D).

  a. installing gmetad on Host A, Host B or Host C:
    
     i. create a /etc/gmetad.conf file that looks like this
         ---- start file -----
         data_source "My Cluster" 127.0.0.1:8649
         ---- end file ---
     ii. start gmetad.
     iii. look in /var/lib/ganglia/rrds to see that the databases
          are being created
     iv. telnet to port 8650 to see the gmetad is outputting XML.
         (telnet localhost 8650).
     if you don't see XML, doublecheck that gmond is really running.

  b. installing gmetad on a dedicated host (Host D):

     i. Pick which hosts running gmond will share their data.  Let's
        say Host A and Host B will share their data but C won't.
     ii. on Host A and Host B add this line to /etc/gmond.conf
         --- start line ---
         trusted_hosts <host D ip address>
         --- end line ---
         and then restart gmond (/etc/rc.d/init.d/gmond restart)

         if you don't tell each gmond to trust the gmetad on Host D
         they will not output their XML to Host D.  As a check, login
         to host D and run "telnet <Host A> 8649" and check if you get
         XML output.  If you don't, then Host A is not trusting Host D.

     iii. create a /etc/gmetad.conf file that looks like this
        --- start file ----
        data_source "My Cluster" <Host A ip>:8649 <Host B ip>:8649
        --- end file ----

        for example
        data_source "My Cluster" 10.0.0.100:8649 10.0.0.200:8649
    
     iv. start gmetad. (goto steps iii. iv. and v above)

you can run gmond and gmetad in "debug mode" by adding the line
--- start line ---
debug_level 1
--- end line ---

if you can't track down the problem, this output would be very helpful.  
hope this quick overview helps.. let me know how else i could help.

matt



Reply via email to