Hi,

it looks like i've managed to solve the problem (I hope). the extra debugging 
didn't show anything out of the ordinary as far as I can tell (apart from the 
fact that the headnode is not included in any of the communications).

the problem was with the headnode firewall. I opened the ports to the client IP 
but forgot about the headnode IP. now the ganglia test is OK.

one final question: initially all tests passed BUT ganglia (APIs included). I 
then closed the OSCAR menu and had to run './install_cluster eth0' again to 
test 
the grid again. the thing is that now all test pass but some installation tests 
fail. does this mean that the grid has some kind of problem? cause at the end 
it 
says the OSCAR cluster is ready.


thanks for everything
FG


Performing root tests...
TORQUE node check                                              [PASSED]
TORQUE service check:pbs_server                                [PASSED]
Maui service check:maui                                        [PASSED]
/home mounts                                                   [PASSED]

Preparing user tests...
Performing user tests...
SSH ping test                                                  [PASSED]
SSH server->node                                               [PASSED]
SSH node->server                                               [PASSED]
Ganglia setup test                                             [PASSED]
Ganglia node count test                                        [PASSED]
TORQUE default queue definition                                [PASSED]
TORQUE Shell Test                                              [PASSED]
PVM (via TORQUE)                                               [PASSED]
Open MPI (via TORQUE)                                          [PASSED]
MPICH (via TORQUE)                                             [PASSED]
LAM/MPI (via TORQUE)                                           [PASSED]

Run APItests...

Running Installation tests for pvm
[PASS]       2007-06-29 18:38:27   pvmd-path-ls.apt
[FAIL]       2007-06-29 18:38:27   envvar-pvm_arch.apt
[FAIL]       2007-06-29 18:38:27   envvar-pvm_root.apt
[FAIL]       2007-06-29 18:38:27   envvar.apb
[FAIL]       2007-06-29 18:38:27   pvmd-path-which.apt
[PASS]       2007-06-29 18:38:27   modulecmd-path-ls.apt
[FAIL]       2007-06-29 18:38:27   pvm-module-list.apt
[FAILDEP]    2007-06-29 18:38:27   pvm-module-show.apb failed dependency(s).'
                                       PREREQ: 'pvm-module-list.apt'
                                       Expected: PASS, Actual: FAIL
[FAIL]       2007-06-29 18:38:27   pvm-module.apb
[FAIL]       2007-06-29 18:38:27   install_tests.apb

All tests passed, your OSCAR cluster is now ready to compute!

Please consider registering your OSCAR cluster at:
http://oscar.openclustergroup.org/register

...Hit <ENTER> to close this window...



Michael Edwards wrote:
> To get the extra debugging info you would have to shell into the
> computer and restart ganglia from the shell with "/etc/init.d/gmon
> restart".  To restart the master process on the head node, from a
> terminal with root privilages type "/etc/init.d/gmetad restart".
> 
> I believe you will get debug information in the terminal which started
> the process then.
> 
> On 6/29/07, Filipe Garrett <[EMAIL PROTECTED]> wrote:
>> I've been looking into the problem and it looks like the problem is with the
>> headnode. i've looked into ganglia and there's just the client node. 
>> apparently
>> ganglia doesn't recognizes the headnode as a node (strange, usually the 
>> problem
>> is with recognizing the clients). During installation I remember setting one
>> option for the headnode to be an execution node also. could it be something 
>> to
>> do with this? where can I check it?
>>
>> i've set the debugging levels on both files to 10 but there is no new output 
>> on
>> ganglia.err? where is the extra debugging info?
>>
>> thanks in adv,
>> FG
>>
>> Milo wrote:
>>> Since the head node seems to be receiving data fine from the compute node, I
>>> don't think the switch not passing multicast packets is the problem. Try
>>> turning up the debugging level for gmetad on the head node, and see if the
>>> output gives any clues, if nothing else, it will tell you if and from what
>>> nodes gmetad is receiving data packets from. To turn on debugging edit your
>>> /etc/gmetad.conf file, near the top you should see a variable 'debug_level'
>>> which is set to 0 by default. Setting it to above 0 will keep gmetad in the
>>> foreground when it is started and spit out debug messages to standard
>>> output. You can also do this for the gmond service as well by editing the
>>> /etc/gmond.conf file and turning on debugging in the globals{} section at
>>> the top.
>>> Hopefully that will at least give you an idea of where to start looking for
>>> the problem, let us know.
>>>
>>> -Milo
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
>>> Edwards
>>> Sent: Thursday, June 28, 2007 12:50 PM
>>> To: oscar-users@lists.sourceforge.net; Bernard Li
>>> Subject: Re: [Oscar-users] install on FC5: error in Ganglia
>>>
>>> We have been getting a fair number of people having problems with
>>> ganglia tests failing lately.  I have not been able to reproduce the
>>> problem myself, so I am not sure where the issue is.
>>>
>>> Sometimes it breaks because of problems with the /etc/hosts file.
>>>
>>> Also, what kind of switch are you using?  Ganglia likes to (needs to?
>>> not sure) use multicast, which is not supported on all switches.
>>>
>>> On 6/28/07, Filipe Garrett <[EMAIL PROTECTED]> wrote:
>>>> Hi all,
>>>>
>>>> I've managed to pass the other problem. after installing FC5 I updated
>>>> everything (with 'yum update') and it was under that configuration that
>>>> the installation script was unable to determine the machine's
>>> architecture.
>>>> I've fresh installed FC5 (1 headnode + 1 client) and everything went
>>>> smooth. I got to the test phase and every test passed BUT Ganglia!!! In
>>>> the ganglia.err file (pasted below) the client (molevol1.ub.edu) looks
>>>> OK and so the problem seems to be with the headnode (molevol.ub.edu)
>>>> that does not show up in CLUSTER HOSTS. Since it is a small cluster I've
>>>> also set the headnode to an execution host (could this be the problem?).
>>>> I've also checked for gmond and it is running in all machines.
>>>>
>>>> thanks in adv,
>>>> FG
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Client nodes: molevol1.ub.edu
>>>> Match pattern: molevol.ub.edu|molevol1.ub.edu
>>>> Number of hosts matched: 1
>>>> Gstat output:
>>>> CLUSTER INFORMATION
>>>>         Name: MolEvol
>>>>        Hosts: 1
>>>> Gexec Hosts: 0
>>>>   Dead Hosts: 0
>>>>    Localtime: Thu Jun 28 17:26:44 2007
>>>>
>>>> CLUSTER HOSTS
>>>> Hostname                     LOAD                       CPU
>>>>   Gexec
>>>>   CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System,
>>>> Idle, Wio]
>>>>
>>>> molevol1.ub.edu
>>>>      2 (    1/   66) [  0.00,  0.00,  0.00] [   2.1,   0.0,   0.8,
>>>> 97.1,   0.1] OFF
>>>>
>>>> The number of nodes expected is different from the number of nodes
>>> detected.
>>>> Check to see if gmond is running on all your nodes and make sure that you
>>>> are not having any network issues.
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> This SF.net email is sponsored by DB2 Express
>>>> Download DB2 Express C - the FREE version of DB2 express and take
>>>> control of your XML. No limits. Just data. Click to get it now.
>>>> http://sourceforge.net/powerbar/db2/
>>>> _______________________________________________
>>>> Oscar-users mailing list
>>>> Oscar-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by DB2 Express
>>> Download DB2 Express C - the FREE version of DB2 express and take
>>> control of your XML. No limits. Just data. Click to get it now.
>>> http://sourceforge.net/powerbar/db2/
>>> _______________________________________________
>>> Oscar-users mailing list
>>> Oscar-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by DB2 Express
>>> Download DB2 Express C - the FREE version of DB2 express and take
>>> control of your XML. No limits. Just data. Click to get it now.
>>> http://sourceforge.net/powerbar/db2/
>>> _______________________________________________
>>> Oscar-users mailing list
>>> Oscar-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> Oscar-users mailing list
>> Oscar-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to