So, I changed two things.  Not sure which one resolved the issue entirely.
I logged in as root and restarted the sgeexec client and found it was not 
starting again.  I was unsure as to why since it had started before, I tired 
stopping the client and noted a new error that /opt/sge/wax-centaur-22 was not 
found.
I found that /opt/sge was missing so I created sge in the /opt directory and 
assigned it the gridadm user and group as its owner.

After doing this I tried to restart sgeexecd as root and it started without 
issue, this time listing girdadm as the owner
[wasim05(rmaes)]-> test 1524> ps -ef |grep sge
gridadm   3903     1  0  2011 ?        00:01:58 
/corp/grid/bin/lx24-amd64/sge_execd

and now it reports in too.
[wasim05(rmaes)]-> sge 173> qload
        HOSTNAME        ARCH            NCPU    LOAD    MEMTOT  MEMUSE  SWAPTO  
SWAPUS
        
-------------------------------------------------------------------------------
        global          -               -       -       -       -       -       
-
        wabuild01       lx24-amd64      8       36%     31.4G   5.8G    8.0G    
140.0K
        wabuild02       lx24-amd64      8       12%     31.4G   3.4G    4.0G    
125.2M
        wabuild03       lx24-amd64      12      0%      31.4G   1.2G    8.0G    
0.0
        wagrid03        lx24-amd64      8       0%      7.4G    348.4M  15.6G   
135.3M
        wasim01         lx24-amd64      2       0%      3.9G    -       1.9G    
-
        wasim02         lx24-amd64      2       0%      3.9G    -       1.9G    
-
        wasim03         lx24-amd64      2       1%      3.9G    182.4M  1.9G    
891.2M
        wasim04         lx24-amd64      2       1%      3.9G    2.6G    1.9G    
124.0K
        wasim05         lx24-amd64      2       1%      3.9G    544.9M  1.9G    
124.0K
        wasim06         lx24-amd64      2       0%      3.9G    1.8G    1.9G    
0.0
        wasim07         lx24-amd64      2       0%      3.9G    394.4M  1.9G    
0.0
        wasim08         lx24-amd64      2       2%      3.9G    1.6G    1.9G    
120.0K
        wax-centaur-22  lx24-x86        8       0%      11.8G   634.3M  8.0G    
0.0  <--- Here it is
        waxgridqm       lx24-amd64      2       0%      7.8G    -       4.0G    
-
        waxvnx01        lx24-amd64      2       1%      7.6G    5.3G    8.0G    
1.4G

-----Original Message-----
From: Maes, Richard 
Sent: Friday, January 27, 2012 2:42 PM
To: 'Rayson Ho'
Cc: [email protected]
Subject: RE: [gridengine users] Verifying execution host connectivity

Rayson,  
Here is a clue
01/27/2012 14:37:15|listen|waxgridqm|C|denied: request for user "rmaes" does 
not match credentials for connection <wax-centaur-22.ciena.com,execd,1>

So I started the SGE client using my account as opposed to doing a SUDO.  I 
have asked the admins to give me SUDO permissions for that box. I'll try it 
again shortly starting the service as root and see if that changes the behavior.
 

-----Original Message-----
From: Rayson Ho [mailto:[email protected]] 
Sent: Friday, January 27, 2012 12:45 PM
To: Maes, Richard
Cc: [email protected]
Subject: Re: [gridengine users] Verifying execution host connectivity

I think the best way is to check the logs - again, is there anything
in "messages" and /tmp/execd_messages.* ??

Rayson



On Fri, Jan 27, 2012 at 2:38 PM, Maes, Richard <[email protected]> wrote:
> Hi Reuti,
> Yes we installed both 64bit and 32bit along time ago, but never used the
> 32 bit binaries until now.  There are several directories with both 64
> and 32bit content.
>
>
> [waxvnx01.ciena.com(rmaes)]-> bin 107> pwd
> /corp/grid/bin
> [waxvnx01.ciena.com(rmaes)]-> bin 108> ls -lart
> total 16
> drwxr-xr-x  4 root    root    4096 Nov  9  2009 .
> drwxr-xr-x  2 root    root    4096 Jul 21  2011 lx24-amd64
> drwxr-xr-x  2 root    root    4096 Jul 21  2011 lx24-x86
> drwxr-xr-x 23 gridadm gridadm 4096 Jan 26 09:49 ..
> [waxvnx01.ciena.com(rmaes)]-> bin 109>
>
> [waxvnx01.ciena.com(rmaes)]-> utilbin 114> pwd
> /corp/grid/utilbin
> [waxvnx01.ciena.com(rmaes)]-> utilbin 115> ls
> lx24-amd64  lx24-x86
> [waxvnx01.ciena.com(rmaes)]-> utilbin 116>
>
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Friday, January 27, 2012 11:17 AM
> To: Maes, Richard
> Cc: [email protected]
> Subject: Re: [gridengine users] Verifying execution host connectivity
>
> Hi,
>
> Am 27.01.2012 um 19:56 schrieb Maes, Richard:
>
>> I have a 32bit execution host that I just add to our 64 bit grid.
> It's our first time interfacing a 32bit machine to the grid.  I have
> started the SGE client on the new execution host.
>>
>> I can see the 32 bit client running on the box
>> [wax-centaur-22.ciena.com(rmaes)]-> ~ 101> ps -ef |grep sge
>> rmaes    26617     1  0 10:34 ?        00:00:00
> /corp/grid/bin/lx24-x86/sge_execd
>>
>> I have looked around for information regarding used of 32bit machines
> and I haven't found anything that says I can't do it.
>
> correct, SGE and also it's precursor Codine were designed to have
> heterogeneous clusters, even not limited to Linux.
>
>
>> Is there a logging feature that would indicate what if any contact
> exists between the  qmaster and the wax-centaur-22 execution host?
>
> You untar'ed the 32 binary just inside the shared /corp/grid, i.e. in
> bin/utilbin/lib you have now 2 directories for lx24-amd64 and lx24-x86?
>
> -- Reuti
>
>
>> So far I have tried restarting the client and the qmaster and the
> connection hasn't come up.
>>
>>
>> I have created the execution host in QMON, but data isn't updating.
>>         HOSTNAME        ARCH            NCPU    LOAD    MEMTOT  MEMUSE
> SWAPTO  SWAPUS
>>
> ------------------------------------------------------------------------
> -------
>>         global          -               -       -       -       -
> -       -
>>         wabuild01       lx24-amd64      8       0%      31.4G   1.9G
> 8.0G    140.0K
>>         wabuild02       lx24-amd64      8       12%     31.4G   3.4G
> 4.0G    125.2M
>>         wabuild03       lx24-amd64      12      0%      31.4G   1.5G
> 8.0G    0.0
>>         wagrid03        lx24-amd64      8       1%      7.4G    960.5M
> 15.6G   148.9M
>>         wasim01         lx24-amd64      2       0%      3.9G    -
> 1.9G    -
>>         wasim02         lx24-amd64      2       0%      3.9G    -
> 1.9G    -
>>         wasim03         lx24-amd64      2       0%      3.9G    161.1M
> 1.9G    891.3M
>>         wasim04         lx24-amd64      2       2%      3.9G    2.6G
> 1.9G    124.0K
>>         wasim05         lx24-amd64      2       1%      3.9G    542.3M
> 1.9G    124.0K
>>         wasim06         lx24-amd64      2       0%      3.9G    1.8G
> 1.9G    0.0
>>         wasim07         lx24-amd64      2       1%      3.9G    393.8M
> 1.9G    0.0
>>         wasim08         lx24-amd64      2       0%      3.9G    1.6G
> 1.9G    120.0K
>>         wax-centaur-22  -       -       0%      -       -       -
> -
>>         waxgridqm       lx24-amd64      2       0%      7.8G    -
> 4.0G    -
>>         waxvnx01        lx24-amd64      2       5%      7.6G    5.6G
> 8.0G    1.4G
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to