So, I changed two things. Not sure which one resolved the issue entirely.
I logged in as root and restarted the sgeexec client and found it was not
starting again. I was unsure as to why since it had started before, I tired
stopping the client and noted a new error that /opt/sge/wax-centaur-22 was not
found.
I found that /opt/sge was missing so I created sge in the /opt directory and
assigned it the gridadm user and group as its owner.
After doing this I tried to restart sgeexecd as root and it started without
issue, this time listing girdadm as the owner
[wasim05(rmaes)]-> test 1524> ps -ef |grep sge
gridadm 3903 1 0 2011 ? 00:01:58
/corp/grid/bin/lx24-amd64/sge_execd
and now it reports in too.
[wasim05(rmaes)]-> sge 173> qload
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - -
-
wabuild01 lx24-amd64 8 36% 31.4G 5.8G 8.0G
140.0K
wabuild02 lx24-amd64 8 12% 31.4G 3.4G 4.0G
125.2M
wabuild03 lx24-amd64 12 0% 31.4G 1.2G 8.0G
0.0
wagrid03 lx24-amd64 8 0% 7.4G 348.4M 15.6G
135.3M
wasim01 lx24-amd64 2 0% 3.9G - 1.9G
-
wasim02 lx24-amd64 2 0% 3.9G - 1.9G
-
wasim03 lx24-amd64 2 1% 3.9G 182.4M 1.9G
891.2M
wasim04 lx24-amd64 2 1% 3.9G 2.6G 1.9G
124.0K
wasim05 lx24-amd64 2 1% 3.9G 544.9M 1.9G
124.0K
wasim06 lx24-amd64 2 0% 3.9G 1.8G 1.9G
0.0
wasim07 lx24-amd64 2 0% 3.9G 394.4M 1.9G
0.0
wasim08 lx24-amd64 2 2% 3.9G 1.6G 1.9G
120.0K
wax-centaur-22 lx24-x86 8 0% 11.8G 634.3M 8.0G
0.0 <--- Here it is
waxgridqm lx24-amd64 2 0% 7.8G - 4.0G
-
waxvnx01 lx24-amd64 2 1% 7.6G 5.3G 8.0G
1.4G
-----Original Message-----
From: Maes, Richard
Sent: Friday, January 27, 2012 2:42 PM
To: 'Rayson Ho'
Cc: [email protected]
Subject: RE: [gridengine users] Verifying execution host connectivity
Rayson,
Here is a clue
01/27/2012 14:37:15|listen|waxgridqm|C|denied: request for user "rmaes" does
not match credentials for connection <wax-centaur-22.ciena.com,execd,1>
So I started the SGE client using my account as opposed to doing a SUDO. I
have asked the admins to give me SUDO permissions for that box. I'll try it
again shortly starting the service as root and see if that changes the behavior.
-----Original Message-----
From: Rayson Ho [mailto:[email protected]]
Sent: Friday, January 27, 2012 12:45 PM
To: Maes, Richard
Cc: [email protected]
Subject: Re: [gridengine users] Verifying execution host connectivity
I think the best way is to check the logs - again, is there anything
in "messages" and /tmp/execd_messages.* ??
Rayson
On Fri, Jan 27, 2012 at 2:38 PM, Maes, Richard <[email protected]> wrote:
> Hi Reuti,
> Yes we installed both 64bit and 32bit along time ago, but never used the
> 32 bit binaries until now. There are several directories with both 64
> and 32bit content.
>
>
> [waxvnx01.ciena.com(rmaes)]-> bin 107> pwd
> /corp/grid/bin
> [waxvnx01.ciena.com(rmaes)]-> bin 108> ls -lart
> total 16
> drwxr-xr-x 4 root root 4096 Nov 9 2009 .
> drwxr-xr-x 2 root root 4096 Jul 21 2011 lx24-amd64
> drwxr-xr-x 2 root root 4096 Jul 21 2011 lx24-x86
> drwxr-xr-x 23 gridadm gridadm 4096 Jan 26 09:49 ..
> [waxvnx01.ciena.com(rmaes)]-> bin 109>
>
> [waxvnx01.ciena.com(rmaes)]-> utilbin 114> pwd
> /corp/grid/utilbin
> [waxvnx01.ciena.com(rmaes)]-> utilbin 115> ls
> lx24-amd64 lx24-x86
> [waxvnx01.ciena.com(rmaes)]-> utilbin 116>
>
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Friday, January 27, 2012 11:17 AM
> To: Maes, Richard
> Cc: [email protected]
> Subject: Re: [gridengine users] Verifying execution host connectivity
>
> Hi,
>
> Am 27.01.2012 um 19:56 schrieb Maes, Richard:
>
>> I have a 32bit execution host that I just add to our 64 bit grid.
> It's our first time interfacing a 32bit machine to the grid. I have
> started the SGE client on the new execution host.
>>
>> I can see the 32 bit client running on the box
>> [wax-centaur-22.ciena.com(rmaes)]-> ~ 101> ps -ef |grep sge
>> rmaes 26617 1 0 10:34 ? 00:00:00
> /corp/grid/bin/lx24-x86/sge_execd
>>
>> I have looked around for information regarding used of 32bit machines
> and I haven't found anything that says I can't do it.
>
> correct, SGE and also it's precursor Codine were designed to have
> heterogeneous clusters, even not limited to Linux.
>
>
>> Is there a logging feature that would indicate what if any contact
> exists between the qmaster and the wax-centaur-22 execution host?
>
> You untar'ed the 32 binary just inside the shared /corp/grid, i.e. in
> bin/utilbin/lib you have now 2 directories for lx24-amd64 and lx24-x86?
>
> -- Reuti
>
>
>> So far I have tried restarting the client and the qmaster and the
> connection hasn't come up.
>>
>>
>> I have created the execution host in QMON, but data isn't updating.
>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE
> SWAPTO SWAPUS
>>
> ------------------------------------------------------------------------
> -------
>> global - - - - -
> - -
>> wabuild01 lx24-amd64 8 0% 31.4G 1.9G
> 8.0G 140.0K
>> wabuild02 lx24-amd64 8 12% 31.4G 3.4G
> 4.0G 125.2M
>> wabuild03 lx24-amd64 12 0% 31.4G 1.5G
> 8.0G 0.0
>> wagrid03 lx24-amd64 8 1% 7.4G 960.5M
> 15.6G 148.9M
>> wasim01 lx24-amd64 2 0% 3.9G -
> 1.9G -
>> wasim02 lx24-amd64 2 0% 3.9G -
> 1.9G -
>> wasim03 lx24-amd64 2 0% 3.9G 161.1M
> 1.9G 891.3M
>> wasim04 lx24-amd64 2 2% 3.9G 2.6G
> 1.9G 124.0K
>> wasim05 lx24-amd64 2 1% 3.9G 542.3M
> 1.9G 124.0K
>> wasim06 lx24-amd64 2 0% 3.9G 1.8G
> 1.9G 0.0
>> wasim07 lx24-amd64 2 1% 3.9G 393.8M
> 1.9G 0.0
>> wasim08 lx24-amd64 2 0% 3.9G 1.6G
> 1.9G 120.0K
>> wax-centaur-22 - - 0% - - -
> -
>> waxgridqm lx24-amd64 2 0% 7.8G -
> 4.0G -
>> waxvnx01 lx24-amd64 2 5% 7.6G 5.6G
> 8.0G 1.4G
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users