Hi Reuti

Thanks for your replay.
When I look for SGE with ps, it is still running:
[jrath@ calcuserver03 tmp]$ ps aux | grep sge
rsmadmin  4203  0.0  0.0 161944  1976 ?        Sl   13:04   0:01 
/data_storage/HPC/ge2011.11/bin/linux-x64/sge_execd

In /tmp I find only two exec message, which are from my first try, when I tried 
to uninstall SGE and reinstall it:
[jrath@ calcuserver03 tmp]$ cat execd_messages.4055
11/13/2012 12:39:10|  main| calcuserver03|W|daemonize error: child exited 
before sending daemonize state

Regards
  Jochen

-----Ursprüngliche Nachricht-----
Von: Reuti [mailto:[email protected]] 
Gesendet: Dienstag, 13. November 2012 14:10
An: RATH Jochen (AREVA Wind GmbH)
Cc: [email protected]
Betreff: Re: [gridengine users] New Execution Host: load_avg = -NA-

Hi,

Am 13.11.2012 um 13:26 schrieb RATH Jochen (AREVA):

> I have installed a new execution host to my existing OGE pool. Unfortunately 
> I can't start jobs, because the load average won't be submitted to the 
> qmaster host:
> [root@ master ge2011.11]# qstat -F la
> queuename                      qtype resv/used/tot. load_avg arch          
> states
> ---------------------------------------------------------------------------------
> [email protected] BIP   0/0/32         -NA-     -NA-          a
> ---------------------------------------------------------------------------------
> [email protected] BIP   0/2/12         10.15    linux-x64
>        hl:load_avg=10.150000
> ---------------------------------------------------------------------------------
> [email protected] BIP   0/0/12         0.00     linux-x64
>        hl:load_avg=0.000000
> 
> My grid consist of one master and now three execution nodes. All is installed 
> on a nfs-directory /data_storage, which is stored on the master. The message 
> of the calcuserver03 is:
> [root@ master calcuserver03]# cat messages
> 11/13/2012 13:04:21|  main| calcuserver03|W|local configuration 
> localhost.localdomain not defined - using global configuration
> 11/13/2012 13:04:21|  main| calcuserver03|I|starting up OGS/GE 2011.11 
> (linux-x64)

This message is harmless. It looks like the exechost can contact the qmaster 
(to request the configuration), fine. But is the execd still running? Maybe it 
crashed during startup - any file "execd..." in /tmp? I suppose, the `qhost` 
output shows a similar information.


> On the master and calcuserver01 runs RHEL 5.8 and on the calcuserver02 and 
> calcuserver03 runs RHEL 6.3. At every server is the iptables stopped and they 
> are all inserted in /etc/hosts.allow.

This is only necessary for applications using the tcp-wrapper and if 
certain/all services are denied in /etc/hosts.deny by default.

-- Reuti

> Why can't the qmaster get information of the load_avg of the new server? 
> Which information do you need further?
> 
> Regards
>      Jochen
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to