It was a network setting error. In my /etc/hosts file, I had the entries 127.0.0.1 localhost.localdomain localhost.localdomain localhost4 localhost4.localdomain4 localhost calcuserver03 ::1 localhost.localdomain localhost.localdomain localhost6 localhost6.localdomain6 localhost calcuserver03
With these entries I think, he tried to send information with the localhost.localdomain and got an error. After I change th /etc/hosts file to 127.0.0.1 localhost.localdomain localhost.localdomain localhost4 localhost4.localdomain4 localhost ::1 localhost.localdomain localhost.localdomain localhost6 localhost6.localdomain6 localhost 11.53.103.149 calcuserver03 The qmaster get the needed information from the calculation server: [root@ master ge2011.11]# qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - calcuserver03 linux-x64 32 0.00 126.0G 1.6G 50.0G 46.9M Mayby this will help other with similar problems. Regards Jochen -----Ursprüngliche Nachricht----- Von: [email protected] [mailto:[email protected]] Im Auftrag von RATH Jochen (AREVA Wind GmbH) Gesendet: Dienstag, 13. November 2012 15:32 An: Reuti Cc: [email protected] Betreff: Re: [gridengine users] New Execution Host: load_avg = -NA- Hello No, no older deamon. The error occured, because I tried to stop the sge_exced deamon with the wrong command. Regards Jochen -----Ursprüngliche Nachricht----- Von: Reuti [mailto:[email protected]] Gesendet: Dienstag, 13. November 2012 15:31 An: RATH Jochen (AREVA Wind GmbH) Cc: [email protected] Betreff: Re: AW: [gridengine users] New Execution Host: load_avg = -NA- Am 13.11.2012 um 14:25 schrieb RATH Jochen (AREVA): > Thanks for your replay. > When I look for SGE with ps, it is still running: > [jrath@ calcuserver03 tmp]$ ps aux | grep sge > rsmadmin 4203 0.0 0.0 161944 1976 ? Sl 13:04 0:01 > /data_storage/HPC/ge2011.11/bin/linux-x64/sge_execd > > In /tmp I find only two exec message, which are from my first try, when I > tried to uninstall SGE and reinstall it: > [jrath@ calcuserver03 tmp]$ cat execd_messages.4055 > 11/13/2012 12:39:10| main| calcuserver03|W|daemonize error: child exited > before sending daemonize state Is there an older daemon still running? -- Reuti > Regards > Jochen > > -----Ursprüngliche Nachricht----- > Von: Reuti [mailto:[email protected]] > Gesendet: Dienstag, 13. November 2012 14:10 > An: RATH Jochen (AREVA Wind GmbH) > Cc: [email protected] > Betreff: Re: [gridengine users] New Execution Host: load_avg = -NA- > > Hi, > > Am 13.11.2012 um 13:26 schrieb RATH Jochen (AREVA): > >> I have installed a new execution host to my existing OGE pool. Unfortunately >> I can't start jobs, because the load average won't be submitted to the >> qmaster host: >> [root@ master ge2011.11]# qstat -F la >> queuename qtype resv/used/tot. load_avg arch >> states >> --------------------------------------------------------------------------------- >> [email protected] BIP 0/0/32 -NA- -NA- >> a >> --------------------------------------------------------------------------------- >> [email protected] BIP 0/2/12 10.15 linux-x64 >> hl:load_avg=10.150000 >> --------------------------------------------------------------------------------- >> [email protected] BIP 0/0/12 0.00 linux-x64 >> hl:load_avg=0.000000 >> >> My grid consist of one master and now three execution nodes. All is >> installed on a nfs-directory /data_storage, which is stored on the master. >> The message of the calcuserver03 is: >> [root@ master calcuserver03]# cat messages >> 11/13/2012 13:04:21| main| calcuserver03|W|local configuration >> localhost.localdomain not defined - using global configuration >> 11/13/2012 13:04:21| main| calcuserver03|I|starting up OGS/GE 2011.11 >> (linux-x64) > > This message is harmless. It looks like the exechost can contact the qmaster > (to request the configuration), fine. But is the execd still running? Maybe > it crashed during startup - any file "execd..." in /tmp? I suppose, the > `qhost` output shows a similar information. > > >> On the master and calcuserver01 runs RHEL 5.8 and on the calcuserver02 and >> calcuserver03 runs RHEL 6.3. At every server is the iptables stopped and >> they are all inserted in /etc/hosts.allow. > > This is only necessary for applications using the tcp-wrapper and if > certain/all services are denied in /etc/hosts.deny by default. > > -- Reuti > >> Why can't the qmaster get information of the load_avg of the new server? >> Which information do you need further? >> >> Regards >> Jochen >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
