Hi,

Am 11.08.2014 um 16:16 schrieb Kerim Gueney:

> Hello guys,
> 
> Absolute gridengine newbie here. I'm trying to fix a problem that we have 
> with our gridengine, while our main admin is on his vacation. We are using 
> Debian stable running gridengine-master (6.2u5-7.1).

The Debian supplied packages are known to cause some issues. The usual 
suggestion is to use a the SoGE source and compile on your own or use one of 
its packages of SoGE.


> The problem is that gridengine-master won't start, at all. if I call it 
> manually it returns nothing. Setting the SGE_ND environment variable results 
> in the following output (blank error):
> 
> # /etc/init.d/gridengine-master start
> error:
> 
> #
> 
> 
> qstat shows
> 
> # qstat
> error: unable to read qmaster name: qmaster hostname in 
> "/var/lib/gridengine/default/common/act_qmaster" has zero length

During startup the "act_qmaster" file is first read and then possibly adjusted 
and filled with the name of the machine the startup was done on, mainly to 
support the case of a failover setup with two or more masters.


> adding the hosts name to the file manually doesn't help. It results in
> 
> # qstat
> error: commlib error: got select error (Connection refused)
> error: unable to send message to qmaster using port 6444 on host 
> "queuemaster": got send error

This name `queuemaster` is the name of the machine also when it's looked up by 
`hostname` or in /etc/hosts?

In /usr/sge/default/common/bootstrap is "ignore_fqdn             true" set 
(resp. you location of the file) (don't change it if it's not the case!)? I 
assume this, as "queuemaster" above has no domain attached.


> attached is an strace of output of
> 
> # strace /etc/init.d/gridengine-master start

I don't know what all these calls to Debian procedures are doing.

Hence let's try a different investigation:

1) is the name resolution working in a proper way? I.e. `hostname` gives the 
correct information; is it in short or FQDN by default?

2) Is there any file in /tmp of the "qmasterd" and attached PID with any hint?

-- Reuti


> I'm grateful for every kind of help I can get. Thank you in advance.
> 
> Best regards,
> Kerim Gueney
> <strace_gridengine.txt>_______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to