Hi Jan, Do you have a snapshot of your configuration area? Do you have a test cell/cluster by chance set up in the same install area? If you do, then fire up a master for the test cluster and add an exec host if not already there. See if this works. If it does, then at least you can start the process of looking into detail at your configuration and spooling area. You may have a corrupted configuration and/or spooling area for your production cell/cluster which can lead to a crippled scheduler knowing about current and/or past jobs....and possibly the current config.
Hope this helps... Ed -----Original Message----- From: jan roels [mailto:[email protected]] Sent: Thursday, June 20, 2013 08:47 AM To: [email protected] Subject: [gridengine users] qmaster crashes Hi, A while ago my server crashed due a power outage and now the qmaster won't keep running. This is the output from the /var/spool/gridengine/qmaster/messages file 06/20/2013 14:41:19| main|server1|I|starting up GE 6.2u5 (lx26-amd64) 06/20/2013 14:41:24|worker|server1|E|cqueue_list_locate_qinstance("(null)@(null)"): cqueue == NULL("(null)", "(null)", 1, 0 06/20/2013 14:41:24|worker|server1|E|writing job finish information: can't locate queue "(null)@(null)" 06/20/2013 14:41:24|worker|server1|W|job 585.27 failed on host <unknown host> before writing exit_status because: shepherd exited w$ 06/20/2013 14:41:24|worker|server1|C|!!!!!!!!!! got NULL element for QU_rerun !!!!!!!!!! Restarting the service won't help. Does somebody know a good way to fix this? Kind regards, Jan
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
