oops meant to send to the list... ---------- Forwarded message ---------- From: William Hay <[email protected]> Date: 20 June 2013 14:37 Subject: Re: [gridengine users] qmaster crashes To: jan roels <[email protected]>
On 20 June 2013 13:47, jan roels <[email protected]> wrote: > Hi, > > A while ago my server crashed due a power outage and now the qmaster > won't keep running. This is the output from the > /var/spool/gridengine/qmaster/messages file > > 06/20/2013 14:41:19| main|server1|I|starting up GE 6.2u5 (lx26-amd64) > 06/20/2013 > 14:41:24|worker|server1|E|cqueue_list_locate_qinstance("(null)@(null)"): > cqueue == NULL("(null)", "(null)", 1, 0 > 06/20/2013 14:41:24|worker|server1|E|writing job finish information: can't > locate queue "(null)@(null)" > 06/20/2013 14:41:24|worker|server1|W|job 585.27 failed on host <unknown > host> before writing exit_status because: shepherd exited w$ > 06/20/2013 14:41:24|worker|server1|C|!!!!!!!!!! got NULL element for > QU_rerun !!!!!!!!!! > > Restarting the service won't help. Does somebody know a good way to fix > this? > > Never seen this exact message AFAICR but if it complains about the same job every time you start the qmaster I would stop it and then remove the job from the job spool by hand (at least with classic spooling that's under $SGE_ROOT/$SGE_CELL/spool/jobs). William > Kind regards, > > Jan > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
