oops meant to send to the list...

---------- Forwarded message ----------
From: William Hay <[email protected]>
Date: 20 June 2013 14:37
Subject: Re: [gridengine users] qmaster crashes
To: jan roels <[email protected]>





On 20 June 2013 13:47, jan roels <[email protected]> wrote:

>  Hi,
>
>  A while ago my server crashed due a power outage and now the qmaster
> won't keep running. This is the output from the
> /var/spool/gridengine/qmaster/messages file
>
> 06/20/2013 14:41:19|  main|server1|I|starting up GE 6.2u5 (lx26-amd64)
> 06/20/2013
> 14:41:24|worker|server1|E|cqueue_list_locate_qinstance("(null)@(null)"):
> cqueue == NULL("(null)", "(null)", 1, 0
> 06/20/2013 14:41:24|worker|server1|E|writing job finish information: can't
> locate queue "(null)@(null)"
> 06/20/2013 14:41:24|worker|server1|W|job 585.27 failed on host <unknown
> host> before writing exit_status because: shepherd exited w$
> 06/20/2013 14:41:24|worker|server1|C|!!!!!!!!!! got NULL element for
> QU_rerun !!!!!!!!!!
>
>  Restarting the service won't help. Does somebody know a good way to fix
> this?
>
> Never seen this exact message AFAICR but if it complains about the same
job every time you start the qmaster I would stop it and then remove the
job from the job spool by hand (at least with classic spooling that's under
$SGE_ROOT/$SGE_CELL/spool/jobs).

William


>  Kind regards,
>
>  Jan
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to