Few jobs remain stuck this night, but there is nothing related to them in the 
'messages' files, either on the exec host or on the master. For jobs that are 
ok, the master emits 2 information messages: 'removing trigger' and 'job xxx 
finished on host'.

After issuing a "qdel -f" on one of the stuck job, this kind of message starts 
to appear on the master 'messages' file:
04/10/2018 09:00:20|worker|master|E|execd@exec-host reports running job 
(2444835.1/master) in queue "queue@exec-host" that was not supposed to be there 
- killing

This message appears every 40 seconds (this is the load_report_time). And on 
the exec host, the job files are still here. They won't be removed unless the 
'sge_execd' process is restarted.



> Sent: Thursday, April 05, 2018 at 1:58 PM
> From: "William Hay" <w....@ucl.ac.uk>
> To: "Paul Paul" <pot94...@clerk.com>
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Job finishes correctly but master is not 
> notified
> On Thu, Apr 05, 2018 at 03:38:18PM +0200, Paul Paul wrote:
> > William,
> > 
> > Thanks for your reply.
> > 
> > In the 'messages' file of the exec host, there is nothing (the last message 
> > was 2 weeks ago).
> Might be worth increasing the loglevel to get more info about what is going 
> on there. 
> William
users mailing list

Reply via email to