Hello, Few jobs remain stuck this night, but there is nothing related to them in the 'messages' files, either on the exec host or on the master. For jobs that are ok, the master emits 2 information messages: 'removing trigger' and 'job xxx finished on host'.
After issuing a "qdel -f" on one of the stuck job, this kind of message starts to appear on the master 'messages' file: 04/10/2018 09:00:20|worker|master|E|execd@exec-host reports running job (2444835.1/master) in queue "queue@exec-host" that was not supposed to be there - killing This message appears every 40 seconds (this is the load_report_time). And on the exec host, the job files are still here. They won't be removed unless the 'sge_execd' process is restarted. Regards, Paul > Sent: Thursday, April 05, 2018 at 1:58 PM > From: "William Hay" <[email protected]> > To: "Paul Paul" <[email protected]> > Cc: [email protected] > Subject: Re: [gridengine users] Job finishes correctly but master is not > notified > > On Thu, Apr 05, 2018 at 03:38:18PM +0200, Paul Paul wrote: > > William, > > > > Thanks for your reply. > > > > In the 'messages' file of the exec host, there is nothing (the last message > > was 2 weeks ago). > > Might be worth increasing the loglevel to get more info about what is going > on there. > > William > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
