I'm getting messages like the following popping up in the grid engine messages file (SoGE 8.1.8): 02/16/2016 08:20:07|worker|nfs-2|E|invalid task number 0 for job 496841 in "ORT_ptickets" order 02/16/2016 08:20:07|worker|nfs-2|W|Skipping remaining 352 orders 02/16/2016 08:20:07|worker|nfs-2|E|reinitialization of "scheduler"
They persist over scheduling cycles but can generally be cleared by applying a qhold on the job in question followed by a qrls. They seem[1] to prevent scheduling for subsequent jobs and quite often once I've cleared one job another job submitted by the same user at about the same time will start showing up in the logs with the same messages. Has anyone seen something similar and/or know what causes it? [1]'Seem' as in this is happening on our production cluster and my priority has been to clear the blockage rather than investigate in detail.
signature.asc
Description: Digital signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users