Hello all, this is my first time posting to this mailing list.

About 1% or less of our qrsh grid jobs are failing in an unusual way.

We are running Open Grid Scheduler 2011.11 on CentOS 6.5.

The small percentage of failing qrsh jobs get a non-zero exit status back to 
the submit host (exit status 1), and display this message:

Your "qrsh" request could not be scheduled, try again later.

Note, we do include the "-now n" option on the command line.

Also the qacct log shows the job as having completed successfully:

qsub_time    Thu Nov 13 14:17:47 2014
start_time   Thu Nov 13 14:21:13 2014
end_time     Thu Nov 13 14:25:15 2014
granted_pe   NONE
slots        1
failed       0
exit_status  0
ru_wallclock 242
ru_utime     226.439
ru_stime     5.383

And reviewing the working directory, it does look like the job completed 
properly.

I'm not sure how to take the next step in debugging this problem.  Any advice?

Brian Small
Northwest Logic
1100 NW Compton Drive, Ste. 100
Beaverton, OR  97006
Desk - 503-533-5800 x-320
Cell - 503-577-6869
Fax: 503-533-5900
E-mail - [email protected]<mailto:[email protected]>
Web - www.nwlogic.com<http://www.nwlogic.com/>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to