Hi,

Am 14.11.2014 um 00:34 schrieb Brian Small:

> Hello all, this is my first time posting to this mailing list.
>  
> About 1% or less of our qrsh grid jobs are failing in an unusual way.
>  
> We are running Open Grid Scheduler 2011.11 on CentOS 6.5.
>  
> The small percentage of failing qrsh jobs get a non-zero exit status back to 
> the submit host (exit status 1), and display this message:

What do you start by `qrsh` - a binary or a script?

This sounds like the probably started script wants to start another `qrsh`. In 
case it's a script, the first line with "#!/bin/sh -x" will list the executed 
commands.

-- Reuti

NB: The side effect of "-now n" is that the job will go to a queue of "qtype" 
set to "BATCH", while "-now y" will route to a queue with "qtype" being 
"INTERACTIVE" (the same applies when this option is used for `qsub`).


> Your "qrsh" request could not be scheduled, try again later.
>  
> Note, we do include the “-now n” option on the command line.
>  
> Also the qacct log shows the job as having completed successfully:
>  
> qsub_time    Thu Nov 13 14:17:47 2014
> start_time   Thu Nov 13 14:21:13 2014
> end_time     Thu Nov 13 14:25:15 2014
> granted_pe   NONE
> slots        1
> failed       0
> exit_status  0
> ru_wallclock 242
> ru_utime     226.439
> ru_stime     5.383
>  
> And reviewing the working directory, it does look like the job completed 
> properly.
>  
> I’m not sure how to take the next step in debugging this problem.  Any advice?
>  
> Brian Small
> Northwest Logic
> 1100 NW Compton Drive, Ste. 100
> Beaverton, OR  97006
> Desk - 503-533-5800 x-320
> Cell - 503-577-6869
> Fax: 503-533-5900
> E-mail - [email protected]
> Web - www.nwlogic.com
>  
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to