On Fri, Jun 15, 2012 at 11:11 AM, Rayson Ho <[email protected]> wrote:

> Can you set "execd_params" to KEEP_ACTIVE for this host?? (See the
> manpage at this URL:
> http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html )
>
> Request the job to run in this queue/host again, and see why the
> shepherd can't open the job_pid.
>
> (And remember to unset the execd_params or else you will fill up your
> local spool dir eventually with job information.)
>
>
I can't do this on my production grid.   And I don't know how to replicate
the problem currently.   I will set things up on a test setup and try and
reproduce the issue with KEEP_ACTIVE turned on.

Is it possible to set the KEEP_ACTIVE per host?   I only see this in the
qconf -sconf


> Rayson
>
>
>
> On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman
> <[email protected]> wrote:
> > On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho <[email protected]> wrote:
> >>
> >> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman
> >> <[email protected]> wrote:
> >> > From the qmaster messages file:
> >> > 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host
> >> > cs428.ftc.avagotech.net general before job because: 06/14/2012
> 21:29:37
> >> > [20339:8436]: can't open file job_pid: Permission denied
> >> >
> >> > I checked a job_pid file on a currently running job on the system that
> >> > had
> >> > the above errors, permission down the entire tree seems fine and here
> is
> >> > the
> >> > job_id file:
> >> >
> >> > -rw-r--r-- 1 grid  grid       6 Jun 14 17:40 job_pid
> >>
> >> Is your execd spool dir on NFS or local??
> >>
> > Local.
> >
> >>
> >> Also, does it happen to all nodes or just a node or queue?
> >>
> >
> > Happened on 2 different nodes.   Not all jobs caused this.
> >
> >>
> >> Rayson
> >>
> >>
> >>
> >> >
> >> > Any clues?    Is the path perhaps hard coded into sge_shepherd for
> this
> >> > file?
> >> >
> >> > Thanks.
> >> > --
> >> > -MichaelC
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > [email protected]
> >> > https://gridengine.org/mailman/listinfo/users
> >> >
> >
> >
> >
> >
> > --
> > -MichaelC
>



-- 
-MichaelC
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to