On Fri, Jun 15, 2012 at 10:22 AM, Reuti <[email protected]> wrote:

> Am 15.06.2012 um 18:01 schrieb Michael Coffman:
>
> > I am trying to update my sge_execd and sge_shepherd binaries.   Based on
> recent emails, I figured I could drop the GE2011.11 bits into place and
> they would work fine.    I am however having issues:
> >
> > My grid environment is:
> >
> > Current Version - SGE - 6.2u5
> > SGE_CELL=ftcrnd
> > SGE_ROOT=/opt/grid-6.2u5
> > SGE_CLUSTER_NAME=ftcrnd
> >
> > Binary path is /opt/grid/bin/lx24-amd64.
> >
> > I had to make l link in /opt/grid/bin for linux-x64 to get things to
> work.
>
> Yep, this "lx24-amd64" is compiled into the binaries.
>
>
> > I used the following commands and it did indeed update live and the
> running processes seemed happy and all seemed to be working fine:
> >
> > gbits=/opt/sa/tmp/gbits
> > service sgeexecd softstop
> > cd /opt/grid/bin
> > ln -s lx24-amd64 linux-x64
> > cd lx24-amd64
> > mv sge_shepherd  sge_shepherd.old
> > mv sge_execd  sge_execd.old
> > cp $gbits/sge_shepherd .
> > cp $gbits/sge_execd .
> > service sgeexecd start
> >
> > Since yesterday though I have had a couple of jobs fail and put their
> queue into an error state.
> >
> > Mail from the failing job:
> > Shepherd error:
> > 06/14/2012 21:29:37 [20339:8436]: can't open file job_pid: Permission
> denied
> >
> > From the qmaster messages file:
> > 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host
> cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37
> [20339:8436]: can't open file job_pid: Permission denied
> >
> > I checked a job_pid file on a currently running job on the system that
> had the above errors, permission down the entire tree seems fine and here
> is the job_id file:
> >
> > -rw-r--r-- 1 grid  grid       6 Jun 14 17:40 job_pid
>
> This usually goes to the spool directory of the jobs, where the sgeadmin
> must have write access.
>
> Under which account is the actual sgeexecd running? I get:
>
> $ ps -e -o user,ruser,command | grep sge
>
> sgeadmin root     /usr/sge/bin/lx24-amd64/sge_execd
>
> I get.

grid     root     /opt/grid-6.2u5/bin/lx24-amd64/sge_execd

Also:
$ ls -lR /opt/grid/ftcrnd/spool/cs249/
/opt/grid/ftcrnd/spool/cs249/:
total 44
drwxr-xr-x 2 grid grid  4096 Jun 14 20:57 active_jobs/
-rw-r--r-- 1 grid grid     5 Dec 27 00:32 execd.pid
drwxr-xr-x 2 grid grid  4096 Jun 14 20:57 job_scripts/
drwxr-xr-x 2 grid grid  4096 Jun 14 20:57 jobs/
-rw-r--r-- 1 grid grid 25981 Jun 13 13:44 messages



> -- Reuti
>
>
> > Any clues?    Is the path perhaps hard coded into sge_shepherd for this
> file?
> >
> > Thanks.
> > --
> > -MichaelC
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>


-- 
-MichaelC
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to