On Fri, Jun 15, 2012 at 10:22 AM, Reuti <[email protected]> wrote:
> Am 15.06.2012 um 18:01 schrieb Michael Coffman: > > > I am trying to update my sge_execd and sge_shepherd binaries. Based on > recent emails, I figured I could drop the GE2011.11 bits into place and > they would work fine. I am however having issues: > > > > My grid environment is: > > > > Current Version - SGE - 6.2u5 > > SGE_CELL=ftcrnd > > SGE_ROOT=/opt/grid-6.2u5 > > SGE_CLUSTER_NAME=ftcrnd > > > > Binary path is /opt/grid/bin/lx24-amd64. > > > > I had to make l link in /opt/grid/bin for linux-x64 to get things to > work. > > Yep, this "lx24-amd64" is compiled into the binaries. > > > > I used the following commands and it did indeed update live and the > running processes seemed happy and all seemed to be working fine: > > > > gbits=/opt/sa/tmp/gbits > > service sgeexecd softstop > > cd /opt/grid/bin > > ln -s lx24-amd64 linux-x64 > > cd lx24-amd64 > > mv sge_shepherd sge_shepherd.old > > mv sge_execd sge_execd.old > > cp $gbits/sge_shepherd . > > cp $gbits/sge_execd . > > service sgeexecd start > > > > Since yesterday though I have had a couple of jobs fail and put their > queue into an error state. > > > > Mail from the failing job: > > Shepherd error: > > 06/14/2012 21:29:37 [20339:8436]: can't open file job_pid: Permission > denied > > > > From the qmaster messages file: > > 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host > cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37 > [20339:8436]: can't open file job_pid: Permission denied > > > > I checked a job_pid file on a currently running job on the system that > had the above errors, permission down the entire tree seems fine and here > is the job_id file: > > > > -rw-r--r-- 1 grid grid 6 Jun 14 17:40 job_pid > > This usually goes to the spool directory of the jobs, where the sgeadmin > must have write access. > > Under which account is the actual sgeexecd running? I get: > > $ ps -e -o user,ruser,command | grep sge > > sgeadmin root /usr/sge/bin/lx24-amd64/sge_execd > > I get. grid root /opt/grid-6.2u5/bin/lx24-amd64/sge_execd Also: $ ls -lR /opt/grid/ftcrnd/spool/cs249/ /opt/grid/ftcrnd/spool/cs249/: total 44 drwxr-xr-x 2 grid grid 4096 Jun 14 20:57 active_jobs/ -rw-r--r-- 1 grid grid 5 Dec 27 00:32 execd.pid drwxr-xr-x 2 grid grid 4096 Jun 14 20:57 job_scripts/ drwxr-xr-x 2 grid grid 4096 Jun 14 20:57 jobs/ -rw-r--r-- 1 grid grid 25981 Jun 13 13:44 messages > -- Reuti > > > > Any clues? Is the path perhaps hard coded into sge_shepherd for this > file? > > > > Thanks. > > -- > > -MichaelC > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > > -- -MichaelC
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
