Am 15.06.2012 um 18:01 schrieb Michael Coffman:

> I am trying to update my sge_execd and sge_shepherd binaries.   Based on 
> recent emails, I figured I could drop the GE2011.11 bits into place and they 
> would work fine.    I am however having issues:
> 
> My grid environment is:
> 
> Current Version - SGE - 6.2u5
> SGE_CELL=ftcrnd
> SGE_ROOT=/opt/grid-6.2u5
> SGE_CLUSTER_NAME=ftcrnd
> 
> Binary path is /opt/grid/bin/lx24-amd64.
> 
> I had to make l link in /opt/grid/bin for linux-x64 to get things to work.

Yep, this "lx24-amd64" is compiled into the binaries.


> I used the following commands and it did indeed update live and the running 
> processes seemed happy and all seemed to be working fine:
> 
> gbits=/opt/sa/tmp/gbits
> service sgeexecd softstop
> cd /opt/grid/bin
> ln -s lx24-amd64 linux-x64
> cd lx24-amd64
> mv sge_shepherd  sge_shepherd.old
> mv sge_execd  sge_execd.old
> cp $gbits/sge_shepherd .
> cp $gbits/sge_execd .
> service sgeexecd start
> 
> Since yesterday though I have had a couple of jobs fail and put their queue 
> into an error state.
> 
> Mail from the failing job:
> Shepherd error:                                                               
>   
> 06/14/2012 21:29:37 [20339:8436]: can't open file job_pid: Permission denied 
> 
> From the qmaster messages file:
> 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host 
> cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37 
> [20339:8436]: can't open file job_pid: Permission denied
> 
> I checked a job_pid file on a currently running job on the system that had 
> the above errors, permission down the entire tree seems fine and here is the 
> job_id file:
>  
> -rw-r--r-- 1 grid  grid       6 Jun 14 17:40 job_pid                          
>   

This usually goes to the spool directory of the jobs, where the sgeadmin must 
have write access.

Under which account is the actual sgeexecd running? I get:

$ ps -e -o user,ruser,command | grep sge

sgeadmin root     /usr/sge/bin/lx24-amd64/sge_execd

-- Reuti


> Any clues?    Is the path perhaps hard coded into sge_shepherd for this file?
> 
> Thanks.
> -- 
> -MichaelC
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to