Thanks. Excellent idea. I will give it a try when I get in to work on Monday.
Lane On Fri, Jan 27, 2012 at 3:42 PM, Rayson Ho <[email protected]> wrote: > On Fri, Jan 27, 2012 at 2:49 PM, Lane Schwartz <[email protected]> wrote: >> I have encountered a problem where sometimes (but not always) my jobs >> ignore the -cwd or -wd flags and run in my home directory instead of >> the specified working directory. I can run the same job multiple times >> launching from the same directory, and sometimes the job correctly >> runs from the current directory, and sometimes it runs from my home >> directory. > > I ran over 100 test jobs and all of them ran in directory specified in > -cwd or -wd. How easy is it to reproduce the issue?? Is the home > directory on NFS or some kind of network or cluster storage?? > > If Grid Engine cannot change the directory to the one specified by > -cwd/-wd, then it will simply turn the job into the "Eqw" state. > > Since this is a random issue, we will need: > > 1) run a few jobs that do the following: > > - check if the current working directory is the correct one by calling > `pwd` & check the hard coded value of the supposingly correct > directory (so obviously you will need to decide the location before > you submit the jobs, since the correct value is hard coded into the > job script). > > - if the value is not correct, then email you to notify the issue, and > then sleep (I mean... the job, not you!) > > - and if the value is correct, then just exit with 0 and don't sleep > (no point in wasting the job slots). > > > 2) So assume you have jobs do not run in the "correct" directory, run: > > - qstat -j <job id> > > the "sge_o_workdir" should show you what SGE thinks which directory > the job is supposed to run in. > > - go into the $SGE_ROOT/$SGE_CELL/spool/<execution > host>/active_jobs/<job id.1> directory > > Tar up the content of the directory and send it to me, together with > the qstat -j output. > > Rayson > > > > >> >> Interestingly, though, it always outputs the stderr and stdout log >> files into the correct folder (specified by -o and -e) which is in the >> current directory. >> >> To help debug the problem, I made a sample script that simply calls >> pwd. The script is below: >> >> #!/bin/bash >> >> # Tell SGE to use bash instead of the SGE default shell >> #$ -S /bin/bash >> >> # Tell SGE to keep all current environment variables >> #$ -V >> >> # Tell SGE to run job from current working directory >> #$ -cwd >> >> # Tell SGE which queue to use >> #$ -q all.q >> >> # Tell SGE the name of this job >> #$ -N fr-en.mert >> >> # Tell SGE where to log this job >> #$ -o log/mert/fr-en/ >> #$ -e log/mert/fr-en/ >> >> # Tell SGE how much memory this job needs >> #$ -l mem_free=8G >> >> echo "pwd=`pwd`" >> echo "PWD=$PWD" >> >> >> >> I'm running OGS/GE2011.11. I also have another setup with SGE/ge6.2u6 >> - on that older setup this problem does not occur. >> >> Has anyone ever seen this type of problem? If I change the script to >> use -wd /path/to/current/dir instead of -cwd I get exactly the same >> inconsistent behavior. Likewise, it doesn't appear to matter whether I >> pass the flag at the command line or within the script, as above. >> >> Are there any grid engine or scheduler log files that I could examine >> which might be helpful in tracking down this behavior? >> >> Thanks, >> Lane Schwartz >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
