Thanks. Excellent idea. I will give it a try when I get in to work on Monday.

Lane

On Fri, Jan 27, 2012 at 3:42 PM, Rayson Ho <[email protected]> wrote:
> On Fri, Jan 27, 2012 at 2:49 PM, Lane Schwartz <[email protected]> wrote:
>> I have encountered a problem where sometimes (but not always) my jobs
>> ignore the -cwd or -wd flags and run in my home directory instead of
>> the specified working directory. I can run the same job multiple times
>> launching from the same directory, and sometimes the job correctly
>> runs from the current directory, and sometimes it runs from my home
>> directory.
>
> I ran over 100 test jobs and all of them ran in directory specified in
> -cwd or -wd. How easy is it to reproduce the issue?? Is the home
> directory on NFS or some kind of network or cluster storage??
>
> If Grid Engine cannot change the directory to the one specified by
> -cwd/-wd, then it will simply turn the job into the "Eqw" state.
>
> Since this is a random issue, we will need:
>
> 1) run a few jobs that do the following:
>
> - check if the current working directory is the correct one by calling
> `pwd` & check the hard coded value of the supposingly correct
> directory (so obviously you will need to decide the location before
> you submit the jobs, since the correct value is hard coded into the
> job script).
>
> - if the value is not correct, then email you to notify the issue, and
> then sleep (I mean... the job, not you!)
>
> - and if the value is correct, then just exit with 0 and don't sleep
> (no point in wasting the job slots).
>
>
> 2) So assume you have jobs do not run in the "correct" directory, run:
>
> - qstat -j <job id>
>
> the "sge_o_workdir" should show you what SGE thinks which directory
> the job is supposed to run in.
>
> - go into the $SGE_ROOT/$SGE_CELL/spool/<execution
> host>/active_jobs/<job id.1> directory
>
> Tar up the content of the directory and send it to me, together with
> the qstat -j output.
>
> Rayson
>
>
>
>
>>
>> Interestingly, though, it always outputs the stderr and stdout log
>> files into the correct folder (specified by -o and -e) which is in the
>> current directory.
>>
>> To help debug the problem, I made a sample script that simply calls
>> pwd. The script is below:
>>
>> #!/bin/bash
>>
>> # Tell SGE to use bash instead of the SGE default shell
>> #$ -S /bin/bash
>>
>> # Tell SGE to keep all current environment variables
>> #$ -V
>>
>> # Tell SGE to run job from current working directory
>> #$ -cwd
>>
>> # Tell SGE which queue to use
>> #$ -q all.q
>>
>> # Tell SGE the name of this job
>> #$ -N fr-en.mert
>>
>> # Tell SGE where to log this job
>> #$ -o log/mert/fr-en/
>> #$ -e log/mert/fr-en/
>>
>> # Tell SGE how much memory this job needs
>> #$ -l mem_free=8G
>>
>> echo "pwd=`pwd`"
>> echo "PWD=$PWD"
>>
>>
>>
>> I'm running OGS/GE2011.11. I also have another setup with SGE/ge6.2u6
>> - on that older setup this problem does not occur.
>>
>> Has anyone ever seen this type of problem? If I change the script to
>> use -wd /path/to/current/dir instead of -cwd I get exactly the same
>> inconsistent behavior. Likewise, it doesn't appear to matter whether I
>> pass the flag at the command line or within the script, as above.
>>
>> Are there any grid engine or scheduler log files that I could examine
>> which might be helpful in tracking down this behavior?
>>
>> Thanks,
>> Lane Schwartz
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to