Am 20.01.2014 um 12:14 schrieb Ian Johnson: > Reuti, > > PS Are the job exit status codes defined in errno.h or are they SGE defined?
They are in the user manual, pages 141 ff. 26 means error while opening the output/error file. -- Reuti > Ian Johnson > Software Engineer > > > Capita Translation and Interpreting > Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 > 845 367 7000 | Tel (US): +1 (800) 579-5010 > | [email protected] | Skype ID: ian.johnson_als > www.capitatranslationinterpreting.com > > > On 14 January 2014 18:34, Reuti <[email protected]> wrote: > Am 14.01.2014 um 18:27 schrieb Ian Johnson: > > > Reuti, > > > > There's no file staging installed. The job script is being copied to the > > execution host. > > Correct (for the job script itself). > > > > The output file *is* being opened in ~smartmate but it is of zero length. > > I would assume that they is not created at all in this location, only on the > nodes. Or do you mean the home directory on the nodes? > > NB: In Torque there is a file staging for the .o/.e files, but not in SGE. > > -- Reuti > > > > Thanks, > > > > Ian > > > > On Tue, 14 Jan 2014 17:18:06 -0000, Reuti <[email protected]> > > wrote: > > > >> Am 14.01.2014 um 18:04 schrieb Ian Johnson: > >> > >>> Reuti, > >>> > >>> There is no output from the script at all in the > >>> ~smartmate/job.sh.o[0-9]+ files. The home directory of the smartmate user > >>> is local disk. However, grid engine is installed on an NFS share. > >> > >> Do you have any file staging installed? Otherwise the output will not be > >> send to the real home directory of the user. Also the input files could be > >> missing on the execution host. > >> > >> -- Reuti > >> > >> > >> > >>> Is there other information you require? Is there any way to get the > >>> function call that is failing in shepherd, e.g. more verbose tracing? > >>> > >>> Thanks, > >>> > >>> Ian > >>> > >>> On Tue, 14 Jan 2014 15:19:34 -0000, Reuti <[email protected]> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> Am 14.01.2014 um 15:19 schrieb Ian Johnson: > >>>> > >>>>> I have a simple job, which echoes `date` to stdout, that I'm using to > >>>>> test an Open Grid Engine installation. Running qsub as root the job is > >>>>> run successfully. However, using another non-superuser, in this case > >>>>> smartmate user, the output from qacct -j says that the job has exited > >>>>> with exit status 11. The shepherd trace confirms this (see below). > >>>> > >>>> Do you have any output? 11 means "Resource temporarily unavailable", > >>>> which could mean it can't write to the (mounted?) home directory of the > >>>> user. How is it mount configured? > >>>> > >>>> AFAICS the user is known, as otherwise you would face a different error. > >>>> > >>>> -- Reuti > >>>> > >>>> > >>>>> Would anyone have an idea as to what is going on? Thank you. > >>>>> > >>>>> <shepherd_trace> > >>>>> 01/14/2014 14:08:56 [0:2723]: shepherd called with uid = 0, euid = 0 > >>>>> 01/14/2014 14:08:56 [0:2723]: starting up 2011.11 > >>>>> 01/14/2014 14:08:56 [0:2723]: setpgid(2723, 2723) returned 0 > >>>>> 01/14/2014 14:08:56 [0:2723]: do_core_binding: "binding" parameter not > >>>>> found in config file > >>>>> 01/14/2014 14:08:56 [0:2723]: no prolog script to start > >>>>> 01/14/2014 14:08:56 [0:2723]: parent: forked "job" with pid 2724 > >>>>> 01/14/2014 14:08:56 [0:2724]: child: starting son(job, > >>>>> /opt/capitati/ge2011.11/smartmate/spool/exec-1/job_scripts/32, 0); > >>>>> 01/14/2014 14:08:56 [0:2724]: pid=2724 pgrp=2724 sid=2724 old pgrp=2723 > >>>>> getlogin()=root > >>>>> 01/14/2014 14:08:56 [0:2723]: parent: job-pid: 2724 > >>>>> 01/14/2014 14:08:56 [0:2724]: reading passwd information for user > >>>>> 'smartmate' > >>>>> 01/14/2014 14:08:56 [0:2724]: setosjobid: uid = 0, euid = 0 > >>>>> 01/14/2014 14:08:56 [0:2724]: setting limits > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CPU setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_FSIZE setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_DATA setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_STACK setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CORE setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_VMEM/RLIMIT_AS setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_RSS setting: (soft > >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > >>>>> resulting: (soft 18446744073709551615(INFINITY), hard > >>>>> 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: setting environment > >>>>> 01/14/2014 14:08:56 [0:2724]: Initializing error file > >>>>> 01/14/2014 14:08:56 [0:2724]: switching to intermediate/target user > >>>>> 01/14/2014 14:08:56 [0:2723]: wait3 returned 2724 (status: 2816; > >>>>> WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 11) > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with exit status 11 > >>>>> 01/14/2014 14:08:56 [0:2723]: reaped "job" with pid 2724 > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited not due to signal > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with status 11 > >>>>> 01/14/2014 14:08:56 [0:2723]: now sending signal KILL to pid -2724 > >>>>> 01/14/2014 14:08:56 [0:2723]: writing usage file to "usage" > >>>>> 01/14/2014 14:08:56 [0:2723]: no tasker to notify > >>>>> 01/14/2014 14:08:56 [0:2723]: no epilog script to start > >>>>> </shepherd_trace> > >>>>> > >>>>> <job_script> > >>>>> #!/bin/bash > >>>>> # > >>>>> #$ -j y > >>>>> # > >>>>> #$ -S /bin/bash > >>>>> > >>>>> echo "Hello World" > >>>>> echo `date` > >>>>> </job_script> > >>>>> > >>>>> Ian Johnson > >>>>> Software Engineer > >>>>> > >>>>> > >>>>> Capita Translation and Interpreting > >>>>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): > >>>>> +44 845 367 7000 | Tel (US): +1 (800) 579-5010 > >>>>> | [email protected] | Skype ID: ian.johnson_als > >>>>> www.capitatranslationinterpreting.com > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> [email protected] > >>>>> https://gridengine.org/mailman/listinfo/users > >>>> > >>> > >>> > >>> -- > >>> Kind regards, > >>> > >>> Ian Johnson > >>> Software Engineer > >>> > >>> Capita Translation and Interpreting > >>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): > >>> +44 845 367 7000 | Tel (US): +1 (800) 579-5010 > >>> | [email protected] | Skype ID: ian.johnson_als > >>> www.capitatranslationinterpreting.com > >> > > > > > > -- > > Kind regards, > > > > Ian Johnson > > Software Engineer > > > > Capita Translation and Interpreting > > Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 > > 845 367 7000 | Tel (US): +1 (800) 579-5010 > > | [email protected] | Skype ID: ian.johnson_als > > www.capitatranslationinterpreting.com > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
