Reuti, PS Are the job exit status codes defined in errno.h or are they SGE defined?
Ian Johnson Software Engineer Capita Translation and Interpreting Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 845 367 7000 | Tel (US): +1 (800) 579-5010 | [email protected] | Skype ID: ian.johnson_als www.capitatranslationinterpreting.com On 14 January 2014 18:34, Reuti <[email protected]> wrote: > Am 14.01.2014 um 18:27 schrieb Ian Johnson: > > > Reuti, > > > > There's no file staging installed. The job script is being copied to the > execution host. > > Correct (for the job script itself). > > > > The output file *is* being opened in ~smartmate but it is of zero length. > > I would assume that they is not created at all in this location, only on > the nodes. Or do you mean the home directory on the nodes? > > NB: In Torque there is a file staging for the .o/.e files, but not in SGE. > > -- Reuti > > > > Thanks, > > > > Ian > > > > On Tue, 14 Jan 2014 17:18:06 -0000, Reuti <[email protected]> > wrote: > > > >> Am 14.01.2014 um 18:04 schrieb Ian Johnson: > >> > >>> Reuti, > >>> > >>> There is no output from the script at all in the > ~smartmate/job.sh.o[0-9]+ files. The home directory of the smartmate user > is local disk. However, grid engine is installed on an NFS share. > >> > >> Do you have any file staging installed? Otherwise the output will not > be send to the real home directory of the user. Also the input files could > be missing on the execution host. > >> > >> -- Reuti > >> > >> > >> > >>> Is there other information you require? Is there any way to get the > function call that is failing in shepherd, e.g. more verbose tracing? > >>> > >>> Thanks, > >>> > >>> Ian > >>> > >>> On Tue, 14 Jan 2014 15:19:34 -0000, Reuti <[email protected]> > wrote: > >>> > >>>> Hi, > >>>> > >>>> Am 14.01.2014 um 15:19 schrieb Ian Johnson: > >>>> > >>>>> I have a simple job, which echoes `date` to stdout, that I'm using > to test an Open Grid Engine installation. Running qsub as root the job is > run successfully. However, using another non-superuser, in this case > smartmate user, the output from qacct -j says that the job has exited with > exit status 11. The shepherd trace confirms this (see below). > >>>> > >>>> Do you have any output? 11 means "Resource temporarily unavailable", > which could mean it can't write to the (mounted?) home directory of the > user. How is it mount configured? > >>>> > >>>> AFAICS the user is known, as otherwise you would face a different > error. > >>>> > >>>> -- Reuti > >>>> > >>>> > >>>>> Would anyone have an idea as to what is going on? Thank you. > >>>>> > >>>>> <shepherd_trace> > >>>>> 01/14/2014 14:08:56 [0:2723]: shepherd called with uid = 0, euid = 0 > >>>>> 01/14/2014 14:08:56 [0:2723]: starting up 2011.11 > >>>>> 01/14/2014 14:08:56 [0:2723]: setpgid(2723, 2723) returned 0 > >>>>> 01/14/2014 14:08:56 [0:2723]: do_core_binding: "binding" parameter > not found in config file > >>>>> 01/14/2014 14:08:56 [0:2723]: no prolog script to start > >>>>> 01/14/2014 14:08:56 [0:2723]: parent: forked "job" with pid 2724 > >>>>> 01/14/2014 14:08:56 [0:2724]: child: starting son(job, > /opt/capitati/ge2011.11/smartmate/spool/exec-1/job_scripts/32, 0); > >>>>> 01/14/2014 14:08:56 [0:2724]: pid=2724 pgrp=2724 sid=2724 old > pgrp=2723 getlogin()=root > >>>>> 01/14/2014 14:08:56 [0:2723]: parent: job-pid: 2724 > >>>>> 01/14/2014 14:08:56 [0:2724]: reading passwd information for user > 'smartmate' > >>>>> 01/14/2014 14:08:56 [0:2724]: setosjobid: uid = 0, euid = 0 > >>>>> 01/14/2014 14:08:56 [0:2724]: setting limits > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CPU setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_FSIZE setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_DATA setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_STACK setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CORE setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_VMEM/RLIMIT_AS setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_RSS setting: (soft > 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) > resulting: (soft 18446744073709551615(INFINITY), hard > 18446744073709551615(INFINITY)) > >>>>> 01/14/2014 14:08:56 [0:2724]: setting environment > >>>>> 01/14/2014 14:08:56 [0:2724]: Initializing error file > >>>>> 01/14/2014 14:08:56 [0:2724]: switching to intermediate/target user > >>>>> 01/14/2014 14:08:56 [0:2723]: wait3 returned 2724 (status: 2816; > WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 11) > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with exit status 11 > >>>>> 01/14/2014 14:08:56 [0:2723]: reaped "job" with pid 2724 > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited not due to signal > >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with status 11 > >>>>> 01/14/2014 14:08:56 [0:2723]: now sending signal KILL to pid -2724 > >>>>> 01/14/2014 14:08:56 [0:2723]: writing usage file to "usage" > >>>>> 01/14/2014 14:08:56 [0:2723]: no tasker to notify > >>>>> 01/14/2014 14:08:56 [0:2723]: no epilog script to start > >>>>> </shepherd_trace> > >>>>> > >>>>> <job_script> > >>>>> #!/bin/bash > >>>>> # > >>>>> #$ -j y > >>>>> # > >>>>> #$ -S /bin/bash > >>>>> > >>>>> echo "Hello World" > >>>>> echo `date` > >>>>> </job_script> > >>>>> > >>>>> Ian Johnson > >>>>> Software Engineer > >>>>> > >>>>> > >>>>> Capita Translation and Interpreting > >>>>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel > (UK): +44 845 367 7000 | Tel (US): +1 (800) 579-5010 > >>>>> | [email protected] | Skype ID: ian.johnson_als > >>>>> www.capitatranslationinterpreting.com > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> [email protected] > >>>>> https://gridengine.org/mailman/listinfo/users > >>>> > >>> > >>> > >>> -- > >>> Kind regards, > >>> > >>> Ian Johnson > >>> Software Engineer > >>> > >>> Capita Translation and Interpreting > >>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 > 845 367 7000 | Tel (US): +1 (800) 579-5010 > >>> | [email protected] | Skype ID: ian.johnson_als > >>> www.capitatranslationinterpreting.com > >> > > > > > > -- > > Kind regards, > > > > Ian Johnson > > Software Engineer > > > > Capita Translation and Interpreting > > Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 > 845 367 7000 | Tel (US): +1 (800) 579-5010 > > | [email protected] | Skype ID: ian.johnson_als > > www.capitatranslationinterpreting.com > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
