Am 20.01.2014 um 12:14 schrieb Ian Johnson:

> Reuti,
> 
> PS Are the job exit status codes defined in errno.h or are they SGE defined?

They are in the user manual, pages 141 ff. 26 means error while opening the 
output/error file.

-- Reuti

> Ian Johnson
> Software Engineer
> 
> 
> Capita Translation and Interpreting
> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 
> 845 367 7000 | Tel (US): +1 (800) 579-5010
> | [email protected] | Skype ID: ian.johnson_als
> www.capitatranslationinterpreting.com
> 
> 
> On 14 January 2014 18:34, Reuti <[email protected]> wrote:
> Am 14.01.2014 um 18:27 schrieb Ian Johnson:
> 
> > Reuti,
> >
> > There's no file staging installed. The job script is being copied to the 
> > execution host.
> 
> Correct (for the job script itself).
> 
> 
> > The output file *is* being opened in ~smartmate but it is of zero length.
> 
> I would assume that they is not created at all in this location, only on the 
> nodes. Or do you mean the home directory on the nodes?
> 
> NB: In Torque there is a file staging for the .o/.e files, but not in SGE.
> 
> -- Reuti
> 
> 
> > Thanks,
> >
> > Ian
> >
> > On Tue, 14 Jan 2014 17:18:06 -0000, Reuti <[email protected]> 
> > wrote:
> >
> >> Am 14.01.2014 um 18:04 schrieb Ian Johnson:
> >>
> >>> Reuti,
> >>>
> >>> There is no output from the script at all in the 
> >>> ~smartmate/job.sh.o[0-9]+ files. The home directory of the smartmate user 
> >>> is local disk. However, grid engine is installed on an NFS share.
> >>
> >> Do you have any file staging installed? Otherwise the output will not be 
> >> send to the real home directory of the user. Also the input files could be 
> >> missing on the execution host.
> >>
> >> -- Reuti
> >>
> >>
> >>
> >>> Is there other information you require? Is there any way to get the 
> >>> function call that is failing in shepherd, e.g. more verbose tracing?
> >>>
> >>> Thanks,
> >>>
> >>> Ian
> >>>
> >>> On Tue, 14 Jan 2014 15:19:34 -0000, Reuti <[email protected]> 
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Am 14.01.2014 um 15:19 schrieb Ian Johnson:
> >>>>
> >>>>> I have a simple job, which echoes `date` to stdout, that I'm using to 
> >>>>> test an Open Grid Engine installation. Running qsub as root the job is 
> >>>>> run successfully. However, using another non-superuser, in this case 
> >>>>> smartmate user, the output from qacct -j says that the job has exited 
> >>>>> with exit status 11. The shepherd trace confirms this (see below).
> >>>>
> >>>> Do you have any output? 11 means "Resource temporarily unavailable", 
> >>>> which could mean it can't write to the (mounted?) home directory of the 
> >>>> user. How is it mount configured?
> >>>>
> >>>> AFAICS the user is known, as otherwise you would face a different error.
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>
> >>>>> Would anyone have an idea as to what is going on? Thank you.
> >>>>>
> >>>>> <shepherd_trace>
> >>>>> 01/14/2014 14:08:56 [0:2723]: shepherd called with uid = 0, euid = 0
> >>>>> 01/14/2014 14:08:56 [0:2723]: starting up 2011.11
> >>>>> 01/14/2014 14:08:56 [0:2723]: setpgid(2723, 2723) returned 0
> >>>>> 01/14/2014 14:08:56 [0:2723]: do_core_binding: "binding" parameter not 
> >>>>> found in config file
> >>>>> 01/14/2014 14:08:56 [0:2723]: no prolog script to start
> >>>>> 01/14/2014 14:08:56 [0:2723]: parent: forked "job" with pid 2724
> >>>>> 01/14/2014 14:08:56 [0:2724]: child: starting son(job, 
> >>>>> /opt/capitati/ge2011.11/smartmate/spool/exec-1/job_scripts/32, 0);
> >>>>> 01/14/2014 14:08:56 [0:2724]: pid=2724 pgrp=2724 sid=2724 old pgrp=2723 
> >>>>> getlogin()=root
> >>>>> 01/14/2014 14:08:56 [0:2723]: parent: job-pid: 2724
> >>>>> 01/14/2014 14:08:56 [0:2724]: reading passwd information for user 
> >>>>> 'smartmate'
> >>>>> 01/14/2014 14:08:56 [0:2724]: setosjobid: uid = 0, euid = 0
> >>>>> 01/14/2014 14:08:56 [0:2724]: setting limits
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CPU setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_FSIZE setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_DATA setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_STACK setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_CORE setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: RLIMIT_RSS setting: (soft 
> >>>>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
> >>>>> resulting: (soft 18446744073709551615(INFINITY), hard 
> >>>>> 18446744073709551615(INFINITY))
> >>>>> 01/14/2014 14:08:56 [0:2724]: setting environment
> >>>>> 01/14/2014 14:08:56 [0:2724]: Initializing error file
> >>>>> 01/14/2014 14:08:56 [0:2724]: switching to intermediate/target user
> >>>>> 01/14/2014 14:08:56 [0:2723]: wait3 returned 2724 (status: 2816; 
> >>>>> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 11)
> >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with exit status 11
> >>>>> 01/14/2014 14:08:56 [0:2723]: reaped "job" with pid 2724
> >>>>> 01/14/2014 14:08:56 [0:2723]: job exited not due to signal
> >>>>> 01/14/2014 14:08:56 [0:2723]: job exited with status 11
> >>>>> 01/14/2014 14:08:56 [0:2723]: now sending signal KILL to pid -2724
> >>>>> 01/14/2014 14:08:56 [0:2723]: writing usage file to "usage"
> >>>>> 01/14/2014 14:08:56 [0:2723]: no tasker to notify
> >>>>> 01/14/2014 14:08:56 [0:2723]: no epilog script to start
> >>>>> </shepherd_trace>
> >>>>>
> >>>>> <job_script>
> >>>>> #!/bin/bash
> >>>>> #
> >>>>> #$ -j y
> >>>>> #
> >>>>> #$ -S /bin/bash
> >>>>>
> >>>>> echo "Hello World"
> >>>>> echo `date`
> >>>>> </job_script>
> >>>>>
> >>>>> Ian Johnson
> >>>>> Software Engineer
> >>>>>
> >>>>>
> >>>>> Capita Translation and Interpreting
> >>>>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): 
> >>>>> +44 845 367 7000 | Tel (US): +1 (800) 579-5010
> >>>>> | [email protected] | Skype ID: ian.johnson_als
> >>>>> www.capitatranslationinterpreting.com
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> [email protected]
> >>>>> https://gridengine.org/mailman/listinfo/users
> >>>>
> >>>
> >>>
> >>> --
> >>> Kind regards,
> >>>
> >>> Ian Johnson
> >>> Software Engineer
> >>>
> >>> Capita Translation and Interpreting
> >>> Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): 
> >>> +44 845 367 7000 | Tel (US): +1 (800) 579-5010
> >>> | [email protected] | Skype ID: ian.johnson_als
> >>> www.capitatranslationinterpreting.com
> >>
> >
> >
> > --
> > Kind regards,
> >
> > Ian Johnson
> > Software Engineer
> >
> > Capita Translation and Interpreting
> > Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 
> > 845 367 7000 | Tel (US): +1 (800) 579-5010
> > | [email protected] | Skype ID: ian.johnson_als
> > www.capitatranslationinterpreting.com
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to