Am 12.05.2014 um 20:01 schrieb Karun K: > It's writing output to job submission directory (default behavior) which > works for us. > Regarding using -V for shell scripts, I need to consult my engineers about > it. Other than exporting current environment variables, is there any other > difference ?
Did we discuss -V below? -V will also work for binaries AFAICS. -- Reuti > Thanks! > > > > On Sat, May 10, 2014 at 5:05 AM, Reuti <[email protected]> wrote: > Am 10.05.2014 um 01:18 schrieb Karun K: > > > Here is the job script, > > > > qsub -N myusername$N -l h_vmem=5.0G ../job1.sh my1$N > > > > from .sge_request > > # default SGE options > > -j y -cwd -b y > > # -j y -cwd > > # -cwd > > I don't see a -o Option here - how does ../job1.sh decide where the output > should go to; why are you submitting a script as binary? > > -- Reuti > > > > On Fri, May 9, 2014 at 3:35 PM, Reuti <[email protected]> wrote: > > Am 10.05.2014 um 00:18 schrieb Karun K: > > > > > Reuti, > > > > > > Some of them are array jobs, looks like we have been using $task_id for > > > array jobs. > > > The issue we are seeing are for non-array jobs. > > > > > > Here is a snippet from one of the corrupted job output log file, the > > > numbers in between the txt lines are actually output from a different job. > > > > How exactly and where are you specifying this output path: command line or > > inside the job script? > > > > What does the job script look like? > > > > -- Reuti > > > > > > > Processing Haplotype 7204 of 15166 ... > > > Outputting Individual 450996750985279->450996750985279 ... > > > Processing Haplotype 7205 of 15166 ... > > > Processing Haplotype 7206 of 15166 ... > > > Outputting Individual 632999004155376->632999004155376 ... > > > Processing Haplotype 7207 of 15955 0.532 0.994 0.538 0.998 > > > 0.999 0.988 0.561 0.560 0.995 0.607 0.978 0.949 0.577 > > > 0.998 0.926 0.998 > > > 0.927 0.938 0.532 0.997 0.999 0.994 0.965 0.533 > > > 0.994 0.938 0.738 0.945 0.995 0.534 0.529 0.998 0.999 > > > 0.968 0.534 0.994 > > > 0.531 0.997 0.539 0.529 0.945 0.529 0.999 0.996 > > > 0.926 0.535 0.546 0.946 0.999 0.999 0.945 0.996 0.998 > > > 0.979 0.978 0.532 > > > 0.925 0.987 0.994 0.945 0.984 0.998 0.969 0.999 > > > 0.983 0.543 0.718 0.918 0.555 0.501 0.998 0.541 0.998 > > > 0.999 0.997 0.553 > > > 0.946 0.987 0.995 0.999 0.979 0.999 0.999 0.881 > > > 0.543 0.541 0.538 0.900 0.979 0.999 0.998 0.999 0.999 > > > 0.999 0.999 0.999 > > > 0.990 0.989 0.986 0.931 0.997 0.997 0.999 0.999 > > > 0.530 0.997 0.925 0.994 0.986 0.795 0.999 0.999 0.978 > > > 0.993 0.721 0.978 > > > 0.538 0.998 0.999 0.984 0.999 0.997 0.997 0.979 > > > 0.553 0.795 0.999 0.979 0.998 0.995 0.999 0.988 0.946 > > > 0.543 0.558 0.995 > > > 0.983 0.992 0.926 0.567 0.979 0.923 0.919 0.949 > > > 0.652 0.940 0.995 0.999 0.999 0.647 0.996 0.678 0.933 > > > 0.870 0.997 0.690 > > > 0.995 0.992 0.981 0.932 0.995 0.993 0.999 0.998 0.861 > > > 0.861 0.979 0.995 0.999 0.999 0.584 0.861 0.978 0.870 > > > 0.872 0.932 > > > 0.999 0.790 0.995 0.999 0.932 0.999 0.863 0. of 15166 > > > ... > > > Processing Haplotype 8564 of 15166 ... > > > Outputting Individual 770954964699120->770954964699120 ... > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 2:46 PM, Reuti <[email protected]> wrote: > > > Am 09.05.2014 um 23:29 schrieb Karun K: > > > > > > > Thanks Reuti. > > > > > > > > But how come other log files are fine and we only see this behavior on > > > > few output logs randomly? > > > > > > And all are array jobs? > > > > > > In case just one runs after the other, they will override the old logfile. > > > > > > -- Reuti > > > > > > > > > > Shouldn't it be consistent with all other output logs too ? > > > > > > > > > > > > On Fri, May 9, 2014 at 2:17 PM, Reuti <[email protected]> > > > > wrote: > > > > Am 09.05.2014 um 23:04 schrieb Karun K: > > > > > > > > > Yes, these are array jobs with output path set to -cwd during job > > > > > submission. > > > > > > > > Well, then you also have to use the $TASK_ID in the -o option to > > > > distinguish between different tasks. > > > > > > > > -- Reuti > > > > > > > > > > > > > On Fri, May 9, 2014 at 12:20 PM, Reuti <[email protected]> > > > > > wrote: > > > > > Am 09.05.2014 um 20:18 schrieb Karun K: > > > > > > > > > > > Reuti, > > > > > > > > > > > > These are the job output logs not /var/spool/sge/qmaster/message. > > > > > > These are in user job directories with jobname.o$jobid > > > > > > > > > > How exactly and where are you specifying this output path: command > > > > > line or inside the job script? > > > > > > > > > > Are these array jobs? > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 11:02 AM, Reuti <[email protected]> > > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > Am 09.05.2014 um 19:43 schrieb Karun K: > > > > > > > > > > > > > We are using OGS/GE 2011.11p1 > > > > > > > > > > > > > > We encountered log file corruptions, in ge log files there is > > > > > > > output of some other jobs written to it (in very few log files), > > > > > > > filesystem is working fine, no corruptions with data files just > > > > > > > in some ge log files randomly. > > > > > > > > > > > > What file do you refer to in detail - the > > > > > > /var/spool/sge/qmaster/messages and alike? Although it's best to > > > > > > have them local on each node, even having them in an NFS locations > > > > > > still means that only one process - the sge_exed/sge_qmaster will > > > > > > write to it. > > > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > > > > > Has anyone else seen this issue? > > > > > > > > > > > > > > Thanks! > > > > > > > _______________________________________________ > > > > > > > users mailing list > > > > > > > [email protected] > > > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
