Reuti,
Some of them are array jobs, looks like we have been using $task_id for
array jobs.
The issue we are seeing are for non-array jobs.
Here is a snippet from one of the corrupted job output log file, the
numbers in between the txt lines are actually output from a different job.
Processing Haplotype 7204 of 15166 ...
Outputting Individual 450996750985279->450996750985279 ...
Processing Haplotype 7205 of 15166 ...
Processing Haplotype 7206 of 15166 ...
Outputting Individual 632999004155376->632999004155376 ...
Processing Haplotype 7207 of 15955 0.532 0.994 0.538 0.998
0.999 0.988 0.561 0.560 0.995 0.607 0.978 0.949 0.577
0.998 0.926 0.998
0.927 0.938 0.532 0.997 0.999 0.994 0.965 0.533
0.994 0.938 0.738 0.945 0.995 0.534 0.529 0.998 0.999
0.968 0.534 0.994
0.531 0.997 0.539 0.529 0.945 0.529 0.999 0.996
0.926 0.535 0.546 0.946 0.999 0.999 0.945 0.996 0.998
0.979 0.978 0.532
0.925 0.987 0.994 0.945 0.984 0.998 0.969 0.999
0.983 0.543 0.718 0.918 0.555 0.501 0.998 0.541 0.998
0.999 0.997 0.553
0.946 0.987 0.995 0.999 0.979 0.999 0.999 0.881
0.543 0.541 0.538 0.900 0.979 0.999 0.998 0.999 0.999
0.999 0.999 0.999
0.990 0.989 0.986 0.931 0.997 0.997 0.999 0.999
0.530 0.997 0.925 0.994 0.986 0.795 0.999 0.999 0.978
0.993 0.721 0.978
0.538 0.998 0.999 0.984 0.999 0.997 0.997 0.979
0.553 0.795 0.999 0.979 0.998 0.995 0.999 0.988 0.946
0.543 0.558 0.995
0.983 0.992 0.926 0.567 0.979 0.923 0.919 0.949
0.652 0.940 0.995 0.999 0.999 0.647 0.996 0.678 0.933
0.870 0.997 0.690
0.995 0.992 0.981 0.932 0.995 0.993 0.999 0.998 0.861
0.861 0.979 0.995 0.999 0.999 0.584 0.861 0.978 0.870
0.872 0.932
0.999 0.790 0.995 0.999 0.932 0.999 0.863 0. of 15166
...
Processing Haplotype 8564 of 15166 ...
Outputting Individual 770954964699120->770954964699120 ...
On Fri, May 9, 2014 at 2:46 PM, Reuti <[email protected]> wrote:
> Am 09.05.2014 um 23:29 schrieb Karun K:
>
> > Thanks Reuti.
> >
> > But how come other log files are fine and we only see this behavior on
> few output logs randomly?
>
> And all are array jobs?
>
> In case just one runs after the other, they will override the old logfile.
>
> -- Reuti
>
>
> > Shouldn't it be consistent with all other output logs too ?
> >
> >
> > On Fri, May 9, 2014 at 2:17 PM, Reuti <[email protected]>
> wrote:
> > Am 09.05.2014 um 23:04 schrieb Karun K:
> >
> > > Yes, these are array jobs with output path set to -cwd during job
> submission.
> >
> > Well, then you also have to use the $TASK_ID in the -o option to
> distinguish between different tasks.
> >
> > -- Reuti
> >
> >
> > > On Fri, May 9, 2014 at 12:20 PM, Reuti <[email protected]>
> wrote:
> > > Am 09.05.2014 um 20:18 schrieb Karun K:
> > >
> > > > Reuti,
> > > >
> > > > These are the job output logs not /var/spool/sge/qmaster/message.
> These are in user job directories with jobname.o$jobid
> > >
> > > How exactly and where are you specifying this output path: command
> line or inside the job script?
> > >
> > > Are these array jobs?
> > >
> > > -- Reuti
> > >
> > >
> > > > On Fri, May 9, 2014 at 11:02 AM, Reuti <[email protected]>
> wrote:
> > > > Hi,
> > > >
> > > > Am 09.05.2014 um 19:43 schrieb Karun K:
> > > >
> > > > > We are using OGS/GE 2011.11p1
> > > > >
> > > > > We encountered log file corruptions, in ge log files there is
> output of some other jobs written to it (in very few log files), filesystem
> is working fine, no corruptions with data files just in some ge log files
> randomly.
> > > >
> > > > What file do you refer to in detail - the
> /var/spool/sge/qmaster/messages and alike? Although it's best to have them
> local on each node, even having them in an NFS locations still means that
> only one process - the sge_exed/sge_qmaster will write to it.
> > > >
> > > > -- Reuti
> > > >
> > > > >
> > > > > Has anyone else seen this issue?
> > > > >
> > > > > Thanks!
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > [email protected]
> > > > > https://gridengine.org/mailman/listinfo/users
> > > >
> > > >
> > >
> > >
> >
> >
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users