Am 10.05.2014 um 00:18 schrieb Karun K: > Reuti, > > Some of them are array jobs, looks like we have been using $task_id for array > jobs. > The issue we are seeing are for non-array jobs. > > Here is a snippet from one of the corrupted job output log file, the numbers > in between the txt lines are actually output from a different job.
How exactly and where are you specifying this output path: command line or inside the job script? What does the job script look like? -- Reuti > Processing Haplotype 7204 of 15166 ... > Outputting Individual 450996750985279->450996750985279 ... > Processing Haplotype 7205 of 15166 ... > Processing Haplotype 7206 of 15166 ... > Outputting Individual 632999004155376->632999004155376 ... > Processing Haplotype 7207 of 15955 0.532 0.994 0.538 0.998 0.999 > 0.988 0.561 0.560 0.995 0.607 0.978 0.949 0.577 0.998 > 0.926 0.998 > 0.927 0.938 0.532 0.997 0.999 0.994 0.965 0.533 0.994 > 0.938 0.738 0.945 0.995 0.534 0.529 0.998 0.999 0.968 > 0.534 0.994 > 0.531 0.997 0.539 0.529 0.945 0.529 0.999 0.996 0.926 > 0.535 0.546 0.946 0.999 0.999 0.945 0.996 0.998 0.979 > 0.978 0.532 > 0.925 0.987 0.994 0.945 0.984 0.998 0.969 0.999 0.983 > 0.543 0.718 0.918 0.555 0.501 0.998 0.541 0.998 0.999 > 0.997 0.553 > 0.946 0.987 0.995 0.999 0.979 0.999 0.999 0.881 0.543 > 0.541 0.538 0.900 0.979 0.999 0.998 0.999 0.999 0.999 > 0.999 0.999 > 0.990 0.989 0.986 0.931 0.997 0.997 0.999 0.999 0.530 > 0.997 0.925 0.994 0.986 0.795 0.999 0.999 0.978 0.993 > 0.721 0.978 > 0.538 0.998 0.999 0.984 0.999 0.997 0.997 0.979 0.553 > 0.795 0.999 0.979 0.998 0.995 0.999 0.988 0.946 0.543 > 0.558 0.995 > 0.983 0.992 0.926 0.567 0.979 0.923 0.919 0.949 0.652 > 0.940 0.995 0.999 0.999 0.647 0.996 0.678 0.933 0.870 > 0.997 0.690 > 0.995 0.992 0.981 0.932 0.995 0.993 0.999 0.998 0.861 0.861 > 0.979 0.995 0.999 0.999 0.584 0.861 0.978 0.870 0.872 > 0.932 > 0.999 0.790 0.995 0.999 0.932 0.999 0.863 0. of 15166 ... > Processing Haplotype 8564 of 15166 ... > Outputting Individual 770954964699120->770954964699120 ... > > > > > On Fri, May 9, 2014 at 2:46 PM, Reuti <[email protected]> wrote: > Am 09.05.2014 um 23:29 schrieb Karun K: > > > Thanks Reuti. > > > > But how come other log files are fine and we only see this behavior on few > > output logs randomly? > > And all are array jobs? > > In case just one runs after the other, they will override the old logfile. > > -- Reuti > > > > Shouldn't it be consistent with all other output logs too ? > > > > > > On Fri, May 9, 2014 at 2:17 PM, Reuti <[email protected]> wrote: > > Am 09.05.2014 um 23:04 schrieb Karun K: > > > > > Yes, these are array jobs with output path set to -cwd during job > > > submission. > > > > Well, then you also have to use the $TASK_ID in the -o option to > > distinguish between different tasks. > > > > -- Reuti > > > > > > > On Fri, May 9, 2014 at 12:20 PM, Reuti <[email protected]> wrote: > > > Am 09.05.2014 um 20:18 schrieb Karun K: > > > > > > > Reuti, > > > > > > > > These are the job output logs not /var/spool/sge/qmaster/message. These > > > > are in user job directories with jobname.o$jobid > > > > > > How exactly and where are you specifying this output path: command line > > > or inside the job script? > > > > > > Are these array jobs? > > > > > > -- Reuti > > > > > > > > > > On Fri, May 9, 2014 at 11:02 AM, Reuti <[email protected]> > > > > wrote: > > > > Hi, > > > > > > > > Am 09.05.2014 um 19:43 schrieb Karun K: > > > > > > > > > We are using OGS/GE 2011.11p1 > > > > > > > > > > We encountered log file corruptions, in ge log files there is output > > > > > of some other jobs written to it (in very few log files), filesystem > > > > > is working fine, no corruptions with data files just in some ge log > > > > > files randomly. > > > > > > > > What file do you refer to in detail - the > > > > /var/spool/sge/qmaster/messages and alike? Although it's best to have > > > > them local on each node, even having them in an NFS locations still > > > > means that only one process - the sge_exed/sge_qmaster will write to it. > > > > > > > > -- Reuti > > > > > > > > > > > > > > Has anyone else seen this issue? > > > > > > > > > > Thanks! > > > > > _______________________________________________ > > > > > users mailing list > > > > > [email protected] > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
