Am 13.10.2011 um 18:10 schrieb Laurent Duchesne:

> I'd like to have your input on a problem we are facing right now:
> 
> We have a small script which parses the SGE (6.2u5) accounting file
> and writes information in a SQL database. We just found out about what
> seems to be a problem in the accounting file. From man 5 accounting:
> 
> ru_wallclock
>        Difference between end_time and start_time (see above).
> 
> We use that particular field to gather statistics for our users. What
> we found out was that when the "failed" field is 37, the ru_wallclock
> field is always 0, even if the job did run. We don't know exactly
> under which circumstances this happens yet.
> 
> Here's one such entry from the accounting file:
> 
> med:r104-n7:nne-790-01:sboisver12:SRA024407-Ray-1.4.0-k31-group1:2903640:sge:0:1306781385:1307195150:1307470755:37:0:0:1023454.939168:617405.204111:0.000000:0:0:0:0:134261699:23127:0:0.000000:0:0:0:0:23568146:18934035:nne-790-ab:defaultdepartment:default:512:0:0.000000:0.000000:0.000000:-l
> h_rt=86400 -pe default 512:0.000000:NONE:0.000000:0:0
> 
> And it's qacct output:
> 
> ==============================================================
> qname        med
> hostname     r104-n7
> group        nne-790-01
> owner        sboisver12
> project      nne-790-ab
> department   defaultdepartment
> jobname      SRA024407-Ray-1.4.0-k31-group1
> jobnumber    2903640
> taskid       undefined
> account      sge
> priority     0
> qsub_time    Mon May 30 14:49:45 2011
> start_time   Sat Jun  4 09:45:50 2011
> end_time     Tue Jun  7 14:19:15 2011
> granted_pe   default
> slots        512

What is your definition of the PE? Normally you have one entry per `qrsh` call, 
or are all 512 slots allocated on one and the same machine, unless you specify 
in the PE to sum it up.

-- Reuti


> failed       37  : qmaster enforced h_rt limit
> exit_status  0
> ru_wallclock 0
> ru_utime     1023454.939
> ru_stime     617405.204
> ru_maxrss    0
> ru_ixrss     0
> ru_ismrss    0
> ru_idrss     0
> ru_isrss     0
> ru_minflt    134261699
> ru_majflt    23127
> ru_nswap     0
> ru_inblock   0
> ru_oublock   0
> ru_msgsnd    0
> ru_msgrcv    0
> ru_nsignals  0
> ru_nvcsw     23568146
> ru_nivcsw    18934035
> cpu          0.000
> mem          0.000
> io           0.000
> iow          0.000
> maxvmem      0.000
> arid         undefined
> 
> Has anyone experienced this before? Is this a known "bug/feature"?
> 
> Thanks,
> 
> --
> Laurent Duchesne
> CLUMEQ, Université Laval
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to