Am 13.10.2011 um 18:10 schrieb Laurent Duchesne: > I'd like to have your input on a problem we are facing right now: > > We have a small script which parses the SGE (6.2u5) accounting file > and writes information in a SQL database. We just found out about what > seems to be a problem in the accounting file. From man 5 accounting: > > ru_wallclock > Difference between end_time and start_time (see above). > > We use that particular field to gather statistics for our users. What > we found out was that when the "failed" field is 37, the ru_wallclock > field is always 0, even if the job did run. We don't know exactly > under which circumstances this happens yet. > > Here's one such entry from the accounting file: > > med:r104-n7:nne-790-01:sboisver12:SRA024407-Ray-1.4.0-k31-group1:2903640:sge:0:1306781385:1307195150:1307470755:37:0:0:1023454.939168:617405.204111:0.000000:0:0:0:0:134261699:23127:0:0.000000:0:0:0:0:23568146:18934035:nne-790-ab:defaultdepartment:default:512:0:0.000000:0.000000:0.000000:-l > h_rt=86400 -pe default 512:0.000000:NONE:0.000000:0:0 > > And it's qacct output: > > ============================================================== > qname med > hostname r104-n7 > group nne-790-01 > owner sboisver12 > project nne-790-ab > department defaultdepartment > jobname SRA024407-Ray-1.4.0-k31-group1 > jobnumber 2903640 > taskid undefined > account sge > priority 0 > qsub_time Mon May 30 14:49:45 2011 > start_time Sat Jun 4 09:45:50 2011 > end_time Tue Jun 7 14:19:15 2011 > granted_pe default > slots 512
What is your definition of the PE? Normally you have one entry per `qrsh` call, or are all 512 slots allocated on one and the same machine, unless you specify in the PE to sum it up. -- Reuti > failed 37 : qmaster enforced h_rt limit > exit_status 0 > ru_wallclock 0 > ru_utime 1023454.939 > ru_stime 617405.204 > ru_maxrss 0 > ru_ixrss 0 > ru_ismrss 0 > ru_idrss 0 > ru_isrss 0 > ru_minflt 134261699 > ru_majflt 23127 > ru_nswap 0 > ru_inblock 0 > ru_oublock 0 > ru_msgsnd 0 > ru_msgrcv 0 > ru_nsignals 0 > ru_nvcsw 23568146 > ru_nivcsw 18934035 > cpu 0.000 > mem 0.000 > io 0.000 > iow 0.000 > maxvmem 0.000 > arid undefined > > Has anyone experienced this before? Is this a known "bug/feature"? > > Thanks, > > -- > Laurent Duchesne > CLUMEQ, Université Laval > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
