Hi Reuti, On Thu, Oct 13, 2011 at 12:26 PM, Reuti <[email protected]> wrote: > Am 13.10.2011 um 18:10 schrieb Laurent Duchesne: > >> I'd like to have your input on a problem we are facing right now: >> >> We have a small script which parses the SGE (6.2u5) accounting file >> and writes information in a SQL database. We just found out about what >> seems to be a problem in the accounting file. From man 5 accounting: >> >> ru_wallclock >> Difference between end_time and start_time (see above). >> >> We use that particular field to gather statistics for our users. What >> we found out was that when the "failed" field is 37, the ru_wallclock >> field is always 0, even if the job did run. We don't know exactly >> under which circumstances this happens yet. >> >> Here's one such entry from the accounting file: >> >> med:r104-n7:nne-790-01:sboisver12:SRA024407-Ray-1.4.0-k31-group1:2903640:sge:0:1306781385:1307195150:1307470755:37:0:0:1023454.939168:617405.204111:0.000000:0:0:0:0:134261699:23127:0:0.000000:0:0:0:0:23568146:18934035:nne-790-ab:defaultdepartment:default:512:0:0.000000:0.000000:0.000000:-l >> h_rt=86400 -pe default 512:0.000000:NONE:0.000000:0:0 >> >> And it's qacct output: >> >> ============================================================== >> qname med >> hostname r104-n7 >> group nne-790-01 >> owner sboisver12 >> project nne-790-ab >> department defaultdepartment >> jobname SRA024407-Ray-1.4.0-k31-group1 >> jobnumber 2903640 >> taskid undefined >> account sge >> priority 0 >> qsub_time Mon May 30 14:49:45 2011 >> start_time Sat Jun 4 09:45:50 2011 >> end_time Tue Jun 7 14:19:15 2011 >> granted_pe default >> slots 512 > > What is your definition of the PE? Normally you have one entry per `qrsh` > call, or are all 512 slots allocated on one and the same machine, unless you > specify in the PE to sum it up. > > -- Reuti >
Here's our pe definition: pe_name default slots 9999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule 8 control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE We have only 1 entry per job/task because of the accounting_summary setting. > >> failed 37 : qmaster enforced h_rt limit >> exit_status 0 >> ru_wallclock 0 >> ru_utime 1023454.939 >> ru_stime 617405.204 >> ru_maxrss 0 >> ru_ixrss 0 >> ru_ismrss 0 >> ru_idrss 0 >> ru_isrss 0 >> ru_minflt 134261699 >> ru_majflt 23127 >> ru_nswap 0 >> ru_inblock 0 >> ru_oublock 0 >> ru_msgsnd 0 >> ru_msgrcv 0 >> ru_nsignals 0 >> ru_nvcsw 23568146 >> ru_nivcsw 18934035 >> cpu 0.000 >> mem 0.000 >> io 0.000 >> iow 0.000 >> maxvmem 0.000 >> arid undefined >> >> Has anyone experienced this before? Is this a known "bug/feature"? >> >> Thanks, >> >> -- >> Laurent Duchesne >> CLUMEQ, Université Laval >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > > Thanks, -- Laurent Duchesne CLUMEQ, Université Laval _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
