Hi Reuti,

On Thu, Oct 13, 2011 at 12:26 PM, Reuti <[email protected]> wrote:
> Am 13.10.2011 um 18:10 schrieb Laurent Duchesne:
>
>> I'd like to have your input on a problem we are facing right now:
>>
>> We have a small script which parses the SGE (6.2u5) accounting file
>> and writes information in a SQL database. We just found out about what
>> seems to be a problem in the accounting file. From man 5 accounting:
>>
>> ru_wallclock
>>        Difference between end_time and start_time (see above).
>>
>> We use that particular field to gather statistics for our users. What
>> we found out was that when the "failed" field is 37, the ru_wallclock
>> field is always 0, even if the job did run. We don't know exactly
>> under which circumstances this happens yet.
>>
>> Here's one such entry from the accounting file:
>>
>> med:r104-n7:nne-790-01:sboisver12:SRA024407-Ray-1.4.0-k31-group1:2903640:sge:0:1306781385:1307195150:1307470755:37:0:0:1023454.939168:617405.204111:0.000000:0:0:0:0:134261699:23127:0:0.000000:0:0:0:0:23568146:18934035:nne-790-ab:defaultdepartment:default:512:0:0.000000:0.000000:0.000000:-l
>> h_rt=86400 -pe default 512:0.000000:NONE:0.000000:0:0
>>
>> And it's qacct output:
>>
>> ==============================================================
>> qname        med
>> hostname     r104-n7
>> group        nne-790-01
>> owner        sboisver12
>> project      nne-790-ab
>> department   defaultdepartment
>> jobname      SRA024407-Ray-1.4.0-k31-group1
>> jobnumber    2903640
>> taskid       undefined
>> account      sge
>> priority     0
>> qsub_time    Mon May 30 14:49:45 2011
>> start_time   Sat Jun  4 09:45:50 2011
>> end_time     Tue Jun  7 14:19:15 2011
>> granted_pe   default
>> slots        512
>
> What is your definition of the PE? Normally you have one entry per `qrsh` 
> call, or are all 512 slots allocated on one and the same machine, unless you 
> specify in the PE to sum it up.
>
> -- Reuti
>

Here's our pe definition:

pe_name            default
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    8
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

We have only 1 entry per job/task because of the accounting_summary setting.

>
>> failed       37  : qmaster enforced h_rt limit
>> exit_status  0
>> ru_wallclock 0
>> ru_utime     1023454.939
>> ru_stime     617405.204
>> ru_maxrss    0
>> ru_ixrss     0
>> ru_ismrss    0
>> ru_idrss     0
>> ru_isrss     0
>> ru_minflt    134261699
>> ru_majflt    23127
>> ru_nswap     0
>> ru_inblock   0
>> ru_oublock   0
>> ru_msgsnd    0
>> ru_msgrcv    0
>> ru_nsignals  0
>> ru_nvcsw     23568146
>> ru_nivcsw    18934035
>> cpu          0.000
>> mem          0.000
>> io           0.000
>> iow          0.000
>> maxvmem      0.000
>> arid         undefined
>>
>> Has anyone experienced this before? Is this a known "bug/feature"?
>>
>> Thanks,
>>
>> --
>> Laurent Duchesne
>> CLUMEQ, Université Laval
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>
>

Thanks,

-- 
Laurent Duchesne
CLUMEQ, Université Laval

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to