Re: [gridengine users] simple report with qstat -F ?

Gerard Henry Mon, 23 Jan 2012 07:31:45 -0800

On 01/20/12 07:55 PM, Reuti wrote:


Well, for the CPU time spend right now on a process it's not reported per 
process, at least I've never heard of it. The kernel schedules the processes 
ouside of SGE and the nice values are only relative. So it could only report in 
short time intervals like: in the last 2 seconds I spent 50% on this process 
and 10% for each of the following processes. But a second later it might change 
already as one is waiting for I/O. I would expect that `top` is doing it this 
way, as the values are more constant in case you specify a longer refresh 
interval.

As in the outline for the memory above, it should be possible to compute the 
difference of used up CPU time in the last 1/5/15 minutes for each process and 
compute the amount of CPU spend for the jobs in a particular queue this way.

==

Another idea: build a load sensor (i.e. one complex needs to be defined per 
queue in question and attached to each exechost) and feed:

top -b -n 2 -d 120 -p 1600,1234 | tail -n 3

to it where 1600 and 1234 are the PIDs of the jobs whose additonal group was 
found before $SGE_SPOOL_DIR/active_jobs. I suggest -n 2, as the first output 
doesn't show anything meaningful regarding CPU time because the interval is too 
short. This means to read one cycle ahead to avoid a delay in replying to SGE. 
The values can then be sorted by queue and assigned to appropriate complexes 
for each queue on each host.

thanks to your reply. I'm not sure to understand, certainly because myquestion was so obscure.Here is a solution i tried. Considering that all jobs are ended, and iwant to give values about cpu and memory consumed by jobs during lastyear, i dig into /local/export/sge/default/common/accounting and extractwith the attached python script the folowing results:

user iusti (51) cpu: 634.871534559 days mem: 0.805913076979 Go
user irphe (135) cpu: 414.912775315 days mem: 1.4525422628 Go
user l3m (252) cpu: 567.227461918 days mem: 1.70951371079 Go
user lma (45) cpu: 1139.86254098 days mem: 1.71787710938 Go
user latp (106) cpu: 127.5595829 days mem: 1.41270344795 Go

i'm just summing each cpu and mem values for a user on a queue definedon one host.The only problem is that the mem record is false due to the "4go" bug inSGE 6.2u5

another point obscure for me, is, " can i detect if a job is OpenMP orserial job by looking in accounting file" ?

for instance, i can write the following comand:

cat /local/export/sge/default/common/accounting | awk -F':' 'BEGIN{printf("owner group job_number granted_pe cpu mem category\n")}/<hostname>/ && /<queuename>/ {printf("%s %s %s %s %s %s %s\n", $4, $3,$6, $34, $37, $38, $40)}' | less

owner group job_number granted_pe cpu mem category
user1 iusti 13496 impi 2572922.350000 5628845.960567 -q dev -pe impi 10
user1 iusti 13925 impi 15400293.770000 15728826.955261 -q dev -pe impi 25
user2 iusti 13926 NONE 602366.910000 768713.172452 -q dev
user2 iusti 14088 NONE 606897.320000 779858.199665 -q dev
user1 iusti 14169 NONE 0.382940 0.003174 -U arusers -ar 2002

does "NONE" in granted_pe show an OpenMP job ?


thanks

gerard

#!/usr/bin/env python

keys = ['qname', 'hostname', 'group', 'owner', 'job_name', 'job_number', 
'account', 'priority', 'submission_time',
            'start_time', 'end_time', 'failed', 'exit_status', 
'ru_wallclock',
            'ru_utime', 'ru_stime', 'ru_maxrss', 'ru_ixrss', 
'ru_ismrss', 'ru_idrss', 'ru_isrss',
            'ru_minflt', 'ru_majflt', 'ru_nswap', 'ru_inblock', 
'ru_oublock', 'ru_msgsnd', 'ru_msgrcv',
            'ru_nsignals', 'ru_nvcsw', 'ru_nivcsw',
            'project', 'department', 'granted_pe', 'slots', 
'task_number',
            'cpu', 'mem', 'io', 'category', 'iow', 'pe_taskid', 
'maxvmem',
            'arid', 'ar_submission_time']

hostname = 'holopherne'
users = ['iusti', 'irphe', 'l3m', 'lma', 'latp', 'analysenum', 'edp', 
'webservd']

f = open("/local/export/sge/default/common/accounting", 'r')
values = {}
liste_values = []
for line in f.readlines():
 if line[0] != '#':
  l = line.strip().split(':')
  d = dict(zip(keys, l))
  if d['hostname'] == hostname:
   values = d
   liste_values.append(values)

   
f.close()

for user in users:
 cpu = 0.
 mem=0.
 njob = 0
 for value in liste_values:
  if user == value['group']:
   njob = njob + 1
   cpu = cpu + float(value['cpu'])
   mem = mem + float(value['mem'])
 if cpu > 3600:
  res1 = "cpu: %s days" % (cpu/3600/24)
 else:
  res1 = "cpu: %s s" % (cpu)
 if mem > 0.:
  res2 = "mem: %s Go" % (mem/cpu)
 else:
  res2 = "mem: %s Go.s" % (mem)
 print "user %s \t(%d jobs) \t %s \t %s" % (user, njob, res1, res2)

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] simple report with qstat -F ?

Reply via email to