On 01/20/12 07:55 PM, Reuti wrote:
Well, for the CPU time spend right now on a process it's not reported per
process, at least I've never heard of it. The kernel schedules the processes
ouside of SGE and the nice values are only relative. So it could only report in
short time intervals like: in the last 2 seconds I spent 50% on this process
and 10% for each of the following processes. But a second later it might change
already as one is waiting for I/O. I would expect that `top` is doing it this
way, as the values are more constant in case you specify a longer refresh
interval.
As in the outline for the memory above, it should be possible to compute the
difference of used up CPU time in the last 1/5/15 minutes for each process and
compute the amount of CPU spend for the jobs in a particular queue this way.
==
Another idea: build a load sensor (i.e. one complex needs to be defined per
queue in question and attached to each exechost) and feed:
top -b -n 2 -d 120 -p 1600,1234 | tail -n 3
to it where 1600 and 1234 are the PIDs of the jobs whose additonal group was
found before $SGE_SPOOL_DIR/active_jobs. I suggest -n 2, as the first output
doesn't show anything meaningful regarding CPU time because the interval is too
short. This means to read one cycle ahead to avoid a delay in replying to SGE.
The values can then be sorted by queue and assigned to appropriate complexes
for each queue on each host.
thanks to your reply. I'm not sure to understand, certainly because my
question was so obscure.
Here is a solution i tried. Considering that all jobs are ended, and i
want to give values about cpu and memory consumed by jobs during last
year, i dig into /local/export/sge/default/common/accounting and extract
with the attached python script the folowing results:
user iusti (51) cpu: 634.871534559 days mem: 0.805913076979 Go
user irphe (135) cpu: 414.912775315 days mem: 1.4525422628 Go
user l3m (252) cpu: 567.227461918 days mem: 1.70951371079 Go
user lma (45) cpu: 1139.86254098 days mem: 1.71787710938 Go
user latp (106) cpu: 127.5595829 days mem: 1.41270344795 Go
i'm just summing each cpu and mem values for a user on a queue defined
on one host.
The only problem is that the mem record is false due to the "4go" bug in
SGE 6.2u5
another point obscure for me, is, " can i detect if a job is OpenMP or
serial job by looking in accounting file" ?
for instance, i can write the following comand:
cat /local/export/sge/default/common/accounting | awk -F':' 'BEGIN
{printf("owner group job_number granted_pe cpu mem category\n")}
/<hostname>/ && /<queuename>/ {printf("%s %s %s %s %s %s %s\n", $4, $3,
$6, $34, $37, $38, $40)}' | less
owner group job_number granted_pe cpu mem category
user1 iusti 13496 impi 2572922.350000 5628845.960567 -q dev -pe impi 10
user1 iusti 13925 impi 15400293.770000 15728826.955261 -q dev -pe impi 25
user2 iusti 13926 NONE 602366.910000 768713.172452 -q dev
user2 iusti 14088 NONE 606897.320000 779858.199665 -q dev
user1 iusti 14169 NONE 0.382940 0.003174 -U arusers -ar 2002
does "NONE" in granted_pe show an OpenMP job ?
thanks
gerard
#!/usr/bin/env python
keys = ['qname', 'hostname', 'group', 'owner', 'job_name', 'job_number',
'account', 'priority', 'submission_time',
'start_time', 'end_time', 'failed', 'exit_status',
'ru_wallclock',
'ru_utime', 'ru_stime', 'ru_maxrss', 'ru_ixrss',
'ru_ismrss', 'ru_idrss', 'ru_isrss',
'ru_minflt', 'ru_majflt', 'ru_nswap', 'ru_inblock',
'ru_oublock', 'ru_msgsnd', 'ru_msgrcv',
'ru_nsignals', 'ru_nvcsw', 'ru_nivcsw',
'project', 'department', 'granted_pe', 'slots',
'task_number',
'cpu', 'mem', 'io', 'category', 'iow', 'pe_taskid',
'maxvmem',
'arid', 'ar_submission_time']
hostname = 'holopherne'
users = ['iusti', 'irphe', 'l3m', 'lma', 'latp', 'analysenum', 'edp',
'webservd']
f = open("/local/export/sge/default/common/accounting", 'r')
values = {}
liste_values = []
for line in f.readlines():
if line[0] != '#':
l = line.strip().split(':')
d = dict(zip(keys, l))
if d['hostname'] == hostname:
values = d
liste_values.append(values)
f.close()
for user in users:
cpu = 0.
mem=0.
njob = 0
for value in liste_values:
if user == value['group']:
njob = njob + 1
cpu = cpu + float(value['cpu'])
mem = mem + float(value['mem'])
if cpu > 3600:
res1 = "cpu: %s days" % (cpu/3600/24)
else:
res1 = "cpu: %s s" % (cpu)
if mem > 0.:
res2 = "mem: %s Go" % (mem/cpu)
else:
res2 = "mem: %s Go.s" % (mem)
print "user %s \t(%d jobs) \t %s \t %s" % (user, njob, res1, res2)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users