Re: [gridengine users] simple report with qstat -F ?

Reuti Fri, 20 Jan 2012 10:56:35 -0800

Am 20.01.2012 um 14:53 schrieb Gerard Henry:

> hello all,
> i need to extract some stats from one particular node, belonging to many 
> queues:
> $ qstat -F
> queue1@holopherne                 BIP   0/0/30         30.03 lx24-amd64
> ...
>        hl:mem_used=21.019G
>        hl:swap_used=33.395M
>        hl:virtual_used=21.052G
>        hl:cpu=93.700000


These are "hl" = "host load" values. It's not counted per queue. While for 
virtual/mem_used it might work to have it per queue, I don't think that the 
operating system tells you what was put in swap space right now from which 
process. The pagefaults can be accessed in /proc and after the job though. For 
the memory:

- get the processes in a queue on a machine
- get their additonal group id
- scan /proc for the real processes having the additonal group id attached
- then the memory in stat, statm and status can be added up

Well, for the CPU time spend right now on a process it's not reported per 
process, at least I've never heard of it. The kernel schedules the processes 
ouside of SGE and the nice values are only relative. So it could only report in 
short time intervals like: in the last 2 seconds I spent 50% on this process 
and 10% for each of the following processes. But a second later it might change 
already as one is waiting for I/O. I would expect that `top` is doing it this 
way, as the values are more constant in case you specify a longer refresh 
interval.

As in the outline for the memory above, it should be possible to compute the 
difference of used up CPU time in the last 1/5/15 minutes for each process and 
compute the amount of CPU spend for the jobs in a particular queue this way.

==

Another idea: build a load sensor (i.e. one complex needs to be defined per 
queue in question and attached to each exechost) and feed:

top -b -n 2 -d 120 -p 1600,1234 | tail -n 3

to it where 1600 and 1234 are the PIDs of the jobs whose additonal group was 
found before $SGE_SPOOL_DIR/active_jobs. I suggest -n 2, as the first output 
doesn't show anything meaningful regarding CPU time because the interval is too 
short. This means to read one cycle ahead to avoid a delay in replying to SGE. 
The values can then be sorted by queue and assigned to appropriate complexes 
for each queue on each host.

-- Reuti


> ...
> queue2@holopherne               BIP   0/0/30         30.03    lx24-amd64
> ...
>        hl:mem_used=21.019G
>        hl:swap_used=33.395M
>        hl:virtual_used=21.052G
>        hl:cpu=93.700000
> etc...
> 
> 
> what i need is approximatively the memory and cpu consumed by each queue 
> (queue1, queue2, etc...) and i'm surprised because the values are the same!? 
> I know that the values are huge than that.
> i know that i can extract values from accounting file, but i'm wondering if 
> qstat can do the work?
> 
> thanks in advance for help,
> 
> gerard
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] simple report with qstat -F ?

Reply via email to