Am 20.01.2012 um 14:53 schrieb Gerard Henry: > hello all, > i need to extract some stats from one particular node, belonging to many > queues: > $ qstat -F > queue1@holopherne BIP 0/0/30 30.03 lx24-amd64 > ... > hl:mem_used=21.019G > hl:swap_used=33.395M > hl:virtual_used=21.052G > hl:cpu=93.700000
These are "hl" = "host load" values. It's not counted per queue. While for virtual/mem_used it might work to have it per queue, I don't think that the operating system tells you what was put in swap space right now from which process. The pagefaults can be accessed in /proc and after the job though. For the memory: - get the processes in a queue on a machine - get their additonal group id - scan /proc for the real processes having the additonal group id attached - then the memory in stat, statm and status can be added up Well, for the CPU time spend right now on a process it's not reported per process, at least I've never heard of it. The kernel schedules the processes ouside of SGE and the nice values are only relative. So it could only report in short time intervals like: in the last 2 seconds I spent 50% on this process and 10% for each of the following processes. But a second later it might change already as one is waiting for I/O. I would expect that `top` is doing it this way, as the values are more constant in case you specify a longer refresh interval. As in the outline for the memory above, it should be possible to compute the difference of used up CPU time in the last 1/5/15 minutes for each process and compute the amount of CPU spend for the jobs in a particular queue this way. == Another idea: build a load sensor (i.e. one complex needs to be defined per queue in question and attached to each exechost) and feed: top -b -n 2 -d 120 -p 1600,1234 | tail -n 3 to it where 1600 and 1234 are the PIDs of the jobs whose additonal group was found before $SGE_SPOOL_DIR/active_jobs. I suggest -n 2, as the first output doesn't show anything meaningful regarding CPU time because the interval is too short. This means to read one cycle ahead to avoid a delay in replying to SGE. The values can then be sorted by queue and assigned to appropriate complexes for each queue on each host. -- Reuti > ... > queue2@holopherne BIP 0/0/30 30.03 lx24-amd64 > ... > hl:mem_used=21.019G > hl:swap_used=33.395M > hl:virtual_used=21.052G > hl:cpu=93.700000 > etc... > > > what i need is approximatively the memory and cpu consumed by each queue > (queue1, queue2, etc...) and i'm surprised because the values are the same!? > I know that the values are huge than that. > i know that i can extract values from accounting file, but i'm wondering if > qstat can do the work? > > thanks in advance for help, > > gerard > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
