Hi,

Am 06.07.2011 um 21:24 schrieb Peskin, Eric:

> We have a cluster managed by SGE.  Specifically, this is the version from the 
> rpm sge-V62u4-1 installed on a Linux cluster running Rocks.
> 
> We want to start charging users for use of the cluster.  Our plan is to 
> charge by the core hour.  In other words, using one CPU core for 12 hours 
> should cost the same as using 12 CPU cores for 1 hour.
> 
> I think I should be able to gather this information using qacct.  For 
> example, if I understand correctly,
> 
>       qacct -o -d 30
> 
> should output usage for each user over the last 30 days.

not directly. It will look into the accounting file for jobs which started in 
the last 30 days and also finished already. For still running ones it can't 
generate any output. Also jobs started before "now -30 days" aren't taken into 
account.

To start, you can look into an individual entry for a job by "-j <job_id>".


> However, I am still unclear (after having read the man page and some web 
> sites) on the interpretation of the various columns of output from qacct 
> (especially WALLCLOCK, UTIME, STIME, and CPU).

While UTIME and STIME are values computed by the kernel, the CPU and MEM (IO 
too) are computed by SGE's shepherd. If no process (of a serial ot Open MP) job 
is jumping out of the process tree, the computed values by the kernel, should 
almost be the same as the ones computed by SGE. As SGE is using an additonal 
group ID to keep track of it, the kernel can only generate the correct values 
in case a normal end of a job. If you used `qdel`, these values are often wrong 
(the ones by the kernel).


>  Sometimes the value in the WALLCLOCK column is greater than that in the CPU 
> column and sometimes it is the other way around.  Also, sometimes the CPU 
> column is the sum of UTIME + STIME, but sometimes it is quite different.
> 
> I want to make sure we get this right in the face of parallelism.  SGE has 
> multiple ways to run jobs in parallel (e.g., qmake, array jobs, and parallel 
> environments).

If you run a parallel job, it's important to have a proper tight integration of 
all slave processes into SGE, i.e. slave tasks are started by `qrsh - inherit 
...`, and not by `ssh` or `rsh` from one node to another. One way to spot a 
wrong setup is to disable `ssh` and `rsh` inside the cluster (I usually limit 
it to admin staff), and force users to even use `qrsh -l node22` or alike in 
case they want to check the state of a job on the nodes (here you could also 
limit h_cpu to 60 to avoid abuse of this granted feature).

If you have this, you will either get one entry per job plus one for each `qrsh 
-inherit ...` and the summarized output (like you used above) will add this 
accordingly as long as you set "accounting_summary false" in the PE definition. 
If you set "accounting_summary true" instead, there will be only one entry per 
job and the vaules for CPU, MEM and IO are summarized for all slave tasks 
(unfortunately not the kernel reported values, but it's an RFE to add them too).

Array jobs are not so special. You will have one entry per task (for serial 
array jobs) and you could use "-t <taskid>" to get a single entry out of the 
accounting.


> What is the most reliable way to track core-hours, such that occupying 100 
> cores for a day costs 100 times as much as occupying just 1 core for a day?

Occupying or using?

-- Reuti


> Any advice would be greatly appreciated.
> 
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain information that is proprietary, 
> confidential, and exempt from disclosure under applicable law. Any 
> unauthorized review, use, disclosure, or distribution is prohibited. If you 
> have received this email in error please notify the sender by return email 
> and delete the original message. Please note, the recipient should check this 
> email and any attachments for the presence of viruses. The organization 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> =================================
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to