Hi,

Am 18.04.2014 um 22:25 schrieb Ilya M:

> I have been using h_vmem as a consumable resource to limit the amount of 
> memory users can request and to make sure jobs don't use more than they 
> requested. It all has been working fine until we added nodes with GPU modules.
> 
> The memory model in CUDA applications is such, that the address space for 
> virtual memory is expanded to have a single address space for regular memory 
> and GPU memory (and something else, because I saw reported memory usage 
> 2-fold of the virtual memory).
> 
> This results in jobs getting killed because of exceeding h_vmem, whereas is 
> fact the actual memory usage was really low. So this effectively renders 
> h_vmem useless, because it either kills jobs that should not be killed or 
> needs to be set very high (2-3 time the size of virtual memory) for the host 
> and jobs to allow jobs to run.
> 
> I was wondering if there were good solutions or practices on controlling 
> memory usage on GPU-equipped nodes. I am using SGE 6.2u5.

Do you need to control the jobs behavior by using h_vmem? An alternative could 
be to define virtual_free as consumable and with a times 2 or 3 value per 
exechost, so that it helps SGE to schedule jobs to the best suited node. Jobs 
won't get killed when they pass the requested resource though.

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to