Hi, Am 22.04.2014 um 01:17 schrieb Ilya M:
>>> I have been using h_vmem as a consumable resource to limit the amount of >>> memory users can request and to make sure jobs don't use more than they >>> requested. It all has been working fine until we added nodes with GPU >>> modules. >>> >>> The memory model in CUDA applications is such, that the address space for >>> virtual memory is expanded to have a single address space for regular >>> memory and GPU memory (and something else, because I saw reported memory >>> usage 2-fold of the virtual memory). >>> >>> This results in jobs getting killed because of exceeding h_vmem, whereas is >>> fact the actual memory usage was really low. So this effectively renders >>> h_vmem useless, because it either kills jobs that should not be killed or >>> needs to be set very high (2-3 time the size of virtual memory) for the >>> host and jobs to allow jobs to run. >>> >>> I was wondering if there were good solutions or practices on controlling >>> memory usage on GPU-equipped nodes. I am using SGE 6.2u5. >> Do you need to control the jobs behavior by using h_vmem? An alternative >> could be to define virtual_free as consumable and with a times 2 or 3 value >> per exechost, so that it helps SGE to schedule jobs to the best suited node. >> Jobs won't get killed when they pass the requested resource though. >> > I had created a custom consumable to track available/used memory on a node > without the possibility to kill a job. This should help SGE correctly > dispatch jobs. I think this is the same thing you're suggesting. I was hoping > there was something else I overlooked. Using a custom consumable is almost the same. The small difference is, that for virtual_free the measured load value is still taken into account, and the tighter constrain will be used. In `qhost -F` you can notice this by the prefix for the value being either hl: or hc: depending the acutal values. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
