Hi, Am 18.04.2014 um 22:25 schrieb Ilya M:
> I have been using h_vmem as a consumable resource to limit the amount of > memory users can request and to make sure jobs don't use more than they > requested. It all has been working fine until we added nodes with GPU modules. > > The memory model in CUDA applications is such, that the address space for > virtual memory is expanded to have a single address space for regular memory > and GPU memory (and something else, because I saw reported memory usage > 2-fold of the virtual memory). > > This results in jobs getting killed because of exceeding h_vmem, whereas is > fact the actual memory usage was really low. So this effectively renders > h_vmem useless, because it either kills jobs that should not be killed or > needs to be set very high (2-3 time the size of virtual memory) for the host > and jobs to allow jobs to run. > > I was wondering if there were good solutions or practices on controlling > memory usage on GPU-equipped nodes. I am using SGE 6.2u5. Do you need to control the jobs behavior by using h_vmem? An alternative could be to define virtual_free as consumable and with a times 2 or 3 value per exechost, so that it helps SGE to schedule jobs to the best suited node. Jobs won't get killed when they pass the requested resource though. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
