Hello,

I have been using h_vmem as a consumable resource to limit the amount of memory users can request and to make sure jobs don't use more than they requested. It all has been working fine until we added nodes with GPU modules.

The memory model in CUDA applications is such, that the address space for virtual memory is expanded to have a single address space for regular memory and GPU memory (and something else, because I saw reported memory usage 2-fold of the virtual memory).

This results in jobs getting killed because of exceeding h_vmem, whereas is fact the actual memory usage was really low. So this effectively renders h_vmem useless, because it either kills jobs that should not be killed or needs to be set very high (2-3 time the size of virtual memory) for the host and jobs to allow jobs to run.

I was wondering if there were good solutions or practices on controlling memory usage on GPU-equipped nodes. I am using SGE 6.2u5.

Thank you,
Ilya.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to