Hi Alex,

Try setting the consumable to YES instead of JOB. Setting it to YES means
applying it per slot, and requires the exec_host to have an h_vmem limit
defined (in your case, 46G). Setting it to JOB, the resource is debited
from the queue resource, which in your case, I suspect has no limit
(h_vmemis set to INFINITY).

According to Reuti, this may be a bug. The expected behaviour is that
setting it to JOB, even with the queue's default h_vmem set to INFINITY,
the node's tighter h_vmem limit should still take effect. However, it seems
that is not happening (Brett Taylor had the same issue last month).

Ian



On Wed, Jan 9, 2013 at 12:29 AM, Alex Chekholko <[email protected]> wrote:

> Hi all,
>
> I have what seems like a straightforward problem.
>
> This is on Open Grid Scheduler 2011.11...
>
> h_vmem is configured as consumable:
> # qconf -sc | grep  h_vmem
> h_vmem              h_vmem     MEMORY      <=    YES         JOB 4G       0
>
> The exec host is configured to have 46G of h_vmem:
> # qconf -se scg1-4-8 | grep h_vmem
> complex_values        h_vmem=46G,slots=12
>
> The user requests 16G of h_vmem in his job:
> # qstat -f -j 480039 |grep h_vmem
> hard resource_list:         h_vmem=16G
>
>
>
> But the scheduler puts a whole bunch of these on the same node!
>
> # qhost -j -F h_vmem
> ...
>
> scg1-4-8                linux-x64      24     -   47.3G       -    9.8G
>     -
>     Host Resource(s):      hc:h_vmem=-130.000G
>     480039 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480041 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480042 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480043 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480044 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480045 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480046 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480047 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480048 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480049 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>     480050 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
> standard@s MASTER
>
>
>
> How do I go about troubleshooting this?  It seems to have been working
> fine for a while (months), it just started doing this a few days ago.
>
> I did have the same problem earlier, but as I understand it, there is a
> bug related to either multiple queue instances on a host or multiple
> consumable requests for the job, but in this case it is neither.  The host
> only has this one queue instance, and the job only requests this one
> complex.
>
> Suggestions?
>
> Regards,
> --
> Alex Chekholko [email protected]
> ______________________________**_________________
> users mailing list
> [email protected]
> https://gridengine.org/**mailman/listinfo/users<https://gridengine.org/mailman/listinfo/users>
>



-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to