Re: [gridengine users] h_vmem not honored at all?

Pablo Escobar Thu, 21 Mar 2013 07:30:25 -0700

Hi,

I think I am having the same issue. My installation hasn't worked as
expected until I changed my complex h_vmem from consumable=JOB to
consumable=YES.


This ¿bug? only affected me when submitting jobs with multiple slots
(-pe smp 8) to a queue which has another complex. My submit was
something like this

$> qsub -q long.q -l h_vmem=40G,long=true -pe smp 8 -b y /bin/sleep 60

Now I switched to consumable=YES, asked my users to reserve h_vmem by
slot, not by job, and everything seems to work

regards,
Pablo.

2013/1/9 Ian Kaufman <[email protected]>:
> Hi Alex,
>
> Try setting the consumable to YES instead of JOB. Setting it to YES means
> applying it per slot, and requires the exec_host to have an h_vmem limit
> defined (in your case, 46G). Setting it to JOB, the resource is debited from
> the queue resource, which in your case, I suspect has no limit (h_vmem is
> set to INFINITY).
>
> According to Reuti, this may be a bug. The expected behaviour is that
> setting it to JOB, even with the queue's default h_vmem set to INFINITY, the
> node's tighter h_vmem limit should still take effect. However, it seems that
> is not happening (Brett Taylor had the same issue last month).
>
> Ian
>
>
>
> On Wed, Jan 9, 2013 at 12:29 AM, Alex Chekholko <[email protected]> wrote:
>>
>> Hi all,
>>
>> I have what seems like a straightforward problem.
>>
>> This is on Open Grid Scheduler 2011.11...
>>
>> h_vmem is configured as consumable:
>> # qconf -sc | grep  h_vmem
>> h_vmem              h_vmem     MEMORY      <=    YES         JOB 4G
>> 0
>>
>> The exec host is configured to have 46G of h_vmem:
>> # qconf -se scg1-4-8 | grep h_vmem
>> complex_values        h_vmem=46G,slots=12
>>
>> The user requests 16G of h_vmem in his job:
>> # qstat -f -j 480039 |grep h_vmem
>> hard resource_list:         h_vmem=16G
>>
>>
>>
>> But the scheduler puts a whole bunch of these on the same node!
>>
>> # qhost -j -F h_vmem
>> ...
>>
>> scg1-4-8                linux-x64      24     -   47.3G       -    9.8G
>> -
>>     Host Resource(s):      hc:h_vmem=-130.000G
>>     480039 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480041 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480042 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480043 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480044 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480045 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480046 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480047 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480048 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480049 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>     480050 0.50074 a_STL002-O shinlin      r     01/08/2013 20:21:18
>> standard@s MASTER
>>
>>
>>
>> How do I go about troubleshooting this?  It seems to have been working
>> fine for a while (months), it just started doing this a few days ago.
>>
>> I did have the same problem earlier, but as I understand it, there is a
>> bug related to either multiple queue instances on a host or multiple
>> consumable requests for the job, but in this case it is neither.  The host
>> only has this one queue instance, and the job only requests this one
>> complex.
>>
>> Suggestions?
>>
>> Regards,
>> --
>> Alex Chekholko [email protected]
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
>
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] h_vmem not honored at all?

Reply via email to