Hi, I think I am having the same issue. My installation hasn't worked as expected until I changed my complex h_vmem from consumable=JOB to consumable=YES.
This ¿bug? only affected me when submitting jobs with multiple slots (-pe smp 8) to a queue which has another complex. My submit was something like this $> qsub -q long.q -l h_vmem=40G,long=true -pe smp 8 -b y /bin/sleep 60 Now I switched to consumable=YES, asked my users to reserve h_vmem by slot, not by job, and everything seems to work regards, Pablo. 2013/1/9 Ian Kaufman <[email protected]>: > Hi Alex, > > Try setting the consumable to YES instead of JOB. Setting it to YES means > applying it per slot, and requires the exec_host to have an h_vmem limit > defined (in your case, 46G). Setting it to JOB, the resource is debited from > the queue resource, which in your case, I suspect has no limit (h_vmem is > set to INFINITY). > > According to Reuti, this may be a bug. The expected behaviour is that > setting it to JOB, even with the queue's default h_vmem set to INFINITY, the > node's tighter h_vmem limit should still take effect. However, it seems that > is not happening (Brett Taylor had the same issue last month). > > Ian > > > > On Wed, Jan 9, 2013 at 12:29 AM, Alex Chekholko <[email protected]> wrote: >> >> Hi all, >> >> I have what seems like a straightforward problem. >> >> This is on Open Grid Scheduler 2011.11... >> >> h_vmem is configured as consumable: >> # qconf -sc | grep h_vmem >> h_vmem h_vmem MEMORY <= YES JOB 4G >> 0 >> >> The exec host is configured to have 46G of h_vmem: >> # qconf -se scg1-4-8 | grep h_vmem >> complex_values h_vmem=46G,slots=12 >> >> The user requests 16G of h_vmem in his job: >> # qstat -f -j 480039 |grep h_vmem >> hard resource_list: h_vmem=16G >> >> >> >> But the scheduler puts a whole bunch of these on the same node! >> >> # qhost -j -F h_vmem >> ... >> >> scg1-4-8 linux-x64 24 - 47.3G - 9.8G >> - >> Host Resource(s): hc:h_vmem=-130.000G >> 480039 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480041 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480042 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480043 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480044 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480045 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480046 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480047 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480048 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480049 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> 480050 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18 >> standard@s MASTER >> >> >> >> How do I go about troubleshooting this? It seems to have been working >> fine for a while (months), it just started doing this a few days ago. >> >> I did have the same problem earlier, but as I understand it, there is a >> bug related to either multiple queue instances on a host or multiple >> consumable requests for the job, but in this case it is neither. The host >> only has this one queue instance, and the job only requests this one >> complex. >> >> Suggestions? >> >> Regards, >> -- >> Alex Chekholko [email protected] >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
