Hi all,
I have what seems like a straightforward problem.
This is on Open Grid Scheduler 2011.11...
h_vmem is configured as consumable:
# qconf -sc | grep h_vmem
h_vmem h_vmem MEMORY <= YES JOB
4G 0
The exec host is configured to have 46G of h_vmem:
# qconf -se scg1-4-8 | grep h_vmem
complex_values h_vmem=46G,slots=12
The user requests 16G of h_vmem in his job:
# qstat -f -j 480039 |grep h_vmem
hard resource_list: h_vmem=16G
But the scheduler puts a whole bunch of these on the same node!
# qhost -j -F h_vmem
...
scg1-4-8 linux-x64 24 - 47.3G - 9.8G
-
Host Resource(s): hc:h_vmem=-130.000G
480039 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480041 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480042 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480043 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480044 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480045 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480046 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480047 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480048 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480049 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
480050 0.50074 a_STL002-O shinlin r 01/08/2013 20:21:18
standard@s MASTER
How do I go about troubleshooting this? It seems to have been working
fine for a while (months), it just started doing this a few days ago.
I did have the same problem earlier, but as I understand it, there is a
bug related to either multiple queue instances on a host or multiple
consumable requests for the job, but in this case it is neither. The
host only has this one queue instance, and the job only requests this
one complex.
Suggestions?
Regards,
--
Alex Chekholko [email protected]
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users