Having done a bit of investigation it seems that the problem we're hitting is that our h_vmem limits aren't being respected if the jobs are being submitted as parallel jobs.
If I put two jobs in: $ qsub -o test.log -l h_vmem=1000G hostname Your job 343719 ("hostname") has been submitted $ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname Your job 343720 ("hostname") has been submitted The first job won't be scheduled: scheduling info: cannot run in queue instance "all.q@compute-1-6.local" because it is not of type batch cannot run in queue instance "all.q@compute-1-1.local" because it is not of type batch cannot run in queue instance "all.q@compute-1-5.local" because it is not of type batch cannot run in queue instance "all.q@compute-1-3.local" because it is not of type batch cannot run in queue instance "all.q@compute-1-2.local" because it is not of type batch (-l h_vmem=1000G) cannot run at host "compute-0-2.local" because it offers only hc:h_vmem=4.000G cannot run in queue instance "all.q@compute-1-4.local" because it is not of type batch cannot run in queue instance "all.q@compute-1-0.local" because it is not of type batch (-l h_vmem=1000G) cannot run at host "compute-0-4.local" because it offers only hc:h_vmem=16.000G cannot run in queue instance "all.q@compute-1-7.local" because it is not of type batch (-l h_vmem=1000G) cannot run at host "compute-0-3.local" because it offers only hc:h_vmem=25.000G (-l h_vmem=1000G) cannot run at host "compute-0-6.local" because it offers only hc:h_vmem=-968.000G (-l h_vmem=1000G) cannot run at host "compute-0-5.local" because it offers only hc:h_vmem=32.000G (-l h_vmem=1000G) cannot run at host "compute-0-0.local" because it offers only hc:h_vmem=32.000G (-l h_vmem=1000G) cannot run at host "compute-0-1.local" because it offers only hc:h_vmem=12.000G But the second is immediately scheduled and overcommits the node it's on (and the overcommit is reflected by qstat -F h_vmem). The memory usage is recorded and will prevent other jobs from running on that node, but I need to figure out how to make the scheduler respect the resource limit when the job is first submitted. Any suggestions would be very welcome Thanks. Simon. -----Original Message----- From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On Behalf Of Simon Andrews Sent: 08 June 2015 13:53 To: users@gridengine.org Subject: [gridengine users] Negative complex values Our cluster seems to have ended up in a strange state, and I don't understand why. We have set up h_vmem to be a consumable resource so that users can't exhaust the memory on any compute node. This has been working OK and in our tests it all seemed to be right, but we've now found that somehow we've ended up with nodes with negative amounts of memory remaining. We only have one queue on the system, all.q. $ qstat -F h_vmem -q all.q@compute-0-3 queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- all.q@compute-0-3.local BP 0/44/64 13.13 lx26-amd64 hc:h_vmem=-172.000G ..so the node is somehow at -172G memory. The setup for the resource is as follows: $ qconf -sc | grep h_vmem h_vmem h_vmem MEMORY <= YES JOB 0 0 We use a jsv to add a default memory allocation to all jobs, and the jobs listed all provide an h_vmem condition (see later). ..the initialisation of the complex value for the node looks OK: $ qconf -se compute-0-3 | grep complex complex_values h_vmem=128G The problem seems to stem from an individual job which has managed to commit a 200G job on a node with only 128G. These are the jobs which are running on that node. qstat -j 341706 | grep "hard resource_list" hard resource_list: h_vmem=21474836480 qstat -j 342549 | grep "hard resource_list" hard resource_list: h_vmem=21474836480 qstat -j 342569 | grep "hard resource_list" hard resource_list: h_vmem=21474836480 qstat -j 343337 | grep "hard resource_list" hard resource_list: h_vmem=21474836480 qstat -j 343367 | grep "hard resource_list" hard resource_list: h_vmem=21474836480 qstat -j 343400 | grep "hard resource_list" hard resource_list: h_vmem=200G We still have jobs which are queued because there is insufficient memory, so the limit isn't being completely ignored, but I don't understand how the jobs which are currently running were able to be scheduled. (-l h_vmem=40G) cannot run at host "compute-0-3.local" because it offers only hc:h_vmem=-172.000G Does anyone have any suggestions for how the cluster could have got itself into this situation? Thanks Simon. The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902. The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms> _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902. The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms> _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users