Having done a bit of investigation it seems that the problem we're hitting is 
that our h_vmem limits aren't being respected if the jobs are being submitted 
as parallel jobs.

If I put two jobs in:

$ qsub -o test.log -l h_vmem=1000G hostname
Your job 343719 ("hostname") has been submitted

$ qsub -o test.log -l h_vmem=1000G -pe cores 2 hostname
Your job 343720 ("hostname") has been submitted

The first job won't be scheduled:
scheduling info:            cannot run in queue instance 
"all.q@compute-1-6.local" because it is not of type batch
                            cannot run in queue instance 
"all.q@compute-1-1.local" because it is not of type batch
                            cannot run in queue instance 
"all.q@compute-1-5.local" because it is not of type batch
                            cannot run in queue instance 
"all.q@compute-1-3.local" because it is not of type batch
                            cannot run in queue instance 
"all.q@compute-1-2.local" because it is not of type batch
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-2.local" because it offers only hc:h_vmem=4.000G
                            cannot run in queue instance 
"all.q@compute-1-4.local" because it is not of type batch
                            cannot run in queue instance 
"all.q@compute-1-0.local" because it is not of type batch
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-4.local" because it offers only hc:h_vmem=16.000G
                            cannot run in queue instance 
"all.q@compute-1-7.local" because it is not of type batch
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-3.local" because it offers only hc:h_vmem=25.000G
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-6.local" because it offers only hc:h_vmem=-968.000G
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-5.local" because it offers only hc:h_vmem=32.000G
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-0.local" because it offers only hc:h_vmem=32.000G
                            (-l h_vmem=1000G) cannot run at host 
"compute-0-1.local" because it offers only hc:h_vmem=12.000G


But the second is immediately scheduled and overcommits the node it's on (and 
the overcommit is reflected by qstat -F h_vmem).

The memory usage is recorded and will prevent other jobs from running on that 
node, but I need to figure out how to make the scheduler respect the resource 
limit when the job is first submitted.

Any suggestions would be very welcome

Thanks.

Simon.

-----Original Message-----
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On 
Behalf Of Simon Andrews
Sent: 08 June 2015 13:53
To: users@gridengine.org
Subject: [gridengine users] Negative complex values

Our cluster seems to have ended up in a strange state, and I don't understand 
why.

We have set up h_vmem to be a consumable resource so that users can't exhaust 
the memory on any compute node.  This has been working OK and in our tests it 
all seemed to be right, but we've now found that somehow we've ended up with 
nodes with negative amounts of memory remaining.

We only have one queue on the system, all.q.

$ qstat -F h_vmem -q all.q@compute-0-3
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
all.q@compute-0-3.local        BP    0/44/64        13.13    lx26-amd64
        hc:h_vmem=-172.000G

..so the node is somehow at -172G memory.

The setup for the resource is as follows:

$ qconf -sc | grep h_vmem
h_vmem              h_vmem     MEMORY      <=    YES         JOB        0       
 0

We use a jsv to add a default memory allocation to all jobs, and the jobs 
listed all provide an h_vmem condition (see later).

..the initialisation of the complex value for the node looks OK:

$ qconf -se compute-0-3 | grep complex
complex_values        h_vmem=128G

The problem seems to stem from an individual job which has managed to commit a 
200G job on a node with only 128G. These are the jobs which are running on that 
node.

qstat -j 341706 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 342549 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 342569 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343337 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343367 | grep "hard resource_list"
hard resource_list:         h_vmem=21474836480
qstat -j 343400 | grep "hard resource_list"
hard resource_list:         h_vmem=200G

We still have jobs which are queued because there is insufficient memory, so 
the limit isn't being completely ignored, but I don't understand how the jobs 
which are currently running were able to be scheduled.

(-l h_vmem=40G) cannot run at host "compute-0-3.local" because it offers only 
hc:h_vmem=-172.000G

Does anyone have any suggestions for how the cluster could have got itself into 
this situation?

Thanks

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to