Hi Reuti,

Thanks for your response, here's the output of 'qhost -F h_vmem'.
I am not sure how to interpret the negative values here either.

# qhost -F h_vmem
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - - greenie lx26-amd64 64 1.97 1009.9G 19.5G 996.2M 0.0
    Host Resource(s):      hc:h_vmem=720.000G
scg3-0-1 lx26-amd64 32 25.95 63.0G 21.0G 9.8G 4.2G
    Host Resource(s):      hc:h_vmem=4.500G
scg3-0-10 lx26-amd64 32 19.53 63.0G 25.2G 9.8G 74.7M
    Host Resource(s):      hc:h_vmem=12.000G
scg3-0-11 lx26-amd64 32 18.71 63.0G 25.6G 9.8G 389.1M
    Host Resource(s):      hc:h_vmem=20.000G
scg3-0-12 lx26-amd64 32 27.18 63.0G 38.9G 9.8G 33.6M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-13 lx26-amd64 32 22.74 63.0G 24.3G 9.8G 31.5M
    Host Resource(s):      hc:h_vmem=20.000G
scg3-0-14 lx26-amd64 32 31.19 63.0G 41.1G 9.8G 32.1M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-15 lx26-amd64 32 22.97 63.0G 23.0G 9.8G 39.4M
    Host Resource(s):      hc:h_vmem=15.000G
scg3-0-16 lx26-amd64 32 1.00 63.0G 21.7G 9.8G 24.1M
    Host Resource(s):      hc:h_vmem=52.000G
scg3-0-17 lx26-amd64 32 26.99 63.0G 42.5G 9.8G 28.3M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-18 lx26-amd64 32 24.70 63.0G 24.1G 9.8G 30.7M
    Host Resource(s):      hc:h_vmem=20.000G
scg3-0-19 lx26-amd64 32 23.92 63.0G 9.2G 9.8G 27.6M
    Host Resource(s):      hc:h_vmem=20.000G
scg3-0-2 lx26-amd64 32 35.76 63.0G 34.6G 9.8G 31.7M
    Host Resource(s):      hc:h_vmem=-8.000G
scg3-0-20 lx26-amd64 32 22.69 63.0G 23.4G 9.8G 30.0M
    Host Resource(s):      hc:h_vmem=20.000G
scg3-0-21 lx26-amd64 32 0.03 63.0G 841.9M 9.8G 0.0
    Host Resource(s):      hc:h_vmem=60.000G
scg3-0-22 lx26-amd64 32 20.20 63.0G 25.8G 9.8G 35.4M
    Host Resource(s):      hc:h_vmem=16.000G
scg3-0-23 lx26-amd64 32 34.50 63.0G 34.9G 9.8G 32.2M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-24 lx26-amd64 32 35.24 63.0G 35.8G 9.8G 38.1M
    Host Resource(s):      hc:h_vmem=-175.000G
scg3-0-3 lx26-amd64 32 34.24 63.0G 34.2G 9.8G 31.0M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-4 lx26-amd64 32 26.76 63.0G 38.9G 9.8G 28.4M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-5 lx26-amd64 32 34.85 63.0G 26.9G 9.8G 31.5M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-6 lx26-amd64 32 31.63 63.0G 35.8G 9.8G 21.6M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-7 lx26-amd64 32 2.01 63.0G 23.5G 9.8G 48.6M
    Host Resource(s):      hc:h_vmem=-36.000G
scg3-0-8 lx26-amd64 32 34.59 63.0G 38.1G 9.8G 30.6M
    Host Resource(s):      hc:h_vmem=0.000
scg3-0-9 lx26-amd64 32 34.63 63.0G 21.6G 9.8G 36.3M
    Host Resource(s):      hc:h_vmem=0.000




On 10/19/2012 12:48 PM, Reuti wrote:
Am 19.10.2012 um 20:58 schrieb Alex Chekholko:

qhost values seem fine:

...
scg3-0-11               lx26-amd64     32 27.15   63.0G   38.3G    9.8G  393.6M
scg3-0-12               lx26-amd64     32 27.36   63.0G   38.7G    9.8G   33.6M
scg3-0-13               lx26-amd64     32 22.61   63.0G   24.4G    9.8G   31.5M
...

When I submit a job as myself with such a memory request, it doesn't get 
dispatched, just sits in 'qw'.

And:

qhost -F h_vmem

The limit wasn't defined after the job already started?

-- Reuti


Regards,
Alex

On 10/18/12 7:41 PM, Rayson Ho wrote:
Alex,

Can you run qhost and see if the memory value is also negative also??
If it is, then this bug was fixed in any release of OGS/GE.

Rayson



On Thu, Oct 18, 2012 at 6:53 PM, Alex Chekholko <[email protected]> wrote:
Hi,

Running Rocks 6, so whatever GE version is included there.

h_vmem is set consumable and per job, 4G default:

-bash-4.1$ qconf -sc |grep h_vmem
h_vmem              h_vmem     MEMORY      <=    YES         JOB 4G       0

each exec host has an h_vmem attribute set:
-bash-4.1$ qconf -se scg3-0-11 |grep h_vmem
complex_values        slots=16,h_vmem=60G

pe "shm" is defined;
-bash-4.1$ qconf -sp shm
pe_name            shm
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

A user is submitting a job with '-pe shm -l h_vmem=120G', and it's getting
dispatched to a host that has h_vmem=60G defined.  How is that possible?

And qstat reports negative h_vmem values, e.g.:
-bash-4.1$ qstat -f -u '*' -F h_vmem
...
[email protected]          BIP   0/16/16        12.12    lx26-amd64
         hc:h_vmem=-80.000G
   88866 0.50500 mCSRR57762 yxl          r     10/18/2012 09:17:21     1
   89094 0.60500 G_ordermar elisaz       r     10/18/2012 15:03:39    15
...

Maybe the sgeexecd needs to be cycled for the setting to take effect?  I can
try that next.

Regards,
--
Alex Chekholko [email protected]
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to