Hi.

I pack jobs unto nodes using the following GE setup:

    # qconf -ssconf | egrep "queue|load"
    queue_sort_method                 seqno
    job_load_adjustments              NONE
    load_adjustment_decay_time        0
    load_formula                      slots

I also set my nodes with the slots complex value:

    # qconf -rattr exechost complex_values "slots=64" compute-2-1

Serial jobs are all packed nicely unto a node until the node is full and then 
it goes unto the next node.


The issue I am having is that my subordinate queue breaks when I have set my 
nodes with the node complex value above.

I have two queues:  The owner queue and the free queue:

    # qconf -sq owner | egrep "subordinate|shell"
    shell                 /bin/bash
    shell_start_mode      posix_compliant
    subordinate_list      free=1

    # qconf -sq free | egrep "subordinate|shell"
    shell                 /bin/bash
    shell_start_mode      posix_compliant
    subordinate_list      NONE

When I fill up the free queue with serial jobs and I then submit a job to the 
owner queue, the owner job will not suspend the free job.   Qstat scheduling 
info says:

    queue instance "[email protected]" dropped because it is full
    queue instance "[email protected]" dropped because it is full

If I remove the "complex_values=" from my nodes, then jobs are correctly 
suspended in free queue and the owner job runs just fine.

So how can I accomplish both items above?



*** By the way, here are some pre-answers to some questions I am going to be 
asked:

Why pack jobs?:  Because in any HPC environment that runs a mixture of serial 
and parallel jobs, you really don't want to spread single core jobs across 
multiple nodes, specially 64 cores nodes.   You want to keep nodes whole for 
parallel jobs ( this is HPC 101 ).

Suspended jobs will not free up resources:  Yeap, but the jobs will *not* be 
consuming CPU cycles which is what I want.

Thanks,
Joseph

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to