Am 03.08.2012 um 18:04 schrieb Joseph Farran:

> I pack jobs unto nodes using the following GE setup:
> 
>    # qconf -ssconf | egrep "queue|load"
>    queue_sort_method                 seqno
>    job_load_adjustments              NONE
>    load_adjustment_decay_time        0
>    load_formula                      slots
> 
> I also set my nodes with the slots complex value:
> 
>    # qconf -rattr exechost complex_values "slots=64" compute-2-1

Don't limit it here. Just define 64 in both queues for slots.


> Serial jobs are all packed nicely unto a node until the node is full and then 
> it goes unto the next node.
> 
> 
> The issue I am having is that my subordinate queue breaks when I have set my 
> nodes with the node complex value above.
> 
> I have two queues:  The owner queue and the free queue:
> 
>    # qconf -sq owner | egrep "subordinate|shell"
>    shell                 /bin/bash
>    shell_start_mode      posix_compliant
>    subordinate_list      free=1

subordinate_list      slots=64(free)


>    # qconf -sq free | egrep "subordinate|shell"
>    shell                 /bin/bash
>    shell_start_mode      posix_compliant
>    subordinate_list      NONE
> 
> When I fill up the free queue with serial jobs and I then submit a job to the 
> owner queue, the owner job will not suspend the free job.   Qstat scheduling 
> info says:
> 
>    queue instance "[email protected]" dropped because it is full
>    queue instance "[email protected]" dropped because it is full
> 
> If I remove the "complex_values=" from my nodes, then jobs are correctly 
> suspended in free queue and the owner job runs just fine.

Yes, and what's the problem with this setup?


> So how can I accomplish both items above?
> 
> 
> 
> *** By the way, here are some pre-answers to some questions I am going to be 
> asked:
> 
> Why pack jobs?:  Because in any HPC environment that runs a mixture of serial 
> and parallel jobs, you really don't want to spread single core jobs across 
> multiple nodes, specially 64 cores nodes.   You want to keep nodes whole for 
> parallel jobs ( this is HPC 101 ).

Depends on the application. E.g. Molcas is writing a lot to the local scratch 
disk, so it's better to spread them in the cluster and use the remaining cores 
in each exechost for jobs without or at least with less disk access.

-- Reuti


> 
> Suspended jobs will not free up resources:  Yeap, but the jobs will *not* be 
> consuming CPU cycles which is what I want.
> 
> Thanks,
> Joseph
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to