On 18 November 2011 14:21, Gerard Henry <[email protected]> wrote:
> hello all,
>
> i got trouble to confgure a queue on SGE 6.2u5 (linux)
>
> I have two machines amd64, with this topology: SCCSCC so the total of
> cores is 8.
>
> first, i defined a group:
> # qconf -shgrp @qlong
> group_name @qlong
> hostlist charybde scylla
>
> then a queue:
> # qconf -sq long1
> qname                 long1
> hostlist              @qlong
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 4
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> but when i try to submit a job, it fails with:
> % qsub -w v ./script1.sh
> Job 14431 cannot run in PE "mpi_labo" because it only offers 0 slots
>
> the beginning of the script is:
> ...
> #$ -q long1
> #$ -pe mpi_labo 6
>
>
> and the PE is defined by:
> qconf -sp mpi_labo
> pe_name            mpi_labo
> slots              8
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $pe_slots

I think the above line is the problem $pe_slots means that "the full
range of processes as specified with the qsub(1) -pe switch has to be
allocated on a single host".
You only have 4 slots per host so jobs larger than that won't run.

> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
>
>
> If i try to submit with "-pe mpi_labo 4", it works. What am i missing?
>
> I also tried to augment the value:
> qconf -mq long1
> slots                 8
> but in this case, the program executes his 8 threads on the same host,
> that's not what i want;
>
> thanks in advance for help,
>
> gerard
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to