Hi,

Am 08.01.2014 um 07:42 schrieb Edrisse Chermak:

> Hi Reuti,
> 
> Thanks for your prompt answer, I'm requesting a default like parallel
> environment when I submit a job :
> -----------------------------------
> pe_name            parallel_single
> slots              128
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $pe_slots
> control_slaves     FALSE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> -----------------------------------
> I submit with : qsub -pe parallel_single 64 -q 1day.q job.sh

Usually it's not necessary to request any queue, as SGE will select an 
appropriate one for the resource requests you made. Submitting to a queue is 
more "PBS-style".


> with: qconf -sq 1day.q :
> -----------------------------------------------
> qname                 1day.q
> hostlist              node1 node2
> ...
> pe_list               parallel_single
> ...
> slots                 64,[node1=64],[node2=64]
> ...
> -----------------------------------------------

Do you have more than one queue and observe both are used? In this case it's 
necessary to limit the number of used slots across all queues.

a) by setting a "complex_values slots=64" in `qconf -me node1`

*or*

b) an RQS with a restriction "limit        hosts {*} to slots=$num_proc"

(like "limit        hosts {*} to slots=64" would do)

-- Reuti


> I feel that load average is the only criterion preventing a new job to
> start on the node I expect. I perhaps missed something, please let me
> know if so.
> Best Regards,
> Edrisse
> 
> On 01/07/2014 07:31 PM, Reuti wrote:
>> Hi,
>> 
>> Am 07.01.2014 um 15:53 schrieb Edrisse Chermak:
>> 
>>> I have two 64 CPU nodes, node1 running a 16 CPUs job, and node2 which is
>>> free:
>>> 
>>> HOSTNAME  ARCH     NCPU  LOAD
>>> node1   linux-x64   64  16.00
>>> node2   linux-x64   64   0.00
>>> 
>>> When I launch a 2nd job asking for 64 CPU, Grid Engine sends sometimes
>>> the new job to node1.
>> 
>> This sounds like you are submitting a multi-core job without requesting a 
>> parallel environment (PE). It would be good to request for all parallel jobs 
>> a PE for the job where you specify 16 resp. 64 cores. Then this can't happen 
>> at all due to lack of free cores on node 1. A plain PE with the default 
>> values when you define a new one is sufficient, often it's named "smp" if it 
>> should stay on one node only for all slots.
>> 
>> `man sge_pe`
>> 
>> and submit:
>> 
>> $ qsub -pe smp 64 job.sh
>> 
>> 
>> (It's necessary to set the proper slot count [i.e. "64"] in the queue 
>> definition.)
>> 
>> -- Reuti
>> 
> 
> ________________________________
> 
> This message and its contents including attachments are intended solely for 
> the original recipient. If you are not the intended recipient or have 
> received this message in error, please notify me immediately and delete this 
> message from your computer system. Any unauthorized use or distribution is 
> prohibited. Please consider the environment before printing this email.
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to