Hi,
Am 08.01.2014 um 07:42 schrieb Edrisse Chermak:
> Hi Reuti,
>
> Thanks for your prompt answer, I'm requesting a default like parallel
> environment when I submit a job :
> -----------------------------------
> pe_name parallel_single
> slots 128
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $pe_slots
> control_slaves FALSE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary FALSE
> -----------------------------------
> I submit with : qsub -pe parallel_single 64 -q 1day.q job.sh
Usually it's not necessary to request any queue, as SGE will select an
appropriate one for the resource requests you made. Submitting to a queue is
more "PBS-style".
> with: qconf -sq 1day.q :
> -----------------------------------------------
> qname 1day.q
> hostlist node1 node2
> ...
> pe_list parallel_single
> ...
> slots 64,[node1=64],[node2=64]
> ...
> -----------------------------------------------
Do you have more than one queue and observe both are used? In this case it's
necessary to limit the number of used slots across all queues.
a) by setting a "complex_values slots=64" in `qconf -me node1`
*or*
b) an RQS with a restriction "limit hosts {*} to slots=$num_proc"
(like "limit hosts {*} to slots=64" would do)
-- Reuti
> I feel that load average is the only criterion preventing a new job to
> start on the node I expect. I perhaps missed something, please let me
> know if so.
> Best Regards,
> Edrisse
>
> On 01/07/2014 07:31 PM, Reuti wrote:
>> Hi,
>>
>> Am 07.01.2014 um 15:53 schrieb Edrisse Chermak:
>>
>>> I have two 64 CPU nodes, node1 running a 16 CPUs job, and node2 which is
>>> free:
>>>
>>> HOSTNAME ARCH NCPU LOAD
>>> node1 linux-x64 64 16.00
>>> node2 linux-x64 64 0.00
>>>
>>> When I launch a 2nd job asking for 64 CPU, Grid Engine sends sometimes
>>> the new job to node1.
>>
>> This sounds like you are submitting a multi-core job without requesting a
>> parallel environment (PE). It would be good to request for all parallel jobs
>> a PE for the job where you specify 16 resp. 64 cores. Then this can't happen
>> at all due to lack of free cores on node 1. A plain PE with the default
>> values when you define a new one is sufficient, often it's named "smp" if it
>> should stay on one node only for all slots.
>>
>> `man sge_pe`
>>
>> and submit:
>>
>> $ qsub -pe smp 64 job.sh
>>
>>
>> (It's necessary to set the proper slot count [i.e. "64"] in the queue
>> definition.)
>>
>> -- Reuti
>>
>
> ________________________________
>
> This message and its contents including attachments are intended solely for
> the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete this
> message from your computer system. Any unauthorized use or distribution is
> prohibited. Please consider the environment before printing this email.
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users