Re: [gridengine users] Reconfiguring my SGE cluster.

Reuti Thu, 29 Aug 2013 08:54:30 -0700

Am 29.08.2013 um 13:43 schrieb Guillermo Marco Puche:

> Hello Reuti,
> 
> I just thought maybe the solution for memory usage, since I cannot predict 
> the job memory consumption would be setting policies in "qconf -srqs".
> How can I set srqs policies to set a rule "a job cannot subscribe to a node 
> if at least not 2GB virtual memory free", I guess it's virtual memory the 
> correct way to do this.


This would go to load_threshold settings in the queue. Something like:

load_thresholds       virtual_free=2GB

"alarm state" of a queue instance just means, that it became disabled as a 
load_threshold was bypassed. Maybe "disable_threshold" would be clearer.

(The system load I would even leave out. IMO this is useful for large SMP 
machine where you allow e.g. 72 slots on a 64 core system - as not all parallel 
jobs are scaling well, it might be possible to start more processes than cores 
are installed without any penalty.)

-- Reuti


> Regards,
> 
> Guillermo.
> 
> 
> On 08/29/2013 01:02 PM, Guillermo Marco Puche wrote:
>> On 08/29/2013 12:40 PM, Reuti wrote:
>>> Am 29.08.2013 um 12:18 schrieb Guillermo Marco Puche:
>>> 
>>> 
>>>> Hello,
>>>> 
>>>> I'm having a lot of troubles trying to guess the ideal SGE queue 
>>>> configuration for my cluster.
>>>> 
>>>> Cluster schema is the following one:
>>>> 
>>>> submit_host:
>>>>    • frontend
>>>> execution hosts:
>>>>    • compute-0-0 (8 cpus 120GB+ RAM)
>>>>    • compute-0-1 (8 cpus - 32GB RAM)
>>>>    • compute-0-2 (8 cpus - 32GB RAM)
>>>>    • compute-0-3 (8 cpus - 32GB RAM)
>>>>    • compute-0-4 (8 cpus - 32GB RAM)
>>>>    • compute-0-5 (8 cpus - 32GB RAM)
>>>> 
>>>> My idea to configure it was:
>>>> 
>>>> medium_priority.q:
>>>>  - All pipeline jobs will run by default on this queue.
>>>>  - All hosts are available for this queue.
>>>> 
>>>> high_priorty.q:
>>>>  - All hosts are available for this queue.
>>>>  - It has the authority to suspend jobs in medium_priority.q
>>>> 
>>>> non_suspendable.q:
>>>>  - This queue is employed for specific jobs which have trouble if they're 
>>>> suspended.
>>>>  - Problem if lot of non_suspendable jobs run in same queue and exceed 
>>>> memory thresholds.
>>>> 
>>> Why are so many jobs running at the same time when they exceed the 
>>> available memory - are they requesting the estimated amount of memory?
>> Ok the problem is that depending on input data load_memory of process 
>> varies. I really cannot predict the amount of memory the process is gonna 
>> use. Those processes start to "eat" memory slowly. So they start with very 
>> low memory usage and as they run they take more and more memory.
>> 
>> I think the idea solution will be then to set the max memory a job can use 
>> without being suspended.
>>> 
>>>>  - If they're non suspendable I guess both high_priority.q and 
>>>> medium_priority.q must be subordinates to this queue.
>>>> 
>>>> memory.q:
>>>>  - Specific queue with just compute-0-0 slots to submit jobs to memory 
>>>> node.
>>>> 
>>>> The problem is that I've to set-up this schema while enabling threshold 
>>>> memory uses so compute nodes don't crash by excessive memory load.
>>>> 
>>> If they really use 12 GB of swap they are already oversubscribing the 
>>> memory by far. Nowadays having a swap of 2 GB is common. How large is your 
>>> swap partition?
>>> 
>>> Jobs, which are in the system, will still consume the resources they 
>>> already allocated. Nothing is freed.
>>> 
>> 16 GB of Swap per node.
>>> - Putting a queue in alarm state by a load_threshold will avoid that new 
>>> jobs are going to this node. All running ones will continue to run as usual.
>>> 
>>> - Setting a suspend_threshold will only avoid that the suspended job 
>>> consumes more.
>>> 
>>> -- Reuti
>>> 
>>> 
>>> 
>>>> I'm also confused about over subscribing and thresholds. How to mix them 
>>>> together?
>>>> 
>>>> I've read basic Oracle SGE manual but I still feel weak. Is there any 
>>>> example configurations to test? Do you think that configuration is viable? 
>>>> Any suggestions?
>>>> 
>>>> Thank you very much.
>>>> 
>>>> Best regards,
>>>> Guillermo.
>>>> _______________________________________________
>>>> users mailing list
>>>> 
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Reconfiguring my SGE cluster.

Reply via email to