Am 29.08.2013 um 12:18 schrieb Guillermo Marco Puche:

> Hello,
> 
> I'm having a lot of troubles trying to guess the ideal SGE queue 
> configuration for my cluster.
> 
> Cluster schema is the following one:
> 
> submit_host:
>       • frontend
> execution hosts:
>       • compute-0-0 (8 cpus 120GB+ RAM)
>       • compute-0-1 (8 cpus - 32GB RAM)
>       • compute-0-2 (8 cpus - 32GB RAM)
>       • compute-0-3 (8 cpus - 32GB RAM)
>       • compute-0-4 (8 cpus - 32GB RAM)
>       • compute-0-5 (8 cpus - 32GB RAM)
> 
> My idea to configure it was:
> 
> medium_priority.q:
>  - All pipeline jobs will run by default on this queue.
>  - All hosts are available for this queue.
> 
> high_priorty.q:
>  - All hosts are available for this queue.
>  - It has the authority to suspend jobs in medium_priority.q
> 
> non_suspendable.q:
>  - This queue is employed for specific jobs which have trouble if they're 
> suspended.
>  - Problem if lot of non_suspendable jobs run in same queue and exceed memory 
> thresholds.

Why are so many jobs running at the same time when they exceed the available 
memory - are they requesting the estimated amount of memory?


>  - If they're non suspendable I guess both high_priority.q and 
> medium_priority.q must be subordinates to this queue.
> 
> memory.q:
>  - Specific queue with just compute-0-0 slots to submit jobs to memory node.
> 
> The problem is that I've to set-up this schema while enabling threshold 
> memory uses so compute nodes don't crash by excessive memory load.

If they really use 12 GB of swap they are already oversubscribing the memory by 
far. Nowadays having a swap of 2 GB is common. How large is your swap partition?

Jobs, which are in the system, will still consume the resources they already 
allocated. Nothing is freed.

- Putting a queue in alarm state by a load_threshold will avoid that new jobs 
are going to this node. All running ones will continue to run as usual.

- Setting a suspend_threshold will only avoid that the suspended job consumes 
more.

-- Reuti


> I'm also confused about over subscribing and thresholds. How to mix them 
> together?
> 
> I've read basic Oracle SGE manual but I still feel weak. Is there any example 
> configurations to test? Do you think that configuration is viable? Any 
> suggestions?
> 
> Thank you very much.
> 
> Best regards,
> Guillermo.
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to