Am 29.08.2013 um 12:18 schrieb Guillermo Marco Puche: > Hello, > > I'm having a lot of troubles trying to guess the ideal SGE queue > configuration for my cluster. > > Cluster schema is the following one: > > submit_host: > • frontend > execution hosts: > • compute-0-0 (8 cpus 120GB+ RAM) > • compute-0-1 (8 cpus - 32GB RAM) > • compute-0-2 (8 cpus - 32GB RAM) > • compute-0-3 (8 cpus - 32GB RAM) > • compute-0-4 (8 cpus - 32GB RAM) > • compute-0-5 (8 cpus - 32GB RAM) > > My idea to configure it was: > > medium_priority.q: > - All pipeline jobs will run by default on this queue. > - All hosts are available for this queue. > > high_priorty.q: > - All hosts are available for this queue. > - It has the authority to suspend jobs in medium_priority.q > > non_suspendable.q: > - This queue is employed for specific jobs which have trouble if they're > suspended. > - Problem if lot of non_suspendable jobs run in same queue and exceed memory > thresholds.
Why are so many jobs running at the same time when they exceed the available memory - are they requesting the estimated amount of memory? > - If they're non suspendable I guess both high_priority.q and > medium_priority.q must be subordinates to this queue. > > memory.q: > - Specific queue with just compute-0-0 slots to submit jobs to memory node. > > The problem is that I've to set-up this schema while enabling threshold > memory uses so compute nodes don't crash by excessive memory load. If they really use 12 GB of swap they are already oversubscribing the memory by far. Nowadays having a swap of 2 GB is common. How large is your swap partition? Jobs, which are in the system, will still consume the resources they already allocated. Nothing is freed. - Putting a queue in alarm state by a load_threshold will avoid that new jobs are going to this node. All running ones will continue to run as usual. - Setting a suspend_threshold will only avoid that the suspended job consumes more. -- Reuti > I'm also confused about over subscribing and thresholds. How to mix them > together? > > I've read basic Oracle SGE manual but I still feel weak. Is there any example > configurations to test? Do you think that configuration is viable? Any > suggestions? > > Thank you very much. > > Best regards, > Guillermo. > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
