Like the OP mentioned, one could use a consumable complex for 6.1. If you add "complex_values
network=16" to the queue, and "load_thresholds network=15" it will be pushed to
alarm state automatically and you can avoid the load sensor. When you add a default consumption of
1, it works out-of-the-box (it's only subtracted if it's attached to a queue).
I.e. the other queue for normal jobs don't have it attached, and you select the
special multi-node queue by the requested PE.
Unfortunately, I think there are two problems with this suggestion.
1. If I set network=16, then only 16 processors out of 48 will be usable
by parallel jobs.
2. The use of a load threshold seems to prevent fill_up from working
correctly, so even if I have network=48 for the queue complex and
network=47 for the load threshold it will not use up all 48 slots before
moving on to the next host. This seems to be due to the alarm state
becoming active on the queues at inconsistent times during a single
scheduling iteration. This would also affect the use of a custom load
sensor, so I'm abandoning that idea.
If we were to update to 6.2u5, what options would we then have?
--
Gerald Ragghianti
Office of Information Technology - High Performance Computing
Newton HPC Program http://newton.utk.edu/
The University of Tennessee, 2309 Kingston Pike, Knoxville, TN 37919
Phone: 865-974-2448
/-------------------------------------\
| One Contact OIT: 865-974-9900 |
| Many Solutions help.utk.edu |
\-------------------------------------/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users