Moe, There are a couple of other problems with the --switch option that have surfaced. I can make the patches but I'd like your advice first.
1) when the max-time is not specified, and the number of switches selected is greater than the requested number, the job will be requeued once. I could document this as the desired behavior. Other options would be to set the wait time to the site max; or to not wait at all, but that basically means the --switch without a time is a nop. I'm leaning towards setting it to the site max. 2) the max time is a time string. That means it's lowest resolution is minutes. Since the default value for the site max (max_switch_wait) is 60 seconds, it basically means the max-time is a nop. I'm not in favor of raising the max-time default. I'd rather make the value the number of seconds to wait. From: Moe Jette <je...@schedmd.com> To: slurm-dev@lists.llnl.gov, Date: 01/18/2012 12:48 PM Subject: Re: [slurm-dev] bug when --switch option used Sent by: owner-slurm-...@lists.llnl.gov Alex, Thanks for the patch. This will be in version 2.3.3 plus a fix to similar logic in the select/linear plugin used to count the leaf switches used. Moe Quoting Alejandro Lucero Palau <alejandro.luc...@bsc.es>: > Hi, > > I've found a bug related to use of --switch option for requesting a > number of switches when topology plugin is used. > > There are two problems: > > 1) leaf_switch_counter is incremented in the wrong place leading to > count tested switches instead of just the selected switch. > 2) When _select_nodes is called, checking for best_switch value is not > always done. This problem could lead to spread a job through more than > requested switches. > > Patch attached. > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain > information which is privileged, confidential, proprietary, or exempt > from disclosure under applicable law. If you are not the intended > recipient or the person responsible for delivering the message to the > intended recipient, you are strictly prohibited from disclosing, > distributing, copying, or in any way using this message. If you have > received this communication in error, please notify the sender and > destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer.htm