Hi,

I've found a bug related to use of --switch option for requesting a
number of switches when topology plugin is used.

There are two problems:

1) leaf_switch_counter is incremented in the wrong place leading to
count tested switches instead of just the selected switch.
2) When _select_nodes is called, checking for best_switch value is not
always done. This problem could lead to spread a job through more than
requested switches.

Patch attached.


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm
--- /tmp/job_test.c.old 2012-01-18 14:09:32.835010027 +0100
+++ /tmp/job_test.c.new 2012-01-18 14:38:52.016010187 +0100
@@ -1628,13 +1628,13 @@
                                        best_fit_nodes = switches_node_cnt[j];
                                        best_fit_location = j;
                                        best_fit_sufficient = sufficient;
-                                       leaf_switch_count++;
                                }
                        }
                }
                if (best_fit_nodes == 0)
                        break;
 
+               leaf_switch_count++;
                /* Use select nodes from this leaf */
                first = bit_ffs(switches_bitmap[best_fit_location]);
                last  = bit_fls(switches_bitmap[best_fit_location]);
@@ -2105,7 +2105,7 @@
        cpu_count = _select_nodes(job_ptr, min_nodes, max_nodes, req_nodes,
                                  bitmap, cr_node_cnt, free_cores,
                                  node_usage, cr_type, test_only);
-       if (cpu_count) {
+       if ((cpu_count) && (job_ptr->best_switch)) {
                /* job fits! We're done. */
                if (select_debug_flags & DEBUG_FLAG_CPU_BIND) {
                        info("cons_res: cr_job_test: test 1 pass - "
@@ -2319,7 +2319,7 @@
        FREE_NULL_BITMAP(orig_map);
        FREE_NULL_BITMAP(avail_cores);
        FREE_NULL_BITMAP(tmpcore);
-       if (!cpu_count) {
+       if ((!cpu_count) || (job_ptr->best_switch == 0)) {
                /* we were sent here to cleanup and exit */
                FREE_NULL_BITMAP(free_cores);
                if (select_debug_flags & DEBUG_FLAG_CPU_BIND) {

Reply via email to