Attached patch fixes a bug in the select/bluegene plugin for BG/L+P in the 2.4 series.

Without this applied, the userid assigned to a block will never be updated in MMCS, preventing the user from launching a job with a message like:

<Jul 03 11:50:56.264854> BE_MPI (ERROR): Current user is not the owner of the 
partition,
<Jul 03 11:50:56.264925> BE_MPI (ERROR):   and is not in the partition's user 
list - Aborting
<Jul 03 11:50:56.406071> FE_MPI (ERROR): Back-end failed while preparing 
partition with return code 31.
<Jul 03 11:50:56.477110> FE_MPI (ERROR): Failure list:
<Jul 03 11:50:56.477145> FE_MPI (ERROR):   - 1. A user does not have permission 
to run the job on specified partition (failure #31)

The patch simplifies the logic a bit: it removes all users that aren't the correct assigned user including the slurm user account. (This doesn't seem to affect operation on our 1-rack BG/L here at least, although I can't guarantee that for BG/P.) And, correcting the bug itself: it makes sure to add the assigned user to the block.

Before, if user_count=0 or 1 (which was likely slurm_user, hitting a continue out of the one pass of the loop), the loop would be skipped over and the correct user would never be added in to the block.

- Tim
--- slurm-2.4.1/src/plugins/select/bluegene/bl/bridge_linker.c	2011-12-21 18:52:55.000000000 -0500
+++ slurm-2.4.1-mod/src/plugins/select/bluegene/bl/bridge_linker.c	2012-07-03 16:51:57.013133295 -0400
@@ -1687,7 +1687,7 @@
 #ifdef HAVE_BG_FILES
 	char *user;
 	rm_partition_t *block_ptr = NULL;
-	int rc, i, user_count;
+	int rc, i, user_count, found=0;
 	char *user_name = NULL;
 
 	/* We can't use bridge_get_block_info here because users are
@@ -1753,24 +1753,15 @@
 			continue;
 		}
 
-		if (!strcmp(user, bg_conf->slurm_user_name)) {
-			free(user);
-			continue;
-		}
-
 		if (user_name && !strcmp(user, user_name)) {
-			returnc = REMOVE_USER_FOUND;
-			if ((rc = bridge_block_add_user(bg_record, user))
-			    != SLURM_SUCCESS) {
-				debug("couldn't add user %s to block %s",
-				      user, bg_record->bg_block_id);
-			}
+			found=1;
 			free(user);
 			continue;
 		}
 
 		info("Removing user %s from Block %s",
 		     user, bg_record->bg_block_id);
+		returnc = REMOVE_USER_FOUND;
 		if ((rc = _remove_block_user(bg_record->bg_block_id, user))
 		    != SLURM_SUCCESS) {
 			debug("user %s isn't on block %s",
@@ -1779,6 +1770,16 @@
 		}
 		free(user);
 	}
+
+	// no users currently, or we didn't find outselves in the lookup
+ 	if (!found && user_name) {
+ 		if ((rc = bridge_block_add_user(bg_record, user_name))
+ 		    != SLURM_SUCCESS) {
+ 			debug("couldn't add user %s to block %s",
+ 			      user, bg_record->bg_block_id);
+ 		}
+ 	}
+
 	if ((rc = bridge_free_block(block_ptr)) != SLURM_SUCCESS) {
 		error("bridge_free_block(): %s", bg_err_str(rc));
 	}

Reply via email to