Attached patch fixes a bug in the select/bluegene plugin for BG/L+P in
the 2.4 series.
Without this applied, the userid assigned to a block will never be
updated in MMCS, preventing the user from launching a job with a message
like:
<Jul 03 11:50:56.264854> BE_MPI (ERROR): Current user is not the owner of the
partition,
<Jul 03 11:50:56.264925> BE_MPI (ERROR): and is not in the partition's user
list - Aborting
<Jul 03 11:50:56.406071> FE_MPI (ERROR): Back-end failed while preparing
partition with return code 31.
<Jul 03 11:50:56.477110> FE_MPI (ERROR): Failure list:
<Jul 03 11:50:56.477145> FE_MPI (ERROR): - 1. A user does not have permission
to run the job on specified partition (failure #31)
The patch simplifies the logic a bit: it removes all users that aren't
the correct assigned user including the slurm user account. (This
doesn't seem to affect operation on our 1-rack BG/L here at least,
although I can't guarantee that for BG/P.) And, correcting the bug
itself: it makes sure to add the assigned user to the block.
Before, if user_count=0 or 1 (which was likely slurm_user, hitting a
continue out of the one pass of the loop), the loop would be skipped
over and the correct user would never be added in to the block.
- Tim
--- slurm-2.4.1/src/plugins/select/bluegene/bl/bridge_linker.c 2011-12-21 18:52:55.000000000 -0500
+++ slurm-2.4.1-mod/src/plugins/select/bluegene/bl/bridge_linker.c 2012-07-03 16:51:57.013133295 -0400
@@ -1687,7 +1687,7 @@
#ifdef HAVE_BG_FILES
char *user;
rm_partition_t *block_ptr = NULL;
- int rc, i, user_count;
+ int rc, i, user_count, found=0;
char *user_name = NULL;
/* We can't use bridge_get_block_info here because users are
@@ -1753,24 +1753,15 @@
continue;
}
- if (!strcmp(user, bg_conf->slurm_user_name)) {
- free(user);
- continue;
- }
-
if (user_name && !strcmp(user, user_name)) {
- returnc = REMOVE_USER_FOUND;
- if ((rc = bridge_block_add_user(bg_record, user))
- != SLURM_SUCCESS) {
- debug("couldn't add user %s to block %s",
- user, bg_record->bg_block_id);
- }
+ found=1;
free(user);
continue;
}
info("Removing user %s from Block %s",
user, bg_record->bg_block_id);
+ returnc = REMOVE_USER_FOUND;
if ((rc = _remove_block_user(bg_record->bg_block_id, user))
!= SLURM_SUCCESS) {
debug("user %s isn't on block %s",
@@ -1779,6 +1770,16 @@
}
free(user);
}
+
+ // no users currently, or we didn't find outselves in the lookup
+ if (!found && user_name) {
+ if ((rc = bridge_block_add_user(bg_record, user_name))
+ != SLURM_SUCCESS) {
+ debug("couldn't add user %s to block %s",
+ user, bg_record->bg_block_id);
+ }
+ }
+
if ((rc = bridge_free_block(block_ptr)) != SLURM_SUCCESS) {
error("bridge_free_block(): %s", bg_err_str(rc));
}