Tim, this is a good find. Was there a problem keeping the slurm user though? I would feel more comfortable leaving that check in there. Was this not an issue with 2.3?
Danny On 07/03/12 14:33, Tim Wickberg wrote: > Attached patch fixes a bug in the select/bluegene plugin for BG/L+P in > the 2.4 series. > > Without this applied, the userid assigned to a block will never be > updated in MMCS, preventing the user from launching a job with a > message like: > >> <Jul 03 11:50:56.264854> BE_MPI (ERROR): Current user is not the >> owner of the partition, >> <Jul 03 11:50:56.264925> BE_MPI (ERROR): and is not in the >> partition's user list - Aborting >> <Jul 03 11:50:56.406071> FE_MPI (ERROR): Back-end failed while >> preparing partition with return code 31. >> <Jul 03 11:50:56.477110> FE_MPI (ERROR): Failure list: >> <Jul 03 11:50:56.477145> FE_MPI (ERROR): - 1. A user does not have >> permission to run the job on specified partition (failure #31) > > The patch simplifies the logic a bit: it removes all users that aren't > the correct assigned user including the slurm user account. (This > doesn't seem to affect operation on our 1-rack BG/L here at least, > although I can't guarantee that for BG/P.) And, correcting the bug > itself: it makes sure to add the assigned user to the block. > > Before, if user_count=0 or 1 (which was likely slurm_user, hitting a > continue out of the one pass of the loop), the loop would be skipped > over and the correct user would never be added in to the block. > > - Tim
