John, Thanks for your suggestion, but I think I must have miscommunicated. I don't want the reservations to overlap, so I want to figure out how to prevent this situation from happening. So, I think adding the OVERLAP flag is not going to help me. Also, I didn't pick these nodes specifically, but left the choice up to SLURM. I think it's automatically trying to hang onto the nodes from day to day in the PECAN reservation, and by the time it catches up to the other one, which picked its nodes well in advance, it ends up with an overlapping list. Obviously, this behavior isn't optimal.
Any thoughts on preventing this? Thanks, Bill. -- Bill Barth, Ph.D., Director, HPC [email protected] | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 7/6/15, 11:19 AM, "John Desantis" <[email protected]> wrote: > >Bill, > >> Has anyone used DAILY reservations in 14.11.3 on a regular enough basis >>to >> see such a problem? I just recreated these PECAN recurring reservations >>so >> that there are now no reservations in the system that spanned our SLURM >> upgrade. I thought that was going to solve my problem, but no luck. Any >> advice would be much appreciated. > >We currently have 1 DAILY reservation in use at our site and I ran >into the same "slurm_update error: Requested reservation overlaps with >another reservation" error before while migrating a few GPU nodes out >of our reservation. > >The "OVERLAP" flag is required in both the 'TACC-SSI-2015-07-06' and >'PECAN-1km' reservations due to the 'c401' nodes. Luckily (as in my >case), you have start and end times in both of the reservations, and >it doesn't look like to 'PECAN' reservations have overlapping nodes >(then again, I only quickly browsed) which each other. > >What I would do to quickly resolve this is: > >1.) Place all of the nodes in a drain state for the >'TACC-SSI-2015-07-06' and 'PECAN-1km' reservations. >2.) Delete and recreate both reservations ensuring that the "OVERLAP" >flag is present. > >The reason I have suggested #1 is because in our case I didn't want >any long running jobs to land on the nodes while re-creating the >reservation (we're using the same set of nodes), further causing grief >for the reservation users, and I didn't want to place a drain on the >global partition. > >John DeSantis > > > > >2015-07-02 21:10 GMT-04:00 Bill Barth <[email protected]>: >> >> I'd like to update this reservation: >> >> ReservationName=TACC-SSI-2015-07-06 StartTime=2015-07-06T08:00:00 >> EndTime=2015-07-06T18:00:00 Duration=10:00:00 >> >> >>Nodes=c401-[001-004,101-104,201-204,301-304,401-404,501-504,601-604,701-7 >>04 >> ,801-803] NodeCnt=35 CoreCnt=560 Features=(null) >>PartitionName=normal-mic >> Flags= >> Users=foo,bar,baz Accounts=(null) Licenses=(null) State=INACTIVE >> >> >> to add the user qux. >> >> But >> >> # scontrol update ReservationName=TACC-SSI-2015-07-06 Users=+qux >> Error updating the reservation: Requested reservation overlaps with >> another reservation >> slurm_update error: Requested reservation overlaps with another >>reservation >> >> The only potentially conflicting reservations are these DAILY ones I've >> reported before: >> >> >> ReservationName=PECAN-1km StartTime=2015-07-03T10:00:00 >> EndTime=2015-07-03T14:00:00 Duration=04:00:00 >> >> >>Nodes=c401-[001-004,101-102,104,201-204,301-304,401-404,501-504,601-604,7 >>01 >> >>-703,801-804,901-904],c402-[001-004,101-104,201-204,301-304,401-404,501-5 >>04 >> >>,601-604,701-702,704,801-804,901-904],c403-[001-004,101-104,201-204,301-3 >>04 >> >>,401-404,501-504,602-604,702-703,801-804,901-904],c404-[001-004,101-104,2 >>01 >> -204,301-304,401-404,501-504,601-604,701-702] NodeCnt=144 CoreCnt=2304 >> Features=(null) PartitionName=normal-mic Flags=DAILY >> Users=bbarth,sam Accounts=(null) Licenses=(null) State=INACTIVE >> >> ReservationName=PECAN-4km StartTime=2015-07-03T07:00:00 >> EndTime=2015-07-03T19:30:00 Duration=12:30:00 >> >> >>Nodes=c404-[801-804,901-904],c405-[001-004,101,103-104,201-202,204,301-30 >>4, >> >>401-403,503-504,601-604],c406-[001-004,101-104,201-204,301-304,401-404,50 >>1- >> >>504,601-604,701-704,801-804,901-904],c407-[001-004,101-104,201-204,301-30 >>4, >> >>401-404,501-504,601-604,701-704,801-804,901-904],c408-[001-004,101-104,20 >>1- >> >>204,301-304,401-404,501-504,601-604,701-704,801-804,901-904],c409-[001-00 >>4, >> >>101-104,201-204,301-304,401-404,501-504,601-604,701-704,801-804,901-904], >>c4 >> >>10-[001-004,101-104,201-204,301,502,703-704,801-802,804,904],c411-[101,30 >>4, >> 401,403-404,502,602,803] NodeCnt=219 CoreCnt=3504 Features=(null) >> PartitionName=normal-mic Flags=DAILY >> Users=bbarth,sam Accounts=(null) Licenses=(null) State=INACTIVE >> >> Has anyone used DAILY reservations in 14.11.3 on a regular enough basis >>to >> see such a problem? I just recreated these PECAN recurring reservations >>so >> that there are now no reservations in the system that spanned our SLURM >> upgrade. I thought that was going to solve my problem, but no luck. Any >> advice would be much appreciated. >> >> >> Thanks, >> Bill. >> >> >> -- >> Bill Barth, Ph.D., Director, HPC >> [email protected] | Phone: (512) 232-7069 >> Office: ROC 1.435 | Fax: (512) 475-9445 >> >> >>
