The first paragraph of the email to which you responded contains the race condition description. I will reproduce it here with some modifications to make it clearer.
We considered your suggested solution, but, unfortunately, it does not get rid of *the race condition that exists between the time the scheduler is told how many licenses are available and when the job actually checks them out, during which time external user(s) check out sufficient licenses that the job cannot receive the licenses it requested of the scheduler.* When this happens and the job attempts to check out its required licenses from FlexLM, the FlexLM license server denies the job the requested licenses, at which point the job fails. Is that clearer? Gary D. Brown On Wed, Jul 3, 2013 at 5:13 PM, Hongjia Cao <[email protected]> wrote: > > I agree that the solution is poor in performance since a option file > re-read is required each time a job allocates/deallocates licenses. But > I don't see the race condition. Could you please explain it? > > > 在 2013-07-03三的 10:49 -0700,Gary Brown写道: > > We considered your suggested solution, but, unfortunately, it does not > > get rid of the race condition that exists between the time the > > scheduler is told how many licenses are available and when the job > > actually checks them out, during which time an external user can check > > out a license. When this happens and the job attempts to check out > > its required licenses, the FlexLM license server denies the job the > > requested licenses, at which point the job fails. > > > > No matter how fast one tries to keep the job scheduler updated, the > > race condition exists (especially if the job does not need the > > licenses until three hours after it started!) and when the user's job > > aborts and the job was in the queue for two weeks, the user is very > > irate and does not care why the job did not get the licenses. > > > > I have found no way around this race condition dilemma, which means > > Flexera Software would have to modify the FlexLM license manager to > > adopt a new two-step model, which, of course, would mean all the ISVs > > must modify their software products to use the two-step model. But > > the two-step model means users would need fewer licenses, which no ISV > > is willing to allow; hence, no ISV will adopt the two-step model, and > > Flexera Software has said it will not implement the two-step model due > > to "lack of demand". > > > > The other problem with your suggested solution is it does not scale. > > > > Gary D. Brown > > > > > > > > > > > > On Tue, Jul 2, 2013 at 7:22 PM, 曹宏嘉 <[email protected]> wrote: > > I thought about the reservation/commit model you mentioned. > > Indeed FlexLM has a very simple support of license reservation > > by project. My idea is as follows: > > > > 1. in the vendor option file, reserve the number of licenses > > (features, in term of flexlm) configured in SLURM. For > > example, create a project name with a random string (in case a > > user can easily guess it and use it to checkout licenses), and > > reserve the proper number of licenses. This ensures that SLURM > > has the configured licenses. > > > > 2. on job resource allocation, create a project name according > > to the job id, and reserved the allocated licenses to the > > project. The application must set the environment LM_PROJECT > > to the project name to checkout licenses. > > > > 3. on job resource deallocation, the licenses reserved to the > > project of the job is taken back(reservation in vendor option > > file deleted and lmreread executed). > > > > This is not a very good approach because a user may cheat by > > guessing LM_PROJECT environment variables. It could work > > if the users are all well behaved. > > > > > > -----原始邮件----- > > 发件人: "Gary Brown" <[email protected]> > > 发送时间: 2013-07-03 00:06:15 (星期三) > > 收件人: slurm-dev <[email protected]> > > 抄送: > > 主题: [slurm-dev] Re: slurm integration with FlexLM > > license manager > > > > > > Three years ago I tried to work with Flexera Software > > (FlexLM) to resolve race conditions that arose between > > a scheduler and FlexLM because the FlexLM license > > manager was also serving licenses to external users; > > i.e., the scheduler was not the only one trying to > > obtain licenses. > > I proposed a "reservation/commit" model similar to > > that used by the credit card industry to handle > > charges where a retail establishment will obtain an > > "authorization" for a specific amount, which the > > credit card system "reserves" against a customer's > > credit limit, and then when the retail establishment > > "settles" the charge, the reserved amount is actually > > added to the customer's credit card balance and the > > "authorization" deleted. This would properly handle > > the situation where a scheduler "reserves" licenses > > through FlexLM and the a running job actually "checks > > out" the reserved licenses. > > Despite the company and product names, Flexera was > > completely inflexible and would not do anything in > > this direction since its customers, the Independent > > Software Vendors (ISVs), would actually sell fewer > > software licenses under this model, which is what > > users actually want, and Flexera's customers would > > take a very dim view of Flexera if it implemented this > > model. No logic (cloud model also needs this), > > cajoling, or begging would get Flexera to budge. > > I do not know if Flexera has done anything to resolve > > the issue of race conditions between when a scheduler > > tries to schedule licenses and when a job actually > > checks the licenses out during which interval an > > external user checks out licenses unbeknownst to the > > scheduler, but I suspect they have done nothing. > > If anyone hears of anything different, I, for one, > > would be happy to know. > > > > Gary D. Brown > > > > > > On Tue, Jul 2, 2013 at 8:38 AM, David Bigagli > > <[email protected]> wrote: > > Indeed currently there is no integration > > between Flexlm and SLURM, but some ideas are > > being passed around what to do about it. I am > > one of the original designers and developers > > of Platform License Scheduler. > > > > > > The item 1) you mentioned is certainly the > > first step but consider even that may not be > > easy, just imagine an electronic design > > application that is running in the cluster and > > jobs checking in and out hundreds of features > > per second. It is important to choose which > > features has to be managed by the scheduler > > and it has to be 'well behaved' one, meaning > > the behavior of the application from license > > perspective has to be well know. One of the > > difficulties is to understand how the > > application uses the licenses as you observed > > in item 2). > > > > > > The only way to get license information out of > > Flexlm is indeed lmstat, which could be quite > > slow if the license servers and handling many > > applications there is no other supported > > interface, a possible alternative could be > > parsing the lmgrd log file. > > > > > > > > > > /David > > > > > > > > On Tue, Jul 2, 2013 at 2:57 PM, Hongjia Cao > > <[email protected]> wrote: > > > > I don't think there is integration > > with FlexLM in SLURM. There is a > > simple license management in SLURM by > > counting the licenses used. > > > > I am also considering the interaction > > between SLURM and FlexLM, but I > > have no good result yet. The > > difficulty is that FlexLM has no open > > API > > (except for a command line tool > > lmutil). And the function provided by > > FlexLM is not enough for SLURM to > > totally controlling the licenses. For > > now, I think the following issues > > should be addressed: > > > > 1. Keep the license count in SLURM > > consistent with FlexLM. There may be > > applications run out of SLURM which > > may check out licenses. And a job > > may request wrong number of licenses > > (intentionally or unintentionally). > > > > 2. Force a job to release the licenses > > on job termination, even if there > > are job processes not killed. With > > LS-DYNA I have run into the case that > > after the application completes, the > > licenses will not be released until > > a long time period (even with out job > > processes left). LS-DYNA is not > > using FlexLM for license control and I > > am not sure whether this could > > happen for FlexLM managed > > applications. > > > > To handle various applications and the > > licenses managers, a license > > plug-in should be introduced. But the > > interface of the plug-in is not > > clear yet. > > > > I'd like to know if anyone has > > experiences with SLURM integration > > with > > FlexLM or other license managers. Any > > requirements or considerations > > would also be welcomed. > > > > > > 在 2013-07-01一的 17:17 -0700,Eva > > Hocks写道: > > > > > > > > > The documentation announced the > > integration since 2.4. I am running > > > slurm 2.4.3. > > > > > > Could anyone please point me to > > where I can find how to onfigure the > > > FlexLM license manager integration > > with slurm? > > > > > > > > > Thanks > > > Eva > > > > > > > > > > > > > > > >
