I agree that the solution is poor in performance since a option file re-read is required each time a job allocates/deallocates licenses. But I don't see the race condition. Could you please explain it?
在 2013-07-03三的 10:49 -0700,Gary Brown写道: > We considered your suggested solution, but, unfortunately, it does not > get rid of the race condition that exists between the time the > scheduler is told how many licenses are available and when the job > actually checks them out, during which time an external user can check > out a license. When this happens and the job attempts to check out > its required licenses, the FlexLM license server denies the job the > requested licenses, at which point the job fails. > > No matter how fast one tries to keep the job scheduler updated, the > race condition exists (especially if the job does not need the > licenses until three hours after it started!) and when the user's job > aborts and the job was in the queue for two weeks, the user is very > irate and does not care why the job did not get the licenses. > > I have found no way around this race condition dilemma, which means > Flexera Software would have to modify the FlexLM license manager to > adopt a new two-step model, which, of course, would mean all the ISVs > must modify their software products to use the two-step model. But > the two-step model means users would need fewer licenses, which no ISV > is willing to allow; hence, no ISV will adopt the two-step model, and > Flexera Software has said it will not implement the two-step model due > to "lack of demand". > > The other problem with your suggested solution is it does not scale. > > Gary D. Brown > > > > > > On Tue, Jul 2, 2013 at 7:22 PM, 曹宏嘉 <[email protected]> wrote: > I thought about the reservation/commit model you mentioned. > Indeed FlexLM has a very simple support of license reservation > by project. My idea is as follows: > > 1. in the vendor option file, reserve the number of licenses > (features, in term of flexlm) configured in SLURM. For > example, create a project name with a random string (in case a > user can easily guess it and use it to checkout licenses), and > reserve the proper number of licenses. This ensures that SLURM > has the configured licenses. > > 2. on job resource allocation, create a project name according > to the job id, and reserved the allocated licenses to the > project. The application must set the environment LM_PROJECT > to the project name to checkout licenses. > > 3. on job resource deallocation, the licenses reserved to the > project of the job is taken back(reservation in vendor option > file deleted and lmreread executed). > > This is not a very good approach because a user may cheat by > guessing LM_PROJECT environment variables. It could work > if the users are all well behaved. > > > -----原始邮件----- > 发件人: "Gary Brown" <[email protected]> > 发送时间: 2013-07-03 00:06:15 (星期三) > 收件人: slurm-dev <[email protected]> > 抄送: > 主题: [slurm-dev] Re: slurm integration with FlexLM > license manager > > > Three years ago I tried to work with Flexera Software > (FlexLM) to resolve race conditions that arose between > a scheduler and FlexLM because the FlexLM license > manager was also serving licenses to external users; > i.e., the scheduler was not the only one trying to > obtain licenses. > I proposed a "reservation/commit" model similar to > that used by the credit card industry to handle > charges where a retail establishment will obtain an > "authorization" for a specific amount, which the > credit card system "reserves" against a customer's > credit limit, and then when the retail establishment > "settles" the charge, the reserved amount is actually > added to the customer's credit card balance and the > "authorization" deleted. This would properly handle > the situation where a scheduler "reserves" licenses > through FlexLM and the a running job actually "checks > out" the reserved licenses. > Despite the company and product names, Flexera was > completely inflexible and would not do anything in > this direction since its customers, the Independent > Software Vendors (ISVs), would actually sell fewer > software licenses under this model, which is what > users actually want, and Flexera's customers would > take a very dim view of Flexera if it implemented this > model. No logic (cloud model also needs this), > cajoling, or begging would get Flexera to budge. > I do not know if Flexera has done anything to resolve > the issue of race conditions between when a scheduler > tries to schedule licenses and when a job actually > checks the licenses out during which interval an > external user checks out licenses unbeknownst to the > scheduler, but I suspect they have done nothing. > If anyone hears of anything different, I, for one, > would be happy to know. > > Gary D. Brown > > > On Tue, Jul 2, 2013 at 8:38 AM, David Bigagli > <[email protected]> wrote: > Indeed currently there is no integration > between Flexlm and SLURM, but some ideas are > being passed around what to do about it. I am > one of the original designers and developers > of Platform License Scheduler. > > > The item 1) you mentioned is certainly the > first step but consider even that may not be > easy, just imagine an electronic design > application that is running in the cluster and > jobs checking in and out hundreds of features > per second. It is important to choose which > features has to be managed by the scheduler > and it has to be 'well behaved' one, meaning > the behavior of the application from license > perspective has to be well know. One of the > difficulties is to understand how the > application uses the licenses as you observed > in item 2). > > > The only way to get license information out of > Flexlm is indeed lmstat, which could be quite > slow if the license servers and handling many > applications there is no other supported > interface, a possible alternative could be > parsing the lmgrd log file. > > > > > /David > > > > On Tue, Jul 2, 2013 at 2:57 PM, Hongjia Cao > <[email protected]> wrote: > > I don't think there is integration > with FlexLM in SLURM. There is a > simple license management in SLURM by > counting the licenses used. > > I am also considering the interaction > between SLURM and FlexLM, but I > have no good result yet. The > difficulty is that FlexLM has no open > API > (except for a command line tool > lmutil). And the function provided by > FlexLM is not enough for SLURM to > totally controlling the licenses. For > now, I think the following issues > should be addressed: > > 1. Keep the license count in SLURM > consistent with FlexLM. There may be > applications run out of SLURM which > may check out licenses. And a job > may request wrong number of licenses > (intentionally or unintentionally). > > 2. Force a job to release the licenses > on job termination, even if there > are job processes not killed. With > LS-DYNA I have run into the case that > after the application completes, the > licenses will not be released until > a long time period (even with out job > processes left). LS-DYNA is not > using FlexLM for license control and I > am not sure whether this could > happen for FlexLM managed > applications. > > To handle various applications and the > licenses managers, a license > plug-in should be introduced. But the > interface of the plug-in is not > clear yet. > > I'd like to know if anyone has > experiences with SLURM integration > with > FlexLM or other license managers. Any > requirements or considerations > would also be welcomed. > > > 在 2013-07-01一的 17:17 -0700,Eva > Hocks写道: > > > > > > The documentation announced the > integration since 2.4. I am running > > slurm 2.4.3. > > > > Could anyone please point me to > where I can find how to onfigure the > > FlexLM license manager integration > with slurm? > > > > > > Thanks > > Eva > > > > > > >
