I agree with Christopher. Flexera Software has absolutely no interest in the scheduling and/or optimization of software licenses since their customers, the ISVs, want to sell as many licenses as possible. The real problem is 3rd-party license checkout unbeknownst to the job scheduler, which is why a job will run and then fail. No matter what one tries to do to mitigate the issue, there is always a race condition between when the scheduler checks the license counts and when the job's application actually checks out the licenses from FlexLM. If a user checks out a license between those two events, the job fails when the application cannot obtain the license the job scheduler thought it had. I tried to work with Flexera 3-4 years ago to resolve this issue through a two-step process very similar to what the bank card industry does; that is, create an account funds lien identified by a returned token, then later commit the actual account funds identified by the token. Between those two steps no one can lay claim to or reserve funds greater than the account balance. The equivalent for HPC is the job scheduler requesting a lien on a software license and then when the application checks out the license using the lien token, the license manager releases the lien and grants the license. When I laid out the race condition and its solution, Flexera acknowledged they knew about the situation but would not do anything about it because their ISV customers did not want the problem solved. Others with whom I have spoken tried to devise a "man-in-the-middle" workaround that would intercept license requests and keep a scheduler updated, but Flexera contracts specifically forbid ISVs and customers from trying anything like this. So, be aware any solution you devise will have the race condition and you will have to handle it in some less-than-ideal manner.
Gary D. Brown, HPC Product Manager Adaptive Computing On Thu, Jan 15, 2015 at 8:20 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: > > On 16/01/15 13:29, Andrew Elwell wrote: > > > We’re interested in the possibility of holding jobs until certain > > licences are available (hello ansys) rather than them running and > > failing. Can anyone speculate roughly how much work is involved to > > finish the current implementation? > > Looking at http://slurm.schedmd.com/licenses.html I suspect the easiest > way will be do have a script that runs from cron and populates slurmdbd > with sacctmgr for the names of the licenses and number of tokens. > > In my experience, however, the biggest problem will be the users being > able to predict how many of which license they are going to need to use. > Some software dynamically adapts and two identical jobs may request > different numbers and types of tokens depending on what is currently in > use (or the phase of the moon, cosmic rays, etc). > > To be honest I suspect this will be the best that you can do, it's not > really what vendors or whoever the current owner of FlexLM/FlexNet want > to make it easy to tell what's going on. > > The vendor wants to sell you more tokens "just in case" and if Flex*'s > made it easy for you to queue things up when there are not enough to go > around it would work against their customers interest. > > Yours, > Cynical of Melbourne.. > -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci >