One other case to take into account which complicates the logic a bit is we have some jobs that need to be stopped and then started again usually with either code changes or capacity increases. In this case we would need to have the resources already consumed for the job factored back in to determine whether there is enough room to run the job. I think for a first pass a simple yes/no on outstanding offers would be good but for our use case we would need to supply an existing job as an argument to tell the offers check to add those resources back when considering whether there is enough room or not.
This can get a bit race conditiony if you have multiple people starting and stopping jobs in the cluster. It may also be interesting to have an addition to the deploy task that says something like "if you can deploy this do it if not then don't do anything and exit with an error" or something like that. I'm not sure what guarantees you can make between the check and the actual deploy based on other things that are going on in the cluster but that would definitely be an awesome improvement for that use case. -- Andrew Jorgensen @ajorgensen On Tue, Jan 12, 2016, at 06:14 PM, John Sirois wrote: > On Tue, Jan 12, 2016 at 3:56 PM, Brian Hatfield <[email protected]> > wrote: > > > Hi, > > > > We currently run a (relatively) small Mesos/Aurora cluster, and don't > > always have significant resource overhead available. > > > > Sometimes, we go to schedule a job and we're just short of what we > > estimated-by-hand we'd need in the cluster for it. Most of the tasks > > schedule - but a few stay "PENDING" because of the resource constraint. > > This often confuses users, or in some cases, causes the command to block > > for a while until it eventually times out. > > > > We're currently working on automating somewhat-more-precise basic > > estimation with information sourced from /offers to get a sense of "nope, > > your task won't schedule" to provide fast feedback that doesn't manipulate > > the state of the cluster. > > > > A friend recommended that I suggest to this mailing list something > > integrated into Aurora to accomplish this instead - since our basic > > estimation doesn't include co-scheduling constraints, quotas, etc. > > > > So: We believe that this feature doesn't exist in Aurora today, and wanted > > to suggest it as a future feature for the project. > > > > I think this would be a great feature from simple yes/no to more > sophisticated likelyhood estimates even based on time of day (cron job > scheduling taken into account): > 1. A ticket [1] describing the minimum viable feature. > 2. Work towards implementation [2]. > > Would you be willing to do any of these? I'd be willing to review designs > and reviews. > > [1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa > [2] http://aurora.apache.org/documentation/latest/contributing/ > > > > Thanks :-) > > Brian > >
