Wow! Thanks for the positive feedback and fast responses!
@john/bill - Yes, I'd be happy to do at a minimum [1], and I am willing to do [2] but am currently completely unfamiliar with the codebase. I'll read the contributing docs and pull down the code and see if I can figure out a guess of a way forward, and then report in if I think I can do it. Thanks! Brian On Tue, Jan 12, 2016 at 6:22 PM, Andrew Jorgensen < [email protected]> wrote: > One other case to take into account which complicates the logic a bit is > we have some jobs that need to be stopped and then started again usually > with either code changes or capacity increases. In this case we would > need to have the resources already consumed for the job factored back in > to determine whether there is enough room to run the job. I think for a > first pass a simple yes/no on outstanding offers would be good but for > our use case we would need to supply an existing job as an argument to > tell the offers check to add those resources back when considering > whether there is enough room or not. > > This can get a bit race conditiony if you have multiple people starting > and stopping jobs in the cluster. It may also be interesting to have an > addition to the deploy task that says something like "if you can deploy > this do it if not then don't do anything and exit with an error" or > something like that. I'm not sure what guarantees you can make between > the check and the actual deploy based on other things that are going on > in the cluster but that would definitely be an awesome improvement for > that use case. > > -- > Andrew Jorgensen > @ajorgensen > > On Tue, Jan 12, 2016, at 06:14 PM, John Sirois wrote: > > On Tue, Jan 12, 2016 at 3:56 PM, Brian Hatfield <[email protected]> > > wrote: > > > > > Hi, > > > > > > We currently run a (relatively) small Mesos/Aurora cluster, and don't > > > always have significant resource overhead available. > > > > > > Sometimes, we go to schedule a job and we're just short of what we > > > estimated-by-hand we'd need in the cluster for it. Most of the tasks > > > schedule - but a few stay "PENDING" because of the resource constraint. > > > This often confuses users, or in some cases, causes the command to > block > > > for a while until it eventually times out. > > > > > > We're currently working on automating somewhat-more-precise basic > > > estimation with information sourced from /offers to get a sense of > "nope, > > > your task won't schedule" to provide fast feedback that doesn't > manipulate > > > the state of the cluster. > > > > > > A friend recommended that I suggest to this mailing list something > > > integrated into Aurora to accomplish this instead - since our basic > > > estimation doesn't include co-scheduling constraints, quotas, etc. > > > > > > So: We believe that this feature doesn't exist in Aurora today, and > wanted > > > to suggest it as a future feature for the project. > > > > > > > I think this would be a great feature from simple yes/no to more > > sophisticated likelyhood estimates even based on time of day (cron job > > scheduling taken into account): > > 1. A ticket [1] describing the minimum viable feature. > > 2. Work towards implementation [2]. > > > > Would you be willing to do any of these? I'd be willing to review designs > > and reviews. > > > > [1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa > > [2] http://aurora.apache.org/documentation/latest/contributing/ > > > > > > > Thanks :-) > > > Brian > > > >
