Re: Pre-checking if job can be scheduled?

Brian Hatfield Tue, 12 Jan 2016 15:54:07 -0800

Wow!

Thanks for the positive feedback and fast responses!


@john/bill - Yes, I'd be happy to do at a minimum [1], and I am willing to
do [2] but am currently completely unfamiliar with the codebase. I'll read
the contributing docs and pull down the code and see if I can figure out a
guess of a way forward, and then report in if I think I can do it.

Thanks!
Brian

On Tue, Jan 12, 2016 at 6:22 PM, Andrew Jorgensen <
[email protected]> wrote:

> One other case to take into account which complicates the logic a bit is
> we have some jobs that need to be stopped and then started again usually
> with either code changes or capacity increases. In this case we would
> need to have the resources already consumed for the job factored back in
> to determine whether there is enough room to run the job. I think for a
> first pass a simple yes/no on outstanding offers would be good but for
> our use case we would need to supply an existing job as an argument to
> tell the offers check to add those resources back when considering
> whether there is enough room or not.
>
> This can get a bit race conditiony if you have multiple people starting
> and stopping jobs in the cluster. It may also be interesting to have an
> addition to the deploy task that says something like "if you can deploy
> this do it if not then don't do anything and exit with an error" or
> something like that. I'm not sure what guarantees you can make between
> the check and the actual deploy based on other things that are going on
> in the cluster but that would definitely be an awesome improvement for
> that use case.
>
> --
> Andrew Jorgensen
> @ajorgensen
>
> On Tue, Jan 12, 2016, at 06:14 PM, John Sirois wrote:
> > On Tue, Jan 12, 2016 at 3:56 PM, Brian Hatfield <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > We currently run a (relatively) small Mesos/Aurora cluster, and don't
> > > always have significant resource overhead available.
> > >
> > > Sometimes, we go to schedule a job and we're just short of what we
> > > estimated-by-hand we'd need in the cluster for it. Most of the tasks
> > > schedule - but a few stay "PENDING" because of the resource constraint.
> > > This often confuses users, or in some cases, causes the command to
> block
> > > for a while until it eventually times out.
> > >
> > > We're currently working on automating somewhat-more-precise basic
> > > estimation with information sourced from /offers to get a sense of
> "nope,
> > > your task won't schedule" to provide fast feedback that doesn't
> manipulate
> > > the state of the cluster.
> > >
> > > A friend recommended that I suggest to this mailing list something
> > > integrated into Aurora to accomplish this instead - since our basic
> > > estimation doesn't include co-scheduling constraints, quotas, etc.
> > >
> > > So: We believe that this feature doesn't exist in Aurora today, and
> wanted
> > > to suggest it as a future feature for the project.
> > >
> >
> > I think this would be a great feature from simple yes/no to more
> > sophisticated likelyhood estimates even based on time of day (cron job
> > scheduling taken into account):
> > 1. A ticket [1] describing the minimum viable feature.
> > 2. Work towards implementation [2].
> >
> > Would you be willing to do any of these? I'd be willing to review designs
> > and reviews.
> >
> > [1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
> > [2] http://aurora.apache.org/documentation/latest/contributing/
> >
> >
> > > Thanks :-)
> > > Brian
> > >
>

Re: Pre-checking if job can be scheduled?

Reply via email to