> On March 30, 2017, 11:56 p.m., David McLaughlin wrote: > > The motivation for this is a performance optimization (less Scheduling loop > > overhead + cache locality on the target host). So why should that decision > > be encoded in the service tier? We'd want every single task using this and > > wouldn't want users even knowing about it. And we still want to have the > > preferred vs preemptible distinction. > > > > Currently a task restart is a powerful tool to undo a bad scheduling round > > or for whatever reason to get off a host - e.g. to get away from a noisy > > neighbor or a machine that's close to falling over. If I'm reading this > > patch correctly, they lose this ability after this change? Or at least the > > change is now - kill the task, wait for some operator defined timeout and > > then schedule it again with the original config. > > > > What happens when we want to extend the use of Dynamic Reservations and > > give users control over when they are collected. What tier would we use > > then? How would reserved offers be collected? It seems like this > > implementation is not future proof at all. > > Dmitriy Shirchenko wrote: > David, thanks for your comment. I would add the following to performance > optimization as improvements and features that this patch will offer: > > * Consistent MTTA for any size when upgrading, irrespective of cluster > capacity and demand, assuming an upgrade does not increase the resource > vector (sizing down is OK). > * Shorter MTTR for tasks using Docker or unified containerizer, reserved > tasks will get consistent placement of each task on the same host, resulting > in less work for the Mesos or Docker fetcher as host’s warm cache can be > leveraged and previous image layer already exist on each host. > * After a job is placed on each host, task failures cannot be in a > PENDING state transition as we guarantee resource availability. > * This implementation lays foundation for support of persistent volumes > in Aurora. > > The way tier is added, you absolutely can make a reserved job > preemptible. All you would do is specify a new tier definition in tiers.json > and set both 'reserved' and 'preemptable' to `True`. > > About restarts, you bring up a good point. I would like to add that if a > task does not have a `reserved` set to True inside `TierInfo` then nothing > changes and restarts proceed by rescheduling the task onto different hosts. > However, if a task will want reserved resources, it implies to us that they > want "stickiness" so the task would be scheduled on the same host. I feel > like that contradicts use case of trying to get away from noisy neighbors and > yes, the story is not great for this case. We can brainstorm on possible > solutions for this use case. If this would be an immediately required > feature, we can add an `unreserve` operation to any offer that comes back > from a reserved task before rescheduling. How does that sound? > > Would you elaborate what you are referring to about control over > dynamically reserved resources? Do we currently give users a control beyond > host constraints at the moment? Currently reserved offers are not collected > but with @serb's nice suggestion are simply expired if offer is unused. To > collect them, we can bring back the `OfferReconciler` if the complexity > warrants it.
Well the main point about preemptable is that now users have to opt into the reserved tier and now we need to set up tiers for each combination of parameters (side note: I'm not a huge fan of this static tiers concept, seems broken to me). Currently there is no easy way to automatically migrate running tasks from one tier to another, nor to force users to update to a certain tier without making a custom client. In fact because of the DSL it's not even possible to upgrade all of the job configs to this new tier either. Why is this a problem? Well from our use case at Twitter, we want this feature in order to run our clusters close to full capacity. Currently Aurora is not very good when there is very little headroom. The fact that preemptible or revocable tasks can get launched on an offer vacated by a production task as part of an update is problematic. It can force the prod job to go through the preemption loop to get the same slot back and adds hours to a production job's deploy time (and the churn created severely impacts those trying to run test jobs). Because this is a performance optimization, we do not want users to opt in to this feature, and instead have it applied to every service without users even knowing. My point about control over dynamically reserved resources relates to your last point that this lays a foundation for persistent storage. Right now we are using the reserved tier and we automatically reclaim the resources with a timer. In the case of persistent storage the decision to reclaim a reservation - even if the host is down for hours - should be totally in the hands of the service owner. This is at least what we would need to meet the use cases we have here. So what you'll end up needing is another tier "reallyReserved" or something else to be able to disable the automatic unreserving of resources (this is true with the current mechanism in this patch, or with the old reconciliation logic you had). Having these two reserved tiers would be confusing for users. I'm really not convinced we want to use dynamic reservations for this problem. As I said on the dev list, I think doing this with the same mechanisms as preemption with some shared state between JobUpdateController and TaskAssigner is a cleaner solution that leaves the reserved tier open for when we actually need it. At the absolute minimum, if we use DR we need to hide this fact from the user. - David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57487/#review170659 ----------------------------------------------------------- On March 31, 2017, 8:52 p.m., Dmitriy Shirchenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57487/ > ----------------------------------------------------------- > > (Updated March 31, 2017, 8:52 p.m.) > > > Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > Esteemed reviewers, here is the latest iteration on the implementation of > dynamic reservations. Some changes are merging of the patches into a single > one, updated design document with a more high level overview and user stories > told from an operator’s point of view. Unit TESTS are going to be done as > soon as we agree on the approach, as I have tested this patch on local > vagrant and a multi-node dev cluster. Jenkins build is expected to fail as > tested are incomplete. > > For reference, here are previous two patches which feedback I addressed in > this new single patch. > Previous 2 patches: > https://reviews.apache.org/r/56690/ > https://reviews.apache.org/r/56691/ > > RFC document: > https://docs.google.com/document/d/15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A > Design Doc [UPDATED]: > https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE > > > Diffs > ----- > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > f2296a9d7a88be7e43124370edecfe64415df00f > src/jmh/java/org/apache/aurora/benchmark/fakes/FakeOfferManager.java > 6f2ca35c5d83dde29c24865b4826d4932e96da80 > src/main/java/org/apache/aurora/scheduler/HostOffer.java > bc40d0798f40003cab5bf6efe607217e4d5de9f1 > src/main/java/org/apache/aurora/scheduler/TaskVars.java > 676dfd9f9d7ee0633c05424f788fd0ab116976bb > src/main/java/org/apache/aurora/scheduler/TierInfo.java > c45b949ae7946fc92d7e62f94696ddc4f0790cfa > src/main/java/org/apache/aurora/scheduler/TierManager.java > c6ad2b1c48673ca2c14ddd308684d81ce536beca > src/main/java/org/apache/aurora/scheduler/base/InstanceKeys.java > b12ac83168401c15fb1d30179ea8e4816f09cd3d > src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java > f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 > > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java > ad6b3efb69d71e8915044abafacec85f8c9efc59 > > src/main/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilter.java > f6c759f03c4152ae93317692fc9db202fe251122 > src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java > 36608a9f027c95723c31f9915852112beb367223 > src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java > df51d4cf4893899613683603ab4aa9aefa88faa6 > src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java > 0d639f66db456858278b0485c91c40975c3b45ac > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 78255e6dfa31c4920afc0221ee60ec4f8c2a12c4 > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java > adf7f33e4a72d87c3624f84dfe4998e20dc75fdc > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java > 317a2d26d8bfa27988c60a7706b9fb3aa9b4e2a2 > > src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java > 5ed578cc4c11b49f607db5f7e516d9e6022a926c > src/main/java/org/apache/aurora/scheduler/resources/AcceptedOffer.java > 291d5c95916915afc48a7143759e523fccd52feb > > src/main/java/org/apache/aurora/scheduler/resources/MesosResourceConverter.java > 7040004ae48d3a9d0985cb9b231f914ebf6ff5a4 > src/main/java/org/apache/aurora/scheduler/resources/ResourceManager.java > 9aa263a9cfae03a9a0c5bc7fe3a1405397d3009c > > src/main/java/org/apache/aurora/scheduler/scheduling/ReservationTimeoutCalculator.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 03a0e8485d1a392f107fda5b4af05b7f8f6067c6 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > 203f62bacc47470545d095e4d25f7e0f25990ed9 > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java > a177b301203143539b052524d14043ec8a85a46d > src/main/java/org/apache/aurora/scheduler/stats/AsyncStatsModule.java > 40451e91aed45866c2030d901160cc4e084834df > src/main/resources/org/apache/aurora/scheduler/tiers.json > 34ddb1dc769a73115c209c9b2ee158cd364392d8 > src/test/java/org/apache/aurora/scheduler/TierManagerTest.java > 82e40d509d84c37a19b6a9ef942283d908833840 > > src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java > d6904f844df3880fb699948b3a7fd457c9e81ed0 > src/test/java/org/apache/aurora/scheduler/http/OffersTest.java > 30699596a1c95199df7504f62c5c18cab1be1c6c > > src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java > 93cc34cf8393f969087cd0fd6f577228c00170e9 > src/test/java/org/apache/aurora/scheduler/offers/HostOffers.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > d7addc0effb60c196cf339081ad81de541d05385 > src/test/java/org/apache/aurora/scheduler/resources/AcceptedOfferTest.java > dded9c34749cf599d197ed312ffb6bf63b6033f1 > > src/test/java/org/apache/aurora/scheduler/resources/ResourceManagerTest.java > b8b8edb1a21ba89b8b60f8f8451c8c776fc23ae8 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > e04f6113c43eca4555ee0719f8208d7c4ebb8d61 > > src/test/java/org/apache/aurora/scheduler/scheduling/ReservationTimeoutCalculatorTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > fa1a81785802b82542030e1aae786fe9570d9827 > src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java > 78f440f7546de9ed6842cb51db02b3bddc9a74ff > src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java > cf2d25ec2e407df7159e0021ddb44adf937e1777 > > > Diff: https://reviews.apache.org/r/57487/diff/5/ > > > Testing > ------- > > Tested on local vagrant for following scenarios: > Reserving a task > Making sure returned offer comes back > Making sure offer is unreserved > > > Thanks, > > Dmitriy Shirchenko > >
