Which version of the master are you using and do you have the logs? The fact that no offers were coming back sounds like a bug!
As for using O1 after a disconnection, all offers are invalid once a disconnection occurs. The scheduler driver does not automatically rescind offers upon disconnection, so I'd recommend clearing all cached offers when your scheduler gets disconnected, to avoid the unnecessary TASK_LOST updates. On Thu, Nov 6, 2014 at 6:25 PM, Sharma Podila <spod...@netflix.com> wrote: > We had an interesting problem with resource offers today and I would like > to confirm this problem and request an enhancement. Here's the summary in > the right sequence of events: > > 1. resource offer O1 for slave A arrives > 2. mesos disconnects > 3. mesos reregisters > 4. mesos offer O2 for slave A arrives > (our framework keeps offers for sometime if unused, therefore, we now > have both O1 and O2, incorrectly) > 5. launch task T1 using offers O1 and O2 > 6. framework thinks it has no offers with it now for slave A, will wait > for new offer after mesos consumes resources for task T1 > 7. mesos sends TASK_LOST for T1 saying it was using an invalid offer > (even though only O1 was invalid, O2 is gone missing silently) > 8. no more offers come for slave A > 9. basically we have an offer leak problem. > > To work around this, I am changing my framework so that when it receives > mesos reregistration callback (step 3 above), it removes all existing > offers. This should fix the problem. > > However, I am wondering if #7 can be improved in Mesos. When a task is (or > set of tasks are) launched using multiple offers, if at least one of the > offers is invalid, then Mesos should treat all offers as given up by the > framework. This will send TASK_LOST to the framework, but, also make the > valid offers available again through new offers. > > I am thinking this will be critical to do when Mesos starts rescinding > offers. Because in that case the frameworks cannot rely on the strategy > like the one I am using with reregistration. > > Sharma > >