Hello,

I don't seem to have reconcileTasks() working for me and was wondering if I
am either using it incorrectly or hitting a problem. Here's what's
happening:

1. There's one Mesos (0.18) master, one slave, one framework, all running
on Ubuntu 12.04
2. Mesos master and slave come up fine (using Zookeeper, but that isn't
relevant here, I'd think)
3. My framework registers and gets offers
4. Two tasks are launched, both start running fine on the single available
slave
5. I restart my framework. During restart my framework knows that it had
previously launched two tasks that were last known to be in running state.
Therefore, upon getting the registered() callback, it calls
driver.reconcileTasks() for the two tasks. In actuality, the tasks are
still running fine. I see this in mesos master logs:

    I0417 12:26:27.207361 27301 master.cpp:2154] Performing task state
reconciliation for framework MyFramework

​But, no other logs about reconciliation.​

6. My framework gets no callback about status of tasks that it requested
reconciliation on.

At this point, I am not sure if the lack of a callback for status update is
due to
  a) the fact that my framework asked for reconciliation on running state,
which Mesos also knows to be true, therefore, no status update
  b) Or, if the reconcile is not working. (hopefully this; reason (a) would
be problematic)

So, I then proceed to another test:

7. kill my framework and mesos master
8. Then, kill the slave (as an aside, this seems to have killed the tasks
as well)
9. Restart mesos master
10. Restart my framework. Now, again the reconciliation is requested.
11. Still no callback.

At this time, mesos master doesn't know about the slave because it hasn't
returned since master restarted.
What is the expected behavior for reconciliation under these circumstances?

12. Restarted slave
13. Killed and restarted my framework.
14. Still no callback for reconciliation.

Given these results, I can't see how reconciliation is working at all. I
did try this with Mesos 0.16 first and then upgraded to 0.18 to see if it
makes a difference.

Thank you for any ideas on getting this resolved.

Sharma

Reply via email to