Looks like a good step forward.

What is the reason for the algorithm having to call reconcile tasks
multiple times after waiting some time in step 6? Shouldn't it be just once
per (re)registration?

Are there time bound guarantees within which a task update will be sent out
after a reconcile request is sent? In the algorithm for task
reconciliation, what would be a good timeout after which we conclude that
we got no task update from the master? Upon such a timeout, I would be
tempted to conclude that the task has disappeared. In which case, I would
call driver.killTask() (to be sure its marked as gone), mark my task as
terminated, then submit a replacement task.

Does the "rate limiting" feature (in the works?) affect task reconciliation
due to the volume of task updates sent back?

Thanks.


On Wed, Oct 15, 2014 at 2:05 PM, Benjamin Mahler <[email protected]>
wrote:

> Hi all,
>
> I've sent a review out for a document describing reconciliation, you can
> see the draft here:
> https://gist.github.com/bmahler/18409fc4f052df43f403
>
> Would love to gather high level feedback on it from framework developers.
> Feel free to reply here, or on the review:
> https://reviews.apache.org/r/26669/
>
> Thanks!
> Ben
>

Reply via email to