Hi all, As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a scalability concern with the reconciliation API. Performing an implicit reconciliation results in a status update being sent for each task in the cluster. For large clusters in the tens of thousands of slaves, this can be begin to approach hundreds of thousands of status updates.
With the current design of the driver, status updates must be persisted before the scheduler returns from the 'statusUpdate' callback, as the driver sends an acknowledgement implicitly once the call completes. This design forces the scheduler to synchronously process individual status updates. To remedy the issue, we're looking to introduce the ability to optionally specify whether the implicit acknowledgements are provided (during construction of the scheduler driver). If disabled, then the scheduler must send acknowledgments through a new 'acknowledgeStatusUpdate' call on the driver. Having explicit acknowledgements allows schedulers to process them asynchronously outside of the driver thread, and allows them to process updates in batch (e.g. 1:N storage operation:status updates). As part of the change, the underlying UUID of the status update needs to be exposed to the scheduler, which requires an update to the signature of 'statusUpdate'. What this means is that when schedulers include the new headers/JAR/egg, they need to adjust their code to accept the new uuid argument, regardless of whether implicit acknowledgements are desired (to my knowledge, there is no way to expose the uuid without requiring schedulers to update their code, because of Java's interface semantics). I'd like to get this change landed for 0.22.0 to make reconciliation usable for large clusters. The patches are up on MESOS-2347. I've outlined the compatibility details and upgrade steps in https://reviews.apache.org/r/30978/ Please share any high level feedback or concerns! Ben