The timeout behavior sounds like a dangerous scalability tripwire. Consider
revisiting that approach.
On Sun, Oct 28, 2018 at 10:42 PM Varun Gupta
wrote:
> Mesos Version: 1.6
>
> scheduler has 250k events in its queue: Master master sends status updates
> to scheduler, and scheduler stores them
Mesos Version: 1.6
scheduler has 250k events in its queue: Master master sends status updates
to scheduler, and scheduler stores them in the queue. Scheduler process in
FIFO, and once processed (includes persisting to DB) it ack the update.
These updates are processed asynchronously with a thread
Which version of mesos are you running?
> In framework, event updates grow up to 250k
What does this mean? The scheduler has 250k events in its queue?
> which leads to cascading effect on higher latency at Mesos Master (ack
requests with 10s timeout)
Can you send us perf stacks of the master du
Hi Benjamin,
In our batch workload use case, number of tasks churn is pretty high. We
have seen 20-30k tasks launch within 10 second window and 100k+ tasks
running.
In framework, event updates grow up to 250k, which leads to cascading
effect on higher latency at Mesos Master (ack requests with 10
Hi Varun,
What problem are you trying to solve precisely? There seems to be an
implication that the duplicate acknowledgements are expensive. They should
be low cost, so that's rather surprising. Do you have any data related to
this?
You can also tune the backoff rate on the agents, if the defaul
> Hi,
>
> Mesos agent will send status updates with exponential backoff until ack is
> received.
>
> If processing of events at framework and sending ack to Master is running
> slow then it builds a back pressure at framework due to duplicate updates
> for same status.
>
> Has someone explored the