Hi Benjamin,

In our batch workload use case, number of tasks churn is pretty high. We
have seen 20-30k tasks launch within 10 second window and 100k+ tasks
running.

In framework, event updates grow up to 250k, which leads to cascading
effect on higher latency at Mesos Master (ack requests with 10s timeout) as
well as blocking framework to process new since there are too many left to
be acknowledged.

Reconciliation is every 30 mins which also adds pressure on event stream if
too many unacknowledged.

I am thinking to experiment with default backoff period from 10s -> 30s or
60s, and simultaneously explore if dedup is an option.

Thanks,
Varun

On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler <bmah...@apache.org> wrote:

> Hi Varun,
>
> What problem are you trying to solve precisely? There seems to be an
> implication that the duplicate acknowledgements are expensive. They should
> be low cost, so that's rather surprising. Do you have any data related to
> this?
>
> You can also tune the backoff rate on the agents, if the defaults are too
> noisy in your setup.
>
> Ben
>
> On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta <var...@uber.com> wrote:
>
> >
> > Hi,
> >>
> >> Mesos agent will send status updates with exponential backoff until ack
> >> is received.
> >>
> >> If processing of events at framework and sending ack to Master is
> running
> >> slow then it builds a back pressure at framework due to duplicate
> updates
> >> for same status.
> >>
> >> Has someone explored the option to dedup same status update event at
> >> framework or is it even advisable to do. End goal is to dedup all events
> >> and send only one ack back to Master.
> >>
> >> Thanks,
> >> Varun
> >>
> >>
> >>
>

Reply via email to