Re: Dedup mesos agent status updates at framework
The timeout behavior sounds like a dangerous scalability tripwire. Consider revisiting that approach. On Sun, Oct 28, 2018 at 10:42 PM Varun Gupta wrote: > Mesos Version: 1.6 > > scheduler has 250k events in its queue: Master master sends status updates > to scheduler, and scheduler stores them in the queue. Scheduler process in > FIFO, and once processed (includes persisting to DB) it ack the update. > These updates are processed asynchronously with a thread pool of 1000 size. > We are using explicit reconciliation. > If the ack to Mesos Master is timing out, due to high CPU usage then next > ack will likely fail too. It slows down processing on Scheduler side, > meanwhile Mesos Master continuous to send status updates (duplicate ones, > since old status updates are not ack). This leads to building up of status > updates at Scheduler to be processed, and we have seen it to grow upto 250k > status updates. > > Timeout is the explicit ack request from Scheduler to Mesos Master. > > Mesos Master profiling: Next time, when this issue occurs, I will take the > dump. > > Deduplication is for the status updates present in the queue for scheduler > to process, idea is to dedup duplicate status updates such that scheduler > only processes same status update pending in queue once, and ack to Mesos > Master also ones. It will reduce the load for both Scheduler and Mesos > Master. After the ack (success/fail) scheduler will remove the status > update from the queue, and in case of failure, Mesos Master will send > status update again. > > > > On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler > wrote: > > > Which version of mesos are you running? > > > > > In framework, event updates grow up to 250k > > > > What does this mean? The scheduler has 250k events in its queue? > > > > > which leads to cascading effect on higher latency at Mesos Master (ack > > requests with 10s timeout) > > > > Can you send us perf stacks of the master during such a time window so > > that we can see if there are any bottlenecks? > > http://mesos.apache.org/documentation/latest/performance-profiling/ > > > > Where is this timeout coming from and how is it used? > > > > > simultaneously explore if dedup is an option > > > > I don't know what you're referring to in terms of de-duplication. Can you > > explain how the scheduler's status update processing works? Does it use > > explicit acknowledgements and process batches asynchronously? Aurora > > example: https://reviews.apache.org/r/33689/ > > > > On Sun, Oct 28, 2018 at 8:58 PM Varun Gupta > > wrote: > > > >> Hi Benjamin, > >> > >> In our batch workload use case, number of tasks churn is pretty high. We > >> have seen 20-30k tasks launch within 10 second window and 100k+ tasks > >> running. > >> > >> In framework, event updates grow up to 250k, which leads to cascading > >> effect on higher latency at Mesos Master (ack requests with 10s timeout) > >> as > >> well as blocking framework to process new since there are too many left > to > >> be acknowledged. > >> > >> Reconciliation is every 30 mins which also adds pressure on event stream > >> if > >> too many unacknowledged. > >> > >> I am thinking to experiment with default backoff period from 10s -> 30s > or > >> 60s, and simultaneously explore if dedup is an option. > >> > >> Thanks, > >> Varun > >> > >> On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler > >> wrote: > >> > >> > Hi Varun, > >> > > >> > What problem are you trying to solve precisely? There seems to be an > >> > implication that the duplicate acknowledgements are expensive. They > >> should > >> > be low cost, so that's rather surprising. Do you have any data related > >> to > >> > this? > >> > > >> > You can also tune the backoff rate on the agents, if the defaults are > >> too > >> > noisy in your setup. > >> > > >> > Ben > >> > > >> > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote: > >> > > >> > > > >> > > Hi, > >> > >> > >> > >> Mesos agent will send status updates with exponential backoff until > >> ack > >> > >> is received. > >> > >> > >> > >> If processing of events at framework and sending ack to Master is > >> > running > >> > >> slow then it builds a back pressure at framework due to duplicate > >> > updates > >> > >> for same status. > >> > >> > >> > >> Has someone explored the option to dedup same status update event > at > >> > >> framework or is it even advisable to do. End goal is to dedup all > >> events > >> > >> and send only one ack back to Master. > >> > >> > >> > >> Thanks, > >> > >> Varun > >> > >> > >> > >> > >> > >> > >> > > >> > > >
Re: Dedup mesos agent status updates at framework
Mesos Version: 1.6 scheduler has 250k events in its queue: Master master sends status updates to scheduler, and scheduler stores them in the queue. Scheduler process in FIFO, and once processed (includes persisting to DB) it ack the update. These updates are processed asynchronously with a thread pool of 1000 size. We are using explicit reconciliation. If the ack to Mesos Master is timing out, due to high CPU usage then next ack will likely fail too. It slows down processing on Scheduler side, meanwhile Mesos Master continuous to send status updates (duplicate ones, since old status updates are not ack). This leads to building up of status updates at Scheduler to be processed, and we have seen it to grow upto 250k status updates. Timeout is the explicit ack request from Scheduler to Mesos Master. Mesos Master profiling: Next time, when this issue occurs, I will take the dump. Deduplication is for the status updates present in the queue for scheduler to process, idea is to dedup duplicate status updates such that scheduler only processes same status update pending in queue once, and ack to Mesos Master also ones. It will reduce the load for both Scheduler and Mesos Master. After the ack (success/fail) scheduler will remove the status update from the queue, and in case of failure, Mesos Master will send status update again. On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler wrote: > Which version of mesos are you running? > > > In framework, event updates grow up to 250k > > What does this mean? The scheduler has 250k events in its queue? > > > which leads to cascading effect on higher latency at Mesos Master (ack > requests with 10s timeout) > > Can you send us perf stacks of the master during such a time window so > that we can see if there are any bottlenecks? > http://mesos.apache.org/documentation/latest/performance-profiling/ > > Where is this timeout coming from and how is it used? > > > simultaneously explore if dedup is an option > > I don't know what you're referring to in terms of de-duplication. Can you > explain how the scheduler's status update processing works? Does it use > explicit acknowledgements and process batches asynchronously? Aurora > example: https://reviews.apache.org/r/33689/ > > On Sun, Oct 28, 2018 at 8:58 PM Varun Gupta > wrote: > >> Hi Benjamin, >> >> In our batch workload use case, number of tasks churn is pretty high. We >> have seen 20-30k tasks launch within 10 second window and 100k+ tasks >> running. >> >> In framework, event updates grow up to 250k, which leads to cascading >> effect on higher latency at Mesos Master (ack requests with 10s timeout) >> as >> well as blocking framework to process new since there are too many left to >> be acknowledged. >> >> Reconciliation is every 30 mins which also adds pressure on event stream >> if >> too many unacknowledged. >> >> I am thinking to experiment with default backoff period from 10s -> 30s or >> 60s, and simultaneously explore if dedup is an option. >> >> Thanks, >> Varun >> >> On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler >> wrote: >> >> > Hi Varun, >> > >> > What problem are you trying to solve precisely? There seems to be an >> > implication that the duplicate acknowledgements are expensive. They >> should >> > be low cost, so that's rather surprising. Do you have any data related >> to >> > this? >> > >> > You can also tune the backoff rate on the agents, if the defaults are >> too >> > noisy in your setup. >> > >> > Ben >> > >> > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote: >> > >> > > >> > > Hi, >> > >> >> > >> Mesos agent will send status updates with exponential backoff until >> ack >> > >> is received. >> > >> >> > >> If processing of events at framework and sending ack to Master is >> > running >> > >> slow then it builds a back pressure at framework due to duplicate >> > updates >> > >> for same status. >> > >> >> > >> Has someone explored the option to dedup same status update event at >> > >> framework or is it even advisable to do. End goal is to dedup all >> events >> > >> and send only one ack back to Master. >> > >> >> > >> Thanks, >> > >> Varun >> > >> >> > >> >> > >> >> > >> >
Re: Dedup mesos agent status updates at framework
Which version of mesos are you running? > In framework, event updates grow up to 250k What does this mean? The scheduler has 250k events in its queue? > which leads to cascading effect on higher latency at Mesos Master (ack requests with 10s timeout) Can you send us perf stacks of the master during such a time window so that we can see if there are any bottlenecks? http://mesos.apache.org/documentation/latest/performance-profiling/ Where is this timeout coming from and how is it used? > simultaneously explore if dedup is an option I don't know what you're referring to in terms of de-duplication. Can you explain how the scheduler's status update processing works? Does it use explicit acknowledgements and process batches asynchronously? Aurora example: https://reviews.apache.org/r/33689/ On Sun, Oct 28, 2018 at 8:58 PM Varun Gupta wrote: > Hi Benjamin, > > In our batch workload use case, number of tasks churn is pretty high. We > have seen 20-30k tasks launch within 10 second window and 100k+ tasks > running. > > In framework, event updates grow up to 250k, which leads to cascading > effect on higher latency at Mesos Master (ack requests with 10s timeout) as > well as blocking framework to process new since there are too many left to > be acknowledged. > > Reconciliation is every 30 mins which also adds pressure on event stream if > too many unacknowledged. > > I am thinking to experiment with default backoff period from 10s -> 30s or > 60s, and simultaneously explore if dedup is an option. > > Thanks, > Varun > > On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler > wrote: > > > Hi Varun, > > > > What problem are you trying to solve precisely? There seems to be an > > implication that the duplicate acknowledgements are expensive. They > should > > be low cost, so that's rather surprising. Do you have any data related to > > this? > > > > You can also tune the backoff rate on the agents, if the defaults are too > > noisy in your setup. > > > > Ben > > > > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote: > > > > > > > > Hi, > > >> > > >> Mesos agent will send status updates with exponential backoff until > ack > > >> is received. > > >> > > >> If processing of events at framework and sending ack to Master is > > running > > >> slow then it builds a back pressure at framework due to duplicate > > updates > > >> for same status. > > >> > > >> Has someone explored the option to dedup same status update event at > > >> framework or is it even advisable to do. End goal is to dedup all > events > > >> and send only one ack back to Master. > > >> > > >> Thanks, > > >> Varun > > >> > > >> > > >> > > >
Re: Dedup mesos agent status updates at framework
Hi Benjamin, In our batch workload use case, number of tasks churn is pretty high. We have seen 20-30k tasks launch within 10 second window and 100k+ tasks running. In framework, event updates grow up to 250k, which leads to cascading effect on higher latency at Mesos Master (ack requests with 10s timeout) as well as blocking framework to process new since there are too many left to be acknowledged. Reconciliation is every 30 mins which also adds pressure on event stream if too many unacknowledged. I am thinking to experiment with default backoff period from 10s -> 30s or 60s, and simultaneously explore if dedup is an option. Thanks, Varun On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler wrote: > Hi Varun, > > What problem are you trying to solve precisely? There seems to be an > implication that the duplicate acknowledgements are expensive. They should > be low cost, so that's rather surprising. Do you have any data related to > this? > > You can also tune the backoff rate on the agents, if the defaults are too > noisy in your setup. > > Ben > > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote: > > > > > Hi, > >> > >> Mesos agent will send status updates with exponential backoff until ack > >> is received. > >> > >> If processing of events at framework and sending ack to Master is > running > >> slow then it builds a back pressure at framework due to duplicate > updates > >> for same status. > >> > >> Has someone explored the option to dedup same status update event at > >> framework or is it even advisable to do. End goal is to dedup all events > >> and send only one ack back to Master. > >> > >> Thanks, > >> Varun > >> > >> > >> >
Re: Dedup mesos agent status updates at framework
Hi Varun, What problem are you trying to solve precisely? There seems to be an implication that the duplicate acknowledgements are expensive. They should be low cost, so that's rather surprising. Do you have any data related to this? You can also tune the backoff rate on the agents, if the defaults are too noisy in your setup. Ben On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote: > > Hi, >> >> Mesos agent will send status updates with exponential backoff until ack >> is received. >> >> If processing of events at framework and sending ack to Master is running >> slow then it builds a back pressure at framework due to duplicate updates >> for same status. >> >> Has someone explored the option to dedup same status update event at >> framework or is it even advisable to do. End goal is to dedup all events >> and send only one ack back to Master. >> >> Thanks, >> Varun >> >> >>
Re: Dedup mesos agent status updates at framework
> Hi, > > Mesos agent will send status updates with exponential backoff until ack is > received. > > If processing of events at framework and sending ack to Master is running > slow then it builds a back pressure at framework due to duplicate updates > for same status. > > Has someone explored the option to dedup same status update event at > framework or is it even advisable to do. End goal is to dedup all events > and send only one ack back to Master. > > Thanks, > Varun > > >