Slow communications between components

Renan DelValle Sat, 07 Nov 2020 19:17:15 -0800

Hi all,

We've been noticing connections slowing down between our elected masterand other components in the cluster the like the agents, frameworks,executor, etc.

From a high level view, it looks like the master is too busy doingother tasks to reply to messages and we've seen ACKs from our exectuorget delayed to the point where a new request has been sent by the retrymechanism.

My initial suspicion is that we have some metric collectors that arehitting expensive endpoints (/metrics/snapshot, /master/state) toofrequently and causing the master process to get bogged down.

I was wondering if anyone had any experience with this and could confirmwhether I'm on the right track with this.

If this hunch is right, it would also be great if anyone could chimewith a rough estimate of tasks and agents at which we should avoidhitting the Web UI directly since that generates a call to/metrics/snapshot at an interval.


Thanks!

-Renan

Slow communications between components

Reply via email to