My team recently had an issue that looks like it might be caused by some 
backpressure from flume and I am having trouble finding information related to 
my investigations.

We have a single source that is writing to three memory channels (one for each 
DC we are sending to) and multiple sinks writing to the kafka clusters in each 
DC. We had a connectivity issue where two of our DC’s were no longer able to 
send to the kafka cluster in the third DC which caused that memory channel to 
fill up. It appears that this one full channel has either create backpressure 
on the process sending data into flume or slowed down the ability of flume to 
read data in.

When our problem first started it looked like it might have been related to 
boxes that had lower memory and then a significant amount of swap getting used 
but we were seeing similar issues on other boxes that were never using any 
swap. Looking back at the some of some of our grafana info during the issue 
seems to show EventsReceived in the hundreds rather than in the 20k-40k range.

My team is using Flume 1.8.0.


