Hello, If the Flume agents are receiving the shutdown request and they are not doing so, I would suggest discussing this on the Flume mailing lists at https://flume.apache.org/mailinglists.html?
Regards, Srimanth ________________________________ From: cs user <[email protected]> Sent: Wednesday, May 18, 2016 12:54 AM To: [email protected] Subject: Re: Flume - always unable to stop 2 flume agents Hi Srimanth, Thanks for responding. I've checked the logs and it seems that the shutdown event is received, and it is closed for some channels (we have 3 channels) but it just continues to run. For example I can see entries like: 18 May 2016 08:13:21,142 INFO [agent-shutdown-hook] (com.aweber.flume.source.rabbitmq.RabbitMQSource.stop:117) - Stopping channel1-source 18 May 2016 08:13:21,142 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149) - Component type: SOURCE, name: channel1-source stopped But it looks like it continues to process events. I can see entries like this repeated over and over, you can see this is around 30 mins after it tried to stop: 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.run:143) - Attributes for component SOURCE.channel1-source 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - EventReceivedCount = 36417 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - AppendBatchAcceptedCount = 0 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - EventAcceptedCount = 36417 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - AppendReceivedCount = 0 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - StartTime = 1463486595420 18 May 2016 08:47:06,778 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - AppendAcceptedCount = 0 18 May 2016 08:47:06,779 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - OpenConnectionCount = 2 18 May 2016 08:47:06,779 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - AppendBatchReceivedCount = 0 18 May 2016 08:47:06,779 INFO [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163) - StopTime = 1463555601142 Is this normal behavior? We are using this plugin: https://github.com/aweber/rabbitmq-flume-plugin I have thought about switching to this plugin: https://github.com/jcustenborder/flume-ng-rabbitmq To see if the problem goes away. Thanks! On Tue, May 17, 2016 at 5:29 PM, Srimanth Gunturi <[email protected]<mailto:[email protected]>> wrote: ?Hello, Could you please describe the setup a little bit more? Are 12 flume agents on 12 different hosts or on a single host? Also, have you looked at the flume logs for the those 2 agents to determine what is going on during the 45 minutes? Regards, Srimanth ________________________________ From: cs user <[email protected]<mailto:[email protected]>> Sent: Tuesday, May 17, 2016 4:44 AM To: [email protected]<mailto:[email protected]> Subject: Flume - always unable to stop 2 flume agents Hello, We have 12 flume agents. Whenever we change the config and need to restart the affected nodes, we always end up with 2 flume agents which refuse to stop, it takes multiple attempts (sometimes this takes as long as 45 mins) to eventually stop the agents. You have to keep trying to restart them. Has anyone else seen this? Is there a work around? Thanks!
