Is it even the right strategy to poll /metrics as a healthcheck? Are there better alternative sources
On Thursday, July 23, 2015, iain wright <[email protected]> wrote: > GC is a good idea. Was also thinking maybe there is a config management > tool in your environment changing the modified time of the flume.properties > file, causing flume to re-initialize, which takes the metrics down for a > few seconds depending on startup time. That seems like a stretch though. I > would definitely throw JMX monitoring on it to monitor JVM (or use the GC > logs), and watch flume logs during the time the problem exists. > > Also ssh and try polling localhost:port/metrics at the time your > monitoring system is unable to poll it. > > Anytime ive seen this in our enviornment its been OOM or re-intializing > > > On Jul 23, 2015 9:09 AM, "Ashish" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> I think the Flume Agent is up, since the issue is intermittent. >> Whenever the issue is happening check the Flume Agent which you are >> polling i.e. it's up and running and processing messages. If you >> already have GC logs enabled, check if GC could be causing the freeze. >> Nothing else comes is striking as of now, assuming the network is >> good. >> >> On Thu, Jul 23, 2015 at 12:09 AM, George Blazer <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> > We poll metrics once a minute. It's pretty intermittent >> > >> > On Wednesday, July 22, 2015, iain wright <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> >> >> How often do you poll the metrics? >> >> Have you checked flume logs? >> >> Is flume starting up fine , then at some point not responding on >> metrics, >> >> then you do something to bring it back up? >> >> Or is it intermitently not responsive but fixes itself? >> >> >> >> On Jul 22, 2015 5:49 PM, "George Blazer" <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> >> >>> I use :5653/metrics endpoint as my Flume healthcheck, but very often >> the >> >>> healthcheck refuses connection, i.e. the server doesn't run. >> >>> >> >>> Is there anything I could look at? >> >>> >> >>> I'm using Flume 1.5. >> >>> >> >>> Thanks. >> >> >> >> -- >> thanks >> ashish >> >> Blog: http://www.ashishpaliwal.com/blog >> My Photo Galleries: http://www.pbase.com/ashishpaliwal >> >
