Depends on what do you want to see as part of health check. AFAIK, metrics would be the only thing available at Agent level that does not depend on type of source used. I would see this more as a Application level health check, perhaps can look at Sematext SPM for more ideas (http://sematext.com/spm/)
On Thu, Jul 23, 2015 at 12:15 PM, George Blazer <[email protected]> wrote: > Is it even the right strategy to poll /metrics as a healthcheck? Are there > better alternative sources > > On Thursday, July 23, 2015, iain wright <[email protected]> wrote: >> >> GC is a good idea. Was also thinking maybe there is a config management >> tool in your environment changing the modified time of the flume.properties >> file, causing flume to re-initialize, which takes the metrics down for a few >> seconds depending on startup time. That seems like a stretch though. I would >> definitely throw JMX monitoring on it to monitor JVM (or use the GC logs), >> and watch flume logs during the time the problem exists. >> >> Also ssh and try polling localhost:port/metrics at the time your >> monitoring system is unable to poll it. >> >> Anytime ive seen this in our enviornment its been OOM or re-intializing >> >> >> On Jul 23, 2015 9:09 AM, "Ashish" <[email protected]> wrote: >>> >>> I think the Flume Agent is up, since the issue is intermittent. >>> Whenever the issue is happening check the Flume Agent which you are >>> polling i.e. it's up and running and processing messages. If you >>> already have GC logs enabled, check if GC could be causing the freeze. >>> Nothing else comes is striking as of now, assuming the network is >>> good. >>> >>> On Thu, Jul 23, 2015 at 12:09 AM, George Blazer <[email protected]> >>> wrote: >>> > We poll metrics once a minute. It's pretty intermittent >>> > >>> > On Wednesday, July 22, 2015, iain wright <[email protected]> wrote: >>> >> >>> >> How often do you poll the metrics? >>> >> Have you checked flume logs? >>> >> Is flume starting up fine , then at some point not responding on >>> >> metrics, >>> >> then you do something to bring it back up? >>> >> Or is it intermitently not responsive but fixes itself? >>> >> >>> >> On Jul 22, 2015 5:49 PM, "George Blazer" <[email protected]> wrote: >>> >>> >>> >>> I use :5653/metrics endpoint as my Flume healthcheck, but very often >>> >>> the >>> >>> healthcheck refuses connection, i.e. the server doesn't run. >>> >>> >>> >>> Is there anything I could look at? >>> >>> >>> >>> I'm using Flume 1.5. >>> >>> >>> >>> Thanks. >>> >>> >>> >>> -- >>> thanks >>> ashish >>> >>> Blog: http://www.ashishpaliwal.com/blog >>> My Photo Galleries: http://www.pbase.com/ashishpaliwal -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
