Our approach is similar. We poll /vars.json on an interval and send a selection of those metrics to Riemann. We configure alerts there, and also pass these metrics through to InfluxDB for historical reporting (mostly via Grafana dashboards). This has worked well for us.
-- Derek Slager CTO Amperity On Tue, Aug 22, 2017 at 3:23 PM, De, Bipra <[email protected]> wrote: > Hello Friends, > > Greetings!! > > We are currently using *Aurora 0.17.0* and have a use-case wherein we > want to continuously monitor the below SLA metrics for our clusters to > detect any anomalies : > > - Median Time To Assigned (MTTA > > <http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-assigned-(mtta)> > ) > - Median Time To Starting (MTTS > > <http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-starting-(mtts)> > ) > - Median Time To Running (MTTR > > <http://aurora.apache.org/documentation/latest/features/sla-metrics/#median-time-to-running-(mttr)> > ) > > Currently, the *sla_stat_refresh_interval* for us is set to default *1 > min*. > > Now, while using the */vars* api endpoint to fetch the SLA metrics, > aurora samples the data for metrics calculation of the above metrics only > for the last one min at every 1 minute interval. It won’t give us the > historical data for these metrics. > > Does aurora expose any api endpoint to provide the historical data for > these metrics over some configurable period of time? Is there any metric in > */graphview > *endpoint for this? > > Also, it will be great if anyone can suggest some ideas for monitoring > around these metrics. I am , at present, planning to keep polling the > /vars endpoint regularly for data collection and use ELK stack for graphing > and alerting. > > Thanks for your time in advance !! > > Regards, > > Bipra. >
