Have you ever tried prometheus + Grafana? Please take a look at https://prometheus.io/docs/visualization/grafana/ to see if it helps.
On Fri, Jul 8, 2016 at 5:51 AM, David Kesler <[email protected]> wrote: > We use a combination of new relic for application level monitoring and a > custom python script that scrapes a bunch of stats from the docker socket > file and throws them into elastic so we can use kibana to make graphs. > > > > *From:* Gregory Durham [mailto:[email protected]] > *Sent:* Thursday, July 07, 2016 4:58 PM > *To:* [email protected] > *Cc:* [email protected]; Michał Łowicki > *Subject:* Re: Monitoring at container level > > > > I have been using datadog to monitor my infrastructure. The integration > into service discovery has been really helpful for these environments. > > > > On Thu, Jul 7, 2016 at 1:37 PM, Steven Schlansker < > [email protected]> wrote: > > We use Graphite and ran into similar problems with huge metric namespaces. > We use the Singularity framework which provides both the task "request id" > (name) > and "instance number" (0..N) to the task. > > So we set our Graphite namespace to be "request-number" e.g. "myservice-3" > This has the downside of discontinuous data when you deploy a new release > but we haven't had too many issues due to that in practice. > > > > > On Jul 7, 2016, at 1:26 PM, Krish <[email protected]> wrote: > > > > I have had a good experience so far with bosun and scollector with > cadvisor. > > Check it out at bosun.org. > > > > > > On Friday 8 July 2016, Pradeep Chhetri <[email protected]> > wrote: > > Hi Michal, > > > > Do have a look at sysdig (http://www.sysdig.org). It is basically an > open-source tool which provides container insights. Maybe your will find > something helpful over there. > > > > To tackle the case of new metrics for new containers, maybe you should > tag metrics by service-name instead of container id. (Graphite doesn't have > concept of tags but something like opentsdb and influxdb do have. I don't > see a reason to replace graphite for that. You can use your service-name > (which the container is representing) instead of hostname in the metrics > name) > > > > On Fri, Jul 8, 2016 at 1:18 AM, Michał Łowicki <[email protected]> > wrote: > > Hi, > > > > Before introducing Mesos we're using mainly Graphite / Grafana. Ideally > we would like to have metrics per container as an easy way to detect if > problem touches only single, subset of containers or it's global. > > > > Unfortunately using Graphite for that is far from being perfect. Having > container identifier as a part of metric has many negative implications > like having tons of new metrics every release on Marathon (new containers = > new identifiers). > > > > Investigated InfluxDB so far but project isn't mature enough as still > components like > https://github.com/influxdata/telegraf/blob/master/plugins/inputs/statsd/README.md#influx-statsd > have major blockers: > > > > COMING SOON: there will be a way to specify multiple fields. > > > > What do you use to monitor your Mesos clusters and f.ex. to detect that > some containers are having issues? > > > > -- > > BR, > > Michał Łowicki > > > > > > > > -- > > Regards, > > Pradeep Chhetri > > > > > > -- > > > > Thumb typed mail > > > > >

