Many thanks both for your responses, they've been helpful. @Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't aware the image wouldn't come through. But following your bullet points helped me present a better unit for measurement in the axis.
In regards to contributing, would absolutely love to help there, just not sure what the correct direction is? I wasn't sure if the web page source code / contributions are in the apache-lucene repository? Thanks, On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki <a...@getopt.org> wrote: > Hi, > > Starting with Solr 7.0 all JMX metrics are actually internally driven by > the metrics API - JMX (or Prometheus) is just a way of exposing them. > > I agree that we need more documentation on metrics - contributions are > welcome :) > > Regarding your specific examples (btw. our mailing lists aggressively > strip all attachments - your graphs didn’t make it): > > * time units in time-based counters are in nanoseconds. This is just a > unit of value, not necessarily precision. In this specific example > `ADMIN./admin/collections.totalTime` (and similarly named metrics for all > other request handlers) represents the total elapsed time spent processing > requests. > * time-based histograms are expressed in milliseconds, where it is > indicated by the “_ms” suffix. > * 1-, 5- and 15-min rates represent an exponentially weighted moving > average over that time window, expressed in events/second. > * handlerStart is initialised with System.currentTimeMillis() when this > instance of request handler is first created. > * details on GC, memory buffer pools, and similar JVM metrics are > documented in JDK documentation on Management Beans. For example: > > https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true > < > https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true > > > * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses > this abbreviation anywhere. > > Hope this helps. > > — > > Andrzej Białecki > > > On 7 Oct 2019, at 13:41, Emir Arnautović <emir.arnauto...@sematext.com> > wrote: > > > > Hi Richard, > > We do not use API to collect metrics but JMX, but I believe that those > are the same (did not verify it in code). You can see how we handled those > metrics into reports/charts or even use our agent to send data to > Prometheus: > https://github.com/sematext/sematext-agent-integrations/tree/master/solr < > https://github.com/sematext/sematext-agent-integrations/tree/master/solr> > > > > You can also see some links to Solr metric related blog posts in this > repo. If you find out that managing your own monitoring stack is > overwhelming, you can try our Solr integration. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > >> On 7 Oct 2019, at 12:40, Richard Goodman <richa...@brandwatch.com> > wrote: > >> > >> Hi there, > >> > >> I'm currently working on using the prometheus exporter to provide some > detailed insights for our Solr Cloud clusters. > >> > >> Using the provided template killed our prometheus server, as well as > the exporter due to the size of our clusters (each cluster is around 96 > nodes, ~300 collections with 3way replication and 16 shards), so you can > imagine the amount of data that comes through /admin/metrics and not > filtering it down first. > >> > >> I've began working on writing my own template to reduce the amount of > data being requested and it's working fine, and I'm starting to build some > nice graphs in Grafana. > >> > >> The only difficulty I'm having with this, is I'm struggling to find > decent documentation on the metrics themselves. I was using the resources > metrics reporting - metrics-api < > https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api> > and monitoring solr with prometheus and grafana < > https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html> > but there is a lack of information on most metrics. > >> > >> For example: > >> "ADMIN./admin/collections.totalTime":6715327903, > >> I understand this is a counter, however, I'm not sure what unit this > would be represented when displaying it, for example: > >> > >> > >> > >> A latency of 1mil, not sure if this means milliseconds, million, etc., > >> Another example would be the GC metrics: > >> "gc.ConcurrentMarkSweep.count":7, > >> "gc.ConcurrentMarkSweep.time":1247, > >> "gc.ParNew.count":16759, > >> "gc.ParNew.time":884173, > >> Which when displayed, doesn't give the clearest insight as to what the > unit is: > >> > >> > >> If anyone has any advice / guidance, that would be greatly appreciated. > If there isn't documentation for the API, then this would also be something > I'll look into help contributing with too. > >> > >> Thanks, > >> -- > >> Richard Goodman > > > > -- Richard Goodman | Data Infrastructure engineer richa...@brandwatch.com NEW YORK | BOSTON | BRIGHTON | LONDON | BERLIN | STUTTGART | PARIS | SINGAPORE | SYDNEY <https://www.brandwatch.com/blog/digital-consumer-intelligence/>