[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2017-03-03 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893972#comment-15893972
 ] 

Andrzej Bialecki  commented on SOLR-9898:
-

SOLR-10182 removed the {{DIRECTORY}} metrics in 6.4.2, 6.5 and master. As such 
they are only present in 6.4.1 - I'm not sure how this should be reflected in 
the docs...

> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 6.4, master (7.0)
>Reporter: Andrzej Bialecki 
>Assignee: Cassandra Targett
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2017-03-02 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892436#comment-15892436
 ] 

Otis Gospodnetic commented on SOLR-9898:


While I love all the new metrics you are adding, I think metrics should be 
treated like code/features in terms of how backwards compatibility/deprecation 
is handled.  Otherwise, on upgrade, people's monitoring breaks and 
monitoring is kind of important... :)

Note: Looks like recent metrics changes broke/changed previously-existing 
MBeans...

> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 6.4, master (7.0)
>Reporter: Andrzej Bialecki 
>Assignee: Cassandra Targett
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2017-01-03 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795443#comment-15795443
 ] 

Andrzej Bialecki  commented on SOLR-9898:
-

h1. Directory I/O metrics
Index storage (represented in Lucene/Solr by {{Directory}} abstraction) is 
monitored for I/O throughput, which is optionally tracked per index file (see 
the previous section, {{directoryDetails}} argument). As with the index-level 
metrics, these metrics are also registered in per-core registries.

The following metrics are collected:
* {{DIRECTORY.total.reads}} - meter for total read bytes from the directory.
* {{DIRECTORY.total.writes}} - meter for total written bytes to the directory.

If {{directoryDetails}} is set to true the following additional metrics are 
collected (note: this can potentially produce a lot of metrics so it should not 
be used in production):
* {{DIRECTORY.total.readSizes}} - histogram of read operation sizes (in byte 
units)
* {{DIRECTORY.total.writeSizes}} - histogram of write operation sizes (in byte 
units)
* {{DIRECTORY..reads}} - meter for read bytes per "file type". File 
type is either {{segments}} for {{segments_N}} and {{pending_segments_N}}, or a 
file extension (eg. {{fdt}}, {{doc}}, {{tim}}, etc). The number and type of 
these files vary depending on the type of Lucene {{Codec}} used.
* {{DIRECTORY..writes}} - meter for written bytes per "file type".
* {{DIRECTORY..readSizes}} -  histogram of write operation sizes per 
"file type" (in byte units).
* {{DIRECTORY..writeSizes}} -  histogram of write operation sizes 
per "file type" (in byte units).


> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: master (7.0), 6.4
>Reporter: Andrzej Bialecki 
>Assignee: Cassandra Targett
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2017-01-03 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795262#comment-15795262
 ] 

Andrzej Bialecki  commented on SOLR-9898:
-

h1. Index merge metrics
These metrics are collected in respective registries for each core, under the 
{{INDEX}} category. Basic metrics are always collected - collection of 
additional metrics can be turned on using boolean parameters in the 
{{/config/indexConfig/metrics}} section of {{solrconfig.xml}}:
{code}

  ...
  

  524288
  true
  true

...
  
...

{code}

The following metrics are collected:
* {{INDEX.merge.major}} - timer for merge operations that include at least 
"majorMergeDocs" (default value for this parameter is 512k documents).
* {{INDEX.merge.minor}} - timer for merge operations that include less than 
"majorMergeDocs".
* {{INDEX.merge.errors}} - counter for merge errors.
* {{INDEX.flush}} - meter for index flush operations.
Additionally, the following gauges are reported, which help to monitor the 
momentary state of index merge operations:
* {{INDEX.merge.major.running}} - number of running major merge operations 
(depending on the implementation of {{MergeScheduler}} that is used there can 
be several concurrently running merge operations).
* {{INDEX.merge.minor.running}} - as above, for minor merge operations.
* {{INDEX.merge.major.running.docs}} - total number of documents in the 
segments being currently merged in major merge operations.
* {{INDEX.merge.minor.running.docs}} - as above, for minor merge operations.
* {{INDEX.merge.major.running.segments}} - number of segments being currently 
merged in major merge operations.
* {{INDEX.merge.minor.running.segments}} - as above, for minor merge operations.
If the boolean flag {{mergeDetails}} is true then the following additional 
metrics are collected:
* {{INDEX.merge.major.docs}} - meter for the number of documents merged in 
major merge operations
* {{INDEX.merge.major.deletedDocs}} - meter for the number of deleted documents 
expunged in major merge operations


> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: master (7.0), 6.4
>Reporter: Andrzej Bialecki 
>Assignee: Cassandra Targett
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2016-12-29 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785883#comment-15785883
 ] 

Cassandra Targett commented on SOLR-9898:
-

I've started a page in the "drafts" area of the Solr Ref Guide: 
https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting.

> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: master (7.0), 6.4
>Reporter: Andrzej Bialecki 
>Assignee: Cassandra Targett
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2016-12-28 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782953#comment-15782953
 ] 

Andrzej Bialecki  commented on SOLR-9898:
-

h2. Reporters
Reporter configurations are specified in {{solr.xml}} file in 
{{}} sections, for example:
{code}

 
  
graphite-server

60
  
  
300
example
updatesLogger
QUERYHANDLER./update

 
...

{code}
Reporter plugins use the following attributes:
* *name* - (required) unique name of the reporter plugin
* *class* - (required) fully-qualified implementation class of the plugin, must 
extend {{SolrMetricReporter}}
* *group* - (optional) one or more of the predefined groups (see above)
* *registry* - (optional) one or more of valid fully-qualified registry names

If both {{group}} and {{registry}} attributes are specified only the {{group}} 
attribute is considered. If neither attribute is specified then the plugin will 
be used for all groups and registries. Multiple group or registry names can be 
specified, separated by comma and/or space.

Additionally, several implementation-specific initialization arguments can be 
specified in nested elements. There are some arguments that are common to 
SLF4J, Ganglia and Graphite reporters:
* *period* - (optional int) period in seconds between reports. Default value is 
60.
* *prefix* - (optional str) prefix to be added to metric names, may be helpful 
in logical grouping of related Solr instances, eg. machine name or cluster 
name. Default is empty string, ie. just the registry name and metric name will 
be used to form a fully-qualified metric name.
* *filter* - (optional str) if not empty then only metric names that start with 
this value will be reported. Default is no filtering, ie. all metrics from 
selected registry will be reported.

Reporters are instantiated for every group and registry that they were 
configured for, at the time when the respective components are initialized (eg. 
on JVM startup or SolrCore load). When reporters are created their 
configuration is validated (and eg. necessary connections are established). 
Uncaught errors at this initialization stage cause the reporter to be discarded 
from the running configuration. Reporters are closed when the corresponding 
component is being closed (eg. on SolrCore close, or JVM shutdown) but metrics 
that they reported are still maintained in respective registries, as explained 
in the previous section.

The following sections provide information on implementation-specific 
arguments. All implementation classes provided with Solr can be found under 
{{org.apache.solr.metrics.reporters}}.

h3. JMX reporter ({{org.apache.solr.metrics.reporters.SolrJmxReporter}})
* *domain* - (optional str) JMX domain name. If not specified then registry 
name will be used.
* *serviceUrl* - (optional str) service URL for a JMX server. If not specified 
then the default platform MBean server will be used.
* *agentId* - (optional str) agent ID for a JMX server. Note: either 
{{serviceUrl}} or {{agentId}} can be specified but not both - if both are 
specified then the default MBean server will be used.

Object names created by this reporter are hierarchical, dot-separated but also 
properly structured to form corresponding hierarchies in eg. JConsole. This 
hierarchy consists of the following elements in the top-down order:
* registry name (eg. {{solr.core.collection1.shard1.replica1}}. Dot-separated 
registry names are also split into ObjectName hierarchy levels, so that metrics 
for this registry will be shown under 
{{/solr/core/collection1/shard1/replica1}} in JConsole, with each domain part 
being assigned to {{dom1, dom2, ... domN}} property.
* reporter name (the value of reporter's {{name}} attribute)
* category, scope and name for request handlers
* or additional {{name1, name2, ... nameN}} elements for metrics from other 
components.

h3. SLF4J reporter ({{org.apache.solr.metrics.reporters.SolrSlf4jReporter}})
(See also common arguments above)
* *logger* - (optional str) name of the logger to use. Default is empty, in 
which case the group or registry name will be used if specified in the plugin 
configuration.

Users can specify logger name (and the corresponding logger configuration in 
eg. Log4j configuration) to output metrics-related logging to separate file(s), 
which can then be processed by external applications. Each log line produced by 
this reporter consists of configuration-specific fields, and a message that 
follows this format:
{code}
type=COUNTER, name={}, count={}

type=GAUGE, name={}, value={}

type=TIMER, name={}, count={}, min={}, max={}, mean={}, stddev={}, median={}, 
p75={}, p95={}, p98={}, p99={}, p999={}, mean_rate={}, m1={}, m5={}, m15={}, 
rate_unit={}, duration_unit={}

type=METER, name={}, count={}, mean_rate={}, m1={}, m5={}, m15={}, rate_unit={}

type=HISTOGRAM, name={}, count={}, min={}, max={}, 

[jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics

2016-12-28 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782804#comment-15782804
 ] 

Andrzej Bialecki  commented on SOLR-9898:
-

h1. Overview
Solr 6.4 adds a developer API and instrumentation for the collection of 
detailed performance-oriented metrics throughout the life-cycle of Solr service 
and its various components. Internally it uses [Dropwizard Metrics 
API|http://metrics.dropwizard.io], which uses the following classes of meters 
to measure events:
* *counters* - simply count events. They provide a single long value, e.g. the 
number of requests.
* *meters* - additionally compute rates of events. Provide a count (as above) 
and 1-, 5-, and 15-minute exponentially decaying rates, similarly to the Unix 
system load average.
* *histograms* - calculate approximate distribution of events according to 
their values. Provide the following approximate statistics, with a similar 
exponential decay as above: mean (arithmetic average), median, maximum, 
minimum, standard deviation, and 75-th, 95-th, 98-th, 99-th and 999-th 
percentiles. 
* *timers* - measure the number and duration of events. They provide a count 
and histogram of timings.
* *gauges* - offer instantaneous reading of a current value, e.g. current queue 
depth, current number of active connections, free heap size.

Group of related metrics with unique names is managed in a *metric registry*. 
Solr maintains several such registries, each corresponding to a high-level 
group such as: {{jvm, jetty, http, node, core}} (see below). Metrics are 
maintained and accumulated through all life-cycles of components since the 
start of the process until its shutdown - e.g. metrics for a particular 
SolrCore are tracked through possibly several load / unload / rename 
operations, and deleted only when a core is explicitly deleted. However, 
metrics are not persisted across process restarts - restarting Solr will 
discard all collected metrics.

For each group (and/or for each registry) there can be several *reporters* - 
components responsible for communication of metrics from selected registries to 
external systems. Currently implemented reporters support emitting metrics via 
JMX, Ganglia, Graphite and SLF4J. There is also a dedicated {{/admin/metrics}} 
handler that can be queried to report all or a subset of the current metrics 
from multiple registries.

h2. Metric groups
These are the major groups of metrics that are collected:

h3. JVM level ({{solr.jvm}} registry):
* direct and mapped buffer pools
* class loading / unloading
* OS memory, CPU time, file descriptors, swap, system load
* GC count and time
* heap, non-heap memory and GC pools
* number of threads, their states and deadlocks

h3. Node / CoreContainer level ({{solr.node}} registry):
* handler requests (count, timing): collections, info, admin, configSets, etc.
* number of cores (loaded, lazy, unloaded)

h3. Core (SolrCore) level ({{solr.core}} registries, one for 
each core):
* all common RequestHandler-s report: request timers / counters, timeouts, 
errors.
* index-level events (in progress - SOLR-9854): meters for minor / major 
merges, number of merged docs, number of deleted docs, gauges for currently 
running merges and their size.
* directory-level IO: total read / write meters, histograms for read / write 
operations and their size, optionally split per index file (eg. field data, 
term dictionary, docValues, etc) (SOLR-9854 in progress)
* shard replication and transaction log replay on replicas (TBD, SOLR-9856)
* TBD: caches, update handler details, and other relevant SolrInfoMBean-s

h3. HTTP level ({{solr.http}} registry):
* open / available / pending connections for shard handler and update handler

h3. Jetty level ({{solr.jetty}} registry):
* threads and pools,
* connection and request timers,
* meters for responses by HTTP class (1xx, 2xx, etc)

h3. Shard leader (TBD)
* aggregated metrics from each replica (SOLR-9857)

h3. Overseer (TBD)
* aggregated metrics from shard leaders and cluster nodes (SOLR-9858)


> Documentation for metrics collection and /admin/metrics
> ---
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0), 6.4
>Reporter: Andrzej Bialecki 
>
> Draft documentation follows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org