[jira] [Resolved] (SOLR-13234) Prometheus Metric Exporter Not Threadsafe

2019-03-13 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-13234.
--
Resolution: Fixed

> Prometheus Metric Exporter Not Threadsafe
> -
>
> Key: SOLR-13234
> URL: https://issues.apache.org/jira/browse/SOLR-13234
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.6, 8.0
>Reporter: Danyal Prout
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
>  Labels: metric-collector
> Fix For: master (9.0), 7.7.2, 8.1
>
> Attachments: SOLR-13234-branch_7x.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Solr Prometheus Exporter collects metrics when it receives a HTTP request 
> from Prometheus. Prometheus sends this request, on its [scrape 
> interval|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config].
>  When the time taken to collect the Solr metrics is greater than the scrape 
> interval of the Prometheus server, this results in concurrent metric 
> collection occurring in this 
> [method|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L86].
>  This method doesn’t appear to be thread safe, for instance you could have 
> concurrent modifications of a 
> [map|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L119].
>  After a while the Solr Exporter processes becomes nondeterministic, we've 
> observed NPE and loss of metrics.
> To address this, I'm proposing the following fixes:
> 1. Read/parse the configuration at startup and make it immutable. 
>  2. Collect metrics from Solr on an interval which is controlled by the Solr 
> Exporter and cache the metric samples to return during Prometheus scraping. 
> Metric collection can be expensive, for example executing arbitrary Solr 
> searches, it's not ideal to allow for concurrent metric collection and on an 
> interval which is not defined by the Solr Exporter.
> There are also a few other performance improvements that we've made while 
> fixing this, for example using the ClusterStateProvider instead of sending 
> multiple HTTP requests to each Solr node to lookup all the cores.
> I'm currently finishing up these changes which I'll submit as a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13234) Prometheus Metric Exporter Not Threadsafe

2019-03-04 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-13234.
--
Resolution: Fixed

The fix will be released with Solr 8.1.0 and 7.7.2 releases.

Thanks Danyal!

> Prometheus Metric Exporter Not Threadsafe
> -
>
> Key: SOLR-13234
> URL: https://issues.apache.org/jira/browse/SOLR-13234
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.6, 8.0
>Reporter: Danyal Prout
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
>  Labels: metric-collector
> Fix For: 8.x, master (9.0)
>
> Attachments: SOLR-13234-branch_7x.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Solr Prometheus Exporter collects metrics when it receives a HTTP request 
> from Prometheus. Prometheus sends this request, on its [scrape 
> interval|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config].
>  When the time taken to collect the Solr metrics is greater than the scrape 
> interval of the Prometheus server, this results in concurrent metric 
> collection occurring in this 
> [method|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L86].
>  This method doesn’t appear to be thread safe, for instance you could have 
> concurrent modifications of a 
> [map|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L119].
>  After a while the Solr Exporter processes becomes nondeterministic, we've 
> observed NPE and loss of metrics.
> To address this, I'm proposing the following fixes:
> 1. Read/parse the configuration at startup and make it immutable. 
>  2. Collect metrics from Solr on an interval which is controlled by the Solr 
> Exporter and cache the metric samples to return during Prometheus scraping. 
> Metric collection can be expensive, for example executing arbitrary Solr 
> searches, it's not ideal to allow for concurrent metric collection and on an 
> interval which is not defined by the Solr Exporter.
> There are also a few other performance improvements that we've made while 
> fixing this, for example using the ClusterStateProvider instead of sending 
> multiple HTTP requests to each Solr node to lookup all the cores.
> I'm currently finishing up these changes which I'll submit as a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org