On Tue, 25 Aug 2020 at 13:17, Kristian Rosenvold <
[email protected]> wrote:

> We just had our second server reboot with excessive CPU usage through the
> metrics servlet on java. In both cases our thread dumps have been littered
> with runnable threads along the lines of the sample at the bottom of this
> mail. (These are never visible on normal execution) This really has a clear
> smell of a thread-safety problem. We have been fine studying the overall
> thread safety of the JMX collector code.
>
> We believe that the LinkedHashMap in the cache at
> https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxMBeanPropertyCache.java#L36
>  is
> in violation of JVM thread safety rules. The unsynchronized initialization
> of the LinkedHashMap in this class at line 46 does not guarantee that
> clients running on other threads will see the correct values inside this
> map, and even if they do it is exposed to hashmap "get" thread safety
> issues and potential CPU leakage
>

The LinkedHashMap is never written to after it is inserted into
the ConcurrentHashMap, only read from. So I'd expect that to be safe.

Brian


>
> We have discussed the best way to fix this problem and are somewhat
> undecided as to what is the best approach. We will provide a patch, but we
> would appreciate your opinions on this issue up-front.
>
> Kristian
>
>
> java.lang.Thread.State: RUNNABLE
> at
> io.prometheus.jmx.JmxCollector$Receiver.recordBean(JmxCollector.java:373)
> at io.prometheus.jmx.JmxScraper.processBeanValue(JmxScraper.java:199)
> at io.prometheus.jmx.JmxScraper.scrapeBean(JmxScraper.java:163)
> at io.prometheus.jmx.JmxScraper.doScrape(JmxScraper.java:117)
> at io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:473)
> at
> io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:190)
>
> at
> io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:223)
>
> at
> io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:144)
>
> at
> io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
>
> at
> io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:49)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/526dcf66-8151-4d8d-9487-807cd9ebda08n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/526dcf66-8151-4d8d-9487-807cd9ebda08n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLo5AYErVSwLCqDNQBH7K7%3DNoCU%2BcCpOmJX5jM_%2BqM33bw%40mail.gmail.com.

Reply via email to