On Tue, 25 Aug 2020 at 13:17, Kristian Rosenvold < [email protected]> wrote:
> We just had our second server reboot with excessive CPU usage through the > metrics servlet on java. In both cases our thread dumps have been littered > with runnable threads along the lines of the sample at the bottom of this > mail. (These are never visible on normal execution) This really has a clear > smell of a thread-safety problem. We have been fine studying the overall > thread safety of the JMX collector code. > > We believe that the LinkedHashMap in the cache at > https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxMBeanPropertyCache.java#L36 > is > in violation of JVM thread safety rules. The unsynchronized initialization > of the LinkedHashMap in this class at line 46 does not guarantee that > clients running on other threads will see the correct values inside this > map, and even if they do it is exposed to hashmap "get" thread safety > issues and potential CPU leakage > The LinkedHashMap is never written to after it is inserted into the ConcurrentHashMap, only read from. So I'd expect that to be safe. Brian > > We have discussed the best way to fix this problem and are somewhat > undecided as to what is the best approach. We will provide a patch, but we > would appreciate your opinions on this issue up-front. > > Kristian > > > java.lang.Thread.State: RUNNABLE > at > io.prometheus.jmx.JmxCollector$Receiver.recordBean(JmxCollector.java:373) > at io.prometheus.jmx.JmxScraper.processBeanValue(JmxScraper.java:199) > at io.prometheus.jmx.JmxScraper.scrapeBean(JmxScraper.java:163) > at io.prometheus.jmx.JmxScraper.doScrape(JmxScraper.java:117) > at io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:473) > at > io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:190) > > at > io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:223) > > at > io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:144) > > at > io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22) > > at > io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:49) > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/526dcf66-8151-4d8d-9487-807cd9ebda08n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/526dcf66-8151-4d8d-9487-807cd9ebda08n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLo5AYErVSwLCqDNQBH7K7%3DNoCU%2BcCpOmJX5jM_%2BqM33bw%40mail.gmail.com.

