On Fri, 5 Jun 2020 at 11:13, Cameron Kerr <[email protected]> wrote:
> Hi all, I've spent a few days coming up to speed on understanding > jmx_exporter, and I think I'm in a pretty good place now, understanding how > MBeans, JMX, and RMI work. > > I've so far deployed jmx_exporter in two ways: > > * as a Java Agent on Tomcat 6 (on RHEL6) for a small application that > includes SOLR and PostgreSQL > * and as a standalone Java HTTP server on Tomcat 8.5.34 (that comes > bundled with a mission-critical application) > > I found going with the Java Agent relatively easy, although I think I'll > contribute a blog post and pull-request to help on the documentation front. > > You might reasonably ask why I'm bothering with the HTTP server. Here's my > business logic that drives this: > > * we have a mission-critical application that we urgently need to improve > our visibility on to diagnose some performance limits we believe we're > reaching > * we're reluctant to introduce (and cause an outage to introduce) a java > agent --- as far as I'm aware jmx_exporter lacks the ability that jconsole > has to dynamically inject an agent. > It's a very simple Java agent that doesn't do anything fancy. I believe it's possible if you already know how to do such things, though using it as an agent is almost always better. > * as part of a previous monitoring drive, we've already introduced the > appropriate Remote JMX configuration (-D....jmxremote... etc.), which means > we can introduce some monitoring into our production environment and easily > restart the JMX exporter as needed to iterate through configuration changes. > The JMX exporter will pick up configuration changes without restarting. > > We recognise that running a separate JVM has its disadvantages, namely: > * it will incur a JVM memory overhead > * it will likely need to be run as the same user with the same > version/type of JVM (I'm not sure if this is accurate, but it seems safer). > * it creates a potential hole (via RMI) in the security boundary of the > application, so we would prefer to house this on the same server (similar > to a 'side-car' type of deployment, I suppose) > The really big disadvantage is that it's much slower. You also lose some process and JVM metrics. > > So most of what I'm about to say is about Remote JMX mode of operation > (but still potentially relevant in part to Agent mode). > > Here's the business value I need to obtain from jmx_exporter: > > 1) provide telemetry we're missing to diagnose urgent and important > production issues, particularly for database connection pools and thread > counts (memory/garbage collection would also be useful in the general case, > and application-specific MBeans that would be useful in specific cases, > such as applications that use SOLR or particular frameworks that instrument > various URL handlers with nice statistics) > 2) impart minimal changes to application runtime or risk changing > behaviour in mission-critical production application > 3) impart minimal changes in performance; we don't want to induce > unreasonable load by introducing monitoring. > > As I understand it, the current implementation of jmx_exporter uses a > MBean level of querying the Attributes available within an MBean, effecting > providing a 'batch' sort of API which reduces the number of RMI round-trips > in the expectation that this is faster than what JConsole does by querying > each individual Attribute (more round-trips, potentially over a remote > connection). This does make the assumption though that the time spent (and > value received) from querying all of the attributes is worthwhile. Let's > see where this assumption, well-intentioned as it is, leads us in practice: > > I want to get telemetry around ThreadPool usage within Tomcat, so looking > at JConsole, I see the following > > [image: 2020-06-05 17_38_53-RHEL Server 7 [Running] - Oracle VM > VirtualBox.png] > > Great, connectionCount, currentThreadCount and currentThreadBusy look to > be things I would definately be interested in, I'm unlikely to use most of > the rest. > > Clicking on the 'http-nio-8082', I see the ObjectName being the following, > which I put into my whitelistObjectNames > > Catalina:type=ThreadPool,name="http-nio-8082" > > So now my configuration looks something like the following: > > --- > hostPort: 127.0.0.1:9090 > username: > password: > ssl: false > > lowercaseOutputLabelNames: true > lowercaseOutputName: true > > # You really MUST use some whitelisting to select the bits of JMX you > actually want. > # You DO NOT want to querying the entire MBean tree by default, which is > what you > # get by default. This will likely take about 10 seconds depending and may > have > # unintended side-effects, such as introducing lock contention > potentially, or > # causing database queries to be run. > # > whitelistObjectNames: [ > 'Catalina:type=ThreadPool,name="http-nio-8082"' > ] > > # It's not enough to simply grab the data; we need to do something with it > to > # generate it into metrics, otherwise that's potentially a lot of effort > wasted > # getting all that raw data (you did use a whitelist, right?) > # > rules: > > # Ah, due to a bug that was fixed in Tomcat 8.5.35 (our app bundles > 8.5.34), this results in a > # serialization error. Because the socketProperties is not serialisable > (it shows as 'Unavailable' in JConsole) > # it faults the entire request for that object and returns an exception > over the wire. > # > # https://bz.apache.org/bugzilla/show_bug.cgi?id=62871 > # > - pattern: 'Catalina<type=ThreadPool, > name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|connectionCount):' > name: tomcat_threadpool_$3 > labels: > port: "$2" > protocol: "$1" > help: Tomcat threadpool $3 > type: GAUGE > > > (I've spoiled the story with the comment, but that's okay...) > > The problem (as other people have bumped into) is that Tomcat < 8.5.35, > and other things will exhibit this behaviour also, is that ..... hang on, > let me back up a bit to add some understanding to how this works: > > An MBean is essentially an object (okay, a subclass) that implements an > Interface. Anything in Java can create MBeans; common examples being things > like Tomcat, large libraries, and even the Java base environment itself. > All these MBeans get registered into JMX (Java Management Extensions) which > provide some structure and discoverability for tools like JConsole (or > jmx_exporter). MBeans essentially expose various Attributes (methods that > essentially 'getSomething'), Operations (other methods that might be used > to change runtime state), and Notifications (which we completely ignore, > along with Operations, for the purposes of jmx_exporter. > > JMX Exporter (in its HTTP server, external process form) connects (call it > the 'client') to the (Tomcat) JVM ('server') over an RMI connection. This > is effectively a form of IPC, where the client can invoke methods (RMI = > Remote Method Invocation) on the server. So when you get the value of an > Attribute, you are essentially calling some getSomething() method in an > MBean. What you get from that is up to whatever implemented it (ie. you get > a Plain-Old-Java-Object, or POJO for short). But to get from the 'server' > over the RMI connection to the 'client' it needs to be serialised to be > sent over the wire, deserialised at the other end, and then evaluated. > > Take socketProperties for example. I don't care about it; I care about > currentThreadCount etc. But the problem with Tomcat (fixed in Tomcat > 8.5.35, if you have the luxury of moving to that; our vendor-supplied > application bundles Tomcat 8.5.34) is that its implementation of the > 'getter' method for socketProperties returns something that is not > serialisable (it doesn't implement that expected method). This becomes a > problem at the point where it needs to be serialised, which is RMI. This > results in an exception. > > Because jmx_exporter is using a method that says 'give me all the > attributes for MBean B', that exception basically junks the whole result, > and I lose the result of currentThreadCount etc. with it. > > JConsole on the other hand uses the slower-but-steadier 'tell me what > attributes exist in MBean B' followed by a lot of 'give me Attribute A for > MBean B', it can handle that exception (showing it as a red 'Unavailable') > The JMX exporter used to go attribute by attribute, switching to batch gave a substantial speedup. That's not a change I'd be looking to undo as it'd make the jmx exporter unusable for too many users, for the sake of a small handful of poor JMX implementations. > > Now let's look at another similar case; one where there are no bugs > present. In this example I want to get information about database > connection pool utilisation because this is valuable information and a > common load-related performance issue (this tends to be true of > connection-pools in general, such as for LDAP, but you get plenty of > third-party libraries in the JDBC space). > > For this you'll need to find some suitable MBeans, assuming if they are > even visible at all; one of my studies had a Tomcat 6 deployment with > PostgreSQL and it didn't seem to expose any MBeans that I could see, my > other study had Tomcat 7 and the MBeans lived in a domain specific to the > application (in this case, an online learning product called Blackboard). > > [image: 2020-06-05 20_40_34-RHEL Server 7 [Running] - Oracle VM > VirtualBox.png] > > Note that the ClassName is org.apache.tomcat.jdbc.pool.jmx.ConnectionPool > .... but its the application that decides where to put the MBean and what > to use as the ObjectName, so if the application is managing its own > connection pools (rather than using a connection-pool provided by the > middleware), prepared to hunt around it. The ClassName does come into play > though, because that tells us what data is inside the MBean (and helps us > find some documentation as to what those attributes might actually mean). > > So let's see what attributes this fairly common class exposes for > monitoring: There are some obvious things here we would want to measure, > either as GAUGES or as COUNTERS, but most of it we wouldn't need or want. > In this screenshot, remember that I'm using Remote JMX, and JConsole is > also using Remote JMX in this instance. If you hover over the red > Unavailable for JdbcInterceptorsAsArray, you see the exception that causes > it to be unavailable, and it's the same exception you see in the > jmx_exporter (or more accurately, ./jmx_prometheus_httpserver.jar) when you > have debug logging enabled. > > [image: 2020-06-05 21_29_05-RHEL Server 7 [Running] - Oracle VM > VirtualBox.png] > > java.rmi.UnmarshalException: error unmarshalling return; nested exception > is: > java.lang.ClassNotFoundException: > org.apache.tomcat.jdbc.pool.PoolProperties$InterceptorDefinition (no > security manager: RMI class loader disabled) > > Let's unpack this a bit to understand what this means: the RMI client > (Jconsole in Remote JMX mode, or jmx_prometheus_httpserver.jar) has > received a serialised version of a class called > org.apache.tomcat.jdbc.pool.PoolProperties$InterceptorDefinition, and it > needs to deserialise it to extract a value from it (eg. a string value or > floating point value). But to do that it needs to have that class available > somewhere. You can see that this class is specific to Tomcat and JDBC, so > Jconsole (or jmx_prometheus_httpserver.jar) won't be likely to have that > available. > > Presumably you could hunt around (a lot) and stuff a lot of things into > the classpath of the RMI client, but that's painful, needless work (I tried > and failed, but I'm not enough of a Tomcat wizard to know how to determine > what classpath is present (classloaders, yay) for that webapp etc. > > Alternatively, I could apparently make use of 'RMI class loader' which > sends the classes over the wire too to be loaded on the client side --- and > also have to navigate a security manager --- that's a learning path I may > have to attempt next. > > Either way, considering I have no interest in JdbcInterceptorsAsArray > anyway, all I want is Active, Idle, Size and a few counters that bear > critical importance for my monitoring. But if I can't get a complete result > set, I get nothing. > > > Let's recap and see how this affects the value I'm expecting to achieve: > > 1) provide telemetry we're missing to diagnose urgent and important > production issues, particularly for database connection pools and thread > counts (memory/garbage collection would also be useful in the general case, > and application-specific MBeans that would be useful in specific cases, > such as applications that use SOLR or particular frameworks that instrument > various URL handlers with nice statistics) > 2) impart minimal changes to application runtime or risk changing > behaviour in mission-critical production application > 3) impart minimal changes in performance; we don't want to induce > unreasonable load by introducing monitoring. > > #1 is mostly unattainable either because something is not serialisable on > the RMI server side, or is not serialisable on the RMI client side. All I > can get are the 'nice-to-haves'. > #2 would be met by Remote JMX; if I have to use the Agent then my > lead-time for introducing monitoring increases, decreasing my agility and > ability to quickly withdraw the functionality in a production environment > without an application restart. > #3 with appropriate whitelisting of ObjectNames we can get most of the way > there and could reasonably scrape the metrics once a minute without fear, > although some MBeans do become very large, particularly if they contain > arrays, when you often only need a small handful of attributes. If we can > scrape a smaller set however, we could achieve a higher fidelity if > desired, which might paint a truer picture if all you have to work with are > gauges. > > > I would like to propose that we introduce one of two things: > > EITHER add a new attribute whitelistObjectNameAttributes that could be > used for Jconsole-style attribute at a time (or similar; can you grab a few > named attributes in one go?), which would allow for either the broad-brush > or fine-brush approach to collecting the data; > I'd be open to considering an attribute name blacklist, if there's a sane way to specify that. > > OR allow for using the slower attribute-at-a-time as either an option or > as a fallback. > I'd rather not, the performance difference is just too much. On a higher level, I'd suggest looking at ways to get metrics that don't involve JMX. Using client_java directly as far as possible will avoid all the fun and performance issues that JMX brings. Brian > > Personally I would prefer the first option because I would much rather > pick and choose, since I need to be familiar with what data is available > anyway in order to use it effectively. > > I'm not a Java programmer (at all, but I am a bit of a polyglot and I've > been supporting Java workloads for years) but I'd be willing to give a go > at implementing this and submitting a pull-request if people would be > interested in receiving one. > > > PS. If anyone would like an Ansible playbook for deploying > jmx_prometheus_httpserver.jar I'm willing to share what I have so far. > > PPS. If anyone has experience setting up RMI class loader, I'd love some > tips. > > Thanks for reading this far, and I hope this (long) post helps people to > understand and use jmx_exporter more effectively. Once I complete some of > this, you can expect some documentation-related PRs > > Cameron > > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/04481e73-465e-4815-a6a9-4697c4e930ceo%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/04481e73-465e-4815-a6a9-4697c4e930ceo%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLr34vDLatV-KS%3D5GH%2BRWw3aKOpk3T7bpGzEKTxNNcu6Lg%40mail.gmail.com.

