Hi Rafał, hi Giedrius, Thanks for your interest.
Any provable performance improvement that doesn't come with a huge increase in code complexity will certainly be welcome. However, I have a few comments below, with the possible conclusion that the effort isn't really worth it in this case (or that the effort would be way more involved than you currently anticipate). On 06.01.25 22:03, Rafał Dowgird wrote: > > The current logic for the duplicates check is in client_golang in the > Gather() method: > https://github.com/prometheus/client_golang/blob/aea1a5996a9d8119592baea7310810c65dc598f5/prometheus/registry.go#L424 > Unfortunately this API can only take a whole set of metrics and answer if > it's consistent. It does so by calculating hashes for the whole set, which > in case of Pushgateway leads to quadratic complexity. Pushgateway keeps a > dynamic set of metrics and needs to keep track of its consistency and you > cannot do it efficiently using the current Gather() API. I wrote this code (both the PGW side and the client_golang side) long ago. My memory might be patchy, but I'll try to recall the rationale from back then. Whenever the PGW (or in fact any program instrumented with prometheus/client_golang) is scraped, the logic implemented in the Gather() method takes place, i.e. in that moment the current state of metrics to be exposed is checked for self-consistency. In the way it is implemented, this is linear with the number of metrics (O(n)), so it is generally an accepted burden and hasn't really been perceived as a problem except in very specialized edge cases (kube-state-metrics is an (in-)famous example). Part of the reason is that a scrape happens relatively rarely (a few times per minute) so that the resource need to serve metrics is usually negligible compared to the resource need of the actual primary task the instrumented program is doing. So what happens during pushing? What we do in the current code is to essentially simulate what would happen if the PGW gets scraped with the newly pushed metrics added to the already existing metrics. This appears quite costly, but the rationale here is that the same cost will be paid again when the PGW is scraped for real. While I said above that scrapes are relatively rare (a few times per minute), pushes happen even less often. This means in turn that all the effort to make the consistency check less expensive will be small compared to the effort required during scraping. While you are technically right that the consistency check is O(n*m) with n being the total number of metrics in the PGW and m being the number of pushes, I doubt that this is the relevant metric to look at. In the same way, you could say that the scrape is quadratic with n being the total number of metrics and m being the number of scrapes. As long as you scrape more often than you push, you have to also change the whole way scraping works to actually make a dent. (This is what kube-state-metric did. They removed all layers of abstractions and are now rendering the metrics output directly. In the PGW case, you could probably follow a less radical approach, but you would still break contracts like Gather() being responsible for the final self-consistency check.) Unless of course you are using the PGW for a use case it is not designed for. You quoted me saying "Pushgateway is not meant to be high performance", but that's not what I said. Pushgateway performs just fine for the use case it was designed for. If you really use it in a situation where you push more often than you scrape, I would be concerned about more things than just performance. You are now funneling a whole lot of metrics through a SPOF. The PGW has no HA story whatsoever, following the idea that it is for metrics that only update a few times a day or so, so that nothing bad will happen if it has some downtime. If a huge number of your frequently updating metrics are lost while the PGW is down, you have a bigger problem than performance. -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] bjo...@rabenste.in -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/prometheus-developers/Z37JriYbNlga1nsN%40mail.rabenste.in.