Hello, I would like to ask your opinion about a problem I figured out recently with how python prometheus client functions in our system.
*Description of the problem* The problem is best described by a test (you can copy-paste it into test_multiprocessing.py/TestMultiProcess): def test_gauge_all_accross_forks(self): pid = 0 values.ValueClass = MultiProcessValue(lambda: pid) g1 = Gauge('g1', 'help', registry=None) g1.set(1) pid = 1 g2 = Gauge('g2', 'help', registry=None) g2.set(2) # this works fine self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'})) self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'})) # this metric has never been reported from pid:1, thus should not be present self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': '1'})) Or more verbose, steps to reproduce: - multiprocessing environment - report a metric X from parent process (with pid 0) - fork - continue reporting metric X from parent process (with pid 0) - report a metric Y from child process (with pid 1) - collect metrics via normal mutliprocessing collector Expectation: 1. metric X is reported with label "pid: 0", non-zero value 2. metric Y is reported with label "pidL 1", non-zero value 3. metric X is NOT reported with label "pid: 1" (i.e. it is not reported from pid 1 - should not be present) Actual: 1) and 2) holds, 3) does not. I was wondering if I can somehow fix the way we report metrics, on our side, but I discovered that this is not possible. *Results of my investigation* As of current master, `values` list here is used to store the list of all metric values that are synced via a memory-mapped dict: https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L40 And when there's a fork, this list gets copied over to child process. Then when it detects that there was a pid change, it "resets" the memory-mapped file, and goes over all metric values: https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L84 and tries to read them from the file: https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L73 But for the parent metrics there isn't anything in the brand new memory mapped file. So `self._file.read_value(self._key)` initializes it... with 0! *Why is it a problem* Although it is not blocking us from using the client (yet), it creates N (number of parent metrics) times M (number of child processes) times series that are just occupying the prometheus. But it will eventually become a problem since the number of metrics is always growing and can reach some critical mass. TL;DR What do you guys think? I cannot think of a decent solution just yet. Thank you! -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com.