[prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Nikolay Vasiliev Thu, 20 Feb 2020 03:05:36 -0800

Hello,

I would like to ask your opinion about a problem I figured out recently 
with how python prometheus client functions in our system.


*Description of the problem*

The problem is best described by a test (you can copy-paste it into 
test_multiprocessing.py/TestMultiProcess):

def test_gauge_all_accross_forks(self):
    pid = 0
    values.ValueClass = MultiProcessValue(lambda: pid)
    g1 = Gauge('g1', 'help', registry=None)
    g1.set(1)
    pid = 1
    g2 = Gauge('g2', 'help', registry=None)
    g2.set(2)
    # this works fine
    self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
    self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))

    # this metric has never been reported from pid:1, thus should not be present
    self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': '1'}))


Or more verbose, steps to reproduce:

- multiprocessing environment
- report a metric X from parent process (with pid 0)
- fork
- continue reporting metric X from parent process (with pid 0)
- report a metric Y from child process (with pid 1)
- collect metrics via normal mutliprocessing collector

Expectation:
1. metric X is reported with label "pid: 0", non-zero value
2. metric Y is reported with label "pidL 1", non-zero value
3. metric X is NOT reported with label "pid: 1" (i.e. it is not reported 
from pid 1 - should not be present)

Actual:
1) and 2) holds, 3) does not.

I was wondering if I can somehow fix the way we report metrics, on our 
side, but I discovered that this is not possible.

*Results of my investigation*

As of current master, `values` list here is used to store the list of all 
metric values that are synced via a memory-mapped dict:
https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L40

And when there's a fork, this list gets copied over to child process. Then 
when it detects that there was a pid change, it "resets" the memory-mapped 
file,
and goes over all metric values:
https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L84
and tries to read them from the file:
https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L73

But for the parent metrics there isn't anything in the brand new memory 
mapped file. So `self._file.read_value(self._key)` initializes it... with 0!

*Why is it a problem*

Although it is not blocking us from using the client (yet), it creates N 
(number of parent metrics) times M (number of child processes) times series 
that are just 
occupying the prometheus. But it will eventually become a problem since the 
number of metrics is always growing and can reach some critical mass.

TL;DR What do you guys think? I cannot think of a decent solution just yet.

Thank you!



-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com.

[prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Reply via email to