Re: [prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Brian Brazil Thu, 20 Feb 2020 05:34:38 -0800

On Thu, 20 Feb 2020 at 13:12, Nikolay Vasiliev <[email protected]>
wrote:


> Let me respond inline.
>
> On Thu, 20 Feb 2020 at 12:19, Brian Brazil <
> [email protected]> wrote:
>
>> On Thu, 20 Feb 2020 at 11:05, Nikolay Vasiliev <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> I would like to ask your opinion about a problem I figured out recently
>>> with how python prometheus client functions in our system.
>>>
>>> *Description of the problem*
>>>
>>> The problem is best described by a test (you can copy-paste it into
>>> test_multiprocessing.py/TestMultiProcess):
>>>
>>> def test_gauge_all_accross_forks(self):
>>>     pid = 0
>>>     values.ValueClass = MultiProcessValue(lambda: pid)
>>>     g1 = Gauge('g1', 'help', registry=None)
>>>     g1.set(1)
>>>     pid = 1
>>>     g2 = Gauge('g2', 'help', registry=None)
>>>     g2.set(2)
>>>     # this works fine
>>>     self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
>>>     self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))
>>>
>>>     # this metric has never been reported from pid:1, thus should not be 
>>> present
>>>
>>>
>> This is incorrect. It was there before the fork, so it's still there
>> after the fork. It never stopped "reporting" at any point.
>>
>>
> The problem for me is that now this metric is in the prometheus output 2
> times, with 2 different sets of labels:
> g1{pid="0"} 1.0
> g1{pid="1"} 0.0
>
> I would like a way not to have that g1{pid="1"} thing in the prometheus
> output.
>
> Do you think it is possible? The "ghost" metric for pid="1" looks wrong.
>

As I said, you'd need to fork&exec. You can't remove metrics in
multiprocessing mode.


>
>
>>
>>
>>>
>>>     self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': '1'}))
>>>
>>>
>>> Or more verbose, steps to reproduce:
>>>
>>> - multiprocessing environment
>>> - report a metric X from parent process (with pid 0)
>>> - fork
>>> - continue reporting metric X from parent process (with pid 0)
>>> - report a metric Y from child process (with pid 1)
>>> - collect metrics via normal mutliprocessing collector
>>>
>>> Expectation:
>>> 1. metric X is reported with label "pid: 0", non-zero value
>>> 2. metric Y is reported with label "pidL 1", non-zero value
>>> 3. metric X is NOT reported with label "pid: 1" (i.e. it is not reported
>>> from pid 1 - should not be present)
>>>
>>> Actual:
>>> 1) and 2) holds, 3) does not.
>>>
>>> I was wondering if I can somehow fix the way we report metrics, on our
>>> side, but I discovered that this is not possible.
>>>
>>> *Results of my investigation*
>>>
>>> As of current master, `values` list here is used to store the list of
>>> all metric values that are synced via a memory-mapped dict:
>>>
>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L40
>>>
>>> And when there's a fork, this list gets copied over to child process.
>>> Then when it detects that there was a pid change, it "resets" the
>>> memory-mapped file,
>>> and goes over all metric values:
>>>
>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L84
>>> and tries to read them from the file:
>>>
>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L73
>>>
>>> But for the parent metrics there isn't anything in the brand new memory
>>> mapped file. So `self._file.read_value(self._key)` initializes it... with 0!
>>>
>>> *Why is it a problem*
>>>
>>> Although it is not blocking us from using the client (yet), it creates N
>>> (number of parent metrics) times M (number of child processes) times series
>>> that are just
>>> occupying the prometheus. But it will eventually become a problem since
>>> the number of metrics is always growing and can reach some critical mass.
>>>
>>
>> Have you considered fork+exec?
>>
>
> We are using multiprocessing.Process, I am honestly not sure exactly how
> it does the job, I only know that it does a fork.
>

That'd just be a fork.

Brian


>
>
>> Brian
>>
>>
>>
>>> TL;DR What do you guys think? I cannot think of a decent solution just
>>> yet.
>>>
>>> Thank you!
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com
>>> <https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Brian Brazil
>> www.robustperception.io
>>
>
>
> --
> Best regards,
> Nikolay Vasiliev.
>


-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLqO%3DbsZc%2B1-ALSO-L28VSSTyp-NPEj8jnbUvfkHshf%3DbQ%40mail.gmail.com.

Re: [prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Reply via email to