Re: [prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Nikolay Vasiliev Thu, 20 Feb 2020 06:39:43 -0800

Thanks for the response.

Now that I looked up what exec means, I see what you mean, if fork would be
followed by an exec() yes it will initialize the entire process from
scratch and hence all the internal state of the prometheus_client. This
unfortunately means that the prometheus library and multiprocessing.Process
are not entirely compatible.


> You can't remove metrics in multiprocessing mode.

Just to be sure, this also means that there's no chance our use case gets
supported by the library?
I can imagine a fix (clearing that `values` list after fork() explicitly
upon a request of the user of the client library) but don't know how well
it goes inline with other code of the library.


On Thu, 20 Feb 2020 at 14:33, Brian Brazil <[email protected]>
wrote:

> On Thu, 20 Feb 2020 at 13:12, Nikolay Vasiliev <[email protected]>
> wrote:
>
>> Let me respond inline.
>>
>> On Thu, 20 Feb 2020 at 12:19, Brian Brazil <
>> [email protected]> wrote:
>>
>>> On Thu, 20 Feb 2020 at 11:05, Nikolay Vasiliev <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to ask your opinion about a problem I figured out recently
>>>> with how python prometheus client functions in our system.
>>>>
>>>> *Description of the problem*
>>>>
>>>> The problem is best described by a test (you can copy-paste it into
>>>> test_multiprocessing.py/TestMultiProcess):
>>>>
>>>> def test_gauge_all_accross_forks(self):
>>>>     pid = 0
>>>>     values.ValueClass = MultiProcessValue(lambda: pid)
>>>>     g1 = Gauge('g1', 'help', registry=None)
>>>>     g1.set(1)
>>>>     pid = 1
>>>>     g2 = Gauge('g2', 'help', registry=None)
>>>>     g2.set(2)
>>>>     # this works fine
>>>>     self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'}))
>>>>     self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'}))
>>>>
>>>>     # this metric has never been reported from pid:1, thus should not be 
>>>> present
>>>>
>>>>
>>> This is incorrect. It was there before the fork, so it's still there
>>> after the fork. It never stopped "reporting" at any point.
>>>
>>>
>> The problem for me is that now this metric is in the prometheus output 2
>> times, with 2 different sets of labels:
>> g1{pid="0"} 1.0
>> g1{pid="1"} 0.0
>>
>> I would like a way not to have that g1{pid="1"} thing in the prometheus
>> output.
>>
>> Do you think it is possible? The "ghost" metric for pid="1" looks wrong.
>>
>
> As I said, you'd need to fork&exec. You can't remove metrics in
> multiprocessing mode.
>
>
>>
>>
>>>
>>>
>>>>
>>>>     self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': 
>>>> '1'}))
>>>>
>>>>
>>>> Or more verbose, steps to reproduce:
>>>>
>>>> - multiprocessing environment
>>>> - report a metric X from parent process (with pid 0)
>>>> - fork
>>>> - continue reporting metric X from parent process (with pid 0)
>>>> - report a metric Y from child process (with pid 1)
>>>> - collect metrics via normal mutliprocessing collector
>>>>
>>>> Expectation:
>>>> 1. metric X is reported with label "pid: 0", non-zero value
>>>> 2. metric Y is reported with label "pidL 1", non-zero value
>>>> 3. metric X is NOT reported with label "pid: 1" (i.e. it is not
>>>> reported from pid 1 - should not be present)
>>>>
>>>> Actual:
>>>> 1) and 2) holds, 3) does not.
>>>>
>>>> I was wondering if I can somehow fix the way we report metrics, on our
>>>> side, but I discovered that this is not possible.
>>>>
>>>> *Results of my investigation*
>>>>
>>>> As of current master, `values` list here is used to store the list of
>>>> all metric values that are synced via a memory-mapped dict:
>>>>
>>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L40
>>>>
>>>> And when there's a fork, this list gets copied over to child process.
>>>> Then when it detects that there was a pid change, it "resets" the
>>>> memory-mapped file,
>>>> and goes over all metric values:
>>>>
>>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L84
>>>> and tries to read them from the file:
>>>>
>>>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L73
>>>>
>>>> But for the parent metrics there isn't anything in the brand new memory
>>>> mapped file. So `self._file.read_value(self._key)` initializes it... with 
>>>> 0!
>>>>
>>>> *Why is it a problem*
>>>>
>>>> Although it is not blocking us from using the client (yet), it creates
>>>> N (number of parent metrics) times M (number of child processes) times
>>>> series that are just
>>>> occupying the prometheus. But it will eventually become a problem since
>>>> the number of metrics is always growing and can reach some critical mass.
>>>>
>>>
>>> Have you considered fork+exec?
>>>
>>
>> We are using multiprocessing.Process, I am honestly not sure exactly how
>> it does the job, I only know that it does a fork.
>>
>
> That'd just be a fork.
>
> Brian
>
>
>>
>>
>>> Brian
>>>
>>>
>>>
>>>> TL;DR What do you guys think? I cannot think of a decent solution just
>>>> yet.
>>>>
>>>> Thank you!
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Prometheus Developers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Brian Brazil
>>> www.robustperception.io
>>>
>>
>>
>> --
>> Best regards,
>> Nikolay Vasiliev.
>>
>
>
> --
> Brian Brazil
> www.robustperception.io
>


-- 
Best regards,
Nikolay Vasiliev.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAKGDKWucYGEKJr6R%3DELFHq5YFKnXy6A8fJd4gWP%3D0eAqq84a4w%40mail.gmail.com.

Re: [prometheus-developers] Python prometheus client - metrics duplicated in multiprocessing mode, what to do?

Reply via email to