On Thu, 20 Feb 2020 at 13:12, Nikolay Vasiliev <[email protected]> wrote:
> Let me respond inline. > > On Thu, 20 Feb 2020 at 12:19, Brian Brazil < > [email protected]> wrote: > >> On Thu, 20 Feb 2020 at 11:05, Nikolay Vasiliev <[email protected]> >> wrote: >> >>> Hello, >>> >>> I would like to ask your opinion about a problem I figured out recently >>> with how python prometheus client functions in our system. >>> >>> *Description of the problem* >>> >>> The problem is best described by a test (you can copy-paste it into >>> test_multiprocessing.py/TestMultiProcess): >>> >>> def test_gauge_all_accross_forks(self): >>> pid = 0 >>> values.ValueClass = MultiProcessValue(lambda: pid) >>> g1 = Gauge('g1', 'help', registry=None) >>> g1.set(1) >>> pid = 1 >>> g2 = Gauge('g2', 'help', registry=None) >>> g2.set(2) >>> # this works fine >>> self.assertEqual(1, self.registry.get_sample_value('g1', {'pid': '0'})) >>> self.assertEqual(2, self.registry.get_sample_value('g2', {'pid': '1'})) >>> >>> # this metric has never been reported from pid:1, thus should not be >>> present >>> >>> >> This is incorrect. It was there before the fork, so it's still there >> after the fork. It never stopped "reporting" at any point. >> >> > The problem for me is that now this metric is in the prometheus output 2 > times, with 2 different sets of labels: > g1{pid="0"} 1.0 > g1{pid="1"} 0.0 > > I would like a way not to have that g1{pid="1"} thing in the prometheus > output. > > Do you think it is possible? The "ghost" metric for pid="1" looks wrong. > As I said, you'd need to fork&exec. You can't remove metrics in multiprocessing mode. > > >> >> >>> >>> self.assertIsNone(1, self.registry.get_sample_value('g1', {'pid': '1'})) >>> >>> >>> Or more verbose, steps to reproduce: >>> >>> - multiprocessing environment >>> - report a metric X from parent process (with pid 0) >>> - fork >>> - continue reporting metric X from parent process (with pid 0) >>> - report a metric Y from child process (with pid 1) >>> - collect metrics via normal mutliprocessing collector >>> >>> Expectation: >>> 1. metric X is reported with label "pid: 0", non-zero value >>> 2. metric Y is reported with label "pidL 1", non-zero value >>> 3. metric X is NOT reported with label "pid: 1" (i.e. it is not reported >>> from pid 1 - should not be present) >>> >>> Actual: >>> 1) and 2) holds, 3) does not. >>> >>> I was wondering if I can somehow fix the way we report metrics, on our >>> side, but I discovered that this is not possible. >>> >>> *Results of my investigation* >>> >>> As of current master, `values` list here is used to store the list of >>> all metric values that are synced via a memory-mapped dict: >>> >>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L40 >>> >>> And when there's a fork, this list gets copied over to child process. >>> Then when it detects that there was a pid change, it "resets" the >>> memory-mapped file, >>> and goes over all metric values: >>> >>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L84 >>> and tries to read them from the file: >>> >>> https://github.com/prometheus/client_python/blob/ce7063fc2957716aa6fa9dc4f49bd970ad1249ed/prometheus_client/values.py#L73 >>> >>> But for the parent metrics there isn't anything in the brand new memory >>> mapped file. So `self._file.read_value(self._key)` initializes it... with 0! >>> >>> *Why is it a problem* >>> >>> Although it is not blocking us from using the client (yet), it creates N >>> (number of parent metrics) times M (number of child processes) times series >>> that are just >>> occupying the prometheus. But it will eventually become a problem since >>> the number of metrics is always growing and can reach some critical mass. >>> >> >> Have you considered fork+exec? >> > > We are using multiprocessing.Process, I am honestly not sure exactly how > it does the job, I only know that it does a fork. > That'd just be a fork. Brian > > >> Brian >> >> >> >>> TL;DR What do you guys think? I cannot think of a decent solution just >>> yet. >>> >>> Thank you! >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-developers/4a6f0abb-9555-4ead-be58-007b45120db4%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> Brian Brazil >> www.robustperception.io >> > > > -- > Best regards, > Nikolay Vasiliev. > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLqO%3DbsZc%2B1-ALSO-L28VSSTyp-NPEj8jnbUvfkHshf%3DbQ%40mail.gmail.com.

