Re: [prometheus-developers] python client support for z-pages

2023-09-06 Thread Chris Marchbanks
Hello Brian,

First of all, thank you for the proposal. My initial thought is that since
this functionality would not be used by Prometheus or a Prometheus related
component that it is beyond the scope of a Prometheus client library. I do
like the idea of being able to use metrics as a signal for the result of a
/healthz endpoint though, so if it is challenging to get the current value
of a metric that is something I would consider improving.

Thanks again for the proposal and I am curious what others think as well,
Chris

On Mon, Sep 4, 2023 at 9:14 AM 'Brian Horakh' via Prometheus Developers <
prometheus-developers@googlegroups.com> wrote:

> I've opened a feature/issue that I plan to implement for my org and submit
> upstream to the Prometheus client for python.
>
> The link is here:
> https://github.com/prometheus/client_python/issues/953
>
> The proposal is to equip the prometheus client with the ability to
> concurrently respond with an HTTP 200 /healthz page that can be *easily*
> integrated into control planes (ex: AWS ECS & K8s both use z-pages) ..
>
> Control planes use z-pages (pioneered by Google, but widely adopted by
> most load balancers) to determine if an application is alive/functioning
> properly based on the HTTP response code.   If an application fails to
> return an HTTP 200 after a configured amount of intervals the container is
> terminated and a new container is spun up.   The python client for
> prometheus has it's own webserver internally, so I'm proposing to implement
> z-pages capability in the client.
>
> To be clear:  I'm NOT proposing adding functionality to Prometheus.  As
> far as Prometheus core is concerned these are nothing special "Counters".
>
> Github user: roidelapluie  requested I
> submit my proposal to the broader Prometheus community for discussion &
> feedback, which is welcome!
>
> My proposed design:
> The current implementation for Prometheus Counter is the super class, and
> the proposed "HealthzCounter" will fully inherit all those capabilities &
> behaviors.
>
> Application develops implementing east/west telemetry in their
> applications can then place HealthzCounters at one or more critical points
> in the codepath to track if an application is working (i.e. is the main
> loop running or blocked).
> For applications using python asyncio then it would be appropriate to
> implement one HealthzCounter per critical event loop.
>
> The behavior *if* for example, an MQ or DB client disconnects and
> doesn't/can't reconnect, it's either running an error path or a success
> path, and either can be attributed to a plurality of potential underlying
> reasons .. the application can be easily terminated in a standard way by
> the container orchestrator.
>
> The intention is to place these counters into the critical codepath, the
> HealthzCounter requires a heartbeat, thus after an interval it trips a
> deadman switch) .. this would then cause the /healthz url to return a non
> HTTP-200 informing the clustering orchestration software to perform a Roy
> from IT crowd solution ("hello IT dept., have you tried turning it off an
> on again!?")
>
> The business case:
> My organization is in the process of implementing all our applications
> with east/west metrics and I have gotten tentative approval to develop this
> feature and upstream the work.
>
> While it would be better to fix the bugs in the app, the reset itself is
> often the first "routine" step in troubleshooting.   The control plane will
> keep a log of the resets, etc. because when an application is restarted
> it's counter is also reset to zero (so this makes tracking the entire
> maneuver quite easy and obvious in a tool like grafana using promql)
>
> I will be faster to respond on the github issue, if you don't mind
> responding there with feedback or ideas, but will try to keep an eye on
> this group as well for the next few days.
>
> Also since this is my first time posting to the prometheus devs list --
> need to say: thank you for what you do & what you have done!!
>
> Cheers,
>
> -Brian Horakh
> Software Engineer
> Habitat.Energy Australia
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/b9186e94-8433-4081-a87c-26479bbedbccn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion 

Re: [prometheus-developers] [VOTE] Promote Windows Exporter as an official exporter

2022-12-05 Thread Chris Marchbanks
YES

On Mon, Dec 5, 2022 at 3:44 AM Julien Pivotto 
wrote:

> Dear Prometheans,
>
> As per our governance [1], "any matter that needs a decision [...] may
> be called to a vote by any member if they deem it necessary."
>
> I am therefore calling a vote to promote Prometheus-community's Windows
> Exporter [2] to Prometheus GitHub org, to make it an official exporter.
>
> Official exporters are exporters under the Prometheus github org, listed
> as official on Prometheus.io and available under the Downloads page.
>
> This would provide recognition and credibility to the exporter and its
> contributors, which have provided a large amount of work in the last
> years, and built a huge community.
>
> It would make it easier for users to find and use the exporter, as it
> would be listed on the Prometheus website and promoted on the other
> official channels - such as our announce mailing list.
>
> Anyone interested is encouraged to participate in this vote and this
> discussion. As per our governance, only votes from the team members will
> be counted.
>
> Vote is open for 1 week - until December 12.
>
> [1] https://prometheus.io/governance/
> [2] https://github.com/prometheus-community/windows_exporter
>
> --
> Julien Pivotto
> @roidelapluie
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/Y43Lmr2%2Bb2fk8YSz%40nixos
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovVumdoLy2Lspb2t-TwHQSDGM7FMBkGh0dCzgWbudKaPHw%40mail.gmail.com.


Re: [prometheus-developers] Welcoming Bryan Boreham to the Prometheus team

2022-10-07 Thread Chris Marchbanks
Welcome to the team Bryan!

On Fri, Oct 7, 2022 at 2:44 AM Julien Pivotto 
wrote:

> Dear Prometheans,
>
> The Prometheus team is growing: I am happy to announce that Bryan
> Boreham is joining our team!
>
> Bryan has done a lot of work in the past year to improve the
> performances of the Prometheus server. You might have seen his work at
> play by looking at the memory usage of your prometheus setup over
> different upgrades, and he is working on further improvements.
>
> Welcome Bryan and thanks for your continuous work!
>
> --
> Julien Pivotto
> @roidelapluie
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/Yz/m7%2BxHTP37kh%2B%2B%40nixos
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovXW6CF%2BgJYwYJd10whCvO5G%3DdBjPHdGwz0KKJvU1P8_UA%40mail.gmail.com.


Re: [prometheus-developers] [VOTE] Rename blackbox_exporter to prober

2022-01-25 Thread Chris Marchbanks
NO

I generally agree with Goutham and would be okay with probe_exporter.

On Tue, Jan 25, 2022 at 7:53 AM Goutham Veeramachaneni
 wrote:
>
> NO
>
> I don't think prober is a good name and I think the change will cause a lot 
> of confusion longer-term. I might be okay with probe_exporter but prober is a 
> NO from me. I am sorry that I missed suggesting this and engaging with the 
> previous thread :/
>
> Thanks
> Goutham
>
> On Thu, Jan 20, 2022 at 3:41 PM Julien Pivotto  
> wrote:
>>
>> Dear Prometheans,
>>
>> As per our governance, I'd like to cast a vote to rename the Blackbox
>> Exporter to Prober.
>> This vote is based on the following thread:
>> https://groups.google.com/g/prometheus-developers/c/advMjgmJ1E4/m/A0abCsUrBgAJ
>>
>> Any Prometheus team member is eligible to vote, and votes for the
>> community are welcome too, but do not formally count in the result.
>>
>> Here is the content of the vote:
>>
>> > We want to rename Blackbox Exporter to Prober.
>>
>> I explicitly leave out the "how" out of this vote. If this vote passes,
>> a specific issue will be created in the blackbox exporter repository
>> explaining how I plan to work and communicate on this change. I will
>> make sure that enough time passes so that as many people as possible can
>> give their input on the "how".
>>
>> The vote is open until February 3rd. If the vote comes positive before
>> next week's dev summit, the "how" can also be discussed during the dev
>> summit, and I would use that discussion as input for the previously
>> mentioned github issue.
>>
>> --
>> Julien Pivotto
>> @roidelapluie
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-developers/20220120144119.GA522055%40hydrogen.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/CAKQV3GHMyncF3%2Bp5DxU1SH9ZxdUgwd7rag9U5XZBQk%3D6zp%3DnsQ%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovVmc7qwb5WyO0_%2B1fDbBy-%3DBXb6F8o7-ey61tj09SEtRg%40mail.gmail.com.


[prometheus-developers] Proposal to End Python 2 Support

2021-11-01 Thread Chris Marchbanks
Hello Prometheus Developers,

I just opened https://github.com/prometheus/client_python/issues/717 with a
proposal to end support for Python 2.7, and thus Python 2. I would
appreciate any feedback you have on the proposal.

Download numbers are now low enough, and enough other projects have
discontinued Python 2 support that I believe it is time for us to follow
suit.

Thanks,

Chris

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovW8M8rGPzmcLkRG6BUSzyx_zuQuqfbbJc5sq7gUdifUvA%40mail.gmail.com.


Re: [prometheus-developers] [python prometheus_client] is Gauge.set guaranteed to always cast as float given value?

2021-04-28 Thread Chris Marchbanks
Hi Jonathan,

As it stands I would say that is an implementation detail that could
change, for example we might treat integers differently from floats in the
future. That is just the status quo, and if the community would like to see
this as the official behavior we could add a test case for setting values
to booleans. OpenMetrics even specifies that booleans must be represented
as 0/1 so I think there is a reasonable case to make it official.

Chris

On Wed, Apr 28, 2021 at 4:16 AM Jonathan Martin 
wrote:

> Hi there,
>
> I have a question relative to Gauge and booleans, if you don't mind.
>
> I'm considering to use a gauge to reflect the evolution of a boolean value.
>
> Knowing that the set method of a Gauge cast the given value as a float, I
> could call directly my_gauge.set(my_bool_value), and have it set to 0.0 or
> 1.0.
>
> So the question here is: Is the casting as float part of the API design,
> or simply an implementation detail which may change?
>
> If we have no guarantee that this cast will remain, then the call to set
> needs to be protected by my_gauge.set(float(my_bool_value)), but it is a
> pity to call it twice...
>
> It's a nice built in capability, so if it is going to last, I may argue
> that it would be worth mentioning it explicitly somewhere, either in the
> doc of the *set* function or the documentation of the Gauge (
> https://github.com/prometheus/client_python#gauge).
>
> Thanks in advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/ffcee28e-d367-4897-8b30-ff5a1e28daean%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovVD7FhvAHdu74%2BG00CBZJO-NpRU%3DmkTDYSwb%2B2VPQ-e8w%40mail.gmail.com.


Re: [prometheus-developers] Python Multiprocess

2021-04-07 Thread Chris Marchbanks
Hello,

It appears that there is a subtle bug/misunderstanding in the code that is
linked, though that is possibly due to the multiprocess documentation not
being clear enough. When the code specifies a registry for each metric (
example
),
it is causing both a process-local metric, and the multi process metrics to
be registered, so depending on which process handles the request you will
get different responses for the metric with "Request latency" in the HELP
text. In the multiprocess documentation

this is what is meant by "Registries can not be used as normal, all
instantiated metrics are exported". If you remove the registry=registry
lines in the example you will see just the multiprocess output as expected.

You could also move the registry and MultiProcessCollector code into the
request handler to make it clear that the registry used by the
MultiProcessCollector should not have anything registered to it, as seen in
the example in the multiprocess documentation I linked above.

Let me know if that was unclear or you have more questions,
Chris

On Wed, Apr 7, 2021 at 8:12 AM Esau Rodriguez  wrote:

> Hi all,
> I'm not sure if I'm missing something but I'm seeing a behaviour with the
> python client and multiprocess using gunicorn and flask I'm not sure if I'm
> missing something or there's a bug there.
>
> When I hit the endpoint producing the prometheus text to be scrapped I'm
> seeing 2 versions for the same metrics with different help texts. I would
> expect to see only one metric (the multiprocess one).
>
> I thought I had something wrong in my setup so I tried it with e pretty
> simple project that I found here
> https://github.com/amitsaha/python-prometheus-demo/tree/master/flask_app_prometheus_multiprocessing
> (not my code).
>
> I hit a random url and then the `/metrics` endpoint
>
> You can see in the raw response down here we have 2 entries for each
> metric, with different `types` and `help` texts. In this example there
> really wasn't any processes but in the real example in prod we have several
> processes and we see the prometheus scraper `picks` a different value
> depending on the order of the response.
>
> Am I missing something or is there a bug there?
>
> The raw response was:
>
> 
> % curl --location --request GET 'http://localhost:5000/metrics'
> # HELP request_latency_seconds Multiprocess metric
> # TYPE request_latency_seconds histogram
> request_latency_seconds_sum{app_name="webapp",endpoint="/metrics"}
> 0.00040912628173828125
> request_latency_seconds_sum{app_name="webapp",endpoint="/"}
> 0.0001652240753173828
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.005"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.01"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.025"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.05"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.075"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.1"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.25"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.5"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="0.75"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="1.0"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="2.5"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="5.0"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="7.5"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="10.0"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/metrics",le="+Inf"}
> 1.0
> request_latency_seconds_count{app_name="webapp",endpoint="/metrics"} 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.005"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.01"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.025"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.05"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.075"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.1"} 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.25"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.5"} 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="0.75"}
> 1.0
> request_latency_seconds_bucket{app_name="webapp",endpoint="/",le="1.0"} 1.0
> 

Re: [prometheus-developers] Remote-write drop samples | design doc

2021-03-01 Thread Chris Marchbanks
Harkishen, thank you very much for the design document!

My initial thoughts are to agree with Stuart (as well as some users in the
linked github issue) that it makes the most sense to start with dropping
data that is older than some configured age. The default being to never
drop data. For most outage scenarios I think this is the easiest to
understand, and if there is an outage retrying old data x times still does
not help you much.

There are a couple use cases that an age based solution doesn't solve
ideally:
1. An issue where bad data is causing the upstream system to break, e.g. I
have seen a system return a 5xx due to a null byte in a label value causing
some sort of panic. This blocks Prometheus from being able to process any
samples newer than that bad sample. Yes this is an issue with the remote
storage, but it sucks when it happens and it would be nice to have an easy
workaround while a fix goes into the remote system. In this scenario, only
dropping old data still means you wouldn't be sending anything new for
quite awhile, and if the bad data is persistent you would likely just end
up 10minutes to an hour behind permanently (whatever you set the age to be).
2. Retrying 429 errors, a new feature currently behind a flag, but it could
make sense to only retry 429s a couple of times (if you want to retry them
at all) but then drop the data so that non-rate limited requests can
proceed in the future.

I think to start with the above limitations are fine and the age based
system is probably the way to go. I also wonder if it is worth defining a
more generic "retry_policies" section of remote write that could contain
different options for 5xx vs 429.

On Mon, Mar 1, 2021 at 3:32 AM Ben Kochie  wrote:

> If a remote write receiver is unable to ingest, wouldn't this be something
> to fix on the receiver side? The receiver could have a policy where it
> drops data rather than returning an error.
>
> This way Prometheus sends, but doesn't have to need to know or deal with
> ingestion policies. It sends a bit more data over the wire, but that part
> is cheap compared to the ingestion costs.
>

I certainly see the argument that this could all be cast as a receiver-side
issue, but I have also personally experienced outages that were much harder
to recover from due to a thundering herd scenario once the service was
restored. E.g. cortex distributors (where an ingestion policy would be
implemented) effectively locking up or OOMing at a high enough request
rate. Also, an administrator may not be able to update whatever remote
storage solution they use. This becomes even more painful in a resource
constrained environment. The solution right now is to go restart all of
your Prometheus instances to indiscriminately drop data, I would prefer to
be intentional about what data is dropped.

I would certainly be happy to jump on a call sometime with interested
parties if that would be more efficient :)

Chris

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovVB_xBAXuLhZSBD8nh2LcH%3D-Svq1Ys_m99%3DL-jF3TP0Gg%40mail.gmail.com.


Re: [prometheus-developers] Prometheus

2021-01-28 Thread Chris Marchbanks
Thank you Brian for everything you have contributed over the years! Your
dedication to the project was impressive, and I appreciate everything I
have learned from my interactions with you.

I wish you success on your current and future projects,
Chris

On Thu, Jan 28, 2021 at 3:22 AM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> Hi all,
>
> I first got involved in Prometheus back in 2014 when I was looking for a
> monitoring system, and nothing that was out there seemed to quite cut it.
> Since then I've worked across and helped expand the ecosystem. I've
> reviewed
> and merged 2542 PRs, in addition to creating 670 PRs containing 1867
> commits
> myself. I've created exporters and client libraries, wrote extensive docs,
> and made
> a multitude of improvements to Prometheus from performance to features.
>
> All of this takes a non-trivial amount of my time, and there's more to life
> than maintaining open source projects. Accordingly I have decided to step
> back
> and resign from prometheus-team, in order to focus my efforts more on
> other things
> including Robust Perception.
> I will of course still be part of the ecosystem, helping out fixing bugs,
> answering
> questions, and so on. While I look forward to reducing my workload, I know
> Prometheus
> will remain in good hands.
>
> Yours,
> --
> Brian Brazil
> www.robustperception.io
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLrqLd5BYVZKvrW-A7MW%3DZksB9qP4Lk%3Dt2BLK-CcVok%2BrA%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovWGoLodTPbkF2GoAT_G_gaZGtXm8ECEbfae6_gqN3Peog%40mail.gmail.com.


Re: [prometheus-developers] Introduce the concept of scrape Priority for Targets

2020-07-30 Thread Chris Marchbanks
I do think we need a better way to detect when we are overloaded,
especially with respect to memory usage, and have defined behavior for how
we handle backpressure in those cases. The current experience of entering
OOM loops is frustrating, and makes it hard to debug as you can't query
anything to see what caused the extra load. HA is also not helpful in this
case as both instances will have similar data and OOM at the same time.

Perhaps after general overloading/backpressure is defined, higher level
ideas such as priority can be introduced, but I also agree that it might be
best to just run multiple instances.

Chris

On Thu, Jul 30, 2020 at 3:19 AM Julien Pivotto 
wrote:

> The problem is not that much priorities etc, it is all the questions and
> confusions around this:
>
> - When do we decide we are overloaded?
> - What do we do for the low priority targets?
>
> and more importantly:
>
> - When do we decide that we can scrape the low targets again?
>
> How to avoid:
>
> High load -> stop low scrapes
> -> Normal load (because we do not scrape low priorities) -> restart low
> scrapes
> -> High load -> stop low scrapes
> -> Normal load (because we do not scrape low priorities) -> restart low
> scrapes
> -> High load -> stop low scrapes
> -> Normal load (because we do not scrape low priorities) -> restart low
> scrapes
>
>
> Overall that does not seem easy questions.
>
> On 30 Jul 10:10, Bartłomiej Płotka wrote:
> > Yes, looks like having many scrapers would solve this, and having Thanos
> on
> > top for query aggregation can do. However, given the overhead of even
> > operating the TSDB instances like Prometheus (e.g maintaining persistence
> > volumes), I would still see some longer-term solution of better
> multitenant
> > support (isolation of tenants scrape) within scrape engine. Some
> > alternative is dynamic relabelling configured from outside as seen here
> >
> https://blog.freshtracks.io/bomb-squad-automatic-detection-and-suppression-of-prometheus-cardinality-explosions-62ca8e02fa32
> > -
> > I think with good monitoring of Prometheus health we could implement
> > "sidecar" applying such priorities dynamically as well. That would be
> good
> > for a star maybe (:
> >
> > In the meantime, the separate scraper looks like the way to go.
> >
> > Kind Regards,
> > Bartek
> >
> > On Thu, 30 Jul 2020 at 10:01, Lili Cosic  wrote:
> >
> > > Thanks, everyone for the replies! The official msg seems to be to use a
> > > Prometheus instance per tenant/priority if you want to have multiple
> > > tenants in your environment.
> > >
> > > Kind regards,
> > > Lili
> > >
> > > On Thursday, 30 July 2020 10:44:59 UTC+2, Ben Kochie wrote:
> > >>
> > >> I'm with Brian and Julian on this.
> > >>
> > >> Multi-tenancy is not really something we want to solve in Prometheus.
> > >> This is a concern for higher level systems like Kubernetes.
> Prometheus is
> > >> designed to be distributed. If you have targets with different needs,
> they
> > >> need to have separate Prometheus instances.
> > >>
> > >> This is also why we have things like Thanos and Cortex as aggregation
> > >> layers.
> > >>
> > >> Similar to why we have said we don't plan to implement IO limits,
> this is
> > >> a scheduling concern, out of scope for Prometheus.
> > >>
> > >> On Thu, Jul 30, 2020, 10:31 Frederic Branczyk 
> wrote:
> > >>
> > >>> That's only effective in limiting the number of targets, the point
> here
> > >>> is that selectively scraping those with a higher priority based on
> > >>> backpressure of the system as a whole.
> > >>>
> > >>> On Wed, 22 Jul 2020 at 17:00, Julien Pivotto <
> roidel...@prometheus.io>
> > >>> wrote:
> > >>>
> >  On 22 Jul 16:47, Frederic Branczyk wrote:
> >  > In practice even that can still be problematic. You only know that
> >  > Prometheus has a problem when everything fails, the point is to
> keep
> >  things
> >  > alive well enough for more critical components.
> >  >
> >  > On Wed, 22 Jul 2020 at 16:38, Julien Pivotto <
> roidel...@prometheus.io
> >  >
> >  > wrote:
> >  >
> >  > > On 22 Jul 16:36, Frederic Branczyk wrote:
> >  > > > It's unclear how that helps, can you help me understand?
> >  > >
> >  > > - job: highprio
> >  > >   relabel_configs:
> >  > >   - target_label: job
> >  > > replacement: pods
> >  > >   - source_labels: [__meta_pod_priority]
> >  > > regex: high
> >  > > action: keep
> > 
> >  highprio job will always be scraped.
> > 
> >  > > - job: lowprio
> >  > >   relabel_configs:
> >  > >   - target_label: job
> >  > > replacement: pods
> >  > >   - source_labels: [__meta_pod_priority]
> >  > > regex: high
> >  > > action: drop
> >  > >   target_limit: 1000
> >  > >
> >  > > >
> >  > > > On Wed, 22 Jul 2020 at 16:34, Julien Pivotto <
> >  roidel...@prometheus.io
> >  > > >
> >  > > > wrote:
> >  > > >
> >  

Re: [prometheus-developers] [VOTE] Allow listing non-SNMP exporters for devices that can already be monitored via the SNMP Exporter

2020-05-28 Thread Chris Marchbanks

YES

On Thu, May 28, 2020 at 9:30 pm, Julius Volz  
wrote:

Dear Prometheans,

 proposes to list an 
exporter for Fortigate devices that uses a REST API. Brian's stance 
so far has been to only add exporters for metrics that cannot be 
already produced by another exporter on the list at 
, with the 
intention of focusing community efforts rather than spreading them 
over multiple competing exporters for the same thing.


Since the Fortigate device can also be monitored via SNMP, Brian's 
argument is that the SNMP Exporter 
() already covers this 
use case and a more specialized REST-API-based exporter for Fortigate 
should not be listed. On the other hand, many people feel that SNMP 
is painful to work with, and monitoring via a REST API is much 
preferrable. (Further, and not directly relevant to this vote, the 
Fortigate REST API also reports some information that is not 
available via SNMP (see 
)).


In general I like the guideline of not adding too many competing 
exporters for the same thing to our public integrations list, but I 
believe that it should only be a guideline. I believe that SNMP is 
enough of an operational pain that we should allow exceptions for 
more specific, non-SNMP-based exporters like the linked one. It seems 
that the discussion has been had and is laid out in the issue.


I therefore call a vote for the following proposal:

Allow adding exporters to 
 although the 
devices or applications that they export data for can already be 
monitored via SNMP (and thus via the SNMP Exporter). This proposal 
does not affect other criteria that we may use in deciding whether to 
list an exporter or not.


The vote closes on 2020-06-04 20:30 UTC.

Cheers,
Julius
 --
 You received this message because you are subscribed to the Google 
Groups "Prometheus Developers" group.
 To unsubscribe from this group and stop receiving emails from it, 
send an email to prometheus-developers+unsubscr...@googlegroups.com 
.
 To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YozJK6v%2B9HtYC_n1F7fMH_XPVJqd%2BtU_shAihc70jp%3Dn2g%40mail.gmail.com 
.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/8BP2BQ.G5L7O895A2RZ%40gmail.com.


Re: [prometheus-developers] [VOTE] Allow Kelvin as temperature unit in some cases

2020-05-28 Thread Chris Marchbanks

YES

On Thu, May 28, 2020 at 8:52 pm, Bjoern Rabenstein  
wrote:

Dear Prometheans,

So far, we have recommended Celsius as the base unit for temperatures,
despite Kelvin being the SI unit. That was well justified by the
overwhelming majority of use cases, where Kelvin would be just
weird. I'd really like to see more scientific usage of Prometheus, so
I was never super happy with that recommendation, but since it was
just a recommendation, I could live with it.

Now Matt Layher came up with another, more technical use case: color
temperature. Here, using Celsius would be even weirder. So there is a
case where you clearly do not want to follow the suggestion of the
linter, which is more in line with typical Prometheus use cases than
my arguably somewhat far fetched time series for low-temperature
experiments.

Therefore, Matt suggested to make the metrics linter not complain
about kelvin.

I think this is a clearly defined problem with clear arguments and a
clear disagreement between Brian Brazil on the one side and Matt and
myself on the other side. The appropriate amount of effort has been
spent to find a consensus. All arguments can be found in
 and
 .

I hereby call a vote for the following proposal:

Allow Kelvin as a base unit in certain cases and update our
documented recommendation and the linter code accordingly.


(The changes may take the form of the two PRs out there, but the vote
in about the general idea above, not the implementation detail.)


The vote closes on 2020-06-04 20:00 UTC.
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in 

--
You received this message because you are subscribed to the Google 
Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to prometheus-developers+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/SZO2BQ.2RQROSN209B71%40gmail.com.


Re: [prometheus-developers] how to build npm_licenses.tar.bz2 by using go natively

2020-04-30 Thread Chris Marchbanks

Hello,

npm_licenses.tar.bz2 is built from the react frontend javascript 
dependencies. You can build it by running "make npm_licenses" but you 
will need to install yarn for the command to succeed.


Cheers,

Chris

On Thu, Apr 30, 2020 at 16:37, Guanpeng Gao  wrote:

Hi there
npm_licenses.tar.bz2 is needed in Dockerfile
How to build this thing without promu, can I build it by using go 
natively?

thank you
 --
 You received this message because you are subscribed to the Google 
Groups "Prometheus Developers" group.
 To unsubscribe from this group and stop receiving emails from it, 
send an email to prometheus-developers+unsubscr...@googlegroups.com 
.
 To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ab304bdf-2677-4018-b8b7-f6f4ce813b68%40googlegroups.com 
.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/81WM9Q.CANM3PC4SKKX1%40gmail.com.


Re: [prometheus-developers] Call for Consensus: node_exporter 1.0.0 release

2020-04-23 Thread Chris Marchbanks

Yes to releasing now.

On Thu, Apr 23, 2020 at 13:40, Richard Hartmann 
 wrote:

Dear all,

This is a call for consensus within Prometheus-team on releasing
node_exporter 1.0.0 as-is.

node_exporter 1.0.0-rc.0 has been cut on 2020-02-20[1]. It features
experimental TLS support[2]. We are planning to use this TLS support
as a template for all other exporters within and outside of Prometheus
proper. To make sure we didn’t build a footgun nor that we’re 
holding

it wrong, CNCF is sponsoring an external security review by Cure53. We
have not been giving a clear timeline but work should start in week 22
(May 25th) at the latest with no time to completion stated.

There are two positions:
* Wait for the security review to finish before cutting 1.0.0
* Release ASAP, given that this feature is clearly marked as
experimental and it will not see wider testing until we cut 1.0.0

I am asking Prometheus-team to establish rough consensus with a hum.

Should the maintainers (Ben & Fish) be allowed to release without
waiting for the audit to finish?


Best,
Richard

[1] 


[2] 

--
You received this message because you are subscribed to the Google 
Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to prometheus-developers+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/00199Q.0BW99PEPRXG62%40gmail.com.


Re: [prometheus-developers] Reduce list of GOOS/GOARCH crossbuilt for each PR

2020-02-12 Thread Chris Marchbanks
I also support this, waiting 2-3 hours for the build job to finish is
frustrating. I know that building on 32 bit architectures does not catch
all issues, specifically the alignment bug using the atomic package.
Perhaps add at least one 32 bit build on the pull request though?

Is it worth it to build everything on every master, or should a build all
job be added to the nightly build? I agree that we should build everything
on a cadence more frequent than a release.

On Wed, Feb 12, 2020 at 1:58 AM 'Matthias Rampke' via Prometheus Developers
 wrote:

> I would build everything on master, that way we catch *before* starting a
> release if there is something wrong.
>
> /MR
>
> On Wed, Feb 12, 2020 at 8:37 AM Sylvain Rabot 
> wrote:
>
>> I did not say it but I was speaking of prometheus/prometheus, I haven't
>> checked others repos for their full cross-building time.
>>
>> I think we can come up with a minimal list of GOOS/GOARCH for PRs but, if
>> you think building the complete list on tags only is not enough, we could
>> do it on tags & master.
>>
>> If we were to choose to build the complete list for tags only I would
>> suggest to build this for PRs:
>>
>> - linux/amd64
>> - linux/386
>> - linux/arm
>> - linux/arm64
>> - darwin/amd64
>> - windows/amd64
>> - freebsd/amd64
>> - openbsd/amd64
>> - netbsd/amd64
>> - dragonfly/amd64
>>
>> If we were to choose to build the complete list for tags & master then I
>> would suggest an even more reduced one:
>>
>> - linux/amd64
>> - darwin/amd64
>> - windows/amd64
>> - freebsd/amd64
>>
>> Regards.
>>
>> On Tue, 11 Feb 2020 at 23:17, Matthias Rampke  wrote:
>>
>>> There are some exceptions like node exporter where it's important that
>>> all variants at least build, but that has a custom setup already.
>>>
>>> What would be a sufficient subset? Do we need to worry about endianness
>>> and 32 bit architectures, or would just building not catch issues specific
>>> to these anyway?
>>>
>>> /MR
>>>
>>> On Tue, 11 Feb 2020, 22:50 Krasimir Georgiev, 
>>> wrote:
>>>
 I think that is a very good idea.

 On Feb 11 2020, at 11:19 pm, Sylvain Rabot 
 wrote:

 Hi,


 I'm wondering if we could reduce the list of GOOS/GOARCH that are 
 crossbuilt for every PR by circle-ci.


 The building of the complete list seems like a waste of time & resources 
 to me.


 Maybe we could select a few and only build the complete list when building 
 tags ?


 Regards.


 --
 Sylvain Rabot 

 --
 You received this message because you are subscribed to the Google
 Groups "Prometheus Developers" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to prometheus-developers+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/prometheus-developers/CADjtP1FJKyVj_gq-hgVgyyVbJ%3D-pECFqcPK-QviXmKB1R-oAgg%40mail.gmail.com
 
 .

 --
 You received this message because you are subscribed to the Google
 Groups "Prometheus Developers" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to prometheus-developers+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/prometheus-developers/2508CDF1-CC2A-4AC6-B9EE-D68B53AFF166%40getmailspring.com
 
 .

>>>
>>
>> --
>> Sylvain Rabot 
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/CAFU3N5V6MiMGH32a4OB0KMfJmV7FBZbJWEe6HZ-z9%2BmiOkqusQ%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovU3e_avaLtSnTwYzYz5v-Aj7M_VZSi%3DA05i90T3txZ6EA%40mail.gmail.com.