Re: [prometheus-users] how to get count of no.of instance

2024-05-28 Thread 'Brian Candler' via Prometheus Users
ut > > On Sunday, May 26, 2024 at 1:24:10 PM UTC+5:30 Brian Candler wrote: > >> The labels for the two sides of the division need to match exactly. >> >> If they match 1:1 except for additional labels, then you can use >> xxx / on (foo,bar) yyy # foo,bar are t

Re: [prometheus-users] Pod with Pending phase is in endpoints scraping targets (Prometheus 2.46.0)

2024-05-27 Thread 'Brian Candler' via Prometheus Users
Have you looked in the changelog for Prometheus? I found: ## 2.51.0 / 2024-03-18 * [BUGFIX] Kubernetes SD: Pod status changes were not discovered by Endpoints service discovery #13337

Re: [prometheus-users] how to get count of no.of instance

2024-05-26 Thread 'Brian Candler' via Prometheus Users
The labels for the two sides of the division need to match exactly. If they match 1:1 except for additional labels, then you can use xxx / on (foo,bar) yyy # foo,bar are the matching labels or xxx / ignoring (baz,qux) zzz # baz,qux are the labels to ignore If they match N:1 then you need to

[prometheus-users] Re: Regular Expression and Label Action Support to match two or more source labels

2024-05-22 Thread 'Brian Candler' via Prometheus Users
ility or > workaround in the old (< 2.41) Prometheus releases on this topic? > > /Teja > > On Wednesday, May 22, 2024 at 12:01:31 PM UTC+2 Brian Candler wrote: > >> Yes, there are similar relabel actions "keepequal" and "dropequal": >>

[prometheus-users] Re: Regular Expression and Label Action Support to match two or more source labels

2024-05-22 Thread 'Brian Candler' via Prometheus Users
Yes, there are similar relabel actions "keepequal" and "dropequal": https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config These were added in v2.41.0 / 2022-12-20

[prometheus-users] Re: All Samples Lost when prometheus server return 500 to prometheus agent

2024-05-19 Thread 'Brian Candler' via Prometheus Users
: 1m > external_labels: > clusterName: clustertest150 > clusterRegion: region0 > clusterZone: zone1 > prometheus: ccos-monitoring/agent-0 > prometheus_replica: prometheus-agent-0-0 > keep_dropped_targets: 1 > > and the remote write con

[prometheus-users] Re: hundreds of containers, how to alert when a certain container is down?

2024-05-18 Thread 'Brian Candler' via Prometheus Users
Monitoring for a metric vanishing is not a very good way to do alerting. Metrics hang around for the "staleness" interval, which by default is 5 minutes. Ideally, you should monitor all the things you care about explicitly, get a success metric like "up" (1 = working, 0 = not working) and then

[prometheus-users] Re: Alertmanager frequently sending erroneous resolve notifications

2024-05-18 Thread 'Brian Candler' via Prometheus Users
> What can be done? Perhaps the alert condition resolved very briefly. The solution with modern versions of prometheus (v2.42.0 or later) is to do this: for: 2d keep_firing_for: 10m The alert won't be resolved unless it has been

[prometheus-users] Re: All Samples Lost when prometheus server return 500 to prometheus agent

2024-05-17 Thread 'Brian Candler' via Prometheus Users
It's difficult to make sense of what you're saying. Without seeing logs from both the agent and the server while this problem was occurring (e.g. `journalctl -eu prometheus`), it's hard to know what was really happening. Also you need to say what exact versions of prometheus and the agent were

[prometheus-users] Re: what insecure_skip_verify will do

2024-05-16 Thread 'Brian Candler' via Prometheus Users
wrote: > So here is the update i did try this insecure skip but i am still getting > below error, > > tls: failed to verify certificate: x509: certificate signed by unknown > authority > > On Thursday, May 16, 2024 at 1:28:43 PM UTC+5:30 Brian Candler wrote: > >

[prometheus-users] Re: what insecure_skip_verify will do

2024-05-16 Thread 'Brian Candler' via Prometheus Users
It depends what you mean by "secure". It's encrypted, because you've told it to use HTTPS (HTTP + TLS). If the remote end doesn't talk TLS, then the two won't be able to establish a connection at all. However it is also insecure, because the client has no way of knowing whether the remote

[prometheus-users] Re: Locatinme in Alertmanager

2024-05-09 Thread 'Brian Candler' via Prometheus Users
Can you describe what the actual problem is? Are you seeing an error message, if so what is it? Why are you defining a time interval of 00:00 to 23:59, which is basically all the time apart from 1 minute between 23:59 and 24:00? You also don't seem to be referencing it from a routing rule. In

[prometheus-users] Re: Does anyone have any examples of what a postgres_exporter.yml file is supposed to look like?

2024-05-08 Thread 'Brian Candler' via Prometheus Users
...then move on to configuring *prometheus* I meant. On Wednesday 8 May 2024 at 07:11:46 UTC+1 Brian Candler wrote: > - job_name: 'postgresql_exporter' > static_configs: > - targets: ['host.docker.internal:5432'] > > One problem I can see is that you're trying to get prome

[prometheus-users] Re: Does anyone have any examples of what a postgres_exporter.yml file is supposed to look like?

2024-05-08 Thread 'Brian Candler' via Prometheus Users
- job_name: 'postgresql_exporter' static_configs: - targets: ['host.docker.internal:5432'] One problem I can see is that you're trying to get prometheus to scrape the postgres SQL port. If you go to the Prometheus web UI and look at the Status > Targets menu option, I think you will see it's

[prometheus-users] Re: Compare metrics with differents labels

2024-04-30 Thread 'Brian Candler' via Prometheus Users
;prod", > instance="kafka-exporter.monitor:9308", job="kafka-exporter", > partition="0", topic="TOPIC-NOTIFICATION-PUSH"} > 31267495 > kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", &g

[prometheus-users] Re: Compare metrics with differents labels

2024-04-30 Thread 'Brian Candler' via Prometheus Users
t; 0 > {consumergroup="$Default", topic="TOPIC-NOTIFICATION-TESTE"} > 1.25 > {consumergroup="$Default", topic="TOPIC-NOTIFICATION-SMS"} > 0 > {consumergroup="$Default", topic="TOPIC-NOTIFICATION-WHATSAPP"} > 0 > {consum

[prometheus-users] Re: Compare metrics with differents labels

2024-04-30 Thread 'Brian Candler' via Prometheus Users
st 5 > minutes is 0 and the production of messages is greater than 1 in the topic, > then the group of consumers is not consuming messages and I wanted to > return which groups and topics these would be > Em sexta-feira, 19 de abril de 2024 às 15:36:44 UTC-3, Brian Candler > escreveu:

[prometheus-users] Re: Compare metrics with differents labels

2024-04-19 Thread 'Brian Candler' via Prometheus Users
ond, while increase(foo[5m]) gives the increase per 5 minutes. Hence: rate(kafka_consumergroup_current_offset[5m]) * 60 increase(kafka_consumergroup_current_offset[5m]) / 5 should both be the same, giving the per-minute increase. On Friday 19 April 2024 at 18:30:21 UTC+1 Brian Candler wrote: >

[prometheus-users] Re: Compare metrics with differents labels

2024-04-19 Thread 'Brian Candler' via Prometheus Users
Sorry, first link was wrong. https://groups.google.com/g/prometheus-users/c/IeW_3nyGkR0/m/unto0oGQAQAJ https://groups.google.com/g/prometheus-users/c/83pEAX44L3M/m/E20UmVJyBQAJ On Friday 19 April 2024 at 18:28:29 UTC+1 Brian Candler wrote: > Can you give examples of the metrics in quest

[prometheus-users] Re: Compare metrics with differents labels

2024-04-19 Thread 'Brian Candler' via Prometheus Users
Can you give examples of the metrics in question, and what conditions you're trying to check for? Looking at your specific PromQL query: Firstly, in my experience, it's very unusual in Prometheus queries to use ==bool or >bool, and in this specific case definitely seems to be wrong. Secondly,

Re: [prometheus-users] Re: Need urgent help!!! Want to modify tags "keys" to lowercase scraping from Cloudwatch-Exporter in Prometheus before sending to Mimir #13912

2024-04-18 Thread 'Brian Candler' via Prometheus Users
No. That test case demonstrates that it is the label *values* that are downcased, not the label names, exactly as you said. On Thursday 18 April 2024 at 13:07:51 UTC+1 Vaibhav Ingulkar wrote: > Thanks @Brian Candler > > Actually not possible fixing the data at source due to

Re: [prometheus-users] Re: Need urgent help!!! Want to modify tags "keys" to lowercase scraping from Cloudwatch-Exporter in Prometheus before sending to Mimir #13912

2024-04-18 Thread 'Brian Candler' via Prometheus Users
ercase. >> >> Here my requirement is to convert labels i.e. keys to lowercase for ex. >> *tag_Budget_Code* to *tag_budget_code* or *tag_Name* to *tag_name* >> >> On Thursday, April 18, 2024 at 2:26:10 PM UTC+5:30 Brian Candler wrote: >> >>>

Re: [prometheus-users] Re: Need urgent help!!! Want to modify tags "keys" to lowercase scraping from Cloudwatch-Exporter in Prometheus before sending to Mimir #13912

2024-04-18 Thread 'Brian Candler' via Prometheus Users
On Thursday 18 April 2024 at 09:42:41 UTC+1 Ben Kochie wrote: Prometheus can lower/upper in relabeling. Thanks! That was added in v2.36.0 , and I missed it. -- You received this message because you are subscribed to the Google

[prometheus-users] Re: Need urgent help!!! Want to modify tags "keys" to lowercase scraping from Cloudwatch-Exporter in Prometheus before sending to Mimir #13912

2024-04-18 Thread 'Brian Candler' via Prometheus Users
> Need urgent help!!! See https://www.catb.org/~esr/faqs/smart-questions.html#urgent > we can add *only one pattern (Uppercase or lowercase)* in template code. At worst you can match like this: tag_Name=~"[fF][oO][oO][bB][aA][rR]" I don't know of any way internally to prometheus to lowercase

[prometheus-users] Re: many-to-many not allowed error

2024-04-18 Thread 'Brian Candler' via Prometheus Users
Look at the results of each half of the query separately: redis_memory_max_bytes{k8s_cluster_name="$cluster", namespace="$namespace", pod="$pod_name"} redis_instance_info{role=~"master|slave"} You then need to find some set of labels which mean that N entries on the left-hand side

Re: [prometheus-users] Re: Config DNS Prometheus/Blackbox_Exporter

2024-04-18 Thread 'Brian Candler' via Prometheus Users
> target_label: instance >> #QUERY >> - source_labels: [dns] >> #target_label: __param_hostname >> target_label: __param_target >> # Populate __address__ with the address of the blackbox exporter to hit >> - target_label: __address__ >> replaceme

[prometheus-users] Re: Prometheus Azure Service Discovery behind a proxy server

2024-04-15 Thread 'Brian Candler' via Prometheus Users
> Is there a way to enable or add proxy config just for the service discoery and microsoft authentication part ? The configuration of azure sd is here: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#azure_sd_config It has its own local settings for proxy_url,

Re: [prometheus-users] Re: Config DNS Prometheus/Blackbox_Exporter

2024-04-12 Thread 'Brian Candler' via Prometheus Users
t addresses). - Geo-aware DNS generally takes place for the user-visible query names (like "www.google.com") and generally are affected by the *source* address where the query is coming from. On Friday 12 April 2024 at 14:21:57 UTC+1 Conall O'Brien wrote: > On Wed, 10 Apr 2024 at

[prometheus-users] Re: Config DNS Prometheus/Blackbox_Exporter

2024-04-09 Thread 'Brian Candler' via Prometheus Users
9, 2024 a la(s) 12:19:25 PM UTC-4, Vincent Romero > escribió: > >> i will try make build, with this change >> >> >> >> El Saturday, April 6, 2024 a la(s) 2:45:29 PM UTC-3, Brian Candler >> escribió: >> >>> You're correct that currently the q

[prometheus-users] Re: what to do about flapping alerts?

2024-04-08 Thread 'Brian Candler' via Prometheus Users
On Monday 8 April 2024 at 20:57:34 UTC+1 Christoph Anton Mitterer wrote: Assume the following (arguably a bit made up) example: One has a metric that counts the number of failed drives in a RAID. One drive fails so some alert starts firing. Eventually the computing centre replaces the drive and

[prometheus-users] Re: Config DNS Prometheus/Blackbox_Exporter

2024-04-06 Thread 'Brian Candler' via Prometheus Users
You're correct that currently the qname is statically configured in the prober config. A patch was submitted to allow what you want, but hasn't been merged: https://github.com/prometheus/blackbox_exporter/pull/1105 You can build blackbox_exporter yourself with this patch applied though. On

[prometheus-users] Re: what to do about flapping alerts?

2024-04-06 Thread 'Brian Candler' via Prometheus Users
> but AFAIU that would simply affect all alerts, i.e. it wouldn't just keep firing, when the scraping failed, but also when it actually goes back to an ok state, right? It affects all alerts individually, and I believe it's exactly what you want. A brief flip from "failing" to "OK" doesn't

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

2024-04-03 Thread 'Brian Candler' via Prometheus Users
On Wednesday 3 April 2024 at 16:01:21 UTC+1 mohan garden wrote: Is there a way i can see the entire message which alert manager sends out to the Opsgenie? - somewhere in the alertmanager logs or a text file? You could try setting api_url to point to a webserver that you control. -- You

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

2024-04-03 Thread 'Brian Candler' via Prometheus Users
t;> >> >> but i was expecting an additional host=server2 tag on the ticket. >> in Summary - i see updated description , but unable to see updated tags. >> >> in tags section of the alertmanager - opsgenie integration configuration >> , i had tried iterat

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

2024-04-02 Thread 'Brian Candler' via Prometheus Users
FYI, those images are unreadable - copy-pasted text would be much better. My guess, though, is that you probably don't want to group alerts before sending them to opsgenie. You haven't shown your full alertmanager config, but if you have a line like group_by: ['alertname'] then try

Re: [prometheus-users] Assistance Needed with Prometheus and Alertmanager Configuration

2024-03-30 Thread 'Brian Candler' via Prometheus Users
ss, leading to confusion and potentially overlooking critical alerts. > > I would greatly appreciate any further insights or recommendations you may > have to address this issue and ensure alignment between Prometheus and > Alertmanager in terms of the number of alerts generated and disp

Re: [prometheus-users] Assistance Needed with Prometheus and Alertmanager Configuration

2024-03-30 Thread 'Brian Candler' via Prometheus Users
On Friday 29 March 2024 at 22:09:18 UTC Chris Siebenmann wrote: I believe that recording rules and alerting rules similarly may have their evaluation time happen at different offsets within their evaluation interval. This is done for the similar reason of spreading out the internal load of

[prometheus-users] Re: Relabeling for proxied hosts

2024-03-28 Thread 'Brian Candler' via Prometheus Users
According to the source in prometheus-common/model/labels.go, these are the only declared magic labels: const ( // AlertNameLabel is the name of the label containing the an alert's name. AlertNameLabel = "alertname" // ExportedLabelPrefix is the prefix to prepend to the

[prometheus-users] Re: [snmp-exporter] when will --config.expand-environment-variables be available?

2024-03-26 Thread 'Brian Candler' via Prometheus Users
It's in git head, so it's available now if you compile snmp_exporter from source. Otherwise you need to wait until the next release. I don't know when that will be. On Tuesday 26 March 2024 at 08:49:45 UTC ohey...@gmail.com wrote: > Readme on Github shows this option, but it's not available. >

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-03-22 Thread 'Brian Candler' via Prometheus Users
Personally I think you're looking at this wrong. You want to "capture" single scrape failures? Sure - it's already being captured. Make yourself a dashboard. But do you really want to be *alerted* on every individual one-time scrape failure? That goes against the whole philosophy of

[prometheus-users] Re: HTTPS proxy_url

2024-03-20 Thread 'Brian Candler' via Prometheus Users
The error "http2: unsupported scheme" might be affected by this setting: # Whether to enable HTTP2. [ enable_http2: | default: true ] Whether that will fix your problem I don't know. If you've deployed

[prometheus-users] Re: Thanos sidecar installation

2024-03-20 Thread 'Brian Candler' via Prometheus Users
Follow the Thanos documentation linked from https://github.com/thanos-io/thanos?tab=readme-ov-file#getting-started In particular: https://thanos.io/tip/thanos/quick-tutorial.md/ shows running the sidecar. On Wednesday 20 March 2024 at 10:09:45 UTC BHARATH KUMAR wrote: > Hello All, > > I

Re: [prometheus-users] blackbox_exporter 0.24.0 and smokeping_prober 0.7.1 - DNS cache "nscd" not working

2024-03-20 Thread 'Brian Candler' via Prometheus Users
> To be able to use DNS caching (without rebuilding), one would need a local DNS server with enabled cache on the system which is referenced in the resolv.conf. That's what systemd does: its cache binds to 127.0.0.53, and then you point to 127.0.0.53 in /etc/resolv.conf On Wednesday 20 March

[prometheus-users] Re: Get prometheus snapshot for specific timeperiod.

2024-03-17 Thread 'Brian Candler' via Prometheus Users
> To capture data for a specific duration, can you provide the URL query that takes time parameters, such as start and finish times. No I can't, because there is no feature for that - as the API documentation makes clear. > Snapshot creates a snapshot of* all current data *into snapshots/-

[prometheus-users] Re: Get prometheus snapshot for specific timeperiod.

2024-03-17 Thread 'Brian Candler' via Prometheus Users
I'm not sure what you mean by "preserve prometheus snapshot" - AFAIK the snapshot remains forever until you delete it. If you mean you want to delete snapshots when they reach a particular age, then you can do that yourself from a cronjob. e.g. for 90 days retention: find /snapshots -mtime +90

[prometheus-users] Re: Best practive: "job_name in prometheus agent? Same job_name allowed ?

2024-03-15 Thread 'Brian Candler' via Prometheus Users
ever I think I will have a problem because if I use "127.0.0.1:9100" > as target to scrape then all instances are equal. > > Is there any possibility to use a variable in the scrape_config which > reflects any environment variable from linux system or any other mechanism >

[prometheus-users] Re: Best practive: "job_name in prometheus agent? Same job_name allowed ?

2024-03-14 Thread 'Brian Candler' via Prometheus Users
As long as all the time series have distinct label sets (in particular, different "instance" labels), and you're not mixing scraping with remote-writing for the same targets, then I don't see any problem with all the agents using the same "job" label when remote-writing. On Tuesday 12 March

[prometheus-users] Re: disable all alerts for a job

2024-03-12 Thread 'Brian Candler' via Prometheus Users
option 1: filter them out in alertmanager, with an extra routing rule that matches on the 'job' label and delivers to a null receiver. option 1b: create a long-lived silence in alertmanager that matches on the 'job' label option 2: drop them in alert_relabel_configs

[prometheus-users] Re: drop all some metrics based on regex

2024-03-12 Thread 'Brian Candler' via Prometheus Users
Thanks. I always forget that labels starting with __ are automatically dropped after target relabelling, but not metric relabelling. On Monday 11 March 2024 at 20:49:42 UTC Ben Kochie wrote: > The other way you can do this is with the "__tmp_keep" pattern. This is > where you positively tag

[prometheus-users] Re: drop all some metrics based on regex

2024-03-11 Thread 'Brian Candler' via Prometheus Users
You can use temporary variables. Something like this (untested): metric_relabel_configs: - source_labels: [__name__, name] regex: 'node_systemd_unit_state;(ssh|apache).*' target_label: __tmp_keep replacement: y - source_labels: [__name__, __tmp_keep] regex:

[prometheus-users] Re: PromQL - Check for specific value in the past

2024-03-06 Thread 'Brian Candler' via Prometheus Users
You can use a subquery which will sample the data, something like this: bgp_state_info != 3 and present_over_time((bgp_state_info == 3)[60d:1h]) You can reduce the sampling interval from 1h to reduce the risk of missing times when BGP was up, but then the query becomes increasingly expensive.

[prometheus-users] Re: Powershell POST to pushgateway

2024-03-04 Thread 'Brian Candler' via Prometheus Users
not work for me. > > echo "metricname1 101 > " | Invoke-WebRequest -Uri http://192.168.1.111:9091/metrics/job/jobname1 > -Method POST > *Invoke-WebRequest : text format parsing error in line 1: expected float > as value, got "101\r"* > > On Monday, Marc

[prometheus-users] Re: Powershell POST to pushgateway

2024-03-04 Thread 'Brian Candler' via Prometheus Users
https://superuser.com/questions/344927/powershell-equivalent-of-curl On Monday 4 March 2024 at 07:10:18 UTC Leen Tux wrote: > Hi > What is the powershell command equivalent to: > *$ echo 'metricname1 101' | curl --data-binary @- >

[prometheus-users] Re: Best practices to using "xxxxx_info" gauge metric

2024-03-04 Thread 'Brian Candler' via Prometheus Users
Yes, it's good practice and you can read about it here: https://www.robustperception.io/how-to-have-labels-for-machine-roles https://www.robustperception.io/exposing-the-software-version-to-prometheus You may also find these relevant: https://www.robustperception.io/left-joins-in-promql

[prometheus-users] Re: Hi All

2024-03-01 Thread 'Brian Candler' via Prometheus Users
Sorry, but you have the wrong group. This is for Prometheus, not Grafana. Questions about Grafana should be addressed to the Grafana community: https://community.grafana.com/ On Friday 1 March 2024 at 21:49:11 UTC+7 h0ksa wrote: > > Hello everyone! I have a Grafana dashboard with two panels and

[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

2024-02-29 Thread 'Brian Candler' via Prometheus Users
On Thursday 29 February 2024 at 18:46:05 UTC+7 Puneet Singh wrote: The default node exporter has the ability to report the disk usage at user level in my context? - by extending it via any flag ( i came across the text collector and i plan to explore that.) or writing the custom exporter

[prometheus-users] Re: consul discovery

2024-02-29 Thread 'Brian Candler' via Prometheus Users
Consul is a Hashicorp product. How you configure and manage consul is not really a topic for a Prometheus mailing list. See https://www.consul.io/community for a list of Consul community resources. On Thursday 29 February 2024 at 00:33:07 UTC+7 sri L wrote: > Hi all, > > I am trying to

[prometheus-users] Re: User level disk usage monitoring and notification - with prometheus and alertmanager

2024-02-29 Thread 'Brian Candler' via Prometheus Users
> I don't think *condition1* and *condition2* will work as labels and label values returned by condition1 and condition2 are different. condition1 if on (instance,mountpoint) group_left(username) condition2 This assumes that the both expressions have "instance" and "mountpoint" labels; these

[prometheus-users] Re: Integrating Prometheus with Splunk and ServiceNow for automated ticket creation.

2024-02-26 Thread 'Brian Candler' via Prometheus Users
> Invalid authorization Seems you're not authorizing to Splunk properly. Can you point to their documentation which says how you need to authenticate to their API? I note you're using http rather than https, so HTTP basic auth is probably not allowed (it's insecure, it sends the username and

[prometheus-users] Re: Metrics from PUSH Consumer - Relabeled Metrics? Check "Up" state?

2024-02-26 Thread 'Brian Candler' via Prometheus Users
> I am still looking for a solution to identify if a device which uses "PUSH" method is not sending data anmore for e.g. 10 minutes. Push an additional metric which is "last push time", and check when that value is more than 10 minutes earlier than the current time. If you already have a

[prometheus-users] Re: PromQL: understanding the and operator

2024-02-23 Thread 'Brian Candler' via Prometheus Users
On Saturday 24 February 2024 at 01:00:57 UTC+7 Alexander Wilke wrote: Another possibility could be QueryA + queryB == 0 #both down No, that doesn't work, for exactly the same reason that "QueryA and QueryB" doesn't work. With a binary expression like "foo + bar", each side is a vector, and

[prometheus-users] Re: PromQL: understanding the and operator

2024-02-23 Thread 'Brian Candler' via Prometheus Users
On Friday 23 February 2024 at 02:28:52 UTC+7 Puneet Singh wrote: Now i tried to find the time duration where both these service were simultaneously down / 0 on both server1 and server2 : (sum without (USER) ( *go_service_status{HOSTNAME="server1",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}*) <

[prometheus-users] Re: snmp_exporter - Generator.yml configuration

2024-02-21 Thread 'Brian Candler' via Prometheus Users
You need to collect the metric "ifHCInOctets". The module "if_mib" in the supplied sample generator.yml does this. On Wednesday 21 February 2024 at 00:39:19 UTC+7 Mitchell Laframboise wrote: > Hi there. What do I have to include in my generator.yml configuration to > scrape data for this

[prometheus-users] Re: interference between prometheus and rabbitmq when starting

2024-02-19 Thread 'Brian Candler' via Prometheus Users
If RabbitMQ attempts to start but fails, the reason will be shown in its output (e.g. "journalctl -eu rabbitmq" or whatever the service name is). One guess is that RabbitMQ is trying to bind to the same port as prometheus. Prometheus uses port 9090 by default, and I think RabbitMQ uses port

[prometheus-users] Re: prometheus ui on local computer

2024-02-18 Thread 'Brian Candler' via Prometheus Users
localhost:9090 is what you'd enter if prometheus was running on the same machine as your browser. In this case it's remote, so enter :9090 where is the IP-address of the server where prometheus is running. On Monday 19 February 2024 at 07:13:02 UTC Leah Stapleton wrote: > Hello, > This is

[prometheus-users] Re: Hi

2024-02-14 Thread 'Brian Candler' via Prometheus Users
ommendations on how to formulate this query? > On Wednesday, February 14, 2024 at 2:28:33 PM UTC+1 Brian Candler wrote: > >> I'm not sure why you're summing over increase, but if you plot that >> PromQL expression in the web UI, does its value drop to zero when the >> problem

[prometheus-users] Re: Hi

2024-02-14 Thread 'Brian Candler' via Prometheus Users
4 at 11:37:32 UTC h0ksa wrote: > sum by (table) (increase(pinot_server_realtimeRowsConsumed_Count[5m])) > > right now iam using this query > > and rows are always rising in the database but i want to know when they > stop and trigger an alert > > On Wednesday, February 1

[prometheus-users] Re: Hi

2024-02-14 Thread 'Brian Candler' via Prometheus Users
What Prometheus metrics are you collecting? For example, do you have a metric for the total number of rows in the database? Or do you have a metric for the last time a row was inserted? Or some other metric which can identify new rows - if so, what? What is the "previously suggested function"?

Re: [prometheus-users] Re: Alert Query

2024-02-14 Thread 'Brian Candler' via Prometheus Users
ciently well. On Wednesday 14 February 2024 at 03:46:33 UTC sri L wrote: > Thanks Brian Candler. > > I am thinking of combining two conditions. > > ((kube_pod_status_ready{condition="true"} == 0 and > max_over_time(kube_pod_status_ready{condition="true"}[10

[prometheus-users] Re: Alert Query

2024-02-13 Thread 'Brian Candler' via Prometheus Users
I guess it goes through non-ready states while it's starting up. A simple approach is to put "for: 3m" on the alert so that it doesn't fire an alert until it has been in the down state for 3 minutes. Another approach would be: kube_pod_status_ready{condition="true"} == 0 and

Re: [prometheus-users] snmp_exporter 0.25.0 + and prometheus 2.49.1 with "%" in label value - format issue

2024-02-12 Thread 'Brian Candler' via Prometheus Users
Are you running either the Prometheus server or the web browser under Windows? STATUS_BREAKPOINT appears here: https://pkg.go.dev/golang.org/x/sys@v0.17.0/windows#pkg-constants On Monday 12 February 2024 at 15:58:44 UTC Ben Kochie wrote: > On Mon, Feb 12, 2024, 16:39 Alexander Wilke wrote: >

[prometheus-users] Re: PromQL filter based on current date

2024-02-12 Thread 'Brian Candler' via Prometheus Users
The only ways I know are to use the Prometheus API and set the evaluation time, or to use the @ timestamp PromQL modifier. But in either case

[prometheus-users] Re: Prometheus Federation: cannot unmarshal number into Go struct field

2024-02-05 Thread 'Brian Candler' via Prometheus Users
Note that > "--web.enable-lifecycle" is included in the docker compose YAML file. Any > thoughts? > On Sunday, February 4, 2024 at 12:13:14 AM UTC+8 Brian Candler wrote: > >> Can you show the content of your /etc/prometheus/federate_sd.json file? >> The error s

Re: [prometheus-users] Prometheus alert evaluation, are they instant queries?

2024-02-03 Thread 'Brian Candler' via Prometheus Users
Even without a subquery, a rule can include a range vector expression and then reduce it to an instant vector, e.g. expr: avg_over_time(snmp_scrape_duration_seconds[5m]) >= 3 On Saturday 3 February 2024 at 16:04:56 UTC Ben Kochie wrote: > All rule evaluations are instant queries. You do all

[prometheus-users] Re: Prometheus Federation: cannot unmarshal number into Go struct field

2024-02-03 Thread 'Brian Candler' via Prometheus Users
Can you show the content of your /etc/prometheus/federate_sd.json file? The error suggests to me that you are putting a number where you need a string, for example {"labels": {"foo": 123}} where it should be {"labels": {"foo": "123"}} On Saturday 3 February 2024 at 16:03:02 UTC Edwin

[prometheus-users] Re: Prometheus target disappears from Grafana metrics when it is down.

2024-01-31 Thread 'Brian Candler' via Prometheus Users
Questions about Grafana would be best asked to the Grafana Community: https://community.grafana.com/ On Wednesday 31 January 2024 at 14:48:07 UTC donna_u...@comcast.net wrote: > I have a dashboard in Grafana with a Prometheus data source. When target > goes down it disappears from the Grafana

[prometheus-users] Re: Actual alert repeat_interval = group_interval + repeat_interval ?

2024-01-30 Thread 'Brian Candler' via Prometheus Users
Not wanting to state the obvious, but have you tried group_interval: 1h repeat_interval: 3h ? On Tuesday 30 January 2024 at 18:35:46 UTC Puneet Singh wrote: > Hi All, > I am facing an issue with the latest version of Alert manager. > I have a group_interval which is a perfect divisor of

[prometheus-users] Re: How to monitor UP and downtime in Prometheus

2024-01-29 Thread 'Brian Candler' via Prometheus Users
This is a duplicate of https://groups.google.com/g/prometheus-users/c/f5aM1n7aPY8 - please don't keep posting the same question. You need a separate piece of software to create your dashboard, the most popular of which is Grafana. For any questions about Grafana, please go to

Re: [prometheus-users] Binary operations between range vectors and scalars

2024-01-29 Thread 'Brian Candler' via Prometheus Users
On Monday 29 January 2024 at 15:24:26 UTC Chris Siebenmann wrote: For instance, if you do delta(metric[1h] > 0), does delta() extrapolate using the timestamps of the first and last time series in the original range vector or the filtered one? I would expect it to use the timestamps of the

[prometheus-users] Re: Prometheus jobs are not showing in prometheus UI

2024-01-29 Thread 'Brian Candler' via Prometheus Users
ing different jobs instead of a single job. > > [image: image.png] > > > Thanks, > Venkatraman N > > On Thursday, January 25, 2024 at 7:13:23 PM UTC+5:30 Brian Candler wrote: > >> Sorry, I don't know what you mean by "not showing all the jobs". You >&g

[prometheus-users] Re: Prometheus Authentication

2024-01-29 Thread 'Brian Candler' via Prometheus Users
rite ? > > > Le lundi 29 janvier 2024 à 12:24:32 UTC+1, Brian Candler a écrit : > >> Using --web.config-file you can make Prometheus require HTTP Basic >> Authentication (basic_auth_users) or TLS client certificate >> authentication (client_auth_type, client_ca_file,

[prometheus-users] Re: Prometheus Authentication

2024-01-29 Thread 'Brian Candler' via Prometheus Users
Using --web.config-file you can make Prometheus require HTTP Basic Authentication (basic_auth_users) or TLS client certificate authentication (client_auth_type, client_ca_file, client_allowed_sans). See:

[prometheus-users] Binary operations between range vectors and scalars

2024-01-28 Thread 'Brian Candler' via Prometheus Users
I don't know if this has been proposed before, so I'd like to raise it here before taking it to github or prometheus-developers. There are cases where binary operators could act between range vectors and scalars, but this is not currently allowed today (except by using subqueries, which end up

[prometheus-users] Re: storage.tsdb.max-block-duration to a lower value completely stops compaction

2024-01-26 Thread 'Brian Candler' via Prometheus Users
xactly do? Let's say my data retention > is 30 days, this parameter by default sets to 3 days. Does that mean every > 3 days the data compaction will be triggered for 30days of data? > On Wednesday, January 24, 2024 at 11:15:09 PM UTC-8 Brian Candler wrote: > >> Since regular

[prometheus-users] Re: Secure BlackBox exporter (Basic Authentication)

2024-01-25 Thread 'Brian Candler' via Prometheus Users
Duplicate of https://groups.google.com/g/prometheus-users/c/TMhocibN14M On Thursday 25 January 2024 at 14:29:17 UTC Cres Portillo wrote: > Hello Everyone, > > Can you add basic authentication to secure Blackbox Exporter like you can > with UI Endpoints? > > SECURING PROMETHEUS API AND UI

[prometheus-users] Re: "Secure" the Blackbox exporter using basic authentication

2024-01-25 Thread 'Brian Candler' via Prometheus Users
Please read the documentation here: https://github.com/prometheus/blackbox_exporter?tab=readme-ov-file#tls-and-basic-authentication > To use TLS and/or basic authentication, you need to pass a configuration file using the --web.config.file parameter. The format of the file is described in the

[prometheus-users] Re: Prometheus jobs are not showing in prometheus UI

2024-01-25 Thread 'Brian Candler' via Prometheus Users
Sorry, I don't know what you mean by "not showing all the jobs". You have only shown a small portion of the targets page. Are you saying that job blackbox-fundconnect_retail does not appear there, but blackbox-fundconnect_retail_03 does? Is it possible that blackbox-fundconnect_retail has

[prometheus-users] Re: storage.tsdb.max-block-duration to a lower value completely stops compaction

2024-01-24 Thread 'Brian Candler' via Prometheus Users
Since regular blocks are 2h, setting maximum size of compacted blocks to 1h sound unlikely to work. And therefore testing with 1d seems reasonable. Can you provide more details about the scale of your environment, in particular the "head stats" from Status > TSDB Stats in the Prometheus web

[prometheus-users] Re: query to plot graph to monitor endpoints

2024-01-24 Thread 'Brian Candler' via Prometheus Users
I don't think it's a question of "creating a query" unless you've already got this data in Prometheus - and if you have, you need to show what metrics you have before anyone can advise on queries against them. If you're not already collecting the data then that would be the starting point. If

[prometheus-users] Re: Drop Target

2024-01-24 Thread 'Brian Candler' via Prometheus Users
> Now, I want prometheus to read only from job 2,3 and drop 1, do we have a provision to do that in file_sd_config? Yes. Use target relabelling using "drop" or "keep" rules. You will have to match on some label(s) which distinguish job 1 from jobs 2 and 3. Note 1: the text you provided is not

Re: [prometheus-users] snmp exporter & snmpv3

2024-01-20 Thread 'Brian Candler' via Prometheus Users
> If you have a working SNMP.yml thrn Just add this at the top of the File But indented and nested under the "auths" key. > f you use Cisco devices you have to use > AES-128C or AES-256C That is not true. You can use "AES" as normal, or you can use "AES192C" or "AES256C". There is no

Re: [prometheus-users] delta/increase on a counter return wrong value

2024-01-18 Thread 'Brian Candler' via Prometheus Users
If you are not worried too much about what happens if the counter resets during that period, then you can use: (metric - metric offset 15m) >= 0 On Friday 19 January 2024 at 05:26:42 UTC+8 Chris Siebenmann wrote: > > I have a counter and I want to counter the number of occurences on a > >

[prometheus-users] Re: Node_exporter 1.7.0 - http_server_config - Strict-Transport-Security

2024-01-17 Thread 'Brian Candler' via Prometheus Users
The YAML parsing error is simply saying that under "http_server_config", you cannot put "Strict-Transport-Security". The documentation says that the only keys allowed under "http_server_config" are "http2" and "headers". So it needs to be like this: http_server_config: headers:

[prometheus-users] Re: Weird node_exporter network metrics behaviour - NIC problem?

2024-01-16 Thread 'Brian Candler' via Prometheus Users
I would suspect due to how the counters are incremented and the new values published. Suppose in the NIC's API new counter values are published at some odd interval like every 0.9 seconds. Your 15 second scrape will sometimes see the results of 16 increments from the previous counter, and

Re: [prometheus-users] Maximum targets for exporter

2024-01-13 Thread 'Brian Candler' via Prometheus Users
Just to clarify: I picked "4 cores" out of thin air just as an example to work through, same as I picked 15 second scrape interval and 150ms per scrape. On Saturday 13 January 2024 at 09:34:21 UTC Brian Candler wrote: > One reason is you may already have eight 4-core servers

Re: [prometheus-users] Maximum targets for exporter

2024-01-13 Thread 'Brian Candler' via Prometheus Users
One reason is you may already have eight 4-core servers lying around. If it's a VM then of course you can just scale up to the largest instance size available, before you need to go to multiple instnaces. On Saturday 13 January 2024 at 00:20:10 UTC Alexander Wilke wrote: > Hello, > sorry to

[prometheus-users] Re: Maximum targets for exporter

2024-01-12 Thread 'Brian Candler' via Prometheus Users
The http_sd_config refresh is going to be a very tiny part of the resource utilisation of Prometheus, although 15 seconds is quite aggressive. As for the exporters, it depends very much on the scrape interval and the duration of each probe, the type of probe, and number of cores you have. For

[prometheus-users] Re: snmp_exporter 0.25.0 - IF-MIB and CISCO-IF-EXTENSION-MIB

2024-01-11 Thread 'Brian Candler' via Prometheus Users
On Thursday 11 January 2024 at 11:27:15 UTC Alexander Wilke wrote: thank you for that snippet. I could use it to solve my issue: (sysUpTime - on (instance) group_right () ifLastChange) / 100 However I need to find some time and try to better understand how these operations work. Sure. In

[prometheus-users] Re: smokeping_prober - $(target:raw) - help with ":raw" and how to use multiple targets

2024-01-11 Thread 'Brian Candler' via Prometheus Users
This is a question about Grafana and/or the smokeping_exporter Grafana dashboard, not Prometheus. ${target:raw} is a Grafana variable expansion, and the :raw suffix is a format specifier: https://grafana.com/docs/grafana/latest/dashboards/variables/variable-syntax/#variable-syntax

[prometheus-users] Re: Prom Mailing list: Why "lint error 1 duplicate rule(s) found."?

2024-01-11 Thread 'Brian Candler' via Prometheus Users
Those expressions are different. I suspect you have two alerts with the same name. Try "grep alert: xxx-winserver-rules.yml" On Thursday 11 January 2024 at 04:36:59 UTC Jason wrote: > Hi > > Happy 2024 to all > > I get this error with "promtool check config" > > > Checking

  1   2   3   4   5   6   7   8   9   10   >