Re: [prometheus-users] logs for Prometheus and Alertmanager

2020-02-24 Thread Christian Hoffmann
Hi, On 2/24/20 11:02 PM, rs vas wrote: > How to configure Prometheus and Alertmanager write logs into a file? > > I could not find any thing as argument to pass > https://github.com/prometheus/alertmanager/blob/master/cmd/alertmanager/main.go#L186 Those binaries log to stderr, as do many other mo

Re: [prometheus-users] Alertmanager configuration

2020-02-25 Thread Christian Hoffmann
Hi, On 2/25/20 10:42 AM, Karthike Ezhilarasan wrote: > I've recently configured prometheus and alertmanager with a slack > webhook, everything works fine but the notifications I get in slack uses > the same text for firing and resolved making it look a bit odd. > > [FIRING:1] warning@my_serve

Re: [prometheus-users] Re: Upgrade Prometheus Version

2020-02-26 Thread Christian Hoffmann
Hi, Side note: Looks like you are running Prometheus as root. This should not be necessary and based on common security principles such as Least Privilege, you should probably look into changing this. Just creating a user, chowning your data files and starting Prometheus with the new user shou

Re: [prometheus-users] Re: Inserting a label value into e-mail subject

2020-03-08 Thread Christian Hoffmann
On 3/8/20 6:57 AM, Bibin John wrote: > did you get solution for this? if yes, what was the solution? Brian had replied to the initial mail. Do you have any issues with implementing this? If so, try describing your issue with some more details so that someone can help. Kind regards, Christian --

Re: [prometheus-users] Dynamic Email Subject in Alert Manager

2020-03-08 Thread Christian Hoffmann
Hi, (sorry, just found this mail after I replied to your other mail) On 3/7/20 8:49 PM, Bibin John wrote: > I have few alerts configured in promertheus/alertmanager. I want to > dynamically change email subject based on group name. how is it > possible? lets say below is one of the entry. > so in

Re: [prometheus-users] [ALERTMANAGER][ERROR] err="Post : x509: certificate signed by unknown authority"

2020-03-08 Thread Christian Hoffmann
Hi, On 3/7/20 6:01 PM, BDT wrote: > I have a problem to send alerts to slack via webhook. I have a traefik > proxy and alertmanager which run in docker swarm. > So the communication between prometheus and alert is done by docker > network service (alermanager:9093). > > Traefik generates certfica

Re: [prometheus-users] Slower subquery

2020-03-08 Thread Christian Hoffmann
Hi, On 3/5/20 7:07 PM, Vishnu B wrote: > i am using subqueries to get the 95th percentile data for the network > traffic. But it is slower to excute with 30days of interval. > is it a best idea to use recording rules for this, so that it will be > faster? Recording rules are probably not faster p

Re: [prometheus-users] How can I define a different email subject for first notification

2020-03-11 Thread Christian Hoffmann
Hi, On 3/11/20 2:45 PM, eyal trigerman wrote: > I would like to configure a different template for email subject: > one template for the first notification for an alert > other template for repeating notifications for an alert I'm not sure if Alertmanager exposes this state for usage in the templ

Re: [prometheus-users] HTTP API request for querying multiple metrics

2020-03-11 Thread Christian Hoffmann
Hi, On 3/11/20 11:00 PM, Hakim Kahlouche wrote: > Is it possible to fetch time-series for multiple metric names using one > single HTTP API query request, or is there a restriction of one HTTP > request per metric name? > > Let's say I want to query : > > *query=go_memstats_gc_cpu_fraction _and_

Re: [prometheus-users] Best method to attach custom collector to Node-exporter

2020-03-14 Thread Christian Hoffmann
Hi Ankit, On 3/12/20 7:46 AM, Ankit Rohilla wrote: > Hi, I've created a few of my own collectors for Node-exporter. To > build it I have to run the dockerfile again and build my own custom > node-exporter image. Is there any mechanism by which I don't have to > build the node-exporter but instead

Re: [prometheus-users] Re: How to hide auth password writing in alertmanager for alerting

2020-03-14 Thread Christian Hoffmann
On 3/12/20 11:14 AM, mohd wrote: > The one who has direct access to the filesystem of the docker container. > And also want to know at ubuntu level  for filesystem. I fully agree with what Brian said, just want to add another opinion: I commonly see this as a mis- or overinterpretation of securit

Re: [prometheus-users] Need help with setting up process -exporter ,are there any video avaible explaining it.

2020-03-14 Thread Christian Hoffmann
Hi, On 3/13/20 7:55 AM, Pooja Chauhan wrote: > Need help with setting up process -exporter ,are there any video avaible > explaining it. Are you refering to this project? https://github.com/ncabatoff/process-exporter I don't know a video tutorial. But maybe someone can help you if you describe

Re: [prometheus-users] Re: Print Breaching Threshold in the Description of Alert

2020-03-14 Thread Christian Hoffmann
On 3/13/20 9:43 AM, Brian Candler wrote: > When you write an alert rule expression like this: > > expr: foo > 200 > > it's just a single promQL query.  Starting with the universe of > timeseries available, prometheus filters them down to just those where > the metric name is "foo" and the value i

Re: [prometheus-users] Enabling CAdvisor's default disabled metrics

2020-03-14 Thread Christian Hoffmann
Hi, On 3/13/20 11:20 AM, Ankita Khot wrote: > How to enable the metrics that are by default disable for the cAdvisor > like container_network_tcp_usage_total through yml file ?  I don't know cadvisor details at all, but it looks like those metrics are disabled by default: https://github.com/goog

Re: [prometheus-users] Getting the dynamic threshold to print in the alert.

2020-03-14 Thread Christian Hoffmann
Hi, On 3/13/20 1:55 PM, Yagyansh S. Kumar wrote: > Hi. I am using Number of Cores of a server as the threshold for CPU > Load. It is working fine but I want to print the number of cores also in > the alert. > > Configured Alert: > >   - alert: HighCpuLoad >     expr: (node_load15 > count without

Re: [prometheus-users] Aggregation Metrics - Found duplicate series for the match group (How delete a label before join metrics ?)

2020-03-14 Thread Christian Hoffmann
Hi, On 3/10/20 11:33 PM, BDT wrote: > Today I have a problem about my rules expression because I try to join > metrics together to get the name of the swam node in it. To simplify: This is about "joining" some real-world metric with a meta metric to get additional labels and the problem being th

Re: [prometheus-users] "All" value problem across dashboards in Grafana while using Prometheus as Datasource.

2020-03-14 Thread Christian Hoffmann
On 3/13/20 9:32 PM, Yagyansh S. Kumar wrote: > Hi. In one of the dashboards(Say D-1), I have created a variable called > cluster and it has "All" option enabled. I have the same variable in > another dashboard(Say D-2) and there are links of D-2 in D-1 at > different places according to the value o

Re: [prometheus-users] How can I define a different email subject for first notification

2020-03-14 Thread Christian Hoffmann
and ask again if you hit a specific showstopper. Maybe other people can also chime in. https://prometheus.io/docs/alerting/notifications/#alert https://golang.org/pkg/text/template/ Kind regards, Christian > בתאריך יום רביעי, 11 במרץ 2020 בשעה 23:01:54 UTC+2, מאת Christian Hoffmann: > &

Re: [prometheus-users] "All" value problem across dashboards in Grafana while using Prometheus as Datasource.

2020-03-14 Thread Christian Hoffmann
On 3/14/20 1:44 PM, Yagyansh S. Kumar wrote: > Grafana's community is dead. Hrm, sad :( > Might there be some escaping problem in the outgoing URLs? -> Um, how > exactly do I conclude this? When you select "All" in the target dashboard (D-2?), the URL of that page also changes, right? Does this wo

Re: [prometheus-users] Re: Print Breaching Threshold in the Description of Alert

2020-03-14 Thread Christian Hoffmann
ic number, e.g. 200, or a the value of timeseries such as foo_threshold) into the alert description or the summary. Kind regards, Christian > On Sat, Mar 14, 2020 at 4:56 PM Christian Hoffmann > mailto:m...@hoffmann-christian.info>> wrote: > > On 3/13/20 9:43 AM, Brian Cand

Re: [prometheus-users] Re: Print Breaching Threshold in the Description of Alert

2020-03-14 Thread Christian Hoffmann
Hi, On 3/14/20 9:13 PM, Aditya Nageswar wrote: > I want the threshold to be as a variable, I dont want to use it in > static way. > > FYI, Similar to  {{ $value }} I would like to know if there is something > like {{ $threshold }} to print my alert threshold. I don't think there is some special

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Christian Hoffmann
Hi, On 3/14/20 4:32 PM, Yagyansh S. Kumar wrote: > Hi. In my prometheus.yml file all the targets necessarily have 2 labels > viz "cluster" and "node". [...] > > Now, for a particular node value(Let it be A which belongs to X cluster) > I want this threshold to be 2*NumberofCores. How can I do thi

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Christian Hoffmann
On 3/14/20 5:06 PM, Yagyansh S. Kumar wrote: > Can you explain in a little detail please? I'll try to walk through your example in several steps: ## Step 1 Your initial expression was this: (node_load15 > count without (cpu, mode) (node_cpu_seconds_total{mode="system"})) * on(instance) group_left

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Christian Hoffmann
On 3/14/20 10:01 PM, Yagyansh S. Kumar wrote: > Also, since you mentioned hanging network filesystem, is there any > way/logic to find out whether my NFS mount is hanged on a machine or > not? I have busted my ass on getting this result, must have tried more > than 50 things but still have nothing

Re: [prometheus-users] Giving different Dynamic Thresholds for the same alert.

2020-03-14 Thread Christian Hoffmann
Hi, On 3/14/20 10:35 PM, Yagyansh S. Kumar wrote: > Yes, I did experiment with node_filesystem_device_error earlier based on > Ben's suggestion on my earlier thread, but not extensively. Also, I > didn't know it is Statfs success. With what I have read so far on this > matter, statfs is the best w

Re: [prometheus-users] Re: Using Regex in the Annotations of Alert.

2020-03-15 Thread Christian Hoffmann
Hi, On 3/15/20 10:07 AM, Yagyansh S. Kumar wrote: > Thanks for the quick response. I appreciate your advice and I know that > instance label shouldn't contain the port number and that should be the > ideal way forward, but now my setup is huge. I'll have to change things > over all my dashboards a

Re: [prometheus-users] Re: Using Regex in the Annotations of Alert.

2020-03-15 Thread Christian Hoffmann
On 3/15/20 3:40 PM, Yagyansh S. Kumar wrote: > Thanks a lot, Christian. Will try them out and report back. > Also, according to you will the Step 3 add any significant overhead? I > mean will it cause any kind of slowness? I don't think it would cause slowness per-se as the cardinality will be in

Re: [prometheus-users] Is up metric reliable?

2020-03-16 Thread Christian Hoffmann
Hi Steve, On 3/16/20 9:09 PM, Steve wrote: > I have been playing with the up metric to see if it is reliable. > > So far, I conclude it is not. Do you have the same results? I guess I would have the same (technical) results, but my conclusion would be that "up" works as expected and is reliable.

Re: [prometheus-users] Re: what is the safe/best way to restart Prometheus service quickly without any errors

2020-03-16 Thread Christian Hoffmann
On 3/16/20 7:33 PM, rs vas wrote: > Thanks all for all the options where we can easily reload the updated > configurations. > > That does mean, we should not restart the service, if we restart the > issues I have mentioned in the email are expected? You should also ensure that Prometheus shuts do

Re: [prometheus-users] Calculating Availability SLA over multiple VMs

2020-03-16 Thread Christian Hoffmann
Hi, On 3/16/20 9:21 PM, Debashish Ghosh wrote: >   I am currently using spring's actuator/micrometer to spit out metrics > that are scraped by prometheus. > The framework generates a metric called *process_uptime_seconds* which > is the number of seconds my app is running in a VM . I have *2 VMs*

Re: [prometheus-users] scrape metrics for applications sitting behind a load balancer

2020-03-17 Thread Christian Hoffmann
Hi, On 3/17/20 10:02 AM, Eswar Rao Bevara wrote: > I have a spring boot application running on two vms and sitting behind a > load balancer. when I try to scrape using the application url/insights , > each time the call goes to one of the two vms. I want to have the > metrics scrapped from each vm

Re: [prometheus-users] Getting the Age of alerts.

2020-03-17 Thread Christian Hoffmann
Hi, On 3/17/20 10:33 AM, Yagyansh S. Kumar wrote: > Hi. I want to extract the age of the alerts(i.e from when the alert is > Critical or Warning or even Resolved). Is this possible? Where are you looking for that? Within Prometheus? You could try deriving that from the ALERTS meta metric which g

Re: [prometheus-users] Saving alerts in database and making dashboards out of it

2020-03-20 Thread Christian Hoffmann
Hi, On 3/20/20 10:02 PM, mohd wrote: > I am running prometheus setup on docker. > I have set some alerts rules and we are getting alerts - email and slack. > > 1. Want to set some conditions in Alertmanager conf so that unwanted > alerts should get silence - duplicate alerts etc. I could configur

Re: [prometheus-users] Writing an email with Alert manager

2020-03-24 Thread Christian Hoffmann
Hi, On 3/24/20 10:25 AM, Big Noob wrote: > Hello everyone, > I have an SMTP for testing, I see that AlertManager is creating a > connection with it (I have done all the necessary configuration), I want > to know how can I create a custom mailing. > Thanks for your time. Have a look at the alertma

Re: [prometheus-users] AlertManager firing duplicate alerts

2020-03-25 Thread Christian Hoffmann
Hi, you seem to be using external_labels without alert_relabel_configs to drop this label from your alerts again. Therefore, your alerts will have different labels and will not be de-duplicated. See this blog post: https://www.robustperception.io/high-availability-prometheus-alerting-and-notifica

Re: [prometheus-users] Re: Data retention policy

2020-03-26 Thread Christian Hoffmann
Hi, not sure if it's this what you are hitting, but the following may help: The retention time defines how long Prometheus is expected to keep history. This does not imply that data older than this interval will vanish immediately. In other words: Once the interval has passed, Prometheus is free

Re: [prometheus-users] Prometheus - how to get systemd service level open/max fds limits?

2020-03-27 Thread Christian Hoffmann
Hi, On 3/27/20 5:30 PM, rs vas wrote: > Thanks Brian for the input! Yes, tried systemd collector but did not see > the limits exposed through that collector. Pretty sure the ncabatoff's process_exporter exposes such metrics: https://github.com/ncabatoff/process-exporter/ However, there is no dire

Re: [prometheus-users] Node Exporter & Systemd

2020-03-28 Thread Christian Hoffmann
Hi Jonathan, On 3/28/20 2:20 AM, Jonathan Sloan wrote: > I seem to be having some troubling getting NE to ignore a few devices > 'sr0/sda[1-3]. I've tried multiple things to try and get this work, > exact for manually telling it to ignore sr0|sda1|sda2|sda3 etc. When > using the below command. >  

Re: [prometheus-users] Re: Documentation missing: Howto set exporters port?

2020-03-30 Thread Christian Hoffmann
Hi, the command line help contains the hint for the relevant flag: $ blackbox_exporter --help |& grep listen --web.listen-address=":9115" The address to listen on for HTTP requests. Kind regards, Christian On 3/30/20 1:31 PM, Boris Kairat wrote: > Sorry for that - I did not remem

Re: [prometheus-users] PROMETHUS FAILURE

2020-03-30 Thread Christian Hoffmann
Hi, first, your mail does not seem to be readable in my mail client (Thunderbird) -- it's black text on black background. You may want to adjust this to reach a larger audience. :) I don't see any obvious errors in your screenshot. I suggest looking at the full log output (journalctl -u prometheu

Re: [prometheus-users] how can I monitor custom systemd services?

2020-03-31 Thread Christian Hoffmann
Hi, On 3/31/20 10:56 PM, Joey Jojo wrote: > I have a node exporter setup and installed in a Linux Ubuntu server and > everything works fine. I've had to setup a few different custom SystemD > services located in /etc/systemd/system/ and I'd like to know how I can > whitelist them into the node_ex

Re: [prometheus-users] Getting status of services in CentOS 6.

2020-03-31 Thread Christian Hoffmann
Hi, On 3/31/20 12:22 PM, Yagyansh S. Kumar wrote: > Hi. We have systemd collector in Prometheus that enables us to get the > metrics - status of services from systemd. This works well in CentOS 7. > But CentOS 6 does not have systemd, hence, the collector keeps throwing > error. Is there any alter

Re: [prometheus-users] probe_ssl_earliest_cert_expiry showing wrong expiry date

2020-03-31 Thread Christian Hoffmann
Hi Amjad, blackbox_exporter's probe_ssl_earliest_cert_expiry outputs exactly what the name says -- the *earliest* cert expiry, i.e. when this certificate will become invalid as seen from a user/browser/client validating this cert. This is not necessarily identical to the value of the end of validi

Re: [prometheus-users] how can I monitor custom systemd services?

2020-04-01 Thread Christian Hoffmann
ose >custom services. >Any idea why it doesn't show up in Prometheus when I query it? > > >On Tuesday, March 31, 2020 at 5:15:46 PM UTC-4, Christian Hoffmann wrote: >> >> Hi, >> >> >> On 3/31/20 10:56 PM, Joey Jojo wrote: >> > I have a no

Re: [prometheus-users] Node-Exporter version compatibility

2020-04-03 Thread Christian Hoffmann
Hi, On 4/3/20 5:49 PM, Кирилл Новиков wrote: > Are new versions of Node-Exporter >  for Prometheus compatible > to older versions? Is someone automatically update this exporter without > any check's?  Technically, node_exporter is still pre-1.0 (1.0 is

Re: [prometheus-users] Re: Data getting corrpted - missing meta.json

2020-04-03 Thread Christian Hoffmann
Hi Robin, On 4/3/20 8:36 PM, Robin Pharaoh wrote: > Was this ever resolved? We are hitting a the exact same issue right now. > > We have a single instance of prometheus > We are using Azure File Share with a volume claim I don't have any Azure knowledge, but I assume this is a SMB-based mount? T

Re: [prometheus-users] AlertManager firing duplicate alerts

2020-04-03 Thread Christian Hoffmann
Hi Sunil, On 4/3/20 6:00 PM, Sagar wrote: > For more information , there are more global labels (such as > environment, cluster , etc) apart from replica . > > Do I need to remove other global variables and maintain only one label > as replica . You need to drop those labels which are used to d

Re: [prometheus-users] AlertManager firing duplicate alerts

2020-04-04 Thread Christian Hoffmann
On 4/4/20 6:42 PM, Sagar wrote: >      relabel_configs: >      - action: drop >        source_labels: [replica] >        regex: (.*) [...] > > For other server, replica is secondary  > I want to drop only label in alert manager , but it drops entire alert > in alert_manager.  Ah, now I see the pr

Re: [prometheus-users] AlertManager Silence Regex

2020-04-04 Thread Christian Hoffmann
Hi, On 4/4/20 6:54 PM, Adso Castro wrote: > I'm trying to create a silence using regex from the AlertManager UI and > I want to silence alarms from deployments (from my Kubernetes cluster) > like below: > > *new-batch-stream-**g1343* > *super-batch-pipeline-g1273* > * > * > The match would be the

Re: [prometheus-users] Re: Making prometheus configurable based on instance type

2020-04-10 Thread Christian Hoffmann
On 4/10/20 9:35 AM, adi garg wrote: > Hey guys, please help. Do you need any other specifications? Please be patient. Lots of Prometheus development happens in people's free time. Support is also provided by volunteers. Besides, Brian Candler even replied to your question before you sent your remi

Re: [prometheus-users] Trying to identify PODs with NO LIMITS

2020-04-11 Thread Christian Hoffmann
Hi Carlos, On 4/12/20 4:06 AM, Carlos Mercado wrote: > What I was trying is first to get all the pods running in the cluster > with query:  > > sum  by (pod) (kube_pod_info) > > Then I am getting the list of pods with limits set: > > sum by (pod)(kube_pod_container_resource_limits{resource="cpu

Re: [prometheus-users] [Promql] How do I know if a value hasn't changed in a while?

2020-04-11 Thread Christian Hoffmann
Hi, On 4/12/20 1:05 AM, Adso Castro wrote: > That's the question, there's a metric: *jobs_sent 1728* > I want to know if that value hasn't changed in 1h for example. > How do I do that? changes(jobs_sent[1h]) == 0 could do the trick. In many cases it might be easier to add an additional metric w

Re: [prometheus-users] How Alertmanager can receive alerts from source other than prometheus server?

2020-04-11 Thread Christian Hoffmann
Hi, On 4/12/20 8:32 AM, Adarsh Kumar Pandey wrote: > I just wanted to know that is it possible to get alerts generated from > sources other than Prometheus alerting rules ? can you please point me > to the documentation of that. In general, this is possible, yes, see here: https://prometheus.io/d

Re: [prometheus-users] Re: inhibit all severity but how??

2020-04-15 Thread Christian Hoffmann
Hi, you can just create a silence in the Alertmanager UI which matches label alertname and (regexp) value .*. This can also be done programatically via the HTTP api (or amtool, which makes use of it). In fact, this is how we tied our pre-existing maintainance mode logic to Alertmanager. Kind re

Re: [prometheus-users] probe_ssl_earliest_cert_expiry showing wrong expiry date

2020-04-23 Thread Christian Hoffmann
Hi, On 4/23/20 11:37 AM, James Eduard Andaya wrote: > Does this resolved? same issue i am facing with it. What do you mean by resolved? In this specific case, a certificate within the chain expired earlier than the "last" certificate (i.e. domain certificate). Therefore, blackbox_exporter report

Re: [prometheus-users] suggest document for alert manager

2020-04-25 Thread Christian Hoffmann
Hi, On 4/25/20 3:30 AM, sunil sagar wrote: > I am working on alert-manager , very new to this .  > I want to write query for if data is not received on certain topic/feed > within one hour , it should send mail to a particular group , such kind > of alerting mechanism I have to implement .  This

Re: [prometheus-users] Prometheus returns 502 error

2020-04-27 Thread Christian Hoffmann
Hi, On 4/27/20 3:40 PM, Dmitry wrote: > When I try to use snmp_exporter > with a non-standard > configuration with my standard Prometheus, the config is not loaded and > I get 502 error. > The project developer directed me to you, maybe someone had a s

Re: [prometheus-users] Best way to calculate the increase in the counter value over a time period?

2020-04-30 Thread Christian Hoffmann
Hi, On 5/1/20 3:57 AM, O wrote: > I am using increase() function to calculate the increase in counter > value over a time period. These values are being displayed in a table in > Grafana. But, for a duration of 15 days or so, it errors out because the > number of samples that are being pulled is t

Re: [prometheus-users] Promethous not gathering metrics until added to a existing docker bridge network.

2020-04-30 Thread Christian Hoffmann
On 4/30/20 4:21 AM, H T wrote: > Is it mandatory to add Prometheus docker container to an existing bridge > network for extracting metrics ? > I Have an existing docker bridge network with few containers and i am > trying to extract metrics from them. But prometheus only works when i > add it to th

Re: [prometheus-users] Some questions about `stale` series data

2020-05-18 Thread Christian Hoffmann
Hi, On 5/18/20 12:38 PM, zichen chuh wrote: > https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ : > >> For pending and firing alerts, Prometheus also stores synthetic time > series of the form ALERTS{alertname="", > alertstate="pending|firing", }. The sample > value is se

Re: [prometheus-users] Monitor specific application process in Linux

2020-05-19 Thread Christian Hoffmann
Hi Juan, On 5/19/20 9:11 AM, Juan Rosero wrote: > I've been reading a lot on different sites and this User Group as well, > but have not come up with a clear answer. I need to monitor a specific > application process in Linux and verify if it's running and I've been > reading about *--collector.pro

Re: [prometheus-users] derive alert severity from other labels

2020-05-19 Thread Christian Hoffmann
Hi Roland, On 5/19/20 10:25 AM, Roland Mieslinger wrote: > we are using the same set of alert rules for both, our production and qa > environment, with the severity label set to a value based on what is > appropriate for production. > As a consequence, alert severity is too high for most alerts in

Re: [prometheus-users] server returned HTTP status 500 Internal Server Error

2020-05-19 Thread Christian Hoffmann
On 5/19/20 1:55 PM, Valliappan RM wrote: > Trying to monitor fortigate Firewall > Getting this error -server returned HTTP status 500 Internal Server Error > Trying based on this > https://grafana.com/grafana/dashboards/7567 I guess this error message comes from Prometheus' targets UI? Have you t

Re: [prometheus-users] How to generate a GUID or current time for my annotations in Prometheus alerting rule templates

2020-05-20 Thread Christian Hoffmann
Hi, On 5/20/20 12:26 PM, zichen chuh wrote: > {{ with query "time()" }}{{ . | first | value }}{{ end}} > > This expression is exactly what I need.  > And I wanna learn how to assemble such an expression. After searching > through this document template  , >

Re: [prometheus-users] Why does prometheus automatically set port 80 and 443 in the URL, when you set Scheme?

2020-06-01 Thread Christian Hoffmann
Hi, On 5/29/20 11:11 AM, Donato Bagarozza wrote: > I tried it with relabel_config, but i can`t delete the port. It should be possible to do. What have you tried exactly? You should be able to match on "([^:]+):\d+" and use ${1} to reference the part without the port. Just to confirm: You want to

Re: [prometheus-users] Exporter to monitor Oracle installed in AIX

2020-06-01 Thread Christian Hoffmann
Hi, On 5/29/20 1:09 PM, Nageswara Rao wrote: > We could not find an exporter to monitor Oracle instance installed in > AIX-7.2. Any help is much appreciated. My experience is that the resources regarding commercial UNIX support such as AIX are rather sparse. However, as you are talking about mon

Re: [prometheus-users] Is there any data loss when we upgrade from 2.1.0 to 2.18.0?

2020-06-01 Thread Christian Hoffmann
Hi, On 5/31/20 9:51 AM, 'Nandeesh Bharamagoudra' via Prometheus Users wrote: > Need a quick info, we are going to upgrade Prometheus in production from > v2.1.0 to v2.18.0. My gut feeling says: It should work. However, I am pretty sure that there is at least one (if not more) change which is for

Re: [prometheus-users] Federate data with prometheus

2020-06-01 Thread Christian Hoffmann
Hi, On 5/28/20 5:45 PM, Jacco Steur wrote: > In my HA prometheus instance I can then scrape the /federate uri's of my > various kubernetes clusters. > This configurations looks a bit like this: (From the manual) > > scrape_configs: >   - job_name: 'federate' >     scrape_interval: 15s >     honor

Re: [prometheus-users] file_sd_configs and renaming mertics

2020-06-04 Thread Christian Hoffmann
Hi, On 6/4/20 8:03 PM, Hossman12 wrote: > I'm using file_sd_configs on my remote prometheus instances nad they are > being federated into a central location.   The federation job currently > filters out all the "go" metrics.  I have an application team that wants > to rename the "go" metrics to sa

Re: [prometheus-users] Re: (Alertmanager) Ignore instance label to prevent same alert multiple times

2020-06-04 Thread Christian Hoffmann
Hi, On 6/4/20 7:48 PM, 'ping...@hioscar.com' via Prometheus Users wrote: > We get the same alert multiple times in the same email, because the > monitor label (prometheus instance) being different for our simple > replicated setup. Would be nice to be able to ignore certain labels so > that alert

Re: [prometheus-users] Alertmanager: how to split alerts in email notifications

2020-06-04 Thread Christian Hoffmann
Hi, On 6/5/20 8:20 AM, anya...@gmail.com wrote: > Thank you, Harald, but it's a customer's requirement. 250 alerts are at > the test process, after testing they should be much less. The main > question now: how to split one email to one alert per email? Shouldn't a simple group_by: ['...'] be suf

Re: [prometheus-users] Alertmanager: how to split alerts in email notifications

2020-06-05 Thread Christian Hoffmann
Hi, On 6/5/20 9:23 AM, anya...@gmail.com wrote: > Hi, Christian! > Thank you for your answer, but could you a little bit explain your > point? I don't understand it. I used 'group_by' in my config. If I > understood you correctly, you offered to comment the 'group_by' block, > don't you? You set

Re: [prometheus-users] How to generate a GUID or current time for my annotations in Prometheus alerting rule templates

2020-06-05 Thread Christian Hoffmann
Hi, On 6/5/20 9:48 AM, zichen chuh wrote: > I still wanna generate a GUID. After going through these documents, > couldn't find any. > Wonder u have a clue. Haven't tried it, but you could check if you can access the alert's fingerprint and if this is what you want. This seems to be possible in

Re: [prometheus-users] Prometheus dashboard loads however in target section promtheus isn't UP.

2020-06-05 Thread Christian Hoffmann
Hi, On 6/5/20 11:45 AM, Isabel Noronha wrote: > I have been working on prometheus for the past 3 months. > This is the first time where I have come across such a problem where > prometheus is running inside a container. > However, when I check the dashboard it says"context deadline > exceeded".It

Re: [prometheus-users] metrics by process

2020-06-09 Thread Christian Hoffmann
Hi, On 6/9/20 3:39 PM, Laura Fernández Becerra wrote: > I would like to know if is it possible by using Prometheus to record the > CPU, RAM, net I/O of every process in a system for an interval of time. > Thanks in advance! Something like that might be (almost) possible with a process_exporter su

Re: [prometheus-users] Different rules for different targets

2020-06-09 Thread Christian Hoffmann
Hi, On 6/9/20 4:32 PM, Frederic Arnould wrote: > I would like to do different rules for different targets Are you talking about alert and/or recording rules in rule_files? These rules are always global. However, each rule can be restricted to specific series only. Such restrictions can be made up

Re: [prometheus-users] Different rules for different targets

2020-06-09 Thread Christian Hoffmann
ob_name: production >   static_configs: >   - targets: >      *toto1:port* > global: >   evaluation_interval: 60s >   scrape_interval: 60s > rule_files: > *- rules.d/disk.yml* > scrape_configs: > - job_name: prometheus >   static_configs: >   - targets: >    

Re: [prometheus-users] Graphing target status

2020-06-09 Thread Christian Hoffmann
Hi, shouldn't this just be the data from the simple query for the "up" metric? It would include the job name (e.g. job="node") as well as the availability status (0 vs. 1). You can handle the translation of the value 0/1 to Offline/Online within Grafana, I think. Kind regards, Christian On 6/9/

Re: [prometheus-users] How to specify lookback using Prometheus Operator

2020-06-11 Thread Christian Hoffmann
Hi, On 6/9/20 12:16 AM, Shallow Purple wrote: > We have certain metrics which are aggregated and  for which datapoint > comes in every 10 min interval or 1 hour interval.  > So, I ask for data for right now, there might not be any but few minutes > later, one datapoint will arrive.  Hence, we want

Re: [prometheus-users] generatorURL field in AlertManager

2020-06-12 Thread Christian Hoffmann
Hi, sounds like you are looking for this command line parameter? --web.external-url= The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL has a

Re: [prometheus-users] vector cannot contain metrics with the same labelset

2020-06-12 Thread Christian Hoffmann
Hi, On 6/12/20 1:07 AM, li yun wrote: > I execute the following query > | > label_replace(sum_over_time(augmento{coins="xxx"}[20m]),"instance","","","") > | > > Got the following error: > > vector cannot contain metrics with the same labelset > > What am I doing wrong? thank you very much for

Re: [prometheus-users] Scripts for Windows Metrics - Required

2020-06-12 Thread Christian Hoffmann
Hi, you are probably looking for the windows_exporter, which allows you to gather the necessary metrics: https://github.com/prometheus-community/windows_exporter Once this is set up and scraped by Prometheus, you can set up alert rules: https://prometheus.io/docs/prometheus/latest/configuration

Re: [prometheus-users] Pushing 3rd party VNF metrics into node exporter

2020-06-16 Thread Christian Hoffmann
Hi, On 6/16/20 7:42 AM, Divya Rajasekaran wrote: >    Is there any way to push 3rd party VNF metric into the node exporter Are you referring to Virtual Network Functions as described here? https://tools.ietf.org/id/draft-rosa-bmwg-vnfbench-01.html I don't think node_exporter has support for this

Re: [prometheus-users] Windows Metrics

2020-06-16 Thread Christian Hoffmann
Hi, this sounds very similar to your previous enquiry: https://groups.google.com/forum/#!msg/prometheus-users/-BZ1fpdr_Mk/Q1n1YVJvAgAJ Are you looking for something like this? https://grafana.com/grafana/dashboards/12422 Kind regards, Christian On 6/15/20 10:44 PM, Freddy Mack wrote: > Can I h

Re: [prometheus-users] Breaking down of one prometheus.yml file?

2020-06-16 Thread Christian Hoffmann
Hi, On 6/16/20 9:35 AM, pratyush ranjan wrote: > I am using Prometheus for our monitoring and I have a lot of configs > (our prometheus.yml main config file is 8000+ lines long). > > I would like to divide this out into logical groupings so that it > becomes much readable. I came to know that Pro

Re: [prometheus-users] Re: Node Exporter

2020-06-17 Thread Christian Hoffmann
Hi, On 6/17/20 4:44 PM, Yasmine Mbarek wrote: > I have a tiny problem with node exporter. If you can help me I will be > very grateful . > So my node exporter implemented in my parc of machines , for some > machine it works fine and returns all metrics values but in other > machine it returns ever

Re: [prometheus-users] Windows Metrics

2020-06-17 Thread Christian Hoffmann
Hi, On 6/16/20 4:10 PM, Freddy Mack wrote: > Hello Chris, > > I am looking for the Metrics to show in Grafana like the below example > which I have executed in Linux, Want the same MEtrics for windows: > For example I have for Memory > 100 * (windows_os_physical_memory_free_bytes{instance=~"$inst

Re: [prometheus-users] Prometheus AlertManager Alert Grouping

2020-06-18 Thread Christian Hoffmann
On 6/18/20 3:00 AM, Zhang Zhao wrote: > Hi, I have a question for alert grouping in AlertManager. I integrated > Prometheus Alerts to ServiceNow via Webhook.  I see the events were > captured on ServiceNow side as below. However, inside each of events > below, there were multiple alerts included. I

Re: [prometheus-users] Error while writing an alert rule in alert.yml file

2020-06-19 Thread Christian Hoffmann
Hi, On 6/19/20 4:20 PM, Isabel Noronha wrote: >  This is just code snippet of my alerts.yml file > - alert: ContainerKilled >     expr: IF absent(((time() - container_last_seen{name=".+"}) < 5)) >     for: 15s >     labels: >       severity: warning >     annotations: >       summary: "Container k

Re: [prometheus-users] Alertmanger "Not Grouped" alerts

2020-06-19 Thread Christian Hoffmann
Hi, On 6/19/20 8:34 AM, Romenyrr wrote: > I've come across this issue where I'm grouping by 'alertname' but > nothing is being grouped except for one odd group. When I click on the > group tab and click on "Enable custom grouping" that seems to sort > everything by 'alertname'.  > > This grouping

Re: [prometheus-users] Prometheus javamelody

2020-06-19 Thread Christian Hoffmann
Hi, On 6/16/20 12:09 PM, Shivam Soni wrote: > I got an issue in Prometheus configure to java melody. > can anyone solve this? > plz check URL: > > https://github.com/prometheus/prometheus/issues/7404 I'm seeing a small but maybe relevant difference between your Prometheus config and your test ca

Re: [prometheus-users] Prometheus 2.18 incompatibility with 2.04

2020-06-20 Thread Christian Hoffmann
On 6/20/20 5:31 PM, Johny wrote: > If it is non-compliant endpoint, the problem should appear in both > versions, isn't it? It is effecting more than one series. The set up is > in corporate org so I cannot expose end points publicly. Maybe you can build a small reproducer: Grab your metrics via cu

Re: [prometheus-users] Alert handling using alertmanager even handler .

2020-06-29 Thread Christian Hoffmann
Hi, On 6/28/20 3:23 PM, Pooja Chauhan wrote: > I want to handle alerts like jenkins process down using alertmanager > even handler. But the document is not helping me with how to configure > it . Really need help on from where to download this > :https://github.com/jjneely/am-event-handler  and ho

Re: [prometheus-users] Prometheus Query/alerting rules related to NFS Detach mount using node-exporter mountstats and nfs collector does not work

2020-06-29 Thread Christian Hoffmann
Hi, On 6/26/20 12:50 PM, Satyam Vishnoi wrote: > I should get alert when below given /mapr nfs mount-point get detached . > > > I am using following 2 metrics provided by node-exporter collector > mountstats and nfs . > > > Query-1 absent(node_filesystem_size_bytes { > job="exporter_node",fsty

Re: [prometheus-users] Alert handling using alertmanager even handler .

2020-06-29 Thread Christian Hoffmann
Hi, On 6/30/20 7:51 AM, Pooja Chauhan wrote: > Hi Christian, > Can u pls gve me the official document link which you are referring. This is the official documentation outlining the alert rule syntax: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ Kind regards, Christ

Re: [prometheus-users] Job label in file-based SD

2020-06-29 Thread Christian Hoffmann
Hi, On 6/26/20 10:33 AM, Björn Fischer wrote: > I was going through the guide for file-based service discovery [1] and > noticed that they are setting the job label in the targets file. That > doesn't make sense to me. Targets are not strictly job-specific and > Prometheus is setting the job label

Re: [prometheus-users] disk speed

2020-06-29 Thread Christian Hoffmann
Hi, On 6/23/20 4:45 PM, 'Metrics Searcher' via Prometheus Users wrote: > Does anyone know how to collect the disk speed, like I can do it via > hdparm or dd? I don't know of a standard solution for this. Also, your examples are performance metrics which cannot be collected passively and continuou

Re: [prometheus-users] Prometheus timeseries and table panel in grafana

2020-06-29 Thread Christian Hoffmann
Hi, On 6/23/20 5:43 PM, neel patel wrote: > I am using prometheus and grafana combo to monitor PostgreSQL database. > > Now prometheus stores the timeseries as below. > > disk_free_space{file_system="/dev/sda1",file_system_type=“xfs”,mount_point="/boot",server=“127.0.0.1:5432”} > 9.5023104e+07 >

Re: [prometheus-users] Custom Threshold for a particular instance.

2020-06-29 Thread Christian Hoffmann
Hi, On 6/24/20 8:09 PM, yagyans...@gmail.com wrote: > Hi. Currently I am using a custom threshold in case of my Memory alerts. > I have 2 main labels for my every node exporter target - cluster and > component. > My custom threshold till now has been based on the component as I had to > define tha

Re: [prometheus-users] Merging too prometheus datasources on the same grafana dashboard

2020-06-29 Thread Christian Hoffmann
Hi, On 6/29/20 11:43 AM, Daly Graty wrote: > I got to grafana servers first one is monitoring kubernetes installed on > the master the second is on a separate Vm both are pinging ! > I need to merge both of them in order to access them with the same URL > I tried to added kubernetes prometheus ( m

  1   2   >