Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-07-01 Thread Brian Candler
On Tuesday, 30 June 2020 14:40:46 UTC+1, Sébastien Dionne wrote:
>
> but there is a way to configure the scrape interval with annoation too ?
>
> I could have applications that we want to monitor each 15 sec and others 
> at 45sec interval or more.
>
>
You can have two different scrape jobs, one with interval 15s and one with 
interval 45s.  Use the relabeling step to drop targets which have the wrong 
annotation for that job.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1c166180-a19f-46de-9e89-53291c0e21ceo%40googlegroups.com.


Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Sébastien Dionne
YES.. when I have labels on my pods.. I received them.  good.  I think, 
I'll be able to work with AlertManager webhook.


Prometheus auto-discover my pods because they are annoted with
prometheus.io/path: /metrics
  prometheus.io/port: 8080
  prometheus.io/scrape: true


but there is a way to configure the scrape interval with annoation too ?

I could have applications that we want to monitor each 15 sec and others at 
45sec interval or more.



thanks 





On Tuesday, June 30, 2020 at 7:34:00 AM UTC-4, Sébastien Dionne wrote:
>
> that is the config that I have so far
>
>
> serverFiles:
>   alerts:
> groups:
>   - name: Instances
> rules:
>   - alert: InstanceDown
> expr: up == 0
> for: 10s
> labels:
>   severity: page
> annotations:
>   description: '{{ $labels.instance }} of job {{ $labels.job 
> }} has been down for more than 1 minute.'
>   summary: 'Instance {{ $labels.instance }} down'
>   
> alertmanagerFiles:
>   alertmanager.yml:
> route:
>   receiver: default-receiver
>   group_wait: 5s
>   group_interval: 10s
>
> receivers:
>   - name: default-receiver
> webhook_configs:
>   - url: "
> https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840;
>   
>
>
> here a exemple of one of my pods
>
> Labels:   app.kubernetes.io/instance=optimizer-6e0f0a089c70
>   app.kubernetes.io/name=optimizer-interface
>   
>   pod-template-hash=784669954d
>   releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
>   service.ip=10.1.7.200
>
> Annotations:  cni.projectcalico.org/podIP: 192.168.218.99/32
>   cni.projectcalico.org/podIPs: 192.168.218.99/32
>   prometheus.io/path: /metrics
>   prometheus.io/port: 8080
>   prometheus.io/scrape: true
>
> I have to get Prometheus to scan for pod "health" each 10-15 seconds and 
> send a alert for the pods that are up->down  and down -> up
>
>
> on the side, I added a Gauge that return the timestamp in my application 
> and I pool Prometheus each 15 seconds to get the last timestamp of all 
> application and if the NOW - timestamp> 15, that means that Prometheus 
> wasn't able to call the pod in the last 15 seconds.. so I consider that pod 
> down.  With a query like that
>
>
> http://localhost:9090/api/v1/query?query={__name__=~".*lastTimestampScrapped"}
>
> but if I could do the same directly with Prometheus+alertManager, I 
> wouldn't have to query manually Prometheus myself.
>
>
>
>
>
>
>
>
> On Tuesday, June 30, 2020 at 4:15:58 AM UTC-4, Christian Hoffmann wrote:
>>
>> Hi, 
>>
>> On 6/25/20 8:55 PM, Sébastien Dionne wrote: 
>> > I have few java applications that I'll deploy in my cluster.  I need to 
>> > know how can I detect if a instance is up or down with Prometheus.  
>> > 
>> > *Alerting with AlertManager* 
>> > * 
>> > * 
>> > I have a alert that check for "instanceDown" and send a alert to 
>> > AlertManager-webhook. So when one instance is down, i'm receiving 
>> alerts 
>> > in my application.   
>> > 
>> > But how can I extract the labels that are in that instance ? 
>> What do you mean by "in that instance"? 
>>
>> If the label is part of your service discovery, then it should be 
>> attached to all series from that target. This would also imply that it 
>> would be part of any alert by default unless you aggregate it away (e.g. 
>> by using sum, avg or something). 
>>
>> If the label is only part of some info-style metric, you will have to 
>> mix this metric into your alert. 
>>
>> Can you share one of the relevant alert rules if you need more specific 
>> guidance? 
>>
>> Note: I don't know how many releaseUUIDGroups you have, but having UUIDs 
>> as label values might ring some alarm bells due to the potential for 
>> high cardinality issues. :) 
>>
>> Kind regards, 
>> Christian 
>>
>>
>> > 
>> > ex : I have a special labels in all my application that link the pod to 
>> > the information that I have in the database 
>> > 
>> > releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70 
>> > 
>> > 
>> > there is a way to add that information in the message that AlertManager 
>> > send ? 
>> > 
>> > right now I configure AlertManager to send the alert to 
>> > : 
>> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
>>  
>> > 
>> > for an example, I kill the pod : prometheus-pushgateway 
>> > 
>> > and I received this message :  
>> > 
>> > { 
>> >   "receiver": "default-receiver", 
>> >   "status": "resolved", 
>> >   "alerts": [ 
>> > { 
>> >   "status": "resolved", 
>> >   "labels": { 
>> > "alertname": "InstanceDown", 
>> > "instance": "prometheus-pushgateway.default.svc:9091", 
>> > "job": "prometheus-pushgateway", 
>> > "severity": "page" 
>> >   }, 
>> >  

Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Sébastien Dionne
that is the config that I have so far


serverFiles:
  alerts:
groups:
  - name: Instances
rules:
  - alert: InstanceDown
expr: up == 0
for: 10s
labels:
  severity: page
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} 
has been down for more than 1 minute.'
  summary: 'Instance {{ $labels.instance }} down'
  
alertmanagerFiles:
  alertmanager.yml:
route:
  receiver: default-receiver
  group_wait: 5s
  group_interval: 10s

receivers:
  - name: default-receiver
webhook_configs:
  - url: "https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840;
  


here a exemple of one of my pods

Labels:   app.kubernetes.io/instance=optimizer-6e0f0a089c70
  app.kubernetes.io/name=optimizer-interface
  
  pod-template-hash=784669954d
  releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
  service.ip=10.1.7.200

Annotations:  cni.projectcalico.org/podIP: 192.168.218.99/32
  cni.projectcalico.org/podIPs: 192.168.218.99/32
  prometheus.io/path: /metrics
  prometheus.io/port: 8080
  prometheus.io/scrape: true

I have to get Prometheus to scan for pod "health" each 10-15 seconds and 
send a alert for the pods that are up->down  and down -> up


on the side, I added a Gauge that return the timestamp in my application 
and I pool Prometheus each 15 seconds to get the last timestamp of all 
application and if the NOW - timestamp> 15, that means that Prometheus 
wasn't able to call the pod in the last 15 seconds.. so I consider that pod 
down.  With a query like that

http://localhost:9090/api/v1/query?query={__name__=~".*lastTimestampScrapped"}

but if I could do the same directly with Prometheus+alertManager, I 
wouldn't have to query manually Prometheus myself.








On Tuesday, June 30, 2020 at 4:15:58 AM UTC-4, Christian Hoffmann wrote:
>
> Hi, 
>
> On 6/25/20 8:55 PM, Sébastien Dionne wrote: 
> > I have few java applications that I'll deploy in my cluster.  I need to 
> > know how can I detect if a instance is up or down with Prometheus.  
> > 
> > *Alerting with AlertManager* 
> > * 
> > * 
> > I have a alert that check for "instanceDown" and send a alert to 
> > AlertManager-webhook. So when one instance is down, i'm receiving alerts 
> > in my application.   
> > 
> > But how can I extract the labels that are in that instance ? 
> What do you mean by "in that instance"? 
>
> If the label is part of your service discovery, then it should be 
> attached to all series from that target. This would also imply that it 
> would be part of any alert by default unless you aggregate it away (e.g. 
> by using sum, avg or something). 
>
> If the label is only part of some info-style metric, you will have to 
> mix this metric into your alert. 
>
> Can you share one of the relevant alert rules if you need more specific 
> guidance? 
>
> Note: I don't know how many releaseUUIDGroups you have, but having UUIDs 
> as label values might ring some alarm bells due to the potential for 
> high cardinality issues. :) 
>
> Kind regards, 
> Christian 
>
>
> > 
> > ex : I have a special labels in all my application that link the pod to 
> > the information that I have in the database 
> > 
> > releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70 
> > 
> > 
> > there is a way to add that information in the message that AlertManager 
> > send ? 
> > 
> > right now I configure AlertManager to send the alert to 
> > : 
> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
>  
> > 
> > for an example, I kill the pod : prometheus-pushgateway 
> > 
> > and I received this message :  
> > 
> > { 
> >   "receiver": "default-receiver", 
> >   "status": "resolved", 
> >   "alerts": [ 
> > { 
> >   "status": "resolved", 
> >   "labels": { 
> > "alertname": "InstanceDown", 
> > "instance": "prometheus-pushgateway.default.svc:9091", 
> > "job": "prometheus-pushgateway", 
> > "severity": "page" 
> >   }, 
> >   "annotations": { 
> > "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.", 
> > "summary": "Instance prometheus-pushgateway.default.svc:9091 
> down" 
> >   }, 
> >   "startsAt": "2020-06-19T17:09:53.862877577Z", 
> >   "endsAt": "2020-06-22T11:23:53.862877577Z", 
> >   "generatorURL": "
> http://prometheus-server-57d8dcc67f-qnmkj:9090/graph?g0.expr=up+%3D%3D+0=1;,
>  
>
> >   "fingerprint": "1ed4a1dca68d64fb" 
> > } 
> >   ], 
> >   "groupLabels": {}, 
> >   "commonLabels": { 
> > "alertname": "InstanceDown", 
> > "instance": "prometheus-pushgateway.default.svc:9091", 
> > "job": "prometheus-pushgateway", 
> > 

Re: [prometheus-users] Is it possible to extract labels when generating AlertManager alert ?

2020-06-30 Thread Christian Hoffmann
Hi,

On 6/25/20 8:55 PM, Sébastien Dionne wrote:
> I have few java applications that I'll deploy in my cluster.  I need to
> know how can I detect if a instance is up or down with Prometheus. 
> 
> *Alerting with AlertManager*
> *
> *
> I have a alert that check for "instanceDown" and send a alert to
> AlertManager-webhook. So when one instance is down, i'm receiving alerts
> in my application.  
> 
> But how can I extract the labels that are in that instance ?
What do you mean by "in that instance"?

If the label is part of your service discovery, then it should be
attached to all series from that target. This would also imply that it
would be part of any alert by default unless you aggregate it away (e.g.
by using sum, avg or something).

If the label is only part of some info-style metric, you will have to
mix this metric into your alert.

Can you share one of the relevant alert rules if you need more specific
guidance?

Note: I don't know how many releaseUUIDGroups you have, but having UUIDs
as label values might ring some alarm bells due to the potential for
high cardinality issues. :)

Kind regards,
Christian


> 
> ex : I have a special labels in all my application that link the pod to
> the information that I have in the database
> 
> releaseUUIDGroup=bf79b8ab-a7c1-4d27-8f3c-6e0f0a089c70
> 
> 
> there is a way to add that information in the message that AlertManager
> send ?
> 
> right now I configure AlertManager to send the alert to
> : 
> https://webhook.site/#!/815a0b0b-f40c-4fc2-984d-e29cb9606840/b0dd701d-e972-48d4-9083-385e6a788d55/1
> 
> for an example, I kill the pod : prometheus-pushgateway
> 
> and I received this message : 
> 
> {
>   "receiver": "default-receiver",
>   "status": "resolved",
>   "alerts": [
> {
>   "status": "resolved",
>   "labels": {
> "alertname": "InstanceDown",
> "instance": "prometheus-pushgateway.default.svc:9091",
> "job": "prometheus-pushgateway",
> "severity": "page"
>   },
>   "annotations": {
> "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.",
> "summary": "Instance prometheus-pushgateway.default.svc:9091 down"
>   },
>   "startsAt": "2020-06-19T17:09:53.862877577Z",
>   "endsAt": "2020-06-22T11:23:53.862877577Z",
>   "generatorURL": 
> "http://prometheus-server-57d8dcc67f-qnmkj:9090/graph?g0.expr=up+%3D%3D+0=1;,
>   "fingerprint": "1ed4a1dca68d64fb"
> }
>   ],
>   "groupLabels": {},
>   "commonLabels": {
> "alertname": "InstanceDown",
> "instance": "prometheus-pushgateway.default.svc:9091",
> "job": "prometheus-pushgateway",
> "severity": "page"
>   },
>   "commonAnnotations": {
> "description": "prometheus-pushgateway.default.svc:9091 of job 
> prometheus-pushgateway has been down for more than 1 minute.",
> "summary": "Instance prometheus-pushgateway.default.svc:9091 down"
>   },
>   "externalURL": "http://localhost:9093;,
>   "version": "4",
>   "groupKey": "{}:{}"
> }
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-users+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/20ec33e0-e9bf-4f2a-b366-092743dad957o%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f46ecd4e-0986-939f-a98e-def56d0f1fe9%40hoffmann-christian.info.