Hi,

i am using the amtool client in a Job inside my cluster.

An alert was fired and we got notification in our slack channel, i used the 
cli (in code that runs inside docker image from the Job) to create a 
silence according to `alertname` matcher and there was no failure.

from a look in the AlertManager UI no silence was created, and i got 
resolved notification after 5 minutes since the fired notification.

After ~10 minutes the alert was fired and resolved again (5 minutes 
difference).

I wonder why the silence wasn't able to create? (not the first time it 
happens) 
Maybe it's some kind of a race condition? we can't silence alerts which are 
not in fired state right? (although the alert was in fired state while i 
tried to create the silence)

The Alert rule:
name: Orchestrator GRPC Failures for ExternalProcessor Service 
<http://localhost:9090/graph?g0.expr=ALERTS%7Balertname%3D%22Orchestrator%20GRPC%20Failures%20for%20ExternalProcessor%20Service%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
expr: 
sum(increase(grpc_server_handled_total{grpc_code!~"OK|Canceled",grpc_service="envoy.service.ext_proc.v3.ExternalProcessor"}[5m]))
 
> 0 
<http://localhost:9090/graph?g0.expr=sum(increase(grpc_server_handled_total%7Bgrpc_code!~%22OK%7CCanceled%22%2Cgrpc_service%3D%22envoy.service.ext_proc.v3.ExternalProcessor%22%7D%5B5m%5D))%20%3E%200&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
for: 5m
labels:
severity: WARNING
annotations:
dashboard_url: p-R7Hw1Iz
runbook_url: extension-orchestrator-dashboard
summary: Failed gRPC calls detected in the Envoy External Processor within 
the last 5 minutes. <!subteam^S06E0CPPC5S>

The code for creating the silence:
func postSilence(amCli amclient.Client, matchers []*models.Matcher) error {
startsAt := strfmt.DateTime(silenceStart)
endsAt := strfmt.DateTime(silenceStart.Add(silenceDuration))
createdBy := creatorType
comment := silenceComment
silenceParams := silence.NewPostSilencesParams().WithSilence(
&models.PostableSilence{
Silence: models.Silence{
Matchers:  matchers,
StartsAt:  &startsAt,
EndsAt:    &endsAt,
CreatedBy: &createdBy,
Comment:   &comment,
},
},
)

err := amCli.PostSilence(silenceParams)
if err != nil {
return fmt.Errorf("failed on post silence: %w", err)
}
log.Print("Silence posted successfully")

return nil
}

Thank in advance,
Saar Zur SAP Labs

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/60b275a6-f9b2-4bae-a9d2-95460f6b8cf0n%40googlegroups.com.

Reply via email to