Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread Rodrigo Campos
It really depends on the monitoring solution. Usually this metrics are
exported and you can just predicate on them, in the language they provide.

In my case, I'm using a hosted solution (signalfx) that gives you a daemon
set and sends that metric to them. You can then predicate. We have alerts
when restarts increase significantly, the number of pods ready, cpu used on
average for each app, etc.

Does this help?

On Wednesday, August 8, 2018, David Rosenstrauch  wrote:

> As we're getting ready to go to production with our k8s-based system,
> we're trying to pin down exactly how we're going to do all the needed
> monitoring/alerting for it.  We can easily collect many of the metrics we
> need (using kube-state-metrics to feed into prometheus, and/or Datadog) and
> alert off of those.
>
> However, there's other important k8s-related info about our system that we
> need to be able to access, monitor, and alert on, most notably things like:
>
> * If a container crashes and is restarted by k8s
>
> * If k8s kills a container and restarts it (e.g., due to exceeding cpu or
> memory limits, or due to repeated failure of liveness check)
>
> * If k8s kills a container but cannot restart it
>
> * If an entire pod crashes and is restarted by k8s
>
> etc.
>
>
> How would would go about gaining access to those k8s-related events in an
> automated fashion, and setting up monitoring/alerting off of those?
>
> Thanks,
>
> DR
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread Agrawal, Punit
David,

What we do is export the kubernetes cluster events to Cloud PubSub using 
Stackdriver Export and then we have SumoLogic setup to ingest logs from PubSub.
Then we use the SumoLogic Scheduled Search Capabilities to send alerts based on 
certain events.

Punit Agrawal
Site Reliability Engineer, Lead
New Product Development


From:  on behalf of Marcio Garcia 

Reply-To: "kubernetes-users@googlegroups.com" 

Date: Wednesday, August 8, 2018 at 2:16 PM
To: "kubernetes-users@googlegroups.com" 
Subject: Re: [kubernetes-users] How to monitor/alert on container/pod death or 
restart

David,

In Datadog events you can see the killed pods.

But, if you have containers that need to be killed because they don't die when 
receiving a stop, you'll see a  lot of events like:  KILLED, DESTROYED, and 
this is not necessarily
 an error, could be only a container being restarted, keep that in mind.





On Wed, Aug 8, 2018 at 6:06 PM David Rosenstrauch 
mailto:dar...@darose.net>> wrote:
Thanks for the response, Marcio.  We've actually recently started using
Datadog already.  (At least in dev/qa.)  But DD is a bit of a sea of
metrics, and I'm not clear how we would accomplish one of the specific
tasks I've mentioned - for example, alerting when k8s has killed a
container or pod.  Any pointers on how I might go about setting up an
alert like that?

Thanks,

DR

On 08/08/2018 04:45 PM, Marcio Garcia wrote:
> Hi David,
>
> You can use DataDog  to achieve this.
>
> On 8/8/18, David Rosenstrauch mailto:dar...@darose.net>> 
> wrote:
>> As we're getting ready to go to production with our k8s-based system,
>> we're trying to pin down exactly how we're going to do all the needed
>> monitoring/alerting for it.  We can easily collect many of the metrics
>> we need (using kube-state-metrics to feed into prometheus, and/or
>> Datadog) and alert off of those.
>>
>> However, there's other important k8s-related info about our system that
>> we need to be able to access, monitor, and alert on, most notably things
>> like:
>>
>> * If a container crashes and is restarted by k8s

>> How would would go about gaining access to those k8s-related events in
>> an automated fashion, and setting up monitoring/alerting off of those?

--
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to 
kubernetes-users@googlegroups.com.
Visit this group at 
https://groups.google.com/group/kubernetes-users.
For more options, visit 
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to 
kubernetes-users@googlegroups.com.
Visit this group at 
https://groups.google.com/group/kubernetes-users.
For more options, visit 
https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread Marcio Garcia
David,

In Datadog events you can see the killed pods.

But, if you have containers that need to be killed because they don't die
when receiving a stop, you'll see a  lot of events like:  KILLED,
DESTROYED, and this is not necessarily
 an error, could be only a container being restarted, keep that in mind.





On Wed, Aug 8, 2018 at 6:06 PM David Rosenstrauch  wrote:

> Thanks for the response, Marcio.  We've actually recently started using
> Datadog already.  (At least in dev/qa.)  But DD is a bit of a sea of
> metrics, and I'm not clear how we would accomplish one of the specific
> tasks I've mentioned - for example, alerting when k8s has killed a
> container or pod.  Any pointers on how I might go about setting up an
> alert like that?
>
> Thanks,
>
> DR
>
> On 08/08/2018 04:45 PM, Marcio Garcia wrote:
> > Hi David,
> >
> > You can use DataDog  to achieve this.
> >
> > On 8/8/18, David Rosenstrauch  wrote:
> >> As we're getting ready to go to production with our k8s-based system,
> >> we're trying to pin down exactly how we're going to do all the needed
> >> monitoring/alerting for it.  We can easily collect many of the metrics
> >> we need (using kube-state-metrics to feed into prometheus, and/or
> >> Datadog) and alert off of those.
> >>
> >> However, there's other important k8s-related info about our system that
> >> we need to be able to access, monitor, and alert on, most notably things
> >> like:
> >>
> >> * If a container crashes and is restarted by k8s
>
> >> How would would go about gaining access to those k8s-related events in
> >> an automated fashion, and setting up monitoring/alerting off of those?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread 'Tim Hockin' via Kubernetes user discussion and Q
Most of what you're asking for is available via the k8s API, if you watch
it.

On Wed, Aug 8, 2018 at 12:58 PM David Rosenstrauch 
wrote:

> As we're getting ready to go to production with our k8s-based system,
> we're trying to pin down exactly how we're going to do all the needed
> monitoring/alerting for it.  We can easily collect many of the metrics
> we need (using kube-state-metrics to feed into prometheus, and/or
> Datadog) and alert off of those.
>
> However, there's other important k8s-related info about our system that
> we need to be able to access, monitor, and alert on, most notably things
> like:
>
> * If a container crashes and is restarted by k8s
>

Represented the in the pod.status block


> * If k8s kills a container and restarts it (e.g., due to exceeding cpu
> or memory limits, or due to repeated failure of liveness check)
>

Also in pod.status

* If k8s kills a container but cannot restart it
>

in pod.status and/or events depending on exactly what you want to know


> * If an entire pod crashes and is restarted by k8s
>

There's not really a concept of a pod "crashing", just containers being
restarted.


>
> etc.
>
>
> How would would go about gaining access to those k8s-related events in
> an automated fashion, and setting up monitoring/alerting off of those?
>
> Thanks,
>
> DR
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread David Rosenstrauch
Thanks for the response, Marcio.  We've actually recently started using 
Datadog already.  (At least in dev/qa.)  But DD is a bit of a sea of 
metrics, and I'm not clear how we would accomplish one of the specific 
tasks I've mentioned - for example, alerting when k8s has killed a 
container or pod.  Any pointers on how I might go about setting up an 
alert like that?


Thanks,

DR

On 08/08/2018 04:45 PM, Marcio Garcia wrote:

Hi David,

You can use DataDog  to achieve this.

On 8/8/18, David Rosenstrauch  wrote:

As we're getting ready to go to production with our k8s-based system,
we're trying to pin down exactly how we're going to do all the needed
monitoring/alerting for it.  We can easily collect many of the metrics
we need (using kube-state-metrics to feed into prometheus, and/or
Datadog) and alert off of those.

However, there's other important k8s-related info about our system that
we need to be able to access, monitor, and alert on, most notably things
like:

* If a container crashes and is restarted by k8s



How would would go about gaining access to those k8s-related events in
an automated fashion, and setting up monitoring/alerting off of those?


--
You received this message because you are subscribed to the Google Groups "Kubernetes 
user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


Re: [kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread Marcio Garcia
Hi David,


You can use DataDog  to achieve this.



On 8/8/18, David Rosenstrauch  wrote:
> As we're getting ready to go to production with our k8s-based system,
> we're trying to pin down exactly how we're going to do all the needed
> monitoring/alerting for it.  We can easily collect many of the metrics
> we need (using kube-state-metrics to feed into prometheus, and/or
> Datadog) and alert off of those.
>
> However, there's other important k8s-related info about our system that
> we need to be able to access, monitor, and alert on, most notably things
> like:
>
> * If a container crashes and is restarted by k8s
>
> * If k8s kills a container and restarts it (e.g., due to exceeding cpu
> or memory limits, or due to repeated failure of liveness check)
>
> * If k8s kills a container but cannot restart it
>
> * If an entire pod crashes and is restarted by k8s
>
> etc.
>
>
> How would would go about gaining access to those k8s-related events in
> an automated fashion, and setting up monitoring/alerting off of those?
>
> Thanks,
>
> DR
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.


[kubernetes-users] How to monitor/alert on container/pod death or restart

2018-08-08 Thread David Rosenstrauch
As we're getting ready to go to production with our k8s-based system, 
we're trying to pin down exactly how we're going to do all the needed 
monitoring/alerting for it.  We can easily collect many of the metrics 
we need (using kube-state-metrics to feed into prometheus, and/or 
Datadog) and alert off of those.


However, there's other important k8s-related info about our system that 
we need to be able to access, monitor, and alert on, most notably things 
like:


* If a container crashes and is restarted by k8s

* If k8s kills a container and restarts it (e.g., due to exceeding cpu 
or memory limits, or due to repeated failure of liveness check)


* If k8s kills a container but cannot restart it

* If an entire pod crashes and is restarted by k8s

etc.


How would would go about gaining access to those k8s-related events in 
an automated fashion, and setting up monitoring/alerting off of those?


Thanks,

DR

--
You received this message because you are subscribed to the Google Groups "Kubernetes 
user discussion and Q" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.