Re: Resource limits - an "initial delay" or "max duration" would be really handy

2018-05-06 Thread Alan Christie
Thanks for the quick response Clayton, very helpful.

For now I’ll just set higher limits as you suggest.

But just to be clear it’s the “requests” that are used for scheduling Pods on 
nodes, not it’s “limits” value?

Alan


> On 5 May 2018, at 16:05, Clayton Coleman  wrote:
> 
> Resource limits are fixed because we need to make a good scheduling decision 
> for the initial burst you’re describing (the extremely high cpu at the 
> beginning).  Some applications might also need similar cpu on restart.  Your 
> workload needs to “burst”, so setting your cpu limit to your startup peak and 
> your cpu request to a reasonable percentage of the long term use is the best 
> way to ensure the scheduler can put you on a node that can accommodate you.  
> No matter what, if you want the cpu at the peak we have to schedule you 
> somewhere you can get the peak cpu.
> 
> The longer term approach that makes this less annoying is the feedback loop 
> between actual used resources on a node for running workloads and requested 
> workloads, and the eviction and descheduling agents (which attempt to 
> rebalance nodes by achuffling workloads around).
> 
> For the specific case of an app where you know for sure the processes will 
> use a fraction of the initial limit, you can always voluntarily limit your 
> own cpu at some time after startup.  That could be a side agent that puts a 
> more restrictive cgroup limit in place on the container after it has been up 
> a few minutes.
> 
> On May 5, 2018, at 9:57 AM, Alan Christie  > wrote:
> 
>> I like the idea of placing resource limits on applications running in the 
>> cluster but I wonder if there’s any advice for defining CPU “limits" that 
>> are more tolerant of application start-up behaviour?  Something like the 
>> initial delay on a readiness or liveness probe for example? It just seems 
>> like a rather obvious property of any limit. The ones available are just too 
>> “hard".
>> 
>> One example, and I’m sure this must be common to many applications, is an 
>> application that consumes a significant proportion of the CPU during 
>> initialisation but then, in its steady-state, falls back to a much lower and 
>> non-bursty behaviour. I’ve attached a screenshot of one such application. 
>> You can see that for a very short period of time, exclusive to 
>> initialisation, it consumes many more cycles than the post-initialisation 
>> stage.
>> 
>> 
>> During initialisation CPU demand tends to fall and memory use tends to rise.
>> 
>> I suspect that what I’m seeing during this time is OpenShift “throttling” my 
>> app (understandable given the parameters available) and it then fails to 
>> pass through initialisation fast enough to satisfy the readiness/liveness 
>> probe and then gets restarted. Again, and again.
>> 
>> I cannot use any sensible steady-state limit (i.e. one that prevents the 
>> normal stead-state behaviour from deviating) without the application 
>> constantly being forced to throttle and potentially reboot during 
>> initialisation.
>> 
>> In this example I’d like to set a a perfectly reasonable CPU limit of 
>> something like 5Mi (because, after the first minute of execution it should 
>> never deviate from the steady-state level). Sadly I cannot set a low level 
>> because OpenShift will not let the application start (for reasons already 
>> explained) as its initial but very brief CPU load exceeds any “reasonable" 
>> level I set.
>> 
>> I can get around this by defining an abnormally large cpu limit but, to me, 
>> using an “abnormal” level sort of defeats the purpose of a limit.
>> 
>> Aren’t resource limits missing one vital parameter, “duration" or "initial 
>> delay”.
>> 
>> Maybe this is beyond the resources feature and has to be deferred to 
>> something like prometheus? But can prometheus take actions rather than just 
>> monitor and alert? And, even if it could, employing prometheus may seem to 
>> some like "using a sledgehammer to crack a nut”.
>> 
>> Any advice on permitting bursty applications within the cluster but also 
>> using limits would be most appreciated.
>> 
>> Alan Christie
>> 
>> ___
>> users mailing list
>> users@lists.openshift.redhat.com 
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users 
>> 

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Resource limits - an "initial delay" or "max duration" would be really handy

2018-05-05 Thread Clayton Coleman
Resource limits are fixed because we need to make a good scheduling
decision for the initial burst you’re describing (the extremely high cpu at
the beginning).  Some applications might also need similar cpu on restart.
Your workload needs to “burst”, so setting your cpu limit to your startup
peak and your cpu request to a reasonable percentage of the long term use
is the best way to ensure the scheduler can put you on a node that can
accommodate you.  No matter what, if you want the cpu at the peak we have
to schedule you somewhere you can get the peak cpu.

The longer term approach that makes this less annoying is the feedback loop
between actual used resources on a node for running workloads and requested
workloads, and the eviction and descheduling agents (which attempt to
rebalance nodes by achuffling workloads around).

For the specific case of an app where you know for sure the processes will
use a fraction of the initial limit, you can always voluntarily limit your
own cpu at some time after startup.  That could be a side agent that puts a
more restrictive cgroup limit in place on the container after it has been
up a few minutes.

On May 5, 2018, at 9:57 AM, Alan Christie 
wrote:

I like the idea of placing resource limits on applications running in the
cluster but I wonder if there’s any advice for defining CPU “limits" that
are more tolerant of application start-up behaviour?  Something like the
initial delay on a readiness or liveness probe for example? It just seems
like a rather obvious property of any limit. The ones available are just
too “hard".

One example, and I’m sure this must be common to many applications, is an
application that consumes a significant proportion of the CPU during
initialisation but then, in its steady-state, falls back to a much lower
and non-bursty behaviour. I’ve attached a screenshot of one such
application. You can see that for a very short period of time, exclusive to
initialisation, it consumes many more cycles than the post-initialisation
stage.


During initialisation CPU demand tends to fall and memory use tends to rise.

I suspect that what I’m seeing during this time is OpenShift “throttling”
my app (understandable given the parameters available) and it then fails to
pass through initialisation fast enough to satisfy the readiness/liveness
probe and then gets restarted. Again, and again.

I cannot use any sensible steady-state limit (i.e. one that prevents the
normal stead-state behaviour from deviating) without the application
constantly being forced to throttle and potentially reboot during
initialisation.

In this example I’d like to set a a perfectly reasonable CPU limit of
something like 5Mi (because, after the first minute of execution it should
never deviate from the steady-state level). Sadly I cannot set a low level
because OpenShift will not let the application start (for reasons already
explained) as its initial but very brief CPU load exceeds any “reasonable"
level I set.

I can get around this by defining an abnormally large cpu limit but, to me,
using an “abnormal” level sort of defeats the purpose of a limit.

*Aren’t resource limits missing one vital parameter, “duration" or "initial
delay”.*

Maybe this is beyond the resources feature and has to be deferred to
something like prometheus? But can prometheus take actions rather than just
monitor and alert? And, even if it could, employing prometheus may seem to
some like "using a sledgehammer to crack a nut”.

Any advice on permitting bursty applications within the cluster but also
using limits would be most appreciated.

Alan Christie

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Resource limits - an "initial delay" or "max duration" would be really handy

2018-05-05 Thread Alan Christie
I like the idea of placing resource limits on applications running in the 
cluster but I wonder if there’s any advice for defining CPU “limits" that are 
more tolerant of application start-up behaviour?  Something like the initial 
delay on a readiness or liveness probe for example? It just seems like a rather 
obvious property of any limit. The ones available are just too “hard".

One example, and I’m sure this must be common to many applications, is an 
application that consumes a significant proportion of the CPU during 
initialisation but then, in its steady-state, falls back to a much lower and 
non-bursty behaviour. I’ve attached a screenshot of one such application. You 
can see that for a very short period of time, exclusive to initialisation, it 
consumes many more cycles than the post-initialisation stage.


During initialisation CPU demand tends to fall and memory use tends to rise.

I suspect that what I’m seeing during this time is OpenShift “throttling” my 
app (understandable given the parameters available) and it then fails to pass 
through initialisation fast enough to satisfy the readiness/liveness probe and 
then gets restarted. Again, and again.

I cannot use any sensible steady-state limit (i.e. one that prevents the normal 
stead-state behaviour from deviating) without the application constantly being 
forced to throttle and potentially reboot during initialisation.

In this example I’d like to set a a perfectly reasonable CPU limit of something 
like 5Mi (because, after the first minute of execution it should never deviate 
from the steady-state level). Sadly I cannot set a low level because OpenShift 
will not let the application start (for reasons already explained) as its 
initial but very brief CPU load exceeds any “reasonable" level I set.

I can get around this by defining an abnormally large cpu limit but, to me, 
using an “abnormal” level sort of defeats the purpose of a limit.

Aren’t resource limits missing one vital parameter, “duration" or "initial 
delay”.

Maybe this is beyond the resources feature and has to be deferred to something 
like prometheus? But can prometheus take actions rather than just monitor and 
alert? And, even if it could, employing prometheus may seem to some like "using 
a sledgehammer to crack a nut”.

Any advice on permitting bursty applications within the cluster but also using 
limits would be most appreciated.

Alan Christie

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users