I like the idea of placing resource limits on applications running in the 
cluster but I wonder if there’s any advice for defining CPU “limits" that are 
more tolerant of application start-up behaviour?  Something like the initial 
delay on a readiness or liveness probe for example? It just seems like a rather 
obvious property of any limit. The ones available are just too “hard".

One example, and I’m sure this must be common to many applications, is an 
application that consumes a significant proportion of the CPU during 
initialisation but then, in its steady-state, falls back to a much lower and 
non-bursty behaviour. I’ve attached a screenshot of one such application. You 
can see that for a very short period of time, exclusive to initialisation, it 
consumes many more cycles than the post-initialisation stage.


During initialisation CPU demand tends to fall and memory use tends to rise.

I suspect that what I’m seeing during this time is OpenShift “throttling” my 
app (understandable given the parameters available) and it then fails to pass 
through initialisation fast enough to satisfy the readiness/liveness probe and 
then gets restarted. Again, and again.

I cannot use any sensible steady-state limit (i.e. one that prevents the normal 
stead-state behaviour from deviating) without the application constantly being 
forced to throttle and potentially reboot during initialisation.

In this example I’d like to set a a perfectly reasonable CPU limit of something 
like 5Mi (because, after the first minute of execution it should never deviate 
from the steady-state level). Sadly I cannot set a low level because OpenShift 
will not let the application start (for reasons already explained) as its 
initial but very brief CPU load exceeds any “reasonable" level I set.

I can get around this by defining an abnormally large cpu limit but, to me, 
using an “abnormal” level sort of defeats the purpose of a limit.

Aren’t resource limits missing one vital parameter, “duration" or "initial 
delay”.

Maybe this is beyond the resources feature and has to be deferred to something 
like prometheus? But can prometheus take actions rather than just monitor and 
alert? And, even if it could, employing prometheus may seem to some like "using 
a sledgehammer to crack a nut”.

Any advice on permitting bursty applications within the cluster but also using 
limits would be most appreciated.

Alan Christie

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to