Action health checks

2019-10-26 Thread Tyson Norris
Hi Whiskers –
We periodically have an unfortunate problem where a docker container (or worse, 
many of them) dies off unexpectedly, outside of HTTP usage from invoker. In 
these cases, prewarm or warm containers may still have references at the 
Invoker, and eventually if an activation arrives that matches those container 
references, the HTTP workflow starts and fails immediately since the node is 
not listening anymore, resulting in failed activations. Or, any even worse 
situation, can be when a container failed earlier, and a new container, 
initialized with a different action is initialized on the same host and port 
(more likely a problem for k8s/mesos cluster usage).

To mitigate these issues, I put together a health check process [1] from 
invoker to action containers, where we can test

  *   prewarm containers periodically to verify they are still operational, and
  *   warm containers immediately after resuming them (before HTTP requests are 
sent)
In case of prewarm failure, we should backfill the prewarms to the specified 
config count.
In case of warm failure, the activation is rescheduled to ContainerPool, which 
typically would either route to a different prewarm, or start a new cold 
container.

The test ping is in the form of tcp connection only, since we otherwise need to 
update the HTTP protocol implemented by all runtimes. This test is good enough 
for the worst case of “container has gone missing”, but cannot test for more 
subtle problems like “/run endpoint is broken”. There could be other checks to 
increase the quality of test we add in the future, but most of this I think 
requires expanding the HTTP protocol and state managed at the container, and I 
wanted to get something working for basic functionality to start with.

Let me know if you have opinions about this, and we can discuss  here or in the 
PR.
Thanks
Tyson

[1] https://github.com/apache/openwhisk/pull/4698


[slack-digest] [2019-10-25] #general

2019-10-26 Thread OpenWhisk Team Slack
2019-10-25 01:50:31 UTC - Bill Zong: 1. Your invoker node(machine) should have 
memory more than `39000m`.
2. Concurrency setting in `whisk.limits.actionsInvokesConcurrent` and 
`whisk.limits.actionsInvokesPerminute` should set to a larger value than 
default `60`. The largest value is `99`.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1571968231071300?thread_ts=1571917003.047300&cid=C3TPCAQG1

2019-10-25 12:54:26 UTC - Pepi Paraskevoulakou: Hello i created an api for my 
sequence of actions which is 

 , afterwards i tryied to put my parametres and the full command was: 
,,,
 but raises an error, what i have to do?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572008066071600?thread_ts=1572008066.071600&cid=C3TPCAQG1

2019-10-25 15:31:51 UTC - Narasimha Murthy: Thanks you @Rodric Rabbah. I am 
able to run wskadmin itself as python script but how do I setup? Its throwing 
error that its missing couple of properties. I am not sure where do I get those 
values and run wskadmin. Is there any doc explaining setup of wskadmin?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572017511071900?thread_ts=1571930090.049500&cid=C3TPCAQG1

2019-10-25 16:34:53 UTC - giusdp: Hi there, can you guys point me to some 
material about the load balancing part of open whisk?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572021293073900

2019-10-25 16:35:06 UTC - giusdp: To understand better how it balances the 
load. Does it know how heavy the nodes are, which heuristics uses etc.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572021306074100?thread_ts=1572021306.074100&cid=C3TPCAQG1

2019-10-25 16:42:27 UTC - Rob Allen: Can someone point me at the standarlone 
jar please
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572021747074500

2019-10-25 17:15:41 UTC - Rob Allen: nm. `/gradlew :core:standalone:build` 
worked!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572023741075000

2019-10-25 19:03:58 UTC - Rob Allen: I know I'm behind the times, but this 
standalone jar is quite handy
100 : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1572030238075400