Re: Proposal on a future architecture of OpenWhisk

2018-08-16 Thread Tyson Norris
Thinking more about the singleton aspect, I guess this is mostly an issue for 
blackbox containers, where manifest/managed containers will mitigate at least 
some of the singleton failure delays by prewarm/stemcell containers. 

So in the case of singleton failure, impacts would be:
- managed containers once prewarms are exhausted (may be improved by being more 
intelligent about prewarm pool sizing based on load etc)
- managed containers that don’t match any prewarms (similar - if prewarm pool 
is dynamically configured based on load, this is less problem)
- blackbox containers (no help)

If the failover of the singleton is too long (I think it will be based on 
cluster size, oldest node becomes the singleton host iirc), I think we need to 
consider how containers can launch in the meantime. A first step might be to 
test out the singleton behavior in the cluster of various sizes.

> On Aug 16, 2018, at 11:01 AM, Tyson Norris  wrote:
> 
> A couple comments on singleton:
> - use of cluster singleton will introduce a new single point of failure - 
> from time of singleton node failure, to single resurrection on a different 
> instance, will be an outage from the point of view of any ContainerRouter 
> that does not already have a warm+free container to service an activation
> - resurrecting the singleton will require transferring or rebuilding the 
> state when recovery occurs - in my experience this was tricky, and requires 
> replicating the data (which will be slightly stale, but better than 
> rebuilding from nothing); I don’t recall the handover delay (to transfer 
> singleton to a new akka cluster node) when I tried last, but I think it was 
> not as fast as I hoped it would be.
> 
> I don’t have a great suggestion for the singleton failure case, but would 
> like to consider this carefully, and discuss the ramifications (which may or 
> may not be tolerable) before pursuing this particular aspect of the design.
> 
> 
> On prioritization:
> - if concurrency is enabled for an action, this is another prioritization 
> aspect, of sorts - if the action supports concurrency, there is no reason 
> (except for destruction coordination…) that it cannot be shared across 
> shards. This could be added later, but may be worth considering since there 
> is a general reuse problem where a series of activations that arrives at 
> different ContainerRouters will create a new container in each, while they 
> could be reused (and avoid creating new containers) if concurrency is 
> tolerated in that container. This would only (ha ha) require changing how 
> container destroy works, where it cannot be destroyed until the last 
> ContainerRouter is done with it. And if container destruction is coordinated 
> in this way to increase reuse, it would also be good to coordinate 
> construction (don’t concurrently construct the same container for multiple 
> containerRouters IFF a single container would enable concurrent activations 
> once it is created). I’m not sure if others are desiring this level of 
> container reuse, but if so, it would be worth considering these aspects 
> (sharding/isolation vs sharing/coordination) as part of any redesign.
> 
> 
> WDYT?
> 
> THanks
> Tyson
> 
> On Aug 15, 2018, at 8:55 AM, Carlos Santana 
> mailto:csantan...@gmail.com>> wrote:
> 
> I think we should add a section on prioritization for blocking vs. async
> invokes (none blocking actions a triggers)
> 
> The front door has the luxury of known some intent from the incoming
> request, I feel it would make sense to high priority to blocking invokes,
> and for async they go straight to the queue to be pick up by the system to
> eventually run, even if it takes 10 times longer to execute than a blocking
> invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
> async webhook takes 100ms.
> 
> Also the controller takes time to convert a trigger and process the rules,
> this is something that can also be taken out of hot path.
> 
> So I'm just saying we could optimize the system because we know if the
> incoming request is a hot or hotter path :-)
> 
> -- Carlos
> 
> 



Re: Proposal on a future architecture of OpenWhisk

2018-08-16 Thread Tyson Norris
A couple comments on singleton:
- use of cluster singleton will introduce a new single point of failure - from 
time of singleton node failure, to single resurrection on a different instance, 
will be an outage from the point of view of any ContainerRouter that does not 
already have a warm+free container to service an activation
- resurrecting the singleton will require transferring or rebuilding the state 
when recovery occurs - in my experience this was tricky, and requires 
replicating the data (which will be slightly stale, but better than rebuilding 
from nothing); I don’t recall the handover delay (to transfer singleton to a 
new akka cluster node) when I tried last, but I think it was not as fast as I 
hoped it would be.

I don’t have a great suggestion for the singleton failure case, but would like 
to consider this carefully, and discuss the ramifications (which may or may not 
be tolerable) before pursuing this particular aspect of the design.


On prioritization:
- if concurrency is enabled for an action, this is another prioritization 
aspect, of sorts - if the action supports concurrency, there is no reason 
(except for destruction coordination…) that it cannot be shared across shards. 
This could be added later, but may be worth considering since there is a 
general reuse problem where a series of activations that arrives at different 
ContainerRouters will create a new container in each, while they could be 
reused (and avoid creating new containers) if concurrency is tolerated in that 
container. This would only (ha ha) require changing how container destroy 
works, where it cannot be destroyed until the last ContainerRouter is done with 
it. And if container destruction is coordinated in this way to increase reuse, 
it would also be good to coordinate construction (don’t concurrently construct 
the same container for multiple containerRouters IFF a single container would 
enable concurrent activations once it is created). I’m not sure if others are 
desiring this level of container reuse, but if so, it would be worth 
considering these aspects (sharding/isolation vs sharing/coordination) as part 
of any redesign.


WDYT?

THanks
Tyson

On Aug 15, 2018, at 8:55 AM, Carlos Santana 
mailto:csantan...@gmail.com>> wrote:

I think we should add a section on prioritization for blocking vs. async
invokes (none blocking actions a triggers)

The front door has the luxury of known some intent from the incoming
request, I feel it would make sense to high priority to blocking invokes,
and for async they go straight to the queue to be pick up by the system to
eventually run, even if it takes 10 times longer to execute than a blocking
invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
async webhook takes 100ms.

Also the controller takes time to convert a trigger and process the rules,
this is something that can also be taken out of hot path.

So I'm just saying we could optimize the system because we know if the
incoming request is a hot or hotter path :-)

-- Carlos




Re: logging baby step -- worth pursuing?

2018-08-16 Thread David P Grove


Tyson Norris  wrote on 08/15/2018 08:29:48 PM:
>
> FWIW This won’t help with concurrent activations since the logs from
> concurrent activations will be interleaved (I think Dave was not
> suggesting to use this for concurrent activations). It will only
> help in the case where log processing is done outside of the
> invoker, and logs are not interleaved from multiple activations.

Agreed.  This is the case I'm attempting to optimize for on Kubernetes.
Kube clusters typically already have an external logging service deployed
and the current KubernetesContainerFactory log processing by the Invoker is
a major performance bottleneck.

> I’m not sure having a start sentinel is simpler than just including
> the activation id in the existing sentinel line (end of log segment,
> not the beginning), but it would be probably simpler to read for a human.

I think it makes offline processing to add the activationId to every
logline more efficient because you will already have the activationId in
hand when you read each "real" log line.

> If people use blackbox actions, and if blackbox containers have
> different log collection than managed actions, I think that would be
> a reason to not do anything until there is better support for
> structured logging, since if you are still using invoker to collect
> blackbox logs, you might as well use it to collect all logs? It may
> be that majority log collection is not blackbox so you could get
> some efficiencies there, but the added mess of multiple log
> collection approaches may bring different problems (my logs behave
> different for different types of actions, etc).

On large deployments, we segregate blackbox/non-blackbox actions to
different invokers.  So if all the non-blackbox runtimes are updated to
have start sentinels, those invokers can be cut out of the log processing
entirely.  It's a slightly more complex deployment configuration, but I
think it will be worth it (at least on Kubernetes).

>
> One option might be to allow the /init endpoint to return some
> details about the container image, so that it can hint how it
> expects logs to be handled (if at all) at the invoker - currently /
> init response is only interpreted in case of a non-200 response.
> This same approach may be useful for other optional facilities like
> support of concurrency or gpu, where the container can signal it’s
> support and fail early if there is a mismatch with the action being
> executed. This would not resolve the different behavior problem, but
> would provide a smooth transition for older blackbox images.
>

An interesting idea.

--dave


Re: logging baby step -- worth pursuing?

2018-08-16 Thread David P Grove

This was a pretty simple change, so to make things concrete I have PRs with
a prototype of the enabling change in the invoker [1] and a change to the
nodejs runtime to emit the start sentinels [2].

If we go ahead with this design, here's an example from an action that
writes one line to stdout and no lines to stderr:


stdout stream:
XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id
cafca5b74be94eb8bca5b74be9beb80f for namespace guest
Here's a friendly message
XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX


stderr stream:
XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id
cafca5b74be94eb8bca5b74be9beb80f for namespace guest
XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX


It should be really straightforward to write a streaming agent that reads
the json formatted logstreams and uses the START sentinels to keep track of
the activationId that should be injected into each logline and the
namespace to associate that logline with to the platform logging service.
Arguably the namespace information is redundant since all activations that
run in the container belong to the same namespace, but it seemed like
including it could make the processing marginally simpler.


Pausing for feedback before doing any more of the runtimes...


--dave


[1] https://github.com/apache/incubator-openwhisk/pull/3974


[2] https://github.com/apache/incubator-openwhisk-runtime-nodejs/pull/81




Re: Proposal on a future architecture of OpenWhisk

2018-08-16 Thread Michael Marth
Thanks, Markus!

On 15.08.18, 16:30, "Markus Thömmes"  wrote:

Hi Michael,

loosing/adding a shard is essentially reconciled by the ContainerManager.
As it keeps track of all the ContainerRouters in the system, it can also
observe one going down/crashing or one coming up and joining the "cluster".

If one Router leaves the cluster, the ContainerManager knows which
containers where "managed" by that router and redistributes them across the
Routers left in the system.
If one Router joins the cluster, we can try to rebalance containers to take
load off existing ones. Precise algorithm to be defined but the primitives
should be in place to be able to do that.

Does that answer the question?

Cheers,
Markus

Am Mi., 15. Aug. 2018 um 16:18 Uhr schrieb Michael Marth
:

> Markus,
>
> I agree with your preference of making the state sharded instead of
> distributed. (not only for the scalability reasons you quote but also for
> operational concerns).
> What are your thoughts about losing a shard (planned or crashed) or adding
> a shard?
>
> Michael
>
>
> On 15.08.18, 09:58, "Markus Thömmes"  wrote:
>
> Hi Dragos,
>
> thanks for your questions, good discussion :)
>
> Am Di., 14. Aug. 2018 um 23:42 Uhr schrieb Dragos Dascalita Haut
> :
>
> > Markus, I appreciate the enhancements you mentioned in the wiki, and
> I'm
> > very much inline with the ideas you brought in there.
> >
> >
> >
> > "...having the ContainerManager be a cluster singleton..."
> >
> > I was just in process to reply with the same idea :)
> >
> > In addition, I was thinking we can leverage Akka Distributed Data
> [1] to
> > keep all ContainerRouter actors eventually consistent. When creating
> a new
> > container, the ContainerManager can write with a consistency
> "WriteAll"; it
> > would be a little slower but it would improve consistency.
> >
>
> I think we need to quantify "a little slower". Note that "WriteAll"
> becomes
> slower and slower the more actors you add to the cluster. Scalability
> is at
> question then.
>
> Of course scalability is also at question if we make the
> ContainerManager a
> singleton. The ContainerManager has a 1:1 relationship to the
> Kubernetes/Mesos scheduler. Do we know how those are distributed? I
> think
> the Kubernetes scheduler is a singleton, but I'll need to doublecheck
> on
> that.
>
> I can see the possibility to move the ContainerManager into each
> Router and
> have them communicate with each other to shard in the same way I'm
> proposing. As Dave is hitting on the very same points, I get the
> feeling we
> should/could breakout that specific discussion if we can agree on some
> basic premises of the design (see my answers on the thread with Dave).
> WDYT?
>
>
> >
> >
> > The "edge-case" isn't clear to me b/c I'm coming from the assumption
> that
> > it doesn't matter which ContainerRouter handles the next request,
> given
> > that all actors have the same data. Maybe you can help me understand
> better
> > the edge-case ?
> >
>
> ContainerRouters do not have the same state specifically. The
> live-concurrency on a container is potentially very fast changing 
data.
> Sharing that across a potentially unbounded number of routers is not
> viable
> performance wise.
>
> Hence the premise is to manage that state locally and essentially
> shard the
> list of available containers between all routers, so each of them can
> keep
> its respective state local.
>
>
> >
> >
> > Re Knative approach, can you expand why the execution layer/data
> plane
> > would be replaced entirely by Knative serving ? I think knative
> serving
> > handles very well some cases like API requests, but it's not
> designed to
> > guarantee concurrency restrictions like "1 request at a time per
> container"
> > - something that AI Actions need.
> >
>
> You are right... today! I'm not saying Knative is necessarily a
> superior
> backend for OpenWhisk as it stands today. All I'm saying is that from
> an
> architecture point-of-view, Knative serving replaces all of the
> concerns
> that the execution layer has.
>
>
> >
> >
> > Thanks,
> >
> > dragos
> >
> >
> > [1] - https://doc.akka.io/docs/akka/2.5/distributed-data.html
> >
>