Re: Notes & Video posted from today's Tech. Interchange meeting

2018-08-15 Thread Dascalita Dragos
Matt, echoing again my thanks for such great notes. Not being able to
attend today, it was valuable to get the summary in between my flight
connections. Kudos !
On Wed, Aug 15, 2018 at 12:23 PM Matt Rutkowski  wrote:

> Thanks Ben for moderating a very full agenda with lots of good
> discussions.
>
> Notes:
>
> https://cwiki.apache.org/confluence/display/OPENWHISK/2018-08-15+OW+Tech+Interchange+-+Meeting+Notes
> Video: https://youtu.be/ZvESgK88TyQ
>
> Also, a thanks to Tyson for volunteering to be our next moderator.
>
> Cheers,
> mr
>
>


Re: logging baby step -- worth pursuing?

2018-08-15 Thread Tyson Norris
Hi - 
FWIW This won’t help with concurrent activations since the logs from concurrent 
activations will be interleaved (I think Dave was not suggesting to use this 
for concurrent activations). It will only help in the case where log processing 
is done outside of the invoker, and logs are not interleaved from multiple 
activations. 
I’m not sure having a start sentinel is simpler than just including the 
activation id in the existing sentinel line (end of log segment, not the 
beginning), but it would be probably simpler to read for a human.

If people use blackbox actions, and if blackbox containers have different log 
collection than managed actions, I think that would be a reason to not do 
anything until there is better support for structured logging, since if you are 
still using invoker to collect blackbox logs, you might as well use it to 
collect all logs? It may be that majority log collection is not blackbox so you 
could get some efficiencies there, but the added mess of multiple log 
collection approaches may bring different problems (my logs behave different 
for different types of actions, etc).

One option might be to allow the /init endpoint to return some details about 
the container image, so that it can hint how it expects logs to be handled (if 
at all) at the invoker - currently /init response is only interpreted in case 
of a non-200 response. This same approach may be useful for other optional 
facilities like support of concurrency or gpu, where the container can signal 
it’s support and fail early if there is a mismatch with the action being 
executed. This would not resolve the different behavior problem, but would 
provide a smooth transition for older blackbox images.

Thanks
Tyson

> On Aug 14, 2018, at 2:49 PM, Dragos Dascalita Haut 
>  wrote:
> 
> "...we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline..."
> 
> 
> +1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going 
> down this path as well. Having this natively supported by all OpenWhisk 
> runtimes can only make things easier.
> 
> 
> From: David P Grove 
> Sent: Tuesday, August 14, 2018 2:29:12 PM
> To: dev@openwhisk.apache.org
> Subject: logging baby step -- worth pursuing?
> 
> 
> 
> Even if we think structured logging is the right eventual goal, it could
> take a while to get there (especially since it is changing functionality
> users may have grown accustomed to).
> 
> However, for non-concurrent, non-blackbox runtimes we could make a small,
> not-user visible change, that could enable fully offline and streaming log
> processing.  We already generate an end-of-log sentinel to stdout/stderr
> for these runtimes.  If we also generated a start-of-log sentinel to
> stdout/stderr that included the activation id, we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline.
> 
> Is this worth pursuing?   I'm motivated to get log processing out of the
> Invoker/ContainerRouter so we can push ahead with some of the scheduler
> redesignwithout tackling logging, I don't think we'll be able to assess
> the true scalability potential of the new scheduling architectures.
> 
> --dave



Notes & Video posted from today's Tech. Interchange meeting

2018-08-15 Thread Matt Rutkowski
Thanks Ben for moderating a very full agenda with lots of good 
discussions.

Notes: 
https://cwiki.apache.org/confluence/display/OPENWHISK/2018-08-15+OW+Tech+Interchange+-+Meeting+Notes
Video: https://youtu.be/ZvESgK88TyQ

Also, a thanks to Tyson for volunteering to be our next moderator.

Cheers,
mr



Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th

2018-08-15 Thread Carlos Santana
I guess we ran out of time today, we can cover runtime and sdk changes on
the next meeting.

On Wed, Aug 15, 2018 at 9:54 AM Ben Browning  wrote:

> The only confirmed items on the agenda right now are a website update
> and knative update. So if you're on the fence about bringing up a
> discussion topic, we'll have plenty of available time!
>
> Thanks,
>
> Ben
>
>
> On Wed, Aug 15, 2018 at 9:42 AM Vadim Raskin 
> wrote:
> >
> > hi Ben,
> >
> > I won't be able to join, have an overlapping appointment.
> > Perhaps Carlos could say a couple of sentences on the topic.
> >
> > cheers, Vadim
> >
> > On Wed, Aug 15, 2018 at 12:56 AM Dragos Dascalita Haut
> >  wrote:
> >
> > > Unfortunately I'm in a flight during the meeting and I won't be able to
> > > attend.
> > >
> > >
> > > RE AI Actions I agree with Rodric: the time might have been a bit
> short.
> > > I'm incorporating the feedback I'm receiving into the wiki page and
> > > hopefully it will be in a better shape for our next meeting. If anyone
> has
> > > more thoughts pls let me know or update the wiki, or add comments
> directly.
> > >
> > >
> > > 
> > > From: Rodric Rabbah 
> > > Sent: Tuesday, August 14, 2018 6:23:24 AM
> > > To: dev@openwhisk.apache.org
> > > Subject: Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th
> > >
> > > Looks like a nice agenda. Thanks Ben for hosting this one. I can’t
> make it
> > > unfortunately but will catch the replay.
> > >
> > > Dragos’ AI actions might be another although maybe the runway too
> short.
> > >
> > > -r
> > >
> > > > On Aug 14, 2018, at 9:09 AM, Ben Browning 
> wrote:
> > > >
> > > > Greetings!
> > > >
> > > > Our next tech interchange call is Wednesday, August 15th at 11am US
> > > > Eastern - that's tomorrow! Use the attached .ics file if you'd like
> to
> > > > add a reminder to your calendar.
> > > >
> > > > Call details:
> > > > Web Meeting: Tech Interchange (bi-weekly):
> > > > - Day-Time: Wednesdays, 11AM EDT (Eastern US), 5PM CEST (Central
> Europe),
> > > > 3PM UTC, 11PM CST (Beijing)
> > > > - Zoom:
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Cddascal%40adobe.com%7C0e2a5f620c55405e73a708d601e92409%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636698498170347002sdata=U3zigzoOAhbhy0%2BjmdfNSJ%2BJtzHlAx%2BUiYF8hGazHp0%3Dreserved=0
> > > >
> > > > Based on recent mailing list and Slack discussions, here are some
> > > > proposed discussion topics. If you'd like to speak about one of these
> > > > or have another topic, please email or message me on Slack before our
> > > > meeting. I'll send out an updated agenda shortly before the meeting
> > > > tomorrow.
> > > >
> > > > * 0.9.0 release update (Vincent?)
> > > > * website update (Matt & Priti?)
> > > > * knative update (Ben or Markus)
> > > > * system env vars in user containers (Vadim, Markus, Rodric, Chetan,
> > > > Carlos, Tyson, Dragos?)
> > > > * pluggable API gateways (Henry, Rodric, Dragos?)
> > > > * AI actions (Dragos?)
> > > > * BDD function and performance tests (Rahul, Martin, Markus?)
> > > > * Recap of recent notable changes (?)
> > > > * Anything else - let me know!
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > Ben
> > > > 
> > >
>


Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Carlos Santana
I think we should add a section on prioritization for blocking vs. async
invokes (none blocking actions a triggers)

The front door has the luxury of known some intent from the incoming
request, I feel it would make sense to high priority to blocking invokes,
and for async they go straight to the queue to be pick up by the system to
eventually run, even if it takes 10 times longer to execute than a blocking
invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
async webhook takes 100ms.

Also the controller takes time to convert a trigger and process the rules,
this is something that can also be taken out of hot path.

So I'm just saying we could optimize the system because we know if the
incoming request is a hot or hotter path :-)

-- Carlos




On Wed, Aug 15, 2018 at 10:30 AM Markus Thömmes 
wrote:

> Hi Michael,
>
> loosing/adding a shard is essentially reconciled by the ContainerManager.
> As it keeps track of all the ContainerRouters in the system, it can also
> observe one going down/crashing or one coming up and joining the "cluster".
>
> If one Router leaves the cluster, the ContainerManager knows which
> containers where "managed" by that router and redistributes them across the
> Routers left in the system.
> If one Router joins the cluster, we can try to rebalance containers to take
> load off existing ones. Precise algorithm to be defined but the primitives
> should be in place to be able to do that.
>
> Does that answer the question?
>
> Cheers,
> Markus
>
> Am Mi., 15. Aug. 2018 um 16:18 Uhr schrieb Michael Marth
> :
>
> > Markus,
> >
> > I agree with your preference of making the state sharded instead of
> > distributed. (not only for the scalability reasons you quote but also for
> > operational concerns).
> > What are your thoughts about losing a shard (planned or crashed) or
> adding
> > a shard?
> >
> > Michael
> >
> >
> > On 15.08.18, 09:58, "Markus Thömmes"  wrote:
> >
> > Hi Dragos,
> >
> > thanks for your questions, good discussion :)
> >
> > Am Di., 14. Aug. 2018 um 23:42 Uhr schrieb Dragos Dascalita Haut
> > :
> >
> > > Markus, I appreciate the enhancements you mentioned in the wiki,
> and
> > I'm
> > > very much inline with the ideas you brought in there.
> > >
> > >
> > >
> > > "...having the ContainerManager be a cluster singleton..."
> > >
> > > I was just in process to reply with the same idea :)
> > >
> > > In addition, I was thinking we can leverage Akka Distributed Data
> > [1] to
> > > keep all ContainerRouter actors eventually consistent. When
> creating
> > a new
> > > container, the ContainerManager can write with a consistency
> > "WriteAll"; it
> > > would be a little slower but it would improve consistency.
> > >
> >
> > I think we need to quantify "a little slower". Note that "WriteAll"
> > becomes
> > slower and slower the more actors you add to the cluster. Scalability
> > is at
> > question then.
> >
> > Of course scalability is also at question if we make the
> > ContainerManager a
> > singleton. The ContainerManager has a 1:1 relationship to the
> > Kubernetes/Mesos scheduler. Do we know how those are distributed? I
> > think
> > the Kubernetes scheduler is a singleton, but I'll need to doublecheck
> > on
> > that.
> >
> > I can see the possibility to move the ContainerManager into each
> > Router and
> > have them communicate with each other to shard in the same way I'm
> > proposing. As Dave is hitting on the very same points, I get the
> > feeling we
> > should/could breakout that specific discussion if we can agree on
> some
> > basic premises of the design (see my answers on the thread with
> Dave).
> > WDYT?
> >
> >
> > >
> > >
> > > The "edge-case" isn't clear to me b/c I'm coming from the
> assumption
> > that
> > > it doesn't matter which ContainerRouter handles the next request,
> > given
> > > that all actors have the same data. Maybe you can help me
> understand
> > better
> > > the edge-case ?
> > >
> >
> > ContainerRouters do not have the same state specifically. The
> > live-concurrency on a container is potentially very fast changing
> data.
> > Sharing that across a potentially unbounded number of routers is not
> > viable
> > performance wise.
> >
> > Hence the premise is to manage that state locally and essentially
> > shard the
> > list of available containers between all routers, so each of them can
> > keep
> > its respective state local.
> >
> >
> > >
> > >
> > > Re Knative approach, can you expand why the execution layer/data
> > plane
> > > would be replaced entirely by Knative serving ? I think knative
> > serving
> > > handles very well some cases like API requests, but it's not
> > designed to
> > > guarantee concurrency restrictions like "1 request at a time per
> > container"
> > > - something that AI 

Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Markus Thömmes
Hi Michael,

loosing/adding a shard is essentially reconciled by the ContainerManager.
As it keeps track of all the ContainerRouters in the system, it can also
observe one going down/crashing or one coming up and joining the "cluster".

If one Router leaves the cluster, the ContainerManager knows which
containers where "managed" by that router and redistributes them across the
Routers left in the system.
If one Router joins the cluster, we can try to rebalance containers to take
load off existing ones. Precise algorithm to be defined but the primitives
should be in place to be able to do that.

Does that answer the question?

Cheers,
Markus

Am Mi., 15. Aug. 2018 um 16:18 Uhr schrieb Michael Marth
:

> Markus,
>
> I agree with your preference of making the state sharded instead of
> distributed. (not only for the scalability reasons you quote but also for
> operational concerns).
> What are your thoughts about losing a shard (planned or crashed) or adding
> a shard?
>
> Michael
>
>
> On 15.08.18, 09:58, "Markus Thömmes"  wrote:
>
> Hi Dragos,
>
> thanks for your questions, good discussion :)
>
> Am Di., 14. Aug. 2018 um 23:42 Uhr schrieb Dragos Dascalita Haut
> :
>
> > Markus, I appreciate the enhancements you mentioned in the wiki, and
> I'm
> > very much inline with the ideas you brought in there.
> >
> >
> >
> > "...having the ContainerManager be a cluster singleton..."
> >
> > I was just in process to reply with the same idea :)
> >
> > In addition, I was thinking we can leverage Akka Distributed Data
> [1] to
> > keep all ContainerRouter actors eventually consistent. When creating
> a new
> > container, the ContainerManager can write with a consistency
> "WriteAll"; it
> > would be a little slower but it would improve consistency.
> >
>
> I think we need to quantify "a little slower". Note that "WriteAll"
> becomes
> slower and slower the more actors you add to the cluster. Scalability
> is at
> question then.
>
> Of course scalability is also at question if we make the
> ContainerManager a
> singleton. The ContainerManager has a 1:1 relationship to the
> Kubernetes/Mesos scheduler. Do we know how those are distributed? I
> think
> the Kubernetes scheduler is a singleton, but I'll need to doublecheck
> on
> that.
>
> I can see the possibility to move the ContainerManager into each
> Router and
> have them communicate with each other to shard in the same way I'm
> proposing. As Dave is hitting on the very same points, I get the
> feeling we
> should/could breakout that specific discussion if we can agree on some
> basic premises of the design (see my answers on the thread with Dave).
> WDYT?
>
>
> >
> >
> > The "edge-case" isn't clear to me b/c I'm coming from the assumption
> that
> > it doesn't matter which ContainerRouter handles the next request,
> given
> > that all actors have the same data. Maybe you can help me understand
> better
> > the edge-case ?
> >
>
> ContainerRouters do not have the same state specifically. The
> live-concurrency on a container is potentially very fast changing data.
> Sharing that across a potentially unbounded number of routers is not
> viable
> performance wise.
>
> Hence the premise is to manage that state locally and essentially
> shard the
> list of available containers between all routers, so each of them can
> keep
> its respective state local.
>
>
> >
> >
> > Re Knative approach, can you expand why the execution layer/data
> plane
> > would be replaced entirely by Knative serving ? I think knative
> serving
> > handles very well some cases like API requests, but it's not
> designed to
> > guarantee concurrency restrictions like "1 request at a time per
> container"
> > - something that AI Actions need.
> >
>
> You are right... today! I'm not saying Knative is necessarily a
> superior
> backend for OpenWhisk as it stands today. All I'm saying is that from
> an
> architecture point-of-view, Knative serving replaces all of the
> concerns
> that the execution layer has.
>
>
> >
> >
> > Thanks,
> >
> > dragos
> >
> >
> > [1] - https://doc.akka.io/docs/akka/2.5/distributed-data.html
> >
> >
> > 
> > From: David P Grove 
> > Sent: Tuesday, August 14, 2018 2:15:13 PM
> > To: dev@openwhisk.apache.org
> > Subject: Re: Proposal on a future architecture of OpenWhisk
> >
> >
> >
> >
> > "Markus Thömmes"  wrote on 08/14/2018
> 10:06:49
> > AM:
> > >
> > > I just published a revision on the initial proposal I made. I
> still owe a
> > > lot of sequence diagrams for the container distribution, sorry for
> taking
> > > so long on that, I'm working on it.
> > >
> > > I did include a clear seperation of concerns into 

Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Michael Marth
Markus,

I agree with your preference of making the state sharded instead of 
distributed. (not only for the scalability reasons you quote but also for 
operational concerns).
What are your thoughts about losing a shard (planned or crashed) or adding a 
shard?

Michael


On 15.08.18, 09:58, "Markus Thömmes"  wrote:

Hi Dragos,

thanks for your questions, good discussion :)

Am Di., 14. Aug. 2018 um 23:42 Uhr schrieb Dragos Dascalita Haut
:

> Markus, I appreciate the enhancements you mentioned in the wiki, and I'm
> very much inline with the ideas you brought in there.
>
>
>
> "...having the ContainerManager be a cluster singleton..."
>
> I was just in process to reply with the same idea :)
>
> In addition, I was thinking we can leverage Akka Distributed Data [1] to
> keep all ContainerRouter actors eventually consistent. When creating a new
> container, the ContainerManager can write with a consistency "WriteAll"; 
it
> would be a little slower but it would improve consistency.
>

I think we need to quantify "a little slower". Note that "WriteAll" becomes
slower and slower the more actors you add to the cluster. Scalability is at
question then.

Of course scalability is also at question if we make the ContainerManager a
singleton. The ContainerManager has a 1:1 relationship to the
Kubernetes/Mesos scheduler. Do we know how those are distributed? I think
the Kubernetes scheduler is a singleton, but I'll need to doublecheck on
that.

I can see the possibility to move the ContainerManager into each Router and
have them communicate with each other to shard in the same way I'm
proposing. As Dave is hitting on the very same points, I get the feeling we
should/could breakout that specific discussion if we can agree on some
basic premises of the design (see my answers on the thread with Dave). WDYT?


>
>
> The "edge-case" isn't clear to me b/c I'm coming from the assumption that
> it doesn't matter which ContainerRouter handles the next request, given
> that all actors have the same data. Maybe you can help me understand 
better
> the edge-case ?
>

ContainerRouters do not have the same state specifically. The
live-concurrency on a container is potentially very fast changing data.
Sharing that across a potentially unbounded number of routers is not viable
performance wise.

Hence the premise is to manage that state locally and essentially shard the
list of available containers between all routers, so each of them can keep
its respective state local.


>
>
> Re Knative approach, can you expand why the execution layer/data plane
> would be replaced entirely by Knative serving ? I think knative serving
> handles very well some cases like API requests, but it's not designed to
> guarantee concurrency restrictions like "1 request at a time per 
container"
> - something that AI Actions need.
>

You are right... today! I'm not saying Knative is necessarily a superior
backend for OpenWhisk as it stands today. All I'm saying is that from an
architecture point-of-view, Knative serving replaces all of the concerns
that the execution layer has.


>
>
> Thanks,
>
> dragos
>
>
> [1] - https://doc.akka.io/docs/akka/2.5/distributed-data.html
>
>
> 
> From: David P Grove 
> Sent: Tuesday, August 14, 2018 2:15:13 PM
> To: dev@openwhisk.apache.org
> Subject: Re: Proposal on a future architecture of OpenWhisk
>
>
>
>
> "Markus Thömmes"  wrote on 08/14/2018 10:06:49
> AM:
> >
> > I just published a revision on the initial proposal I made. I still owe 
a
> > lot of sequence diagrams for the container distribution, sorry for 
taking
> > so long on that, I'm working on it.
> >
> > I did include a clear seperation of concerns into the proposal, where
> > user-facing abstractions and the execution (loadbalacing, scaling) of
> > functions are loosely coupled. That enables us to exchange the execution
> > system while not changing anything in the Controllers at all (to an
> > extent). The interface to talk to the execution layer is HTTP.
> >
>
> Nice writeup!
>
> For me, the part of the design I'm wondering about is the separation of 
the
> ContainerManager and the ContainerRouter and having the ContainerManager 
by
> a cluster singleton. With Kubernetes blinders on, it seems more natural to
> me to fuse the ContainerManager into each of the ContainerRouter instances
> (since there is very little to the ContainerManager except (a) talking to
> Kubernetes and (b) keeping track of which Containers it has handed out to
> which ContainerRouters -- a task which is 

Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th

2018-08-15 Thread Ben Browning
The only confirmed items on the agenda right now are a website update
and knative update. So if you're on the fence about bringing up a
discussion topic, we'll have plenty of available time!

Thanks,

Ben


On Wed, Aug 15, 2018 at 9:42 AM Vadim Raskin  wrote:
>
> hi Ben,
>
> I won't be able to join, have an overlapping appointment.
> Perhaps Carlos could say a couple of sentences on the topic.
>
> cheers, Vadim
>
> On Wed, Aug 15, 2018 at 12:56 AM Dragos Dascalita Haut
>  wrote:
>
> > Unfortunately I'm in a flight during the meeting and I won't be able to
> > attend.
> >
> >
> > RE AI Actions I agree with Rodric: the time might have been a bit short.
> > I'm incorporating the feedback I'm receiving into the wiki page and
> > hopefully it will be in a better shape for our next meeting. If anyone has
> > more thoughts pls let me know or update the wiki, or add comments directly.
> >
> >
> > 
> > From: Rodric Rabbah 
> > Sent: Tuesday, August 14, 2018 6:23:24 AM
> > To: dev@openwhisk.apache.org
> > Subject: Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th
> >
> > Looks like a nice agenda. Thanks Ben for hosting this one. I can’t make it
> > unfortunately but will catch the replay.
> >
> > Dragos’ AI actions might be another although maybe the runway too short.
> >
> > -r
> >
> > > On Aug 14, 2018, at 9:09 AM, Ben Browning  wrote:
> > >
> > > Greetings!
> > >
> > > Our next tech interchange call is Wednesday, August 15th at 11am US
> > > Eastern - that's tomorrow! Use the attached .ics file if you'd like to
> > > add a reminder to your calendar.
> > >
> > > Call details:
> > > Web Meeting: Tech Interchange (bi-weekly):
> > > - Day-Time: Wednesdays, 11AM EDT (Eastern US), 5PM CEST (Central Europe),
> > > 3PM UTC, 11PM CST (Beijing)
> > > - Zoom:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Cddascal%40adobe.com%7C0e2a5f620c55405e73a708d601e92409%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636698498170347002sdata=U3zigzoOAhbhy0%2BjmdfNSJ%2BJtzHlAx%2BUiYF8hGazHp0%3Dreserved=0
> > >
> > > Based on recent mailing list and Slack discussions, here are some
> > > proposed discussion topics. If you'd like to speak about one of these
> > > or have another topic, please email or message me on Slack before our
> > > meeting. I'll send out an updated agenda shortly before the meeting
> > > tomorrow.
> > >
> > > * 0.9.0 release update (Vincent?)
> > > * website update (Matt & Priti?)
> > > * knative update (Ben or Markus)
> > > * system env vars in user containers (Vadim, Markus, Rodric, Chetan,
> > > Carlos, Tyson, Dragos?)
> > > * pluggable API gateways (Henry, Rodric, Dragos?)
> > > * AI actions (Dragos?)
> > > * BDD function and performance tests (Rahul, Martin, Markus?)
> > > * Recap of recent notable changes (?)
> > > * Anything else - let me know!
> > >
> > >
> > > Thanks!
> > >
> > > Ben
> > > 
> >


Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th

2018-08-15 Thread Vadim Raskin
hi Ben,

I won't be able to join, have an overlapping appointment.
Perhaps Carlos could say a couple of sentences on the topic.

cheers, Vadim

On Wed, Aug 15, 2018 at 12:56 AM Dragos Dascalita Haut
 wrote:

> Unfortunately I'm in a flight during the meeting and I won't be able to
> attend.
>
>
> RE AI Actions I agree with Rodric: the time might have been a bit short.
> I'm incorporating the feedback I'm receiving into the wiki page and
> hopefully it will be in a better shape for our next meeting. If anyone has
> more thoughts pls let me know or update the wiki, or add comments directly.
>
>
> 
> From: Rodric Rabbah 
> Sent: Tuesday, August 14, 2018 6:23:24 AM
> To: dev@openwhisk.apache.org
> Subject: Re: Reminder: Tech Interchange meeting tomorrow, Wed Aug 15th
>
> Looks like a nice agenda. Thanks Ben for hosting this one. I can’t make it
> unfortunately but will catch the replay.
>
> Dragos’ AI actions might be another although maybe the runway too short.
>
> -r
>
> > On Aug 14, 2018, at 9:09 AM, Ben Browning  wrote:
> >
> > Greetings!
> >
> > Our next tech interchange call is Wednesday, August 15th at 11am US
> > Eastern - that's tomorrow! Use the attached .ics file if you'd like to
> > add a reminder to your calendar.
> >
> > Call details:
> > Web Meeting: Tech Interchange (bi-weekly):
> > - Day-Time: Wednesdays, 11AM EDT (Eastern US), 5PM CEST (Central Europe),
> > 3PM UTC, 11PM CST (Beijing)
> > - Zoom:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Cddascal%40adobe.com%7C0e2a5f620c55405e73a708d601e92409%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636698498170347002sdata=U3zigzoOAhbhy0%2BjmdfNSJ%2BJtzHlAx%2BUiYF8hGazHp0%3Dreserved=0
> >
> > Based on recent mailing list and Slack discussions, here are some
> > proposed discussion topics. If you'd like to speak about one of these
> > or have another topic, please email or message me on Slack before our
> > meeting. I'll send out an updated agenda shortly before the meeting
> > tomorrow.
> >
> > * 0.9.0 release update (Vincent?)
> > * website update (Matt & Priti?)
> > * knative update (Ben or Markus)
> > * system env vars in user containers (Vadim, Markus, Rodric, Chetan,
> > Carlos, Tyson, Dragos?)
> > * pluggable API gateways (Henry, Rodric, Dragos?)
> > * AI actions (Dragos?)
> > * BDD function and performance tests (Rahul, Martin, Markus?)
> > * Recap of recent notable changes (?)
> > * Anything else - let me know!
> >
> >
> > Thanks!
> >
> > Ben
> > 
>


Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Bertrand Delacretaz
Hi Markus,

On Wed, Aug 15, 2018 at 11:27 AM Markus Thömmes
 wrote:
> ...From the ContainerRouter's
> PoV, a container is just an IP address + Port, so that concern is
> encapsulated in the ContainerManager...

Cool, thanks for confirming. This means the ContainerManager can do
whatever allocation makes sense, great!

-Bertrand


Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Markus Thömmes
Hi Bertrand,

that's indeed something I haven't thought about yet, but as you say, the
ContainerManager could support multiple backends at once and schedule a
container wherever it thinks it makes sense. From the ContainerRouter's
PoV, a container is just an IP address + Port, so that concern is
encapsulated in the ContainerManager.

Cheers,
Markus

Am Mi., 15. Aug. 2018 um 10:35 Uhr schrieb Bertrand Delacretaz <
bdelacre...@apache.org>:

> Hi,
>
> On Tue, Aug 14, 2018 at 4:07 PM Markus Thömmes
>  wrote:
> ...
> >
> https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future+architecture
> ...
>
> Very clear proposal, thank you! And thanks for bringing the discussion
> here.
>
> Is the ContainerManager meant to support multiple underlying
> orchestrators? I'm thinking of a use case where you want to segregate
> actions of a specific set of namespaces to a dedicated orchestrator.
> This can be useful for cases where people don't trust existing
> container isolation mechanisms.
>
> From my understanding it looks like this is covered, just wanted to check.
>
> -Bertrand
>
>
> -Bertrand
>


Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Bertrand Delacretaz
Hi,

On Tue, Aug 14, 2018 at 4:07 PM Markus Thömmes
 wrote:
...
> https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future+architecture
...

Very clear proposal, thank you! And thanks for bringing the discussion here.

Is the ContainerManager meant to support multiple underlying
orchestrators? I'm thinking of a use case where you want to segregate
actions of a specific set of namespaces to a dedicated orchestrator.
This can be useful for cases where people don't trust existing
container isolation mechanisms.

>From my understanding it looks like this is covered, just wanted to check.

-Bertrand


-Bertrand


Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Markus Thömmes
Hi Dragos,

thanks for your questions, good discussion :)

Am Di., 14. Aug. 2018 um 23:42 Uhr schrieb Dragos Dascalita Haut
:

> Markus, I appreciate the enhancements you mentioned in the wiki, and I'm
> very much inline with the ideas you brought in there.
>
>
>
> "...having the ContainerManager be a cluster singleton..."
>
> I was just in process to reply with the same idea :)
>
> In addition, I was thinking we can leverage Akka Distributed Data [1] to
> keep all ContainerRouter actors eventually consistent. When creating a new
> container, the ContainerManager can write with a consistency "WriteAll"; it
> would be a little slower but it would improve consistency.
>

I think we need to quantify "a little slower". Note that "WriteAll" becomes
slower and slower the more actors you add to the cluster. Scalability is at
question then.

Of course scalability is also at question if we make the ContainerManager a
singleton. The ContainerManager has a 1:1 relationship to the
Kubernetes/Mesos scheduler. Do we know how those are distributed? I think
the Kubernetes scheduler is a singleton, but I'll need to doublecheck on
that.

I can see the possibility to move the ContainerManager into each Router and
have them communicate with each other to shard in the same way I'm
proposing. As Dave is hitting on the very same points, I get the feeling we
should/could breakout that specific discussion if we can agree on some
basic premises of the design (see my answers on the thread with Dave). WDYT?


>
>
> The "edge-case" isn't clear to me b/c I'm coming from the assumption that
> it doesn't matter which ContainerRouter handles the next request, given
> that all actors have the same data. Maybe you can help me understand better
> the edge-case ?
>

ContainerRouters do not have the same state specifically. The
live-concurrency on a container is potentially very fast changing data.
Sharing that across a potentially unbounded number of routers is not viable
performance wise.

Hence the premise is to manage that state locally and essentially shard the
list of available containers between all routers, so each of them can keep
its respective state local.


>
>
> Re Knative approach, can you expand why the execution layer/data plane
> would be replaced entirely by Knative serving ? I think knative serving
> handles very well some cases like API requests, but it's not designed to
> guarantee concurrency restrictions like "1 request at a time per container"
> - something that AI Actions need.
>

You are right... today! I'm not saying Knative is necessarily a superior
backend for OpenWhisk as it stands today. All I'm saying is that from an
architecture point-of-view, Knative serving replaces all of the concerns
that the execution layer has.


>
>
> Thanks,
>
> dragos
>
>
> [1] - https://doc.akka.io/docs/akka/2.5/distributed-data.html
>
>
> 
> From: David P Grove 
> Sent: Tuesday, August 14, 2018 2:15:13 PM
> To: dev@openwhisk.apache.org
> Subject: Re: Proposal on a future architecture of OpenWhisk
>
>
>
>
> "Markus Thömmes"  wrote on 08/14/2018 10:06:49
> AM:
> >
> > I just published a revision on the initial proposal I made. I still owe a
> > lot of sequence diagrams for the container distribution, sorry for taking
> > so long on that, I'm working on it.
> >
> > I did include a clear seperation of concerns into the proposal, where
> > user-facing abstractions and the execution (loadbalacing, scaling) of
> > functions are loosely coupled. That enables us to exchange the execution
> > system while not changing anything in the Controllers at all (to an
> > extent). The interface to talk to the execution layer is HTTP.
> >
>
> Nice writeup!
>
> For me, the part of the design I'm wondering about is the separation of the
> ContainerManager and the ContainerRouter and having the ContainerManager by
> a cluster singleton. With Kubernetes blinders on, it seems more natural to
> me to fuse the ContainerManager into each of the ContainerRouter instances
> (since there is very little to the ContainerManager except (a) talking to
> Kubernetes and (b) keeping track of which Containers it has handed out to
> which ContainerRouters -- a task which is eliminated if we fuse them).
>
> The main challenge is dealing with your "edge case" where the optimal
> number of containers to create to execute a function is less than the
> number of ContainerRouters.  I suspect this is actually an important case
> to handle well for large-scale deployments of OpenWhisk.  Having 20ish
> ContainerRouters on a large cluster seems plausible, and then we'd expect a
> long tail of functions where the optimal number of container instances is
> less than 20.
>
> I wonder if we can partially mitigate this problem by doing some amount of
> smart routing in the Controller.  For example, the first level of routing
> could be based on the kind of the action (nodejs:6, python, etc).  That
> could then vector to per-runtime ContainerRouters which 

Re: Proposal on a future architecture of OpenWhisk

2018-08-15 Thread Markus Thömmes
Hi Dave,

thanks a lot for your input! Greatly appreciated.

Am Di., 14. Aug. 2018 um 23:15 Uhr schrieb David P Grove :

>
>
>
> "Markus Thömmes"  wrote on 08/14/2018 10:06:49
> AM:
> >
> > I just published a revision on the initial proposal I made. I still owe a
> > lot of sequence diagrams for the container distribution, sorry for taking
> > so long on that, I'm working on it.
> >
> > I did include a clear seperation of concerns into the proposal, where
> > user-facing abstractions and the execution (loadbalacing, scaling) of
> > functions are loosely coupled. That enables us to exchange the execution
> > system while not changing anything in the Controllers at all (to an
> > extent). The interface to talk to the execution layer is HTTP.
> >
>
> Nice writeup!
>
> For me, the part of the design I'm wondering about is the separation of the
> ContainerManager and the ContainerRouter and having the ContainerManager by
> a cluster singleton. With Kubernetes blinders on, it seems more natural to
> me to fuse the ContainerManager into each of the ContainerRouter instances
> (since there is very little to the ContainerManager except (a) talking to
> Kubernetes and (b) keeping track of which Containers it has handed out to
> which ContainerRouters -- a task which is eliminated if we fuse them).
>

As you say below, the main concern is dealing with the edge-case I laid out.


>
> The main challenge is dealing with your "edge case" where the optimal
> number of containers to create to execute a function is less than the
> number of ContainerRouters.  I suspect this is actually an important case
> to handle well for large-scale deployments of OpenWhisk.  Having 20ish
> ContainerRouters on a large cluster seems plausible, and then we'd expect a
> long tail of functions where the optimal number of container instances is
> less than 20.
>

I agree, in large scale environments that might well be an important case.


>
> I wonder if we can partially mitigate this problem by doing some amount of
> smart routing in the Controller.  For example, the first level of routing
> could be based on the kind of the action (nodejs:6, python, etc).  That
> could then vector to per-runtime ContainerRouters which dynamically
> auto-scale based on load.  Since there doesn't have to be a fixed division
> of actual execution resources to each ContainerRouter this could work.  It
> also lets easily stemcells for multiple runtimes without worrying about
> wasting too many resources.
>

The premise I wanted to keep in my proposal is that you can route
essentially random between the routers. That's also why I use the overflow
queue as a work-stealing queue essentially to balance load between the
routers if the discrepancies get too high.

My general gut-feeling as to what can work here is: Keep state local as
long as you can (the individual ContainerRouters) to make the hot-path as
fast as possible. Fall back to work-stealing (slower, more constrained),
once things get out of bands.


>
> How do you want to deal with design alternatives?  Should I be adding to
> the wiki page?  Doing something else?
>

Good question. Feels like we can break out a "Routing" Work Group out of
this? Part of my proposal was to build this out collaboratively. Maybe we
can try to find consensus on some general points (direct HTTP connection to
containers should be part of it, we'll need an overflow queue) and once/if
we agree on the general broader picture, we can break out discussions on
individual aspects of it? Would that make sense?


>
> --dave
>