Re: Hello and Suggested Technical Direction for Mesos

2021-09-12 Thread John Siegrist
Hi Qian,

There’s a deployment time process, where the user uploads their functions to 
the serverless platform. At that time, the system packages/containerizes the 
function code and stores the container (or lightweight VM) in a registry. From 
there, a serverless endpoint is setup and linked to your containerized 
function. This function upload step works very similar to how Heroku PaaS 
operates, where the service platform runs a build step before launching your 
application.

Since sending you the first email, I have since learned that AWS Lambda service 
uses something called Firecracker which is using a containerization technology 
that is closer to VMs than docker containers. Hence, the reference to 
lightweight VM above.

Kind regards,
John Siegrist

> On 12 Sep 2021, at 11:24 pm, Qian Zhang  wrote:
> 
> Thanks John for the detailed info which really helps me understand the
> requirements better.
> 
> For each request, the serverless runtime launches one copy of the
>> containerized function.
> 
> 
> Can you please elaborate a bit more on this? What did you mean for the
> `containerized function`? Is it just a normal function call inside the
> container which packages the supported language runtime and the function or
> an on-demand short-live container dedicatedly launched for each request?
> 
> 
> Regards,
> Qian Zhang
> 
> 
>> On Wed, Sep 8, 2021 at 9:30 PM John Siegrist  wrote:
>> 
>> Hi Qian,
>> 
>> After looking into the open source options for widely-used serverless
>> frameworks, I noted these five:
>> * Oracle-backed Fn: https://fnproject.io/
>> * Kubeless: https://kubeless.io/
>> * Fission: https://fission.io/
>> * OpenFaaS: https://www.openfaas.com/
>> * Apache OpenWhisk: https://openwhisk.apache.org/
>> 
>> Briefly looking at the OpenWhisk documentation, it says that Mesos is
>> already supported alongside Kubernetes and Docker Swarm. I haven’t studied
>> OpenWhisk so I don’t know if it works different than the mostly
>> Kubernetes-based projects in this space.
>> 
>> The way most of these ‘function as a service’ frameworks work is that you
>> provide the function in one of the supported languages, and the system then
>> packages your function with an appropriate runtime into a container. From
>> there, the serverless runtime waits for a request to come in on the service
>> endpoint. For each request, the serverless runtime launches one copy of the
>> containerized function. Functions are supposed to run for extremely short
>> durations, and the serverless runtime meters the running time. You’re
>> billed by the second or fraction of a second for function execution time.
>> For one of these services you are running on your own infrastructure, then
>> obviously there isn’t a billing component like exists with the cloud
>> providers.
>> 
>> None of the major cloud providers have a good solution for using
>> serverless functions that require persistent state. Each function
>> invocation terminates when it’s finished, so nothing persists. Any state
>> needs to be looked up or persisted to external storage. So far, I haven’t
>> seen any good solution proposed for how to do this better. When you’re
>> being charged by the second, you don’t really want your functions to have
>> to query and fetch state over the network before being able to respond to
>> inbound requests, but this is how it currently has to work.
>> 
>> Even if Mesos (and some new framework) doesn’t solve the stateful
>> serverless version of this problem, it is still beneficial if the workload
>> isolation between serverless functions is just as good/secure as is
>> available on Kubernetes but without the overheads for function
>> containerization and function container startup times for each inbound
>> request.
>> 
>> I understand what you said about some of the answer being in the
>> scheduling framework rather than in Mesos itself. This Mesos framework
>> would need to make a runtime decision about where to schedule the function
>> execution based on the inbound request and where the data accessed by the
>> function is distributed across the Mesos cluster. Of course, if the
>> function itself has to be transferred across the network as part of the
>> response, that just adds more to the function startup time.
>> 
>> Kind regards,
>> John
>> 
 On 7 Sep 2021, at 10:50 pm, Qian Zhang  wrote:
>>> 
>>> Hi John,
>>> 
>>> Thanks for your suggestion!
>>> 
 That is, how do you run serverless workloads that require access to
>>> persistent data and how do you schedule your serverless functions so that
>>> they execute with good data locality to ensure decent performance.
>>> 
>>> If you are talking about the workload scheduling, I think it should be
>>> handled by frameworks rather than Mesos. As we all know, Mesos has a
>>> two-level scheduling mechanism where Mesos master will do the resource
>>> scheduling for the frameworks running on top of it, and each framework
>> will
>>> do the workload scheduling after it 

Re: Hello and Suggested Technical Direction for Mesos

2021-09-12 Thread Qian Zhang
Thanks John for the detailed info which really helps me understand the
requirements better.

For each request, the serverless runtime launches one copy of the
> containerized function.


Can you please elaborate a bit more on this? What did you mean for the
`containerized function`? Is it just a normal function call inside the
container which packages the supported language runtime and the function or
an on-demand short-live container dedicatedly launched for each request?


Regards,
Qian Zhang


On Wed, Sep 8, 2021 at 9:30 PM John Siegrist  wrote:

> Hi Qian,
>
> After looking into the open source options for widely-used serverless
> frameworks, I noted these five:
> * Oracle-backed Fn: https://fnproject.io/
> * Kubeless: https://kubeless.io/
> * Fission: https://fission.io/
> * OpenFaaS: https://www.openfaas.com/
> * Apache OpenWhisk: https://openwhisk.apache.org/
>
> Briefly looking at the OpenWhisk documentation, it says that Mesos is
> already supported alongside Kubernetes and Docker Swarm. I haven’t studied
> OpenWhisk so I don’t know if it works different than the mostly
> Kubernetes-based projects in this space.
>
> The way most of these ‘function as a service’ frameworks work is that you
> provide the function in one of the supported languages, and the system then
> packages your function with an appropriate runtime into a container. From
> there, the serverless runtime waits for a request to come in on the service
> endpoint. For each request, the serverless runtime launches one copy of the
> containerized function. Functions are supposed to run for extremely short
> durations, and the serverless runtime meters the running time. You’re
> billed by the second or fraction of a second for function execution time.
> For one of these services you are running on your own infrastructure, then
> obviously there isn’t a billing component like exists with the cloud
> providers.
>
> None of the major cloud providers have a good solution for using
> serverless functions that require persistent state. Each function
> invocation terminates when it’s finished, so nothing persists. Any state
> needs to be looked up or persisted to external storage. So far, I haven’t
> seen any good solution proposed for how to do this better. When you’re
> being charged by the second, you don’t really want your functions to have
> to query and fetch state over the network before being able to respond to
> inbound requests, but this is how it currently has to work.
>
> Even if Mesos (and some new framework) doesn’t solve the stateful
> serverless version of this problem, it is still beneficial if the workload
> isolation between serverless functions is just as good/secure as is
> available on Kubernetes but without the overheads for function
> containerization and function container startup times for each inbound
> request.
>
> I understand what you said about some of the answer being in the
> scheduling framework rather than in Mesos itself. This Mesos framework
> would need to make a runtime decision about where to schedule the function
> execution based on the inbound request and where the data accessed by the
> function is distributed across the Mesos cluster. Of course, if the
> function itself has to be transferred across the network as part of the
> response, that just adds more to the function startup time.
>
> Kind regards,
> John
>
> > On 7 Sep 2021, at 10:50 pm, Qian Zhang  wrote:
> >
> > Hi John,
> >
> > Thanks for your suggestion!
> >
> >> That is, how do you run serverless workloads that require access to
> > persistent data and how do you schedule your serverless functions so that
> > they execute with good data locality to ensure decent performance.
> >
> > If you are talking about the workload scheduling, I think it should be
> > handled by frameworks rather than Mesos. As we all know, Mesos has a
> > two-level scheduling mechanism where Mesos master will do the resource
> > scheduling for the frameworks running on top of it, and each framework
> will
> > do the workload scheduling after it receives the resources offers from
> > Mesos master. Could you please elaborate a bit more on the specific
> > requirements for Mesos to support serverless workload?
> >
> >
> > Regards,
> > Qian Zhang
> >
> >
> >> On Tue, Sep 7, 2021 at 8:27 PM John Siegrist 
> wrote:
> >>
> >> Hello All,
> >>
> >> In going through the mail archive before subscribing to this list, it
> >> seems there have been a number of discussions around what Mesos should
> do
> >> as a project. One use case that might be worth considering is
> ‘serverless’
> >> workloads. This would be something where the Kubernetes containerization
> >> doesn’t provide any advantages, and to some extent may actually be a
> >> hindrance (slower function startup times as the container spins up).
> >>
> >> In particular, there is an open problem having to do with supporting
> >> stateful serverless workloads. That is, how do you run serverless
> workloads
> >> that 

Re: Hello and Suggested Technical Direction for Mesos

2021-09-08 Thread John Siegrist
Hi Qian,

After looking into the open source options for widely-used serverless 
frameworks, I noted these five:
* Oracle-backed Fn: https://fnproject.io/
* Kubeless: https://kubeless.io/
* Fission: https://fission.io/
* OpenFaaS: https://www.openfaas.com/
* Apache OpenWhisk: https://openwhisk.apache.org/

Briefly looking at the OpenWhisk documentation, it says that Mesos is already 
supported alongside Kubernetes and Docker Swarm. I haven’t studied OpenWhisk so 
I don’t know if it works different than the mostly Kubernetes-based projects in 
this space.

The way most of these ‘function as a service’ frameworks work is that you 
provide the function in one of the supported languages, and the system then 
packages your function with an appropriate runtime into a container. From 
there, the serverless runtime waits for a request to come in on the service 
endpoint. For each request, the serverless runtime launches one copy of the 
containerized function. Functions are supposed to run for extremely short 
durations, and the serverless runtime meters the running time. You’re billed by 
the second or fraction of a second for function execution time. For one of 
these services you are running on your own infrastructure, then obviously there 
isn’t a billing component like exists with the cloud providers.

None of the major cloud providers have a good solution for using serverless 
functions that require persistent state. Each function invocation terminates 
when it’s finished, so nothing persists. Any state needs to be looked up or 
persisted to external storage. So far, I haven’t seen any good solution 
proposed for how to do this better. When you’re being charged by the second, 
you don’t really want your functions to have to query and fetch state over the 
network before being able to respond to inbound requests, but this is how it 
currently has to work.

Even if Mesos (and some new framework) doesn’t solve the stateful serverless 
version of this problem, it is still beneficial if the workload isolation 
between serverless functions is just as good/secure as is available on 
Kubernetes but without the overheads for function containerization and function 
container startup times for each inbound request.

I understand what you said about some of the answer being in the scheduling 
framework rather than in Mesos itself. This Mesos framework would need to make 
a runtime decision about where to schedule the function execution based on the 
inbound request and where the data accessed by the function is distributed 
across the Mesos cluster. Of course, if the function itself has to be 
transferred across the network as part of the response, that just adds more to 
the function startup time.

Kind regards,
John

> On 7 Sep 2021, at 10:50 pm, Qian Zhang  wrote:
> 
> Hi John,
> 
> Thanks for your suggestion!
> 
>> That is, how do you run serverless workloads that require access to
> persistent data and how do you schedule your serverless functions so that
> they execute with good data locality to ensure decent performance.
> 
> If you are talking about the workload scheduling, I think it should be
> handled by frameworks rather than Mesos. As we all know, Mesos has a
> two-level scheduling mechanism where Mesos master will do the resource
> scheduling for the frameworks running on top of it, and each framework will
> do the workload scheduling after it receives the resources offers from
> Mesos master. Could you please elaborate a bit more on the specific
> requirements for Mesos to support serverless workload?
> 
> 
> Regards,
> Qian Zhang
> 
> 
>> On Tue, Sep 7, 2021 at 8:27 PM John Siegrist  wrote:
>> 
>> Hello All,
>> 
>> In going through the mail archive before subscribing to this list, it
>> seems there have been a number of discussions around what Mesos should do
>> as a project. One use case that might be worth considering is ‘serverless’
>> workloads. This would be something where the Kubernetes containerization
>> doesn’t provide any advantages, and to some extent may actually be a
>> hindrance (slower function startup times as the container spins up).
>> 
>> In particular, there is an open problem having to do with supporting
>> stateful serverless workloads. That is, how do you run serverless workloads
>> that require access to persistent data and how do you schedule your
>> serverless functions so that they execute with good data locality to ensure
>> decent performance. A good serverless solution would increase the relevance
>> of Mesos, and it is also a forward-looking direction that doesn’t try to
>> reclaim lost territory related to container orchestration. I don’t know how
>> much work would be needed to build function-as-a-service on Mesos, but
>> since Mesos is already quite good at hosting data workloads it may not
>> actually be all that difficult?
>> 
>> Kind regards,
>> John Siegrist


Re: Hello and Suggested Technical Direction for Mesos

2021-09-07 Thread Qian Zhang
Hi John,

Thanks for your suggestion!

> That is, how do you run serverless workloads that require access to
persistent data and how do you schedule your serverless functions so that
they execute with good data locality to ensure decent performance.

If you are talking about the workload scheduling, I think it should be
handled by frameworks rather than Mesos. As we all know, Mesos has a
two-level scheduling mechanism where Mesos master will do the resource
scheduling for the frameworks running on top of it, and each framework will
do the workload scheduling after it receives the resources offers from
Mesos master. Could you please elaborate a bit more on the specific
requirements for Mesos to support serverless workload?


Regards,
Qian Zhang


On Tue, Sep 7, 2021 at 8:27 PM John Siegrist  wrote:

> Hello All,
>
> In going through the mail archive before subscribing to this list, it
> seems there have been a number of discussions around what Mesos should do
> as a project. One use case that might be worth considering is ‘serverless’
> workloads. This would be something where the Kubernetes containerization
> doesn’t provide any advantages, and to some extent may actually be a
> hindrance (slower function startup times as the container spins up).
>
> In particular, there is an open problem having to do with supporting
> stateful serverless workloads. That is, how do you run serverless workloads
> that require access to persistent data and how do you schedule your
> serverless functions so that they execute with good data locality to ensure
> decent performance. A good serverless solution would increase the relevance
> of Mesos, and it is also a forward-looking direction that doesn’t try to
> reclaim lost territory related to container orchestration. I don’t know how
> much work would be needed to build function-as-a-service on Mesos, but
> since Mesos is already quite good at hosting data workloads it may not
> actually be all that difficult?
>
> Kind regards,
> John Siegrist