Re: Support deadline for tasks

2018-03-22 Thread James Peach


> On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
> 
> In our environment, we run a lot of batch jobs, some of which have tight 
> timeline. If any tasks in the job runs longer than x hours, it does not make 
> sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and 
> repeats every Monday. If the job does not finish before next Monday for 
> whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster 
> makes more sense as it makes the system more scalable and also makes our 
> centralized state machine simpler.
>  
> One idea I have right now is to add an  optional TimeInfo deadline to 
> TaskInfo field, and all default executors in Mesos can simply terminate the 
> task and send a proper StatusUpdate.
> 
> I summarized above idea in MESOS-8725.
> 
> Please let me know what you think. Thanks! 

This sounds both useful and simple to implement. I’m happy to shepherd if you’d 
like

J

Re: Communicate with a container while using Mesos unified container runtime

2018-03-22 Thread Gilbert Song
Hi Karan,

It does not seem to me that launching more mesos containers would add more
overheads.

If you want to achieve *docker exec* for debugging purpose, Mesos supports
that (not in Mesos CLI yet /cc Armand and Kevin), but you could still rely
on dc/os CLI
 to do
that given you have the taskId.

Gilbert

On Wed, Mar 21, 2018 at 12:08 PM, Karan Pradhan 
wrote:

>
>
> On 2018/03/21 18:06:48, Gilbert Song  wrote:
> > Hi Karan,
> >
> > Before figuring out some ways to achieve this with Mesos, I would like to
> > better understand your use cases.
> >
> > Do you mean you rely on `docker attach/exec` to send commands to an
> > existing running container?
> >
> > Is there any reason that keeps you from launching a container for each
> > batch job?
> >
> > Gilbert
> >
> > On Wed, Mar 21, 2018 at 10:29 AM, karanprad...@gmail.com <
> > karanprad...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I was docker for running my batch job in which I would follow this
> > > approach:
> > >
> > > 1. Start the docker container
> > > 2. Send commands to the running Docker container with the help of
> docker
> > > python client for each batch of objects.
> > > 3. After all the batches are processed by the docker, shut down the
> > > container.
> > >
> > > I wanted to achieve the same with the help of Mesos and Marathon to
> spin
> > > up containers and submit commands per batch.
> > > But looking a the documents it looks like that this behavior is not
> > > achievable as when Mesos spin up a Docker container with the help of
> Mesos
> > > containerizer and docker/runtime isolation you can submit only one
> command
> > > after which the Sesos framework is killed.
> > >
> > > It would be great if someone could point me to a way to achieve this
> using
> > > Mesos containerizer?
> > >
> > > Thanks,
> > > Karan
> > >
> >
> Hi Gilbert,
> Thanks for taking time answering my question.
>
> Yes as you mentioned I use docker exec to run commands in the container.
> There is no particular reason why we don't run new docker. Would that add
> overhead if I had multiple batches which need to be processed?
>
> Do you know if docker exec is possible on a mesos container running with
> docker/runtime isolation?
>
> Thanks,
> Karan
>


Mesos scalability

2018-03-22 Thread Karan Pradhan
Hi All,

I had the following questions:
1.
I was wondering if it is possible to have multiple Mesos masters as elected 
masters in a Mesos cluster so that the load can be balanced amongst the 
masters. Is there a way to achieve this?
In general, can there be a load balancer for the Mesos masters?

2.
I have seen spikes in the Mesos event queues while running spark SQL workloads 
with multiple stages. So I was wondering what is a better way to handle these 
scalability issues. I noticed that compute intensive machines were able to deal 
with those workloads better. Is there a particular hardware requirement or 
requirement for the number of masters for scaling a Mesos cluster horizontally? 
After reading success stories which mention that Mesos is deployed for ~10K 
machines, I was curious about the hardware used and the number of masters in 
this case. 

It would be awesome if I could get some insight into these questions.

Thanks,
Karan



Re: Mesos on OS X

2018-03-22 Thread Sunil Shah
Thanks all, that sounds promising! We'll probably give it a go and see how
that works out...

On Wed, Mar 21, 2018 at 9:32 PM, Jie Yu  wrote:

> There's no isolation between containers on OSX. Process management is
> based on posix process tree (unlike cgroups on Linux), which has some
> limitations.
>
> If you're fine with the above, then it should work.
>
> - Jie
>
> On Wed, Mar 21, 2018 at 9:23 PM, Benjamin Mahler 
> wrote:
>
>> MacOS is a supported platform, you can see the supported versions here:
>> http://mesos.apache.org/documentation/latest/building/
>>
>> The containerization maintainers could probably chime in to elaborate on
>> the isolation caveats. For example, you won't have many of the resource
>> isolators available and the launcher cannot prevent processes from
>> "escaping" from the "container".
>>
>> On Wed, Mar 21, 2018 at 11:37 AM Ken Sipe  wrote:
>>
>>> I don’t have long running experience but I would expect it to work fine…
>>> the thing to be aware of is that under OSX there are no cgroup constraints…
>>>  you also may want to review the APPLE difference:
>>> https://github.com/apache/mesos/search?utf8=%E2%9C%93=__APPLE__=
>>> 
>>>
>>> Ken
>>>
>>>
>>> On Mar 21, 2018, at 1:25 PM, Sunil Shah  wrote:
>>>
>>> Hey all,
>>>
>>> We're contemplating setting up a small OS X Mesos cluster for running
>>> iOS tests. I know Mesos technically builds on Macs, but has anyone ever had
>>> experience with a long running cluster on OS X? Is it possible?
>>> Recommended? Not recommended?
>>>
>>> Thanks,
>>>
>>> Sunil
>>>
>>>
>>>
>


Support deadline for tasks

2018-03-22 Thread Zhitao Li
In our environment, we run a lot of batch jobs, some of which have tight
timeline. If any tasks in the job runs longer than x hours, it does not
make sense to run it anymore.

For instance, a team would submit a job which builds a weekly index and
repeats every Monday. If the job does not finish before next Monday for
whatever reason, there is no point to keep any task running.

We believe that implementing deadline tracking distributed across our
cluster makes more sense as it makes the system more scalable and also
makes our centralized state machine simpler.

One idea I have right now is to add an  *optional* *TimeInfo deadline* to
TaskInfo field, and all default executors in Mesos can simply terminate the
task and send a proper *StatusUpdate.*

I summarized above idea in MESOS-8725
.

Please let me know what you think. Thanks!

-- 
Cheers,

Zhitao Li


Re: [Containerization WG] Call for Agenda, March 22nd, 2018

2018-03-22 Thread Gilbert Song
Seems like there is no agenda item for tomorrow's WG meeting. We will
cancel it at this time.

On Wed, Mar 21, 2018 at 10:30 AM, Gilbert Song  wrote:

> Folks,
>
> We are planning for a WG meeting tomorrow at 9 am PST.
>
> Please add any agenda item or topic that you like to discuss with the
> Containerization WG to the following list:
> https://docs.google.com/document/d/1z55a7tLZFoRWVuUxz1FZwgxkHeugt
> c2nHR89skFXSpU/edit#heading=h.j7quoqe53vwr
>
> Thanks,
> Gilbert
>