Re: Understanding Mesos Maintenance

2017-03-03 Thread Zameer Manji
Thanks for clearing that up.

I was accidentally setting a long refuse time.

On Fri, Mar 3, 2017 at 6:08 PM, Joseph Wu  wrote:

> Inverse offers have the same offer cycle as normal offers.  They can
> be Accepted/Declined with a timeout (default 5 seconds).
>
> On Fri, Mar 3, 2017 at 5:29 PM, Zameer Manji  wrote:
> > Ben,
> >
> > Thanks for responding to my questions. I have a follow up on #3.
> >
> > I have a framework which accepts inverse offers but does not do anything
> to
> > the associated tasks. I noticed that the framework **does not** receive
> > another inverse offer  within the allocation period. At what interval
> will
> > an inverse offer be resent to the framework if it was accepted? I took a
> > glance at `src/tests/master_maintenance_tests.cpp` and did not notice
> any
> > tests testing for this.
> >
> > Are you sure that inverse offers are resent after they have been accepted
> > but before the tasks are removed from the host?
> >
> >
> > On Thu, Mar 2, 2017 at 4:14 PM, Benjamin Mahler 
> wrote:
> >>
> >> Hey Zameer, great questions. Let us know if there's anything you think
> >> could be improved or documented better.
> >>
> >> Re 1:
> >>
> >> The 'Viewing maintenance status' section of the documentation should
> >> clarify this:
> >> http://mesos.apache.org/documentation/latest/maintenance/
> >>
> >> Re 2:
> >>
> >> Both of these sound reasonable but the scheduler should not accept the
> >> maintenance if it's not yet safe for the machine to be downed.
> Otherwise a
> >> task failure may be mistakenly interpreted as a go ahead to down the
> >> machine, despite the scheduler needing to get the task back running. If
> >> expensive or long running work needs to finish (e.g. migrate data,
> replace
> >> instances in a manner that doesn't violate SLA, etc.) then I would
> suggest
> >> waiting until the work completes safely before accepting.
> >>
> >> We likely need a third state like, TENTATIVELY_ACCEPT to signal to
> >> operators / mesos that the framework intends to comply, but hasn't
> finished
> >> whatever it needs to do yet for it to be safe to down the machine.
> >>
> >> Also, one of the challenges here is when to take the action. Should the
> >> scheduler prepare itself for maintenance as soon as it safely can? Or as
> >> late (but not too late!) as it safely can? If the scheduler runs
> >> long-running services, as soon as safely possible makes sense. If the
> >> scheduler runs short running batch jobs, as late as safely possible
> provides
> >> work-conservation.
> >>
> >> Re 3:
> >>
> >> The framework will receive another inverse offer if the framework still
> >> has resources allocated on that agent. If receiving a regular offer for
> >> available resources on the agent, an 'Unavailability' [1] will be
> included
> >> if the machine is scheduled for maintenance, so that the scheduler can
> be
> >> aware of the maintenance when placing new work.
> >>
> >> Re 4:
> >>
> >> It's not possible currently, and it's the operator's responsibility (the
> >> intention was for "operator" to be maintenance tooling). Ideally we can
> add
> >> automation of this decision into mesos, if decision criteria that is
> widely
> >> applicable can be established (e.g. if nothing is running and all
> relevant
> >> frameworks have accepted). Feel free to file a ticket for this or any
> other
> >> improvements!
> >>
> >> Ben
> >>
> >> [1]
> >> https://github.com/apache/mesos/blob/8f487beb9f8aaed8f27b040
> 4279b1a2f97672ba1/include/mesos/v1/mesos.proto#L1416-L1426
> >>
> >> On Wed, Mar 1, 2017 at 5:41 PM, Zameer Manji  wrote:
> >>>
> >>> Hey,
> >>>
> >>> I'm trying to understand some nuances of the maintenance API. Here are
> my
> >>> questions:
> >>>
> >>> 1. The documentation mentions that accepting or declining and inverse
> >>> offer is a "hint" to the operator. How do operators view if a
> framework has
> >>> declined, accepted or ignored an inverse offer?
> >>>
> >>> 2. Should a framework accept an inverse offer and then start removing
> >>> tasks from an agent or should the framework only accept the inverse
> offer
> >>> after the removal of tasks is complete? I think the former makes
> sense, but
> >>> it implies that operators need to poll the state of the agent to ensure
> >>> there are no active tasks whereas the latter implies operators only
> need to
> >>> check if all inverse offers were accepted.
> >>>
> >>> 3. After accepting the inverse offer, will a framework get another
> >>> inverse offer for the same agent? Currently I'm trying to determine if
> >>> inverse offer information needs to be persisted so a framework can
> continue
> >>> it's draining work between failovers or if it can just wait for an
> inverse
> >>> offer after starting up.
> >>>
> >>> 4. Is it possible for the agent to automatically transition from DRAIN
> to
> >>> DOWN if at the start of the unavailability period the agent is free of
> 

Re: Understanding Mesos Maintenance

2017-03-03 Thread Zameer Manji
Ben,

Thanks for responding to my questions. I have a follow up on #3.

I have a framework which accepts inverse offers but does not do anything to
the associated tasks. I noticed that the framework **does not** receive
another inverse offer  within the allocation period. At what interval will
an inverse offer be resent to the framework if it was accepted? I took a
glance at `src/tests/master_maintenance_tests.cpp` and did not notice any
tests testing for this.

Are you sure that inverse offers are resent after they have been accepted
but before the tasks are removed from the host?


On Thu, Mar 2, 2017 at 4:14 PM, Benjamin Mahler  wrote:

> Hey Zameer, great questions. Let us know if there's anything you think
> could be improved or documented better.
>
> Re 1:
>
> The 'Viewing maintenance status' section of the documentation should
> clarify this:
> http://mesos.apache.org/documentation/latest/maintenance/
>
> Re 2:
>
> Both of these sound reasonable but the scheduler should not accept the
> maintenance if it's not yet safe for the machine to be downed. Otherwise a
> task failure may be mistakenly interpreted as a go ahead to down the
> machine, despite the scheduler needing to get the task back running. If
> expensive or long running work needs to finish (e.g. migrate data, replace
> instances in a manner that doesn't violate SLA, etc.) then I would suggest
> waiting until the work completes safely before accepting.
>
> We likely need a third state like, TENTATIVELY_ACCEPT to signal to
> operators / mesos that the framework intends to comply, but hasn't finished
> whatever it needs to do yet for it to be safe to down the machine.
>
> Also, one of the challenges here is when to take the action. Should the
> scheduler prepare itself for maintenance as soon as it safely can? Or as
> late (but not too late!) as it safely can? If the scheduler runs
> long-running services, as soon as safely possible makes sense. If the
> scheduler runs short running batch jobs, as late as safely possible
> provides work-conservation.
>
> Re 3:
>
> The framework will receive another inverse offer if the framework still
> has resources allocated on that agent. If receiving a regular offer for
> available resources on the agent, an 'Unavailability' [1] will be included
> if the machine is scheduled for maintenance, so that the scheduler can be
> aware of the maintenance when placing new work.
>
> Re 4:
>
> It's not possible currently, and it's the operator's responsibility (the
> intention was for "operator" to be maintenance tooling). Ideally we can add
> automation of this decision into mesos, if decision criteria that is widely
> applicable can be established (e.g. if nothing is running and all relevant
> frameworks have accepted). Feel free to file a ticket for this or any other
> improvements!
>
> Ben
>
> [1] https://github.com/apache/mesos/blob/8f487beb9f8aaed8f27
> b0404279b1a2f97672ba1/include/mesos/v1/mesos.proto#L1416-L1426
>
> On Wed, Mar 1, 2017 at 5:41 PM, Zameer Manji  wrote:
>
>> Hey,
>>
>> I'm trying to understand some nuances of the maintenance API. Here are my
>> questions:
>>
>> 1. The documentation mentions that accepting or declining and inverse
>> offer is a "hint" to the operator. How do operators view if a framework has
>> declined, accepted or ignored an inverse offer?
>>
>> 2. Should a framework accept an inverse offer and then start removing
>> tasks from an agent or should the framework only accept the inverse offer
>> after the removal of tasks is complete? I think the former makes sense, but
>> it implies that operators need to poll the state of the agent to ensure
>> there are no active tasks whereas the latter implies operators only need to
>> check if all inverse offers were accepted.
>>
>> 3. After accepting the inverse offer, will a framework get another
>> inverse offer for the same agent? Currently I'm trying to determine if
>> inverse offer information needs to be persisted so a framework can continue
>> it's draining work between failovers or if it can just wait for an inverse
>> offer after starting up.
>>
>> 4. Is it possible for the agent to automatically transition from DRAIN to
>> DOWN if at the start of the unavailability period the agent is free of
>> tasks or is that still the operator's responsibility?
>>
>> --
>> Zameer Manji
>>
>> --
>> Zameer Manji
>>
>


Re: [VOTE] Release Apache Mesos 1.1.1 (rc2)

2017-03-03 Thread Vinod Kone
+1 (binding)

Since the perf issue I reported earlier doesn't seem to be a blocker.

On Fri, Mar 3, 2017 at 12:14 AM, Alex Rukletsov  wrote:

> Was this perf issue introduced by one of the fixes included in 1.1.1-rc2?
> If not, I would suggest we vote for 1.1.1-rc2 and back port the perf fix
> into 1.1.2. IIUC, time based patch releases should *not be worse*, hence if
> the perf issue was already in 1.1.0 it is *fine* to fix it in 1.1.2. I
> would like to avoid postponing already belated 1.1.1 for even longer.
>
> On Wed, Mar 1, 2017 at 8:02 PM, Vinod Kone  wrote:
>
> > Tested on ASF CI.
> >
> > Saw 2 configurations fail with
> > https://issues.apache.org/jira/browse/MESOS-7160
> >
> > I think @jpeach and @bbannier were looking into this. Not sure about the
> > severity of the issue, so withholding my vote.
> >
> >
> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
> >
> >- refs/tags/1.1.1-rc2
> >
> > Configuration Matrix gcc clang
> > centos:7 --verbose --enable-libevent --enable-ssl autotools
> > [image: Success]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--
> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
> > 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Not run]
> > cmake
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
> > verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
> > GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
> > 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Not run]
> > --verbose autotools
> > [image: Success]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
> > ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
> > exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Not run]
> > cmake
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
> > verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
> > 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Not run]
> > ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> > [image: Success]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--
> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
> > 20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Failed]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=clang,
> CONFIGURATION=--verbose%20--
> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
> > 20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > cmake
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
> > verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
> > GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
> > docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=-
> > -verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
> > GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
> > docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > --verbose autotools
> > [image: Success]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
> > ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
> > label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Failed]
> >  > Release/30/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,
> > ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
> > label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > cmake
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
> > verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
> > 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> > [image: Success]
> >  > Release/30/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=-
> > -verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
> > 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> >
> > On Mon, Feb 27, 2017 at 5:54 AM, Alex Rukletsov 

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-03 Thread Vinod Kone
+1 (binding)

Since the perf and flaky test that I reported earlier doesn't seem to be
blockers.

On Fri, Mar 3, 2017 at 4:01 PM, Adam Bordelon  wrote:

> I haven't heard any -1's so I'm going to go ahead and vote myself, from a
> DC/OS perspective:
>
> +1 (binding)
>
> I ran 1.2.0-rc2 through the DC/OS integration tests on top of the
> 1.9.0-rc1, which covers many Mesos features and tests multiple frameworks.
> See CI results of https://github.com/dcos/dcos/pull/1295
>
> This was then merged into DC/OS 1.9.0-rc2 which passed another suite of
> integration tests. Available for testing at https://dcos.io/releases/1.9.
> 0-rc2/
>
>
> On Thu, Mar 2, 2017 at 12:02 AM, Adam Bordelon  wrote:
>
>> TL;DR: No consensus yet. Let's extend the vote for a day or two, until we
>> have 3 +1s or a legit -1.
>> During that time we can test further, and investigate any issues that
>> have shown up.
>>
>> Here's a summary of what's been reported on the 1.2.0-rc2 vote thread:
>>
>> - There was a perf core dump on ASF CI, which is not necessarily a
>> blocker:
>> MESOS-7160  Parsing of perf version segfaults
>>   Perhaps fixed by backporting MESOS-6982: PerfTest.Version fails on
>> recent Arch Linux
>>
>> - There were a couple of (known/unsurprising) flaky tests:
>> MESOS-7185  
>> DockerRuntimeIsolatorTest.ROOT_INTERNET_CURL_DockerDefaultEntryptRegistryPuller
>> is flaky
>> MESOS-4570  DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
>>
>> - If we were to have an rc3, the following Critical bugs could be
>> included:
>> MESOS-7050  IOSwitchboard FDs leaked when containerizer launch fails --
>> leads to deadlock
>> MESOS-6982  PerfTest.Version fails on recent Arch Linux
>>
>> - Plus doc updates:
>> MESOS-7188 Add documentation for Debug APIs to Operator API doc
>> MESOS-7189 Add nested container launch/wait/kill APIs to agent API
>> docs.
>>
>>
>> On Wed, Mar 1, 2017 at 11:30 AM, Neil Conway 
>> wrote:
>>
>>> The perf core dump might be addressed if we backport this change:
>>>
>>> https://reviews.apache.org/r/56611/
>>>
>>> Although my guess is that this isn't a severe problem: for some
>>> as-yet-unknown reason, running `perf` on the host segfaulted, which
>>> causes the test to fail.
>>>
>>> Neil
>>>
>>> On Wed, Mar 1, 2017 at 11:09 AM, Vinod Kone 
>>> wrote:
>>> > Tested on ASF CI.
>>> >
>>> > Saw 2 configurations fail. One was the perf core dump issue
>>> > . Other is a known
>>> (since
>>> > 0..28.0) flaky test with Docker fetcher plugin
>>> > .
>>> >
>>> > Withholding the vote until we know the severity of the perf core dump.
>>> >
>>> >
>>> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
>>> >
>>> >- refs/tags/1.1.1-rc2
>>> >
>>> > Configuration Matrix gcc clang
>>> > centos:7 --verbose --enable-libevent --enable-ssl autotools
>>> > [image: Success]
>>> > >> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
>>> bose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>>> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> > [image: Not run]
>>> > cmake
>>> > [image: Success]
>>> > >> ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>>> 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoo
>>> p)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> > [image: Not run]
>>> > --verbose autotools
>>> > [image: Success]
>>> > >> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
>>> bose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,
>>> label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> > [image: Not run]
>>> > cmake
>>> > [image: Success]
>>> > >> ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>>> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> > [image: Not run]
>>> > ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>>> > [image: Success]
>>> > >> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
>>> bose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>>> 1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
>>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> > [image: Failed]
>>> > >> ease/30/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--
>>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_
>>> 

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-03 Thread Adam Bordelon
I haven't heard any -1's so I'm going to go ahead and vote myself, from a
DC/OS perspective:

+1 (binding)

I ran 1.2.0-rc2 through the DC/OS integration tests on top of the
1.9.0-rc1, which covers many Mesos features and tests multiple frameworks.
See CI results of https://github.com/dcos/dcos/pull/1295

This was then merged into DC/OS 1.9.0-rc2 which passed another suite of
integration tests. Available for testing at
https://dcos.io/releases/1.9.0-rc2/


On Thu, Mar 2, 2017 at 12:02 AM, Adam Bordelon  wrote:

> TL;DR: No consensus yet. Let's extend the vote for a day or two, until we
> have 3 +1s or a legit -1.
> During that time we can test further, and investigate any issues that have
> shown up.
>
> Here's a summary of what's been reported on the 1.2.0-rc2 vote thread:
>
> - There was a perf core dump on ASF CI, which is not necessarily a blocker:
> MESOS-7160  Parsing of perf version segfaults
>   Perhaps fixed by backporting MESOS-6982: PerfTest.Version fails on
> recent Arch Linux
>
> - There were a couple of (known/unsurprising) flaky tests:
> MESOS-7185  DockerRuntimeIsolatorTest.ROOT_INTERNET_CURL_
> DockerDefaultEntryptRegistryPuller is flaky
> MESOS-4570  DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
>
> - If we were to have an rc3, the following Critical bugs could be included:
> MESOS-7050  IOSwitchboard FDs leaked when containerizer launch fails --
> leads to deadlock
> MESOS-6982  PerfTest.Version fails on recent Arch Linux
>
> - Plus doc updates:
> MESOS-7188 Add documentation for Debug APIs to Operator API doc
> MESOS-7189 Add nested container launch/wait/kill APIs to agent API
> docs.
>
>
> On Wed, Mar 1, 2017 at 11:30 AM, Neil Conway 
> wrote:
>
>> The perf core dump might be addressed if we backport this change:
>>
>> https://reviews.apache.org/r/56611/
>>
>> Although my guess is that this isn't a severe problem: for some
>> as-yet-unknown reason, running `perf` on the host segfaulted, which
>> causes the test to fail.
>>
>> Neil
>>
>> On Wed, Mar 1, 2017 at 11:09 AM, Vinod Kone  wrote:
>> > Tested on ASF CI.
>> >
>> > Saw 2 configurations fail. One was the perf core dump issue
>> > . Other is a known
>> (since
>> > 0..28.0) flaky test with Docker fetcher plugin
>> > .
>> >
>> > Withholding the vote until we know the severity of the perf core dump.
>> >
>> >
>> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
>> >
>> >- refs/tags/1.1.1-rc2
>> >
>> > Configuration Matrix gcc clang
>> > centos:7 --verbose --enable-libevent --enable-ssl autotools
>> > [image: Success]
>> > > ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
>> 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > cmake
>> > [image: Success]
>> > > ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > --verbose autotools
>> > [image: Success]
>> > > ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
>> 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > cmake
>> > [image: Success]
>> > > ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>> > [image: Success]
>> > > ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Failed]
>> > > ease/30/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
>> -verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > cmake
>> > [image: Success]
>> > > ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>> 

Re: isolation and task binary distributives

2017-03-03 Thread Zameer Manji
Another approach would be to create a bind mount from the host to each
container (say `/var/cache/data`). The first executor can copy data there
and the subsequent executors can check for it's presence and re-use that
data.

On Fri, Mar 3, 2017 at 8:32 AM, tommy xiao  wrote:

> create a  persistent disk to store the distribution and run executor
> isolation.
>
> 2017-03-03 21:40 GMT+08:00 Egor Ryashin :
>
>> Hi All,
>>
>> I'm writing custom scheduler which will be sending short-running tasks.
>> I need those tasks to be properly isolated, that means I shouldn't run
>> them in the same executor. Those tasks require large binary distributives
>> and I suppose each run in a separate executor will spawn large sandboxes
>> with copies of those distributives on disk. Is there an easy way to
>> maintain isolation for those tasks meanwhile sharing a distributive between
>> them?
>>
>> Thanks,
>> Egor
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>
> --
> Zameer Manji
> 
>


Re: isolation and task binary distributives

2017-03-03 Thread tommy xiao
create a  persistent disk to store the distribution and run executor
isolation.

2017-03-03 21:40 GMT+08:00 Egor Ryashin :

> Hi All,
>
> I'm writing custom scheduler which will be sending short-running tasks.
> I need those tasks to be properly isolated, that means I shouldn't run
> them in the same executor. Those tasks require large binary distributives
> and I suppose each run in a separate executor will spawn large sandboxes
> with copies of those distributives on disk. Is there an easy way to
> maintain isolation for those tasks meanwhile sharing a distributive between
> them?
>
> Thanks,
> Egor
>
>


-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com


isolation and task binary distributives

2017-03-03 Thread Egor Ryashin
Hi All,

I'm writing custom scheduler which will be sending short-running tasks.
I need those tasks to be properly isolated, that means I shouldn't run them in 
the same executor. Those tasks require large binary distributives and I suppose 
each run in a separate executor will spawn large sandboxes with copies of those 
distributives on disk. Is there an easy way to maintain isolation for those 
tasks meanwhile sharing a distributive between them?

Thanks,
Egor