Re: Welcome Guangya Liu as Mesos Committer and PMC member!

2016-12-19 Thread Guangya Liu
Thank you all! For sure, I'm looking forward to contributing more to the
community! We all want to make mesos awesome!


On Mon, Dec 19, 2016 at 4:51 PM, tommy xiao <xia...@gmail.com> wrote:

> Congrats Guangya!
>
> 2016-12-19 11:26 GMT+08:00 Yan Xu <y...@jxu.me>:
>
>> Congrats!
>>
>> ---
>> Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>
>>
>> On Mon, Dec 19, 2016 at 1:31 AM, haosdent <haosd...@gmail.com> wrote:
>>
>>> Congrats Guangya!
>>>
>>> On Sun, Dec 18, 2016 at 10:02 PM, Klaus Ma <klaus1982...@gmail.com>
>>> wrote:
>>>
>>>> Congratulations!!
>>>>
>>>> On Sat, Dec 17, 2016 at 1:23 PM Dharmesh Kakadia <dhkaka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Congrats Guangya !
>>>>>
>>>>> Thanks,
>>>>> Dharmesh
>>>>>
>>>>> On Fri, Dec 16, 2016 at 5:03 PM, Dario Rexin <dre...@apple.com> wrote:
>>>>>
>>>>> Congrats!
>>>>>
>>>>> > On Dec 16, 2016, at 4:27 PM, Vinod Kone <vinodk...@apache.org>
>>>>> wrote:
>>>>> >
>>>>> > Congrats Guangya! Welcome to the PMC!
>>>>> >
>>>>> >> On Fri, Dec 16, 2016 at 7:03 PM, Sam <usultra...@gmail.com> wrote:
>>>>> >> congratulations Guangya
>>>>> >>
>>>>> >> Sent from my iPhone
>>>>> >>
>>>>> >>> On 17 Dec 2016, at 3:23 AM, Avinash Sridharan <
>>>>> avin...@mesosphere.io> wrote:
>>>>> >>>
>>>>> >>> Congrats Guangya !!
>>>>> >>>
>>>>> >>>> On Fri, Dec 16, 2016 at 11:20 AM, Greg Mann <g...@mesosphere.io>
>>>>> wrote:
>>>>> >>>> Congratulations Guangya!!! :D
>>>>> >>>>
>>>>> >>>>> On Fri, Dec 16, 2016 at 11:10 AM, Jie Yu <yujie@gmail.com>
>>>>> wrote:
>>>>> >>>>> Hi folks,
>>>>> >>>>>
>>>>> >>>>> Please join me in formally welcoming Guangya Liu as Mesos
>>>>> Committer and PMC
>>>>> >>>>> member.
>>>>> >>>>>
>>>>> >>>>> Guangya has worked on the project for more than a year now and
>>>>> has been a
>>>>> >>>>> very active contributor to the project. I think one of the most
>>>>> important
>>>>> >>>>> contribution he has for the community is that he helped grow the
>>>>> Mesos
>>>>> >>>>> community in China. He initiated the Xian-Mesos-User-Group and
>>>>> successfully
>>>>> >>>>> organized two meetups which attracted more than 100 people from
>>>>> Xi’an
>>>>> >>>>> China. He wrote a handful of blogs and articles in Chinese tech
>>>>> media which
>>>>> >>>>> attracted a lot of interests in Mesos. He had given several
>>>>> talks about
>>>>> >>>>> Mesos at conferences in China.
>>>>> >>>>>
>>>>> >>>>> His major coding contribution to the project was the docker
>>>>> volume driver
>>>>> >>>>> isolator. He has also been involved in allocator performance
>>>>> improvement,
>>>>> >>>>> gpu support for docker containerizer, Mesos Tiers/Optimistic
>>>>> Offer design,
>>>>> >>>>> scarce resources discussion, and many others.
>>>>> >>>>>
>>>>> >>>>> His formal checklist is here:
>>>>> >>>>> https://docs.google.com/document/d/1tot79kyJCTTgJHBhzStFKrVk
>>>>> DK4pX
>>>>> >>>>> qfl-LHCLOovNtI/edit?usp=sharing
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>> - Jie
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Avinash Sridharan, Mesosphere
>>>>> >>> +1 (323) 702 5245
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>
>>>> Regards,
>>>> 
>>>> Da (Klaus), Ma (马达), PMP® | Software Architect
>>>> IBM Platform Development & Support, STG, IBM GCG
>>>> +86-10-8245 4084 <+86%2010%208245%204084> | mad...@cn.ibm.com |
>>>> http://k82.me
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>


Re: Support for tasks groups aka pods in Mesos

2016-09-21 Thread Guangya Liu
The answer is No, the taskGroup can be treated as Pod in Kubernetes, it
will be a set of containers co-located and co-managed on an agent that
share some resources (e.g., network namespace, volumes). For your case, you
may want to split your 50 tasks to small task groups.

Also the `execute` cli is just an example framework and it can only launch
one task group for now, you may want to enhance it if you want to launch
multiple task groups.

On Wed, Sep 21, 2016 at 1:41 PM,  wrote:

> Hi,
>
> Very cool feature.
>
> I’ve seen some recent changes (MESOS-6096
> ) in the mesos-execute
> cli supporting grouped tasks which spawns the task-group on an arbitrary or
> specified agent node. Tested this yesterday and it works great.  But if the
> task group holds a lot of one-off tasks (let’s say 50 tasks each 4 cores,…)
> how could one achieve that  tasks are evenly distributed over several
> agents, depending on free resources? Don’t know if this is referred to
> pods, but do you think mesos-execute will support this?
>
>
>
> Currently I’m using chronos & cook for batch scheduling, but I’d  like to
> achieve that every batch/group of  tasks runs in its own temporary
> framework like it does with mesos-execute
>
>
>
> Thanks in advance & kind regards,
>
> Hubert
>
>
>
> ——
>
> *Deutsches Zentrum für Luft- und Raumfahrt* e.V. (DLR)
>
> German Aerospace Center
>
>
>
>
>
> German Remote Sensing Data Center | International Ground Segment |
> Oberpfaffenhofen | 82234 Wessling | Germany
>
>
>
> *Hubert Asamer*
>
> phone  +49 (0) 8153 28-2894 | fax +49 (0) 8153 28-1443 |
> hubert.asa...@dlr.de
>
> www.DLR.de/eoc 
>
>
>
>
>
>
>
>
>
> *From:* Vinod Kone [mailto:vinodk...@apache.org]
> *Sent:* Dienstag, 9. August 2016 02:54
> *To:* dev; user
> *Subject:* Support for tasks groups aka pods in Mesos
>
>
>
> Hi folks,
>
>
>
> One of the most requested features in Mesos has been first class support
> for managing pod like containers. We finally have some time to focus and
> shepherd this work.
>
>
>
> The epic tracking this work is : https://issues.apache.org/
> jira/browse/MESOS-2449
>
>
>
> Design doc: https://issues.apache.org/jira/browse/MESOS-2449
>
>
>
> Your feedback on the design will be most welcome. Once we get agreement on
> the design, we can start breaking down the epic into tickets.
>
>
>
> Thanks,
>
> Vinod & Jie
>


Re: MESOS: Improve the performance of Resources.

2016-07-11 Thread Guangya Liu
Hi Joris,

For `2x` number, can you please show more detail for how did you do and
evaluate the test? How did you compare the result? Did you have any test
code to share?

Thanks,

Guangya

On Mon, Jul 11, 2016 at 6:05 PM, Klaus Ma  wrote:

> Hi Joris,
>
> For `Scalars`, yes, it's also dynamic allocated in
> `Resources::mutable_scalar()`.
>
> For `2x` number, do you have any patch to share? Tested cases this
> afternoon: add resources including 100 roles (1 CPU for each); the
> performance is downgrade a lot; so I agree with you to improve some
> algorithms in Resources/Sorter.
>
> For 'basic benchmarks', temporary tracking in my personal github (
> https://github.com/k82cn/mesos/blob/resources_benchmark/src/tests/resources_tests.cpp).
> The following cases are in my mind to add:
> 1. simple resources, e.g. 1 cpu
> 2. resources with port, e.g. [1-2], [3-4], ... [101-102]
> 3. resources with reservation, cpus(r1):1;cpus(r2):1;  cpus(r10):1
> 4. resources with diskInfo
> 5. resources with revocableInfo
>
> The operators will be +, +=, -, -=, cpus(), contains.
>
> I booked a weekly call to discuss for allocaiton performance and sent the
> invitation to the dev@.
>
> If any comments, please let me know.
>
> 
> Da (Klaus), Ma (马达) | PMP® | Software Architect
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>
> On Mon, Jul 11, 2016 at 3:28 PM, Joris Van Remoortere  > wrote:
>
>> +Dev
>>
>> Hey Klaus,
>>
>> Using Stout's `Optional` to represent the `optional` concept of a message
>> in protobuf is definitely a step in the right direction.
>> Regarding your comment in slack yesterday: From my version of the
>> protobuf generated code there definitely is dynamic allocation even for
>> scalars.
>>
>> It looks like in our case there is a minimum of 3 dynamic allocations per
>> Resource object:
>>
>>> void Resource::SharedDtor() {
>>>   if (name_ != &::google::protobuf::internal::kEmptyString) {
>>> delete name_;
>>>   }
>>>   if (role_ != _default_role_) {
>>> delete role_;
>>>   }
>>>   if (this != default_instance_) {
>>> delete scalar_;
>>> delete ranges_;
>>> delete set_;
>>> delete reservation_;
>>> delete disk_;
>>> delete revocable_;
>>>   }
>>> }
>>
>>
>>  The 2x number I mentioned came from running some of the existing
>> benchmarks. I didn't explore further because it didn't have as big an
>> impact as I had hoped. The first battle is simplifying some of the
>> algorithms in the Sorter / Resources. Once that is done then the resource
>> arithmetic will be more of a bounding factor.
>>
>> I agree with Ben that we should focus on writing some basic benchmarks
>> that represent the common uses of Resources in the allocator. We should
>> scale these benchmarks to represent some of the more stressful environments
>> that could occur. For example, had we had such a benchmark, we would have
>> realized much earlier on that we needed to aggregate only quantities in the
>> Sorter, and that using the existing form of Resources would have led to a
>> grinding halt if a reservation were made on every machine.
>>
>> Is there a regular call that is scheduled to discuss this? I think there
>> are some other folks also working on benchmarks and interested in the
>> discussion.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Sun, Jul 10, 2016 at 8:50 PM, Klaus Ma  wrote:
>>
>>> + more devs :).
>>>
>>> 
>>> Da (Klaus), Ma (马达) | PMP® | Software Architect
>>> Platform OpenSource Technology, STG, IBM GCG
>>> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>>>
>>> On Mon, Jul 11, 2016 at 7:43 AM, Klaus Ma 
>>> wrote:
>>>
 Hi Joris,

 I think `Option` is helpful to the performance improvement, it used
 `placement new` to avoid dynamic allocation. Suppose you're using Option
 for optional member in protobuf, and using class instance directly
 (operator=).

 I'm adding some benchmark for `Resources`, especially for the
 `Resources` with Rang, DiskInfo and ReservationInfo

 Draft PR for Benchmark of Resources:
 https://github.com/k82cn/mesos/commit/09ca215cb37b1f89eb7d68a8cf2249eb641c


 
 Da (Klaus), Ma (马达) | PMP® | Software Architect
 Platform OpenSource Technology, STG, IBM GCG
 +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

>>>
>>>
>>
>


Re: MESOS-4694

2016-07-07 Thread Guangya Liu
Hi Ben and Dario,

The reason that we have "SUPPRESS" call is as following:
1) Act as the complement to the current REVIVE call.
2) The HTTP API do not have an API to "Deactivate" a framework, we want to
use "SUPPRESS", "DECLINE" and "DECLINE_INVERSE_OFFERS" to implement the
call for "DeactivateFrameworkMessage".

You can also refer to https://issues.apache.org/jira/browse/MESOS-3037 for
detail.

So I think that Dario's patch is good, we should remove the framework
clients when "SUPPRESS" and add the framework client back when "REVIVE". to
ignore those frameworks from sorter.

@Viond, any comments for this?

@Ben, for your concern of the benchmark test result is not easy to
understand, I have filed a JIRA ticket here
https://issues.apache.org/jira/browse/MESOS-5800 to trace.

Thanks,

Guangya



On Thu, Jul 7, 2016 at 6:01 AM, Dario Rexin  wrote:

> Hi Vinod,
>
> thanks for your reply. The reason it’s so much faster is because the
> sorting is a lot faster with fewer frameworks. Looping shouldn’t make a
> huge difference, as it used to just skip over the deactivated frameworks.
>
> I don’t know what effects deactivating the framework in the master would
> have. The framework is still active and listening for events / sending
> calls. Could you please elaborate?
>
> Thanks,
> --
>  Dario
>
> On Jul 6, 2016, at 2:56 PM, Benjamin Mahler  wrote:
>
> +implementer and shepherd of SUPPRESS
>
> Is there any reason we didn't already just "deactivate" frameworks that
> were suppressing offers? That seems to be the natural implementation,
> performance aside, because the meaning of "deactivated" is: not being sent
> any offers. The patch you posted seems to only take this half-way: suppress
> = deactivation in the allocator, but not in the master.
>
> Also, Dario it's a bit hard to interpret these numbers without reading the
> benchmark code. My interpretation of these numbers is that this change
> makes the allocation loop complete more quickly when there are many
> frameworks that are in the suppressed state, because we have to loop over
> fewer clients. Is this an accurate interpretation?
>
> On Wed, Jul 6, 2016 at 2:08 PM, Dario Rexin  wrote:
>
> Hi all,
>
> I would like to revive https://issues.apache.org/jira/browse/MESOS-4694 <
> https://issues.apache.org/jira/browse/MESOS-4694>, especially
> https://reviews.apache.org/r/43666/ .
> We heavily depend on this patch and would love to see it merged. To show
> the value of this patch, I ran the benchmark from
> https://reviews.apache.org/r/49616/ 
> first on HEAD and then with the aforementioned patch applied. I took some
> lines out to make it easier to see the changes over time in the patched
> version and to keep this email shorter ;). I would love to get some
> feedback and discuss any necessary changes to get this patch merged.
>
> Here are the results:
>
> Mesos HEAD:
>
> Using 2000 agents and 200 frameworks
> round 0 allocate took 3.064665secs to make 199 offers
> round 1 allocate took 3.029418secs to make 198 offers
> round 2 allocate took 3.091427secs to make 197 offers
> round 3 allocate took 2.955457secs to make 196 offers
> round 4 allocate took 3.133789secs to make 195 offers
> [...]
> round 50 allocate took 3.109859secs to make 149 offers
> round 51 allocate took 3.062746secs to make 148 offers
> round 52 allocate took 3.146043secs to make 147 offers
> round 53 allocate took 3.042948secs to make 146 offers
> round 54 allocate took 3.097835secs to make 145 offers
> [...]
> round 100 allocate took 3.027475secs to make 99 offers
> round 101 allocate took 3.021641secs to make 98 offers
> round 102 allocate took 2.9853secs to make 97 offers
> round 103 allocate took 3.145925secs to make 96 offers
> round 104 allocate took 2.99094secs to make 95 offers
> [...]
> round 150 allocate took 3.080406secs to make 49 offers
> round 151 allocate took 3.109412secs to make 48 offers
> round 152 allocate took 2.992129secs to make 47 offers
> round 153 allocate took 3.405642secs to make 46 offers
> round 154 allocate took 4.153354secs to make 45 offers
> [...]
> round 195 allocate took 3.10015secs to make 4 offers
> round 196 allocate took 3.029347secs to make 3 offers
> round 197 allocate took 2.982825secs to make 2 offers
> round 198 allocate took 2.934595secs to make 1 offers
> round 199 allocate took 313212us to make 0 offers
>
> Mesos HEAD + allocator patch:
>
> Using 2000 agents and 200 frameworks
> round 0 allocate took 3.248205secs to make 199 offers
> round 1 allocate took 3.170852secs to make 198 offers
> round 2 allocate took 3.135146secs to make 197 offers
> round 3 allocate took 3.143857secs to make 196 offers
> round 4 allocate took 3.127641secs to make 195 offers
> [...]
> round 50 allocate took 2.492077secs to make 149 offers
> round 51 allocate took 2.435054secs to make 148 offers
> round 52 allocate took 2.472204secs to 

Allocator slack channel

2016-07-05 Thread Guangya Liu
Hi,

I created an #allocator slack channel in mesos.slack.com, please join this
if you want to have some discussion for allocator related issues, such as
allocator performance, optimistic offer, revocable resource etc.

@bmahler and @vinodkone,

I also posted a RR here https://reviews.apache.org/r/49660/ for adding this
to "Resource Allocation" working group, can you please help check?

Thanks,

Guangya


Re: Mesos CLI

2016-06-23 Thread Guangya Liu
Another advantage for using python is that we can use stevedore
<http://docs.openstack.org/developer/stevedore/tutorial/loading.html> to
manage all of the CLI plugins for container, agent,cluster etc.
The stevedore was been widely used in OpenStack.

Thanks,

Guangya

On Fri, Jun 24, 2016 at 9:56 AM, Jie Yu <yujie@gmail.com> wrote:

> I am actually fine with Python as long as we can figure out a way to
> install python executable without any dependency during make install (and
> subsequently bundle it into rpm/deb packages). According to Kevin, looks
> like pyinstall can achieve that.
>
> If we go for the Python route, I'd like to have a style guide for our
> python code. Looks like we can directly use the google python style guide
> <https://google.github.io/styleguide/pyguide.html>. Looks like pylint can
> also check the style automatically.
>
> - Jie
>
>
> On Thu, Jun 23, 2016 at 1:21 AM, Guangya Liu <gyliu...@gmail.com> wrote:
>
> > +1 to use python. By using python, we can debug the CLI without
> re-compile
> > but just update the CLI file and debug it with pdb, this is very helpful
> to
> > trouble shooting.
> >
> > On Thu, Jun 23, 2016 at 9:34 AM, Kevin Klues <klue...@gmail.com> wrote:
> >
> > > >
> > > > The best option may still be for it
> > > > to be in Python, this is why I'm asking if there are particular
> things
> > > that
> > > > our helper libraries don't provide which you are leveraging in
> python.
> > > >
> > >
> > > One thing we rely heavily on that is missing is `docopt`. We use docopt
> > for
> > > convenient / standardized command line parsing and help formatting.
> This
> > > makes it really easy to enforce a standard help format across plugins
> so
> > > the CLI has a consistent feel throughout all of its subcommands.
> > Supposedly
> > > there is a C++ implementation of this now, but it requires gcc 4.9+
> (for
> > > regex).
> > > https://github.com/docopt/docopt.cpp
> > >
> > > In addition to this, the plugin architecture we built was very easy to
> > > implement in python, and I'm worried it would be much more complicated
> > (and
> > > less readable) to get the same functionality out of C++. The existing
> CLI
> > > has some support for "plugins" (by looking for executables in the path
> > with
> > > a "mesos-" prefix and assuming they are an extension to the CLI that
> can
> > > exist as a subcommand). However, the implementation of this is pretty
> > > ad-hoc and error prone (though it could conceivably be redone to work
> > > better).
> > >
> > > To get the equivalent functionality out of C++ for the plugin
> > architecture
> > > we've built for python, each plugin would need to be implemented as a
> > > shared object that we dlopen() from the main program. Each module would
> > > define a set of global variables describing properties of the plugin
> > > (including help information) as well as create an instance of a class
> > that
> > > inherits from a `PluginBase` class to perform the actual functionality
> of
> > > the plugin. The main program would then load this module, integrate its
> > > help information and other meta data into its own metadata, and begin
> > > invoking functions on the plugin class.
> > >
> > > I'm not saying it's impossible to do in C++, just that python lends
> > itself
> > > better to doing this kind of stuff, and is much more readable when
> doing
> > > so.
> > >
> >
>


Re: Mesos CLI

2016-06-23 Thread Guangya Liu
+1 to use python. By using python, we can debug the CLI without re-compile
but just update the CLI file and debug it with pdb, this is very helpful to
trouble shooting.

On Thu, Jun 23, 2016 at 9:34 AM, Kevin Klues  wrote:

> >
> > The best option may still be for it
> > to be in Python, this is why I'm asking if there are particular things
> that
> > our helper libraries don't provide which you are leveraging in python.
> >
>
> One thing we rely heavily on that is missing is `docopt`. We use docopt for
> convenient / standardized command line parsing and help formatting. This
> makes it really easy to enforce a standard help format across plugins so
> the CLI has a consistent feel throughout all of its subcommands. Supposedly
> there is a C++ implementation of this now, but it requires gcc 4.9+ (for
> regex).
> https://github.com/docopt/docopt.cpp
>
> In addition to this, the plugin architecture we built was very easy to
> implement in python, and I'm worried it would be much more complicated (and
> less readable) to get the same functionality out of C++. The existing CLI
> has some support for "plugins" (by looking for executables in the path with
> a "mesos-" prefix and assuming they are an extension to the CLI that can
> exist as a subcommand). However, the implementation of this is pretty
> ad-hoc and error prone (though it could conceivably be redone to work
> better).
>
> To get the equivalent functionality out of C++ for the plugin architecture
> we've built for python, each plugin would need to be implemented as a
> shared object that we dlopen() from the main program. Each module would
> define a set of global variables describing properties of the plugin
> (including help information) as well as create an instance of a class that
> inherits from a `PluginBase` class to perform the actual functionality of
> the plugin. The main program would then load this module, integrate its
> help information and other meta data into its own metadata, and begin
> invoking functions on the plugin class.
>
> I'm not saying it's impossible to do in C++, just that python lends itself
> better to doing this kind of stuff, and is much more readable when doing
> so.
>


Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-22 Thread Guangya Liu
Hi Elizabeth,

Just FYI, there is a JIRA tracing the resource revocation here
https://issues.apache.org/jira/browse/MESOS-4967

And I'm also working on the short term solution of excluding the scarce
resources from allocator (https://reviews.apache.org/r/48906/), with this
feature and Kevin's GPU_RESOURCES capability, the mesos can handle scarce
resources well.

Thanks,

Guangya

On Wed, Jun 22, 2016 at 4:45 AM, Kevin Klues <klue...@gmail.com> wrote:

> As an FYI, preliminary support to work around this issue for GPUs will
> appear in the 1.0 release
> https://reviews.apache.org/r/48914/
>
> This doesn't solve the problem of scarce resources in general, but it
> will at least keep non-GPU workloads from starving out GPU-based
> workloads on GPU capable machines. The downside of this approach is
> that only GPU aware frameworks will be able to launch stuff on GPU
> capable machines (meaning some of their resources could go unused
> unnecessarily).  We decided this tradeoff is acceptable for now.
>
> Kevin
>
> On Tue, Jun 21, 2016 at 1:40 PM, Elizabeth Lingg
> <elizabeth_li...@apple.com> wrote:
> > Thanks, looking forward to discussion and review on your document. The
> main use case I see here is that some of our frameworks will want to
> request the GPU resources, and we want to make sure that those frameworks
> are able to successfully launch tasks on agents with those resources. We
> want to be certain that other frameworks that do not require GPU’s will not
> request all other resources on those agents (i.e. cpu, disk, memory) which
> would mean the GPU resources are not allocated and the frameworks that
> require them will not receive them. As Ben Mahler mentioned, "(2) Because
> we do not have revocation yet, if a framework decides to consume the
> non-GPU resources on a GPU machine, it will prevent the GPU workloads from
> running!” This will occur for us in clusters where we have higher
> utilization as well as different types of workloads running. Smart task
> placement then becomes more relevant (i.e. we want to be able to schedule
> with scarce resources successfully and we may have considerations like not
> scheduling too many I/O bound workloads on a single host or more stringent
> requirements for scheduling persistent tasks).
> >
> >  Elizabeth Lingg
> >
> >
> >
> >> On Jun 20, 2016, at 7:24 PM, Guangya Liu <gyliu...@gmail.com> wrote:
> >>
> >> Had some discussion with Ben M, for the following two solutions:
> >>
> >> 1) Ben M: Create sub-pools of resources based on machine profile and
> >> perform fair sharing / quota within each pool plus a framework
> >> capability GPU_AWARE
> >> to enable allocator filter out scarce resources for some frameworks.
> >> 2) Guangya: Adding new sorters for non scarce resources plus a framework
> >> capability GPU_AWARE to enable allocator filter out scarce resources for
> >> some frameworks.
> >>
> >> Both of the above two solutions are meaning same thing and there is no
> >> difference between those two solutions: Create sub-pools of resources
> will
> >> need to introduce different sorters for each sub-pools, so I will merge
> >> those two solutions to one.
> >>
> >> Also had some dicsussion with Ben for AlexR's solution of implementing
> >> "requestResource", this API should be treated as an improvement to the
> >> issues of doing resource allocation pessimistically. (e.g. we
> offer/decline
> >> the GPUs to 1000 frameworks before offering it to the GPU framework that
> >> wants it). And the "requestResource" is providing *more information* to
> >> mesos. Namely, it gives us awareness of demand.
> >>
> >> Even though for some cases, we can use the "requestResource" to get all
> of
> >> the scarce resources, and then once those scarce resources are in use,
> then
> >> the WDRF sorter will sorter non scarce resources as normal, but the
> problem
> >> is that we cannot guarantee that the framework which have
> "requestResource"
> >> can always consume all of the scarce resources before those scarce
> resource
> >> allocated to other frameworks.
> >>
> >> I'm planning to draft a document based on solution 1) "Create sub-pools"
> >> for the long term solution, any comments are welcome!
> >>
> >> Thanks,
> >>
> >> Guangya
> >>
> >> On Sat, Jun 18, 2016 at 11:58 AM, Guangya Liu <gyliu...@gmail.com>
> wrote:
> >>
> >>> Thanks Du Fan. So you mean that we should have som

Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-20 Thread Guangya Liu
Had some discussion with Ben M, for the following two solutions:

1) Ben M: Create sub-pools of resources based on machine profile and
perform fair sharing / quota within each pool plus a framework
capability GPU_AWARE
to enable allocator filter out scarce resources for some frameworks.
2) Guangya: Adding new sorters for non scarce resources plus a framework
capability GPU_AWARE to enable allocator filter out scarce resources for
some frameworks.

Both of the above two solutions are meaning same thing and there is no
difference between those two solutions: Create sub-pools of resources will
need to introduce different sorters for each sub-pools, so I will merge
those two solutions to one.

Also had some dicsussion with Ben for AlexR's solution of implementing
"requestResource", this API should be treated as an improvement to the
issues of doing resource allocation pessimistically. (e.g. we offer/decline
the GPUs to 1000 frameworks before offering it to the GPU framework that
wants it). And the "requestResource" is providing *more information* to
mesos. Namely, it gives us awareness of demand.

Even though for some cases, we can use the "requestResource" to get all of
the scarce resources, and then once those scarce resources are in use, then
the WDRF sorter will sorter non scarce resources as normal, but the problem
is that we cannot guarantee that the framework which have "requestResource"
can always consume all of the scarce resources before those scarce resource
allocated to other frameworks.

I'm planning to draft a document based on solution 1) "Create sub-pools"
for the long term solution, any comments are welcome!

Thanks,

Guangya

On Sat, Jun 18, 2016 at 11:58 AM, Guangya Liu <gyliu...@gmail.com> wrote:

> Thanks Du Fan. So you mean that we should have some clear rules in
> document or somewhere else to tell or guide cluster admin which resources
> should be classified as scarce resources, right?
>
> On Sat, Jun 18, 2016 at 2:38 AM, Du, Fan <fan...@intel.com> wrote:
>
>>
>>
>> On 2016/6/17 7:57, Guangya Liu wrote:
>>
>>> @Fan Du,
>>>
>>> Currently, I think that the scarce resources should be defined by cluster
>>> admin, s/he can specify those scarce resources via a flag when master
>>> start
>>> up.
>>>
>>
>> This is not what I mean.
>> IMO, it's not cluster admin's call to decide what resources should be
>> marked as scarce , they can carry out the operation, but should be advised
>> on based on the clear rule: to what extend the resource is scarce compared
>> with other resources, and it will affect wDRF by causing starvation for
>> frameworks which holds scarce resources, that's my point.
>>
>> To my best knowledge here, a quantitative study of how wDRF behaves in
>> scenario of one/multiple scarce resources first will help to verify the
>> proposed approach, and guide the user of this functionality.
>>
>>
>>
>> Regarding to the proposal of generic scarce resources, do you have any
>>> thoughts on this? I can see that giving framework developers the options
>>> of
>>> define scarce resources may bring trouble to mesos, it is better to let
>>> mesos define those scarce but not framework developer.
>>>
>>
>


Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-17 Thread Guangya Liu
Thanks Du Fan. So you mean that we should have some clear rules in document
or somewhere else to tell or guide cluster admin which resources should be
classified as scarce resources, right?

On Sat, Jun 18, 2016 at 2:38 AM, Du, Fan <fan...@intel.com> wrote:

>
>
> On 2016/6/17 7:57, Guangya Liu wrote:
>
>> @Fan Du,
>>
>> Currently, I think that the scarce resources should be defined by cluster
>> admin, s/he can specify those scarce resources via a flag when master
>> start
>> up.
>>
>
> This is not what I mean.
> IMO, it's not cluster admin's call to decide what resources should be
> marked as scarce , they can carry out the operation, but should be advised
> on based on the clear rule: to what extend the resource is scarce compared
> with other resources, and it will affect wDRF by causing starvation for
> frameworks which holds scarce resources, that's my point.
>
> To my best knowledge here, a quantitative study of how wDRF behaves in
> scenario of one/multiple scarce resources first will help to verify the
> proposed approach, and guide the user of this functionality.
>
>
>
> Regarding to the proposal of generic scarce resources, do you have any
>> thoughts on this? I can see that giving framework developers the options
>> of
>> define scarce resources may bring trouble to mesos, it is better to let
>> mesos define those scarce but not framework developer.
>>
>


Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-16 Thread Guangya Liu
Thanks all for the input here!

@Hans van den Bogert,

Yes, agree with Alex R, Mesos is now using coarse grained mode to allocate
resources and the minimum unit is a single host, so you will always get cpu
and memory.

@Alex,

Yes, I was only listing sorters here, ideally, I think that an indeal
allocation sequence should be:

1) Allocate quota non scarce resources
2) Allocate quota scarce resources
3) Allocate reserved non scarce resources
4) Allocate reserved scarce resources
5) Allocate revocable non scarce resources
6) Allocate revocable scarce resources

Regarding to "requestResources", I think that even we implement it, the
scarce resources will still impact the WDRF sorter as Ben M pointed out in
his use cases.

An ideal solution would be "exclude scarce resources from sorter" plus
"requestResources" for scarce resources. The "exclude scarce resources from
sorter" will focus on non scarce resources while "requestResources" focus
on scarce resources.

I can see that till now, we have three solutions to handle scarce resources:
1) Ben M: Create sub-pools of resources based on machine profile and
perform fair sharing / quota within each pool plus a framework
capability GPU_AWARE
to enable allocator filter out scarce resources for some frameworks.
2) Guangya: Adding new sorters for non scarce resources plus a framework
capability GPU_AWARE to enable allocator filter out scarce resources for
some frameworks.
3) Alex R: "requestResources" for scarce resource plus "exclude scarce
resource from sorter" for non scarce resources (@Alex R, I was putting "exclude
scarce resource from sorter" to your proposal, hope it is OK?)

Solution 1) may cause low resource utilization as Ben M point out. Both 2)
and 3) still using resources in a single pool, so the resource utilization
will not be impacted.

For solution 2), I did not have strong intention to say which one is
better. For 2), my only concern is not sure many sorters can cause
performance issue, but as we should assume there are not too many scarce
resources in the cluster, so the performance should not impact much even if
we add another 3 sorters for scarce resources.

For solution 3), the only problem for "requestResource" is that it may lead
to the issue of "greedy framework" consume all resources, we may need to
consider enabling "requestResource" only request scarce resources first so
as to reduce the impact of some "greedy frameworks".

Another problem for solution 1) and 2) is we need to introduce framework
capability for each specified scarce resource to enable allocator filter
out the scarce resources when a new resources appeared, but I think that
this will not impact much as we should not have too many scarce resources
in the future due to those are "scarce resources".

@Fan Du,

Currently, I think that the scarce resources should be defined by cluster
admin, s/he can specify those scarce resources via a flag when master start
up.

Regarding to the proposal of generic scarce resources, do you have any
thoughts on this? I can see that giving framework developers the options of
define scarce resources may bring trouble to mesos, it is better to let
mesos define those scarce but not framework developer.

Thanks,

Guangya


On Fri, Jun 17, 2016 at 6:53 AM, Joris Van Remoortere 
wrote:

> @Fan,
>
> In the community meeting a question was raised around which frameworks
> might be ready to use this.
> Can you provide some more context for immediate use cases on the framework
> side?
>
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Jun 17, 2016 at 12:51 AM, Du, Fan  wrote:
>
> > A couple of rough thoughts in the early morning:
> >
> > a. Is there any quantitative way to decide a resource is kind of scare? I
> > mean how to aid operator to make this decision to use/not use this
> > functionality when deploying mesos.
> >
> > b. Scare resource extend from GPU to, name a few, Xeon Phi, FPGA, what
> > about make the proposal more generic and future proof?
> >
> >
> >
> > On 2016/6/11 10:50, Benjamin Mahler wrote:
> >
> >> I wanted to start a discussion about the allocation of "scarce"
> resources.
> >> "Scarce" in this context means resources that are not present on every
> >> machine. GPUs are the first example of a scarce resource that we support
> >> as
> >> a known resource type.
> >>
> >> Consider the behavior when there are the following agents in a cluster:
> >>
> >> 999 agents with (cpus:4,mem:1024,disk:1024)
> >> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> >>
> >> Here there are 1000 machines but only 1 has GPUs. We call GPUs a
> "scarce"
> >> resource here because they are only present on a small percentage of the
> >> machines.
> >>
> >> We end up with some problematic behavior here with our current
> allocation
> >> model:
> >>
> >>  (1) If a role wishes to use both GPU and non-GPU resources for
> tasks,
> >> consuming 1 GPU will lead DRF to consider the role to have a 100% share
> of
> >> 

Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-16 Thread Guangya Liu
Thanks Joris, sorry, I forgot the case when the scarce resources was also
requested by quota.

But after a second thought, not only quota, but also reserved resources,
revocable resources can also be scarce resources, we may need to handle all
of those cases.

I think that in the future, the allocator should allocate resources as this:
1) Allocate resources for quota.
2) Allocate reserved resources
3) Allocate revocable resources - After "revocable by default" project, I
think that we will only have reserved resources and revocable resources.

So we will need three steps to allocate all resources based on above
analysis, but after introduced scarce resources, we need to split all of
above three kind resource to two: one is scare and the other is non scarce.

Then there should be six sorters:
1) quota non scarce sorter
2) non scarce reserved sorter
3) non scarce revocable sorter
4) quota scarce sorter
5) scarce reserved sorter
6) scarce revocable sorter

Since there are not too many hosts have scarce resources, so the last three
sorter for scarce resources may not impact performance much, comments?

Thanks,

Guangya

On Thu, Jun 16, 2016 at 7:30 PM, Joris Van Remoortere <jo...@mesosphere.io>
wrote:

> With this 4th sorter approach, how does quota work for scarce resources?
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Thu, Jun 16, 2016 at 11:26 AM, Guangya Liu <gyliu...@gmail.com> wrote:
>
> > Hi Ben,
> >
> > The pre-condition for four stage allocation is that we need to put
> > different resources to different sorters:
> >
> > 1) roleSorter only include non scarce resources.
> > 2) quotaRoleSorter only include non revocable & non scarce resources.
> > 3) revocableSorter only include revocable & non scarce resources. This
> will
> > be handled in MESOS-4923 <
> https://issues.apache.org/jira/browse/MESOS-4923
> > >
> > 4) scarceSorter only include scarce resources.
> >
> > Take your case above:
> > 999 agents with (cpus:4,mem:1024,disk:1024)
> > 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> >
> > The four sorters would be:
> > 1) roleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > 2) quotaRoleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > 3) revocableSorter include nothing as I have no revocable resources here.
> > 4) scarceSorter include 1 agent with (gpus:1)
> >
> > When allocate resources, even if a role got the agent with gpu resources,
> > its share will only be counter by scarceSorter but not other sorters, and
> > will not impact other sorters.
> >
> > The above solution is actually kind of enhancement to "exclude scarce
> > resources" as the scarce resources also obey the DRF algorithm with this.
> >
> > This solution can be also treated as diving the whole resources pool
> > logically to scarce and non scarce resource pool. 1), 2) and 3) will
> handle
> > non scarce resources while 4) focus on scarce resources.
> >
> > Thanks,
> >
> > Guangya
> >
> > On Thu, Jun 16, 2016 at 2:10 AM, Benjamin Mahler <bmah...@apache.org>
> > wrote:
> >
> > > Hm.. can you expand on how adding another allocation stage for only
> > scarce
> > > resources would behave well? It seems to have a number of problems
> when I
> > > think through it.
> > >
> > > On Sat, Jun 11, 2016 at 7:59 AM, Guangya Liu <gyliu...@gmail.com>
> wrote:
> > >
> > >> Hi Ben,
> > >>
> > >> For long term goal, instead of creating sub-pool, what about adding a
> > new
> > >> sorter to handle **scare** resources? The current logic in allocator
> was
> > >> divided to two stages: allocation for quota, allocation for non quota
> > >> resources.
> > >>
> > >> I think that the future logic in allocator would be divided to four
> > >> stages:
> > >> 1) allocation for quota
> > >> 2) allocation for reserved resources
> > >> 3) allocation for revocable resources
> > >> 4) allocation for scare resources
> > >>
> > >> Thanks,
> > >>
> > >> Guangy
> > >>
> > >> On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler <bmah...@apache.org
> >
> > >> wrote:
> > >>
> > >>> I wanted to start a discussion about the allocation of "scarce"
> > >>> resources. "Scarce" in this context means resources that are not
> > present on
> > >>> every machine. GPUs are the first example of a scarce resource

Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-16 Thread Guangya Liu
Hi Ben,

The pre-condition for four stage allocation is that we need to put
different resources to different sorters:

1) roleSorter only include non scarce resources.
2) quotaRoleSorter only include non revocable & non scarce resources.
3) revocableSorter only include revocable & non scarce resources. This will
be handled in MESOS-4923 <https://issues.apache.org/jira/browse/MESOS-4923>
4) scarceSorter only include scarce resources.

Take your case above:
999 agents with (cpus:4,mem:1024,disk:1024)
1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)

The four sorters would be:
1) roleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
2) quotaRoleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
3) revocableSorter include nothing as I have no revocable resources here.
4) scarceSorter include 1 agent with (gpus:1)

When allocate resources, even if a role got the agent with gpu resources,
its share will only be counter by scarceSorter but not other sorters, and
will not impact other sorters.

The above solution is actually kind of enhancement to "exclude scarce
resources" as the scarce resources also obey the DRF algorithm with this.

This solution can be also treated as diving the whole resources pool
logically to scarce and non scarce resource pool. 1), 2) and 3) will handle
non scarce resources while 4) focus on scarce resources.

Thanks,

Guangya

On Thu, Jun 16, 2016 at 2:10 AM, Benjamin Mahler <bmah...@apache.org> wrote:

> Hm.. can you expand on how adding another allocation stage for only scarce
> resources would behave well? It seems to have a number of problems when I
> think through it.
>
> On Sat, Jun 11, 2016 at 7:59 AM, Guangya Liu <gyliu...@gmail.com> wrote:
>
>> Hi Ben,
>>
>> For long term goal, instead of creating sub-pool, what about adding a new
>> sorter to handle **scare** resources? The current logic in allocator was
>> divided to two stages: allocation for quota, allocation for non quota
>> resources.
>>
>> I think that the future logic in allocator would be divided to four
>> stages:
>> 1) allocation for quota
>> 2) allocation for reserved resources
>> 3) allocation for revocable resources
>> 4) allocation for scare resources
>>
>> Thanks,
>>
>> Guangy
>>
>> On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler <bmah...@apache.org>
>> wrote:
>>
>>> I wanted to start a discussion about the allocation of "scarce"
>>> resources. "Scarce" in this context means resources that are not present on
>>> every machine. GPUs are the first example of a scarce resource that we
>>> support as a known resource type.
>>>
>>> Consider the behavior when there are the following agents in a cluster:
>>>
>>> 999 agents with (cpus:4,mem:1024,disk:1024)
>>> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
>>>
>>> Here there are 1000 machines but only 1 has GPUs. We call GPUs a
>>> "scarce" resource here because they are only present on a small percentage
>>> of the machines.
>>>
>>> We end up with some problematic behavior here with our current
>>> allocation model:
>>>
>>> (1) If a role wishes to use both GPU and non-GPU resources for
>>> tasks, consuming 1 GPU will lead DRF to consider the role to have a 100%
>>> share of the cluster, since it consumes 100% of the GPUs in the cluster.
>>> This framework will then not receive any other offers.
>>>
>>> (2) Because we do not have revocation yet, if a framework decides to
>>> consume the non-GPU resources on a GPU machine, it will prevent the GPU
>>> workloads from running!
>>>
>>> 
>>>
>>> I filed an epic [1] to track this. The plan for the short-term is to
>>> introduce two mechanisms to mitigate these issues:
>>>
>>> -Introduce a resource fairness exclusion list. This allows the
>>> shares of resources like "gpus" to be excluded from the dominant share.
>>>
>>> -Introduce a GPU_AWARE framework capability. This indicates that the
>>> scheduler is aware of GPUs and will schedule tasks accordingly. Old
>>> schedulers will not have the capability and will not receive any offers for
>>> GPU machines. If a scheduler has the capability, we'll advise that they
>>> avoid placing their additional non-GPU workloads on the GPU machines.
>>>
>>> 
>>>
>>> Longer term, we'll want a more robust way to manage scarce resources.
>>> The first thought we had was to have sub-pools of resources based on
>>> machine 

Re: Apply for assignment of jira MESOS-5425

2016-06-11 Thread Guangya Liu
You may want to post an email to dev@mesos.apache.org to request add you as
a contributor first.

Please take a look at
https://github.com/apache/mesos/blob/master/docs/newbie-guide.md#getting-started-guidance

Thanks,

Guangya

On Sun, Jun 12, 2016 at 10:44 AM, Yan Yan YY Hu  wrote:

>
> Dear Mesos team,
>
> We are now trying to use Mesos to manage container cluster in large scale.
> During our test, we found some performance issue about Ranges value
> subtraction. The current implementation is low-efficient and causes serious
> overhead to Mesos allocator. Joseph Wu has helped to file a jira to track
> this issue:https://issues.apache.org/jira/browse/MESOS-5425. We have
> prepared a fix by re-implementing the Ranges subtraction using IntervalSet
> data type and we hope to get assignment of this jira to contribute our fix
> back to Mesos. Thank you so much!
>
>
> Best regards!
> **
> Yanyan Hu(胡彦彦) Ph.D.
> Cloud Infrastructure & Technology Team
> Building 19 Zhongguancun Software Park, 8 Dongbeiwang WestRoad, Haidian
> District, Beijing,P.R.C.100094
> E-mail: yanya...@cn.ibm.com
> Tel: 8610-58748025
> ***
>


Re: [GPU] [Allocation] "Scarce" Resource Allocation

2016-06-11 Thread Guangya Liu
Hi Ben,

For long term goal, instead of creating sub-pool, what about adding a new
sorter to handle **scare** resources? The current logic in allocator was
divided to two stages: allocation for quota, allocation for non quota
resources.

I think that the future logic in allocator would be divided to four stages:
1) allocation for quota
2) allocation for reserved resources
3) allocation for revocable resources
4) allocation for scare resources

Thanks,

Guangy

On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler 
wrote:

> I wanted to start a discussion about the allocation of "scarce" resources.
> "Scarce" in this context means resources that are not present on every
> machine. GPUs are the first example of a scarce resource that we support as
> a known resource type.
>
> Consider the behavior when there are the following agents in a cluster:
>
> 999 agents with (cpus:4,mem:1024,disk:1024)
> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
>
> Here there are 1000 machines but only 1 has GPUs. We call GPUs a "scarce"
> resource here because they are only present on a small percentage of the
> machines.
>
> We end up with some problematic behavior here with our current allocation
> model:
>
> (1) If a role wishes to use both GPU and non-GPU resources for tasks,
> consuming 1 GPU will lead DRF to consider the role to have a 100% share of
> the cluster, since it consumes 100% of the GPUs in the cluster. This
> framework will then not receive any other offers.
>
> (2) Because we do not have revocation yet, if a framework decides to
> consume the non-GPU resources on a GPU machine, it will prevent the GPU
> workloads from running!
>
> 
>
> I filed an epic [1] to track this. The plan for the short-term is to
> introduce two mechanisms to mitigate these issues:
>
> -Introduce a resource fairness exclusion list. This allows the shares
> of resources like "gpus" to be excluded from the dominant share.
>
> -Introduce a GPU_AWARE framework capability. This indicates that the
> scheduler is aware of GPUs and will schedule tasks accordingly. Old
> schedulers will not have the capability and will not receive any offers for
> GPU machines. If a scheduler has the capability, we'll advise that they
> avoid placing their additional non-GPU workloads on the GPU machines.
>
> 
>
> Longer term, we'll want a more robust way to manage scarce resources. The
> first thought we had was to have sub-pools of resources based on machine
> profile and perform fair sharing / quota within each pool. This addresses
> (1) cleanly, and for (2) the operator needs to explicitly disallow non-GPU
> frameworks from participating in the GPU pool.
>
> Unfortunately, by excluding non-GPU frameworks from the GPU pool we may
> have a lower level of utilization. In the even longer term, as we add
> revocation it will be possible to allow a scheduler desiring GPUs to revoke
> the resources allocated to the non-GPU workloads running on the GPU
> machines. There are a number of things we need to put in place to support
> revocation ([2], [3], [4], etc), so I'm glossing over the details here.
>
> If anyone has any thoughts or insight in this area, please share!
>
> Ben
>
> [1] https://issues.apache.org/jira/browse/MESOS-5377
> [2] https://issues.apache.org/jira/browse/MESOS-5524
> [3] https://issues.apache.org/jira/browse/MESOS-5527
> [4] https://issues.apache.org/jira/browse/MESOS-4392
>


Re: Welcome Anand and Joseph as new committers!

2016-06-09 Thread Guangya Liu
Congrats to both, well deserved!!

On Fri, Jun 10, 2016 at 5:00 AM, Vinod Kone  wrote:

> Hi folks,
>
> I'm happy to announce that the PMC has voted in *Anand Mazumdar *and
> *Joseph
> Wu* as committers and members of PMC for the Apache Mesos project.
>
> A little about the new committers.
>
> Anand has been working on the Apache Mesos project for about an year now
> and has shown incredible commitment to the project and the community. His
> significant contributions include implementing scheduler HTTP API,
> designing and implementing executor HTTP API and helping out with the
> operator HTTP API. His formal committer checklist is here
> <
> https://docs.google.com/document/d/1DGRO-z-0JKS1dIxcrzXD8QznEv-auSqmIP3-aiwhtFI/edit?usp=sharing
> >
> .
>
> Joseph's passion and dedication to the community is phenomenal. His
> significant contributions include Maintenance Primitives and Container
> Logger Modules. He has also been a valuable contributor and reviewer to our
> testing infrastructure and the Windows work. His formal committer checklist
> is here
> <
> https://docs.google.com/document/d/1o7qLQJQ7TZCaf49gSNc6SSl29qAFagYH2STDfhHDDPw/edit?usp=sharing
> >
> .
>
> Please join me in congratulating them on their new roles and especially
> responsibilities :)
>
> On behalf of the PMC,
> Vinod
>


Re: [1/2] mesos git commit: Added aufs provisioning backend.

2016-06-08 Thread Guangya Liu
The document is being tracked here
https://issues.apache.org/jira/browse/MESOS-5549

On Wed, Jun 8, 2016 at 7:29 PM, Neil Conway  wrote:

> Can you update the documentation for this change, please?
>
> Thanks,
> Neil
>
> On Tue, Jun 7, 2016 at 6:14 PM,   wrote:
> > Repository: mesos
> > Updated Branches:
> >   refs/heads/master 90871a48f -> e5358ed1c
> >
> >
> > Added aufs provisioning backend.
> >
> > Review: https://reviews.apache.org/r/47396/
> >
> >
> > Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
> > Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/e5358ed1
> > Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/e5358ed1
> > Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/e5358ed1
> >
> > Branch: refs/heads/master
> > Commit: e5358ed1c132923d5fa357d1e337e037d1f29c8a
> > Parents: ca09304
> > Author: Shuai Lin 
> > Authored: Mon Jun 6 18:05:15 2016 -0700
> > Committer: Jie Yu 
> > Committed: Tue Jun 7 09:14:22 2016 -0700
> >
> > --
> >  src/Makefile.am |   2 +
> >  .../containerizer/mesos/provisioner/backend.cpp |   9 +
> >  .../mesos/provisioner/backends/aufs.cpp | 227
> +++
> >  .../mesos/provisioner/backends/aufs.hpp |  70 ++
> >  .../containerizer/provisioner_backend_tests.cpp |  51 +
> >  src/tests/environment.cpp   |  13 ++
> >  6 files changed, 372 insertions(+)
> > --
> >
> >
> >
> http://git-wip-us.apache.org/repos/asf/mesos/blob/e5358ed1/src/Makefile.am
> > --
> > diff --git a/src/Makefile.am b/src/Makefile.am
> > index a08ea40..b02b901 100644
> > --- a/src/Makefile.am
> > +++ b/src/Makefile.am
> > @@ -1001,6 +1001,7 @@ MESOS_LINUX_FILES =
>\
> >slave/containerizer/mesos/isolators/filesystem/shared.cpp\
> >slave/containerizer/mesos/isolators/namespaces/pid.cpp   \
> >slave/containerizer/mesos/isolators/network/cni/cni.cpp  \
> > +  slave/containerizer/mesos/provisioner/backends/aufs.cpp  \
> >slave/containerizer/mesos/provisioner/backends/bind.cpp  \
> >slave/containerizer/mesos/provisioner/backends/overlay.cpp
> >
> > @@ -1024,6 +1025,7 @@ MESOS_LINUX_FILES +=
> \
> >slave/containerizer/mesos/isolators/filesystem/shared.hpp\
> >slave/containerizer/mesos/isolators/namespaces/pid.hpp   \
> >slave/containerizer/mesos/isolators/network/cni/cni.hpp  \
> > +  slave/containerizer/mesos/provisioner/backends/aufs.hpp  \
> >slave/containerizer/mesos/provisioner/backends/bind.hpp  \
> >slave/containerizer/mesos/provisioner/backends/overlay.hpp
> >
> >
> >
> http://git-wip-us.apache.org/repos/asf/mesos/blob/e5358ed1/src/slave/containerizer/mesos/provisioner/backend.cpp
> > --
> > diff --git a/src/slave/containerizer/mesos/provisioner/backend.cpp
> b/src/slave/containerizer/mesos/provisioner/backend.cpp
> > index b2a20b7..93a2c3a 100644
> > --- a/src/slave/containerizer/mesos/provisioner/backend.cpp
> > +++ b/src/slave/containerizer/mesos/provisioner/backend.cpp
> > @@ -25,6 +25,7 @@
> >  #include "slave/containerizer/mesos/provisioner/backend.hpp"
> >
> >  #ifdef __linux__
> > +#include "slave/containerizer/mesos/provisioner/backends/aufs.hpp"
> >  #include "slave/containerizer/mesos/provisioner/backends/bind.hpp"
> >  #endif
> >  #include "slave/containerizer/mesos/provisioner/backends/copy.hpp"
> > @@ -47,6 +48,14 @@ hashmap Backend::create(const
> Flags& flags)
> >  #ifdef __linux__
> >creators.put("bind", ::create);
> >
> > +  Try aufsSupported = fs::aufs::supported();
> > +  if (aufsSupported.isError()) {
> > +LOG(WARNING) << "Failed to check aufs availability: '"
> > + << aufsSupported.error();
> > +  } else if (aufsSupported.get()) {
> > +creators.put("aufs", ::create);
> > +  }
> > +
> >Try overlayfsSupported = fs::overlay::supported();
> >if (overlayfsSupported.isError()) {
> >  LOG(WARNING) << "Failed to check overlayfs availability: '"
> >
> >
> http://git-wip-us.apache.org/repos/asf/mesos/blob/e5358ed1/src/slave/containerizer/mesos/provisioner/backends/aufs.cpp
> > --
> > diff --git a/src/slave/containerizer/mesos/provisioner/backends/aufs.cpp
> b/src/slave/containerizer/mesos/provisioner/backends/aufs.cpp
> > new file mode 100644
> > index 000..54c0057
> > --- /dev/null
> > +++ b/src/slave/containerizer/mesos/provisioner/backends/aufs.cpp
> > @@ -0,0 +1,227 @@
> > +// Licensed to the 

Re: how to debug HTTP API

2016-06-07 Thread Guangya Liu
So how many agent nodes are there in your cluster? If you continue
receiving offer but without getting UPDATE message, then it may be caused
by that your task definition and the framework continually decline offer.

Can you please share your framework code here for the logic of "Event::
OFFERS"?

Thanks,

Guangya

On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> On 06/07/2016 01:59 PM, Guangya Liu wrote:
> > I can see that your framework is now holding the offer, how did you
> launch
> > task?
>
> I execute an HTTP POST request in Python with json content-type:
>
>  {'type': 'ACCEPT',
> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
> 'accept': {
> 'operations': [
> {'type': 'LAUNCH',
> 'launch': {'container': {
> 'docker': {'image': u'centos:latest',
> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
> 'type': 1,
> 'volumes': [
> {'host_path': u'/a/b', 'container_path':
> u'/mnt/home', 'mode': 1},
> {'host_path': u'/a/b/c', 'container_path':
> u'/mnt/go-docker', 'mode': 1},
> {'host_path': u'/b/c/d', 'container_path':
> u'/mnt/god-data', 'mode': 2}
> ]
> },
> 'name': u'testr',
> 'task_id': {'value': '128'},
> 'command': {'uris': [{'value':
> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
> 'slave_id': {'value':
> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
> 'resources': [
> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
> ]
> } # end launch
> } # end operation
> ],
> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
> }
> }
>
> We can see that Mesos received the ACCEPT:
>
> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
> (tifenn.irisa.fr) for framework
>
>
> and I continue to receive new offers, so "connection" is OK. I should
> receive an UPDATE message even if there is an error, but I receive none
> (I track/log all messages received, whatever the type).
>
> Olivier
>
> >  Perhaps you can take a look at
> > https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311
> which
> > is an example framework using HTTP API
> >
> > Thanks,
> >
> > Guangya
> >
> > On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> > wrote:
> >
> >>
> >> On 06/07/2016 12:25 PM, Guangya Liu wrote:
> >>> Olivier,
> >>>
> >>> For such case, seems there is sth wrong with your framework? can you
> >> please
> >>> run the following two commands and check the output?
> >> I don't think it is a framework issue, I receive offers, heartbeats
> etc...
> >> It is only at task creation step, when I have no rejection nor update
> >> message.
> >>
> >> It could be (certainly) an issue with the json task message I sent in
> >> the ACCEPT, but as there is no error, I have no way to understand what's
> >> wrong with it.
> >>> curl "http://:5050/master/frameworks" 2>/dev/null|python
> >> -m
> >>> json.tool
> >> {
> >> "completed_frameworks": [],
> >> "frameworks": [
> >> {
> >> "active": true,
> >> "capabilities": [],
> >> "checkpoint": false,
> >> "completed_tasks": [],
> >> "executors": [],
> >> "failover_timeout": 0.0,
> >> "hostname": "",
> >> "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
> >> "name": "GoDocker HTTP Framework",
> >> "offered_resources": {
> >> "cpus": 4.0,
> >> "disk": 459470.0,
> >> "mem": 14898.0,
> >> "ports": "[31000-32000]"
> >> },
> >> "offers": [
> >>   

Re: how to debug HTTP API

2016-06-07 Thread Guangya Liu
I can see that your framework is now holding the offer, how did you launch
task? Perhaps you can take a look at
https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311 which
is an example framework using HTTP API

Thanks,

Guangya

On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> On 06/07/2016 12:25 PM, Guangya Liu wrote:
> > Olivier,
> >
> > For such case, seems there is sth wrong with your framework? can you
> please
> > run the following two commands and check the output?
> I don't think it is a framework issue, I receive offers, heartbeats etc...
> It is only at task creation step, when I have no rejection nor update
> message.
>
> It could be (certainly) an issue with the json task message I sent in
> the ACCEPT, but as there is no error, I have no way to understand what's
> wrong with it.
> >
> > curl "http://:5050/master/frameworks" 2>/dev/null|python
> -m
> > json.tool
> {
> "completed_frameworks": [],
> "frameworks": [
> {
> "active": true,
> "capabilities": [],
> "checkpoint": false,
> "completed_tasks": [],
> "executors": [],
> "failover_timeout": 0.0,
> "hostname": "",
> "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
> "name": "GoDocker HTTP Framework",
> "offered_resources": {
> "cpus": 4.0,
> "disk": 459470.0,
> "mem": 14898.0,
> "ports": "[31000-32000]"
> },
> "offers": [
> {
> "framework_id":
> "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
> "id": "1f1486e3-43ee-44c5-b073-82a901add956-O0",
> "resources": {
> "cpus": 4.0,
> "disk": 459470.0,
> "mem": 14898.0,
> "ports": "[31000-32000]"
> },
> "slave_id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0"
> }
> ],
> "registered_time": 1465298174.2483,
> "resources": {
> "cpus": 4.0,
> "disk": 459470.0,
> "mem": 14898.0,
> "ports": "[31000-32000]"
> },
> "role": "*",
> "tasks": [],
> "unregistered_time": 0.0,
> "used_resources": {
> "cpus": 0.0,
> "disk": 0.0,
> "mem": 0.0
> },
> "user": "godocker_http_test",
> "webui_url": ""
> }
> ],
> "unregistered_frameworks": []
> }
>
>
> > curl "http://:5050/master/state" 2>/dev/null|python -m
> > json.tool
>
> {
> "activated_slaves": 1.0,
> "build_date": "2016-04-14 15:44:54",
> "build_time": 1460648694.0,
> "build_user": "root",
> "completed_frameworks": [],
> "deactivated_slaves": 0.0,
> "elected_time": 1465298164.01165,
> "flags": {
> "allocation_interval": "1secs",
> "allocator": "HierarchicalDRF",
> "authenticate": "false",
> "authenticate_http": "false",
> "authenticate_slaves": "false",
> "authenticators": "crammd5",
> "authorizers": "local",
> "framework_sorter": "drf",
> "help": "false",
> "hostname_lookup": "true",
> "http_authenticators": "basic",
> "initialize_driver_logging": "true",
> "log_auto_initialize": "true",
> "log_dir": "/var/log/mesos",
> "logbufsecs": "0",
> "logging_level": "INFO",

Re: how to debug HTTP API

2016-06-07 Thread Guangya Liu
Olivier,

For such case, seems there is sth wrong with your framework? can you please
run the following two commands and check the output?

curl "http://:5050/master/frameworks" 2>/dev/null|python -m
json.tool
curl "http://:5050/master/state" 2>/dev/null|python -m
json.tool

Thanks,

Guangya

On Tue, Jun 7, 2016 at 6:04 PM, Olivier Sallou 
wrote:

> Hi,
> I am trying to switch from Python to HTTP API. I use mesos 0.28.1
>
> I could create framework to register, receive offers etc...  but I have
> an issue accepting offers.
>
> I send my ACCEPT message but I do not receive any UPDATE message, only
> new offers and hearbeat messages.
>
> On mesos master logs I see:
>
> I0607 11:45:15.873184 14896 http.cpp:312] HTTP POST for
> /master/api/v1/scheduler from 127.0.0.1:38298 with
> User-Agent='python-requests/2.9.1'
> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
> (tifenn.irisa.fr) for framework
> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020 (GoDocker HTTP Framework)
>
> There is a "Processing ACCEPT" and no error, but my task is not ran on
> mesos.
> No error on slave either.
>
> Response code to my ACCEPT is 202 as expected.
>
> Here is my HTTP json message:
>
> {'type': 'ACCEPT',
> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
> 'accept': {
> 'operations': [
> {'type': 'LAUNCH',
> 'launch': {'container': {
> 'docker': {'image': u'centos:latest',
> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
> 'type': 1,
> 'volumes': [
> {'host_path': u'/a/b', 'container_path':
> u'/mnt/home', 'mode': 1},
> {'host_path': u'/a/b/c', 'container_path':
> u'/mnt/go-docker', 'mode': 1},
> {'host_path': u'/b/c/d', 'container_path':
> u'/mnt/god-data', 'mode': 2}
> ]
> },
> 'name': u'testr',
> 'task_id': {'value': '128'},
> 'command': {'uris': [{'value':
> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
> 'slave_id': {'value':
> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
> 'resources': [
> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
> ]
> } # end launch
> } # end operation
> ],
> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
> }
> }
>
> There could be an issue with my task definition, but as no error is
> raised and I receive no UPDATE error message.
>
> Any hint on how to debug this?
>
> Thanks
>
>
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


Re: [MesosCon][Slides] Any slides sharing plan for MesosCon

2016-06-06 Thread Guangya Liu
Great! Thanks Artem :D

On Monday, June 6, 2016, Artem Harutyunyan <ar...@mesosphere.io> wrote:

> Hi Guangya,
>
> You should see the slides if you go to individual talk pages, like this one
> [0]
> <
> http://mesosconna2016.sched.org/event/6jtl/containers-in-apache-mesos-present-and-future-jie-yu-tim-chen-mesosphere?iframe=yes=i:0;=yes=no
> >.
> Some of the speakers have uploaded their slides to Sched.
>
> Artem.
>
> [0] -
>
> http://mesosconna2016.sched.org/event/6jtl/containers-in-apache-mesos-present-and-future-jie-yu-tim-chen-mesosphere?iframe=yes=i:0;=yes=no
>
> On Fri, Jun 3, 2016 at 8:42 PM, Guangya Liu <gyliu...@gmail.com
> <javascript:;>> wrote:
>
> > Unlike last year's MesosCon, I saw that the slides was not shared till
> now
> > in http://events.linuxfoundation.org/events/mesoscon-north-america , any
> > plan to share those slides?
> >
> > Thanks,
> >
> > Guangya
> >
>


[MesosCon][Slides] Any slides sharing plan for MesosCon

2016-06-03 Thread Guangya Liu
Unlike last year's MesosCon, I saw that the slides was not shared till now
in http://events.linuxfoundation.org/events/mesoscon-north-america , any
plan to share those slides?

Thanks,

Guangya


Re: Documentation about debugging mesos-master : newbie

2016-06-02 Thread Guangya Liu
Hi Vinit,

Please check if you are encountering this issue:
https://github.com/Homebrew/homebrew-dupes/issues/221

Thanks,

Guangya

On Fri, Jun 3, 2016 at 2:24 AM, Vinit Mahedia 
wrote:

> Hi Gilbert,
>
> Thank you for replying.
>
> Yes, I did that.
>
>
>1.  ./configure --enable-debug --disable-java --disable-python
>2.  make
>3. ./bin/gdb-mesos-master.sh --ip=127.0.0.1 --work_dir=.
>
> Although even after setting source directory, I can not set breakpoint I
> get warning like this
>
> (gdb) break master.cpp:2481
> Cannot access memory at address 0x714d40
>
>
> I also tried few things, passing "static" flag to libtool, passing
>  "--enable-static"
>
> Although I got linker error, where I saw libtool was not using --static
> flag and I do
> not know if doing that will fix it. I forgot to mention that am building
> this on Mac OS.
>
> Thank you.
>
>
>
> On Thu, Jun 2, 2016 at 12:33 PM, Gilbert Song 
> wrote:
>
> > Hi Vinit,
> >
> > Did you configure with debug mode (e.g., ../confugure --enable-debug)?
> >
> > Assuming you have the gdb installed, you should be able to debug mesos
> > master
> > in gbd:
> >
> > ./bin/gdb-mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
> >
> >
> > Gilbert
> >
> > On Thu, Jun 2, 2016 at 9:30 AM, Vinit Mahedia 
> > wrote:
> >
> > > I have been trying to debug mesos-master using gdb-mesos-master.sh
> > although
> > > it does not load symbols or sources. I tried to set those paths as well
> > but
> > > since it thinks mesos-master, libtool script, is the main binary.
> > >
> > > I just want to set the dev environment and try to fix a very stupid bug
> > to
> > > learn the work flow of test/debug/commit.
> > >
> > > If I can get it working, I can help to write if such documentation does
> > not
> > > exist. I also tried to set it up on eclipse CDT but it can't handle
> > libtool
> > > scripts.
> > >
> > > Thank you.
> > >
> >
>


Re: [VOTE] Release Apache Mesos 0.28.2 (rc1)

2016-05-29 Thread Guangya Liu
+1 to this.

Only one minor comment: I saw that there are indeed someone using mesos
CLI, such as `mesos ps` etc. I did minor enhancement for mesos CLI recently
to improve debug-ability, it would be great if we can have those in 0.28.2.

The following are the three patches for mesos CLI enhancement:
https://reviews.apache.org/r/47636/ Fixed some coding error in mesos-ps.
https://reviews.apache.org/r/47693/ Added more error info for mesos ps.
https://reviews.apache.org/r/47711/ Added more verbose message when mesos
command encouter error.

Thanks,

Guangya

On Mon, May 30, 2016 at 7:53 AM, Jie Yu  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.28.2.
>
>
> 0.28.2 is a bug fix release. It includes the following:
>
> 
> ** Bug
>   * [MESOS-4705] - Linux 'perf' parsing logic may fail when OS
> distribution has perf backports.
>   * [MESOS-5239] - Persistent volume DockerContainerizer support assumes
> proper mount propagation setup on the host.
>   * [MESOS-5253] - Isolator cleanup should not be invoked if they are not
> prepared yet.
>   * [MESOS-5282] - Destroy container while provisioning volume images may
> lead to a race.
>   * [MESOS-5312] - Env `MESOS_SANDBOX` is not set properly for command
> tasks that changes rootfs.
>   * [MESOS-4885] - Unzip should force overwrite.
>   * [MESOS-5449] - Memory leak in SchedulerProcess.declineOffer.
>   * [MESOS-5380] - Killing a queued task can cause the corresponding
> command executor to never terminate.
>
> ** Improvement
>   * [MESOS-5307] - Sandbox mounts should not be in the host mount
> namespace.
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.2-rc1
>
> 
>
> The candidate for Mesos 0.28.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.28.2-rc1/mesos-0.28.2.tar.gz
>
> The tag to be voted on is 0.28.2-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.28.2-rc1
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.28.2-rc1/mesos-0.28.2.tar.gz.md5
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.28.2-rc1/mesos-0.28.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1140
>
> Please vote on releasing this package as Apache Mesos 0.28.2!
>
> The vote is open until Wed Jun  1 16:51:42 PDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.28.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> - Jie
>


Re: [Design Docs] A centralized place to store all design docs

2016-05-26 Thread Guangya Liu
It's here
https://cwiki.apache.org/confluence/display/MESOS/Design+docs+--+Shared+Links

On Thu, May 26, 2016 at 2:16 PM, Jay JN Guo  wrote:

>
> Hi folk,
>
> Do we have a place (e.g. google driver directory) to aggregate all design
> docs? I think it makes life easier for people to navigate and dig up
> history.
>
> cheers,
> /J
>


Re: Mesos Modules callbacks

2016-05-24 Thread Guangya Liu
Hi ct,

There are not too much document for this, but you can refer to the
following as reference and this may help.

1)
https://github.com/apache/mesos/blob/master/docs/networking-for-mesos-managed-containers.md#writing-a-custom-network-isolator-module
2)
https://github.com/apache/mesos/blob/master/include/mesos/slave/isolator.hpp

Thanks,

Guangya

On Wed, May 25, 2016 at 12:09 PM, ct clmsn  wrote:

> I'm in the midst of writing a mesos module and wanted to know if there was
> documentation available (in source, or html, or in proposals) that
> explained the callbacks and the order in which they are called (prepare,
> recover, isolate, update). Thanks in advance!
>
> ct
>


Re: volume / mount point error with Unified Containerizer

2016-05-24 Thread Guangya Liu
Hi Zhitao,

This issue was already fixed in 0.28.x and upper version. I have a try with
both 0.28.x and master branch, it works well for Olivier's case.

Thanks,

Guangya

On Tue, May 24, 2016 at 11:58 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:

> Guangya,
>
> Is there a JIRA work board or document to track what's being fixed, working
> on or planned in 0.29 and further releases?
>
> On Tue, May 24, 2016 at 1:51 AM, Guangya Liu <gyliu...@gmail.com> wrote:
>
> > Olivier,
> >
> > If it is not urgent, I think that you can use 0.29 in next week, the
> > unified container will have more features in 0.29 such as external
> storage
> > and network integration.
> >
> > Thanks,
> >
> > Guangya
> >
> > On Tue, May 24, 2016 at 2:17 AM, Olivier Sallou <olivier.sal...@irisa.fr
> >
> > wrote:
> >
> > >
> > >
> > > - Mail original -
> > > > De: "Guangya Liu" <gyliu...@gmail.com>
> > > > À: "dev" <dev@mesos.apache.org>, "Jie Yu" <j...@mesosphere.io>,
> > "Gilbert
> > > Song" <gilb...@mesosphere.io>
> > > > Envoyé: Lundi 23 Mai 2016 17:34:41
> > > > Objet: Re: volume / mount point error with Unified Containerizer
> > > >
> > > > It is a bit strange to me, I also did some test and review code for
> > > > relative path, and found that relative path works well.
> > > >
> > > > In 0.28.1, if deploy a docker container with MesosContaineirizer,
> then
> > if
> > > > using absolute path as continer_path, the mesos agent will update the
> > > > container_path to a relative path by adding a prefix ./rootfs to the
> > > > container_path, e.g. /file/path = > ./rootfs/file/path.
> > > >
> > > > If deploy a docker container with MesosContaineirizer with relative
> > path
> > > as
> > > > container_path, then the mesos agent will not update the
> > container_path.
> > > >
> > > > So the final mount point for the container should be either
> > > >
> > > > 1) /tmp/mesos/slaves/agent_id/frameworks/framework_id/
> > > > executors/51/runs/container_id/.rootfs/file/path
> > > > 2)
> > /tmp/mesos/slaves/agent_id/frameworks/framework_id/executors/51/runs/
> > > > container_id/file/path
> > > >
> > > > The only difference is adding ./rootfs as a prefix or not, the test
> > > result
> > > > is that 1) does not work and 2) works well. And even the mount for 1)
> > > > failed, but I can see the mount point path does exist.
> > > >
> > >
> > > @Guangya
> > > I confirm that using relative path works fine, I get volumes in mesos
> > path
> > > (but it does not help for my implementation).
> > > If I use the Docker containerizer, absolute paths are fine, this is
> what
> > I
> > > use for the moment in my code, and am investigating to switch to
> unified
> > > container.
> > >
> > >
> > > > @Yu Jie and @Gilbert, any comments for this?
> > > >
> > > > @Oilivier,
> > > >
> > > > In order not to block your test, can you please use mesos after
> 0.28.1?
> > > You
> > > > can use either 0.28.2 or above version.
> > >
> > > Well, as this is not an urgent matter, I am waiting 0.29 to test
> against
> > > this release (with other features I am waiting for).
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Guangya
> > > >
> > > >
> > > > On Mon, May 23, 2016 at 10:30 PM, Guangya Liu <gyliu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks Olivier, I can reproduce this issue now and still checking
> > what
> > > is
> > > > > wrong.
> > > > >
> > > > > What I did is as following:
> > > > > 1)  Check out code with tag of 0.28.1
> > > > > 2) update mesos-execute to add a host path volume
> > > > > diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp
> > > > > index 81a0388..0ff913c 100644
> > > > > --- a/src/cli/execute.cpp
> > > > > +++ b/src/cli/execute.cpp
> > > > > @@ -72,6 +72,8 @@ using mesos::v1::TaskID;
> > > > >  using mesos::v1::TaskInfo;
> > > > >  using mesos::v1::TaskState;
> > > > >  using mesos::v1::Ta

Re: volume / mount point error with Unified Containerizer

2016-05-24 Thread Guangya Liu
Olivier,

If it is not urgent, I think that you can use 0.29 in next week, the
unified container will have more features in 0.29 such as external storage
and network integration.

Thanks,

Guangya

On Tue, May 24, 2016 at 2:17 AM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> - Mail original -
> > De: "Guangya Liu" <gyliu...@gmail.com>
> > À: "dev" <dev@mesos.apache.org>, "Jie Yu" <j...@mesosphere.io>, "Gilbert
> Song" <gilb...@mesosphere.io>
> > Envoyé: Lundi 23 Mai 2016 17:34:41
> > Objet: Re: volume / mount point error with Unified Containerizer
> >
> > It is a bit strange to me, I also did some test and review code for
> > relative path, and found that relative path works well.
> >
> > In 0.28.1, if deploy a docker container with MesosContaineirizer, then if
> > using absolute path as continer_path, the mesos agent will update the
> > container_path to a relative path by adding a prefix ./rootfs to the
> > container_path, e.g. /file/path = > ./rootfs/file/path.
> >
> > If deploy a docker container with MesosContaineirizer with relative path
> as
> > container_path, then the mesos agent will not update the container_path.
> >
> > So the final mount point for the container should be either
> >
> > 1) /tmp/mesos/slaves/agent_id/frameworks/framework_id/
> > executors/51/runs/container_id/.rootfs/file/path
> > 2) /tmp/mesos/slaves/agent_id/frameworks/framework_id/executors/51/runs/
> > container_id/file/path
> >
> > The only difference is adding ./rootfs as a prefix or not, the test
> result
> > is that 1) does not work and 2) works well. And even the mount for 1)
> > failed, but I can see the mount point path does exist.
> >
>
> @Guangya
> I confirm that using relative path works fine, I get volumes in mesos path
> (but it does not help for my implementation).
> If I use the Docker containerizer, absolute paths are fine, this is what I
> use for the moment in my code, and am investigating to switch to unified
> container.
>
>
> > @Yu Jie and @Gilbert, any comments for this?
> >
> > @Oilivier,
> >
> > In order not to block your test, can you please use mesos after 0.28.1?
> You
> > can use either 0.28.2 or above version.
>
> Well, as this is not an urgent matter, I am waiting 0.29 to test against
> this release (with other features I am waiting for).
>
> >
> > Thanks,
> >
> > Guangya
> >
> >
> > On Mon, May 23, 2016 at 10:30 PM, Guangya Liu <gyliu...@gmail.com>
> wrote:
> >
> > > Thanks Olivier, I can reproduce this issue now and still checking what
> is
> > > wrong.
> > >
> > > What I did is as following:
> > > 1)  Check out code with tag of 0.28.1
> > > 2) update mesos-execute to add a host path volume
> > > diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp
> > > index 81a0388..0ff913c 100644
> > > --- a/src/cli/execute.cpp
> > > +++ b/src/cli/execute.cpp
> > > @@ -72,6 +72,8 @@ using mesos::v1::TaskID;
> > >  using mesos::v1::TaskInfo;
> > >  using mesos::v1::TaskState;
> > >  using mesos::v1::TaskStatus;
> > > +using mesos::v1::Volume;
> > > +using mesos::v1::Parameters;
> > >
> > >  using mesos::v1::scheduler::Call;
> > >  using mesos::v1::scheduler::Event;
> > > @@ -572,6 +574,12 @@ private:
> > >  }
> > >}
> > >
> > > +  Volume* volume1 = containerInfo.add_volumes();
> > > +  volume1->set_container_path("/tmp/abcd");
> > > +  volume1->set_mode(Volume::RW);
> > > +  volume1->set_host_path("/root/convoy");
> > > +   cout << "Add Voume 1" << endl;
> > > +
> > >return containerInfo;
> > >  } else if (containerizer == "docker") {
> > >// 'docker' containerizer only supports 'docker' images.
> > > 3) launch a task with docker image, task failed.
> > >
> > > 4) Check sandbox:
> > > + /root/src/mesos/m1/mesos/build/src/mesos-containerizer mount
> > > --help=false --operation=make-rslave --path=/
> > > + grep -E /tmp/mesos/.+ /proc/self/mountinfo
> > > + grep -v 3239aafc-78d8-4f70-81e5-f32fb379
> > > + cut+  -d  -f5
> > > xargs --no-run-if-empty umount -l
> > > + mount -n --rbind
> > >
> /tmp/mesos/provisioner/containers/3239aafc-78d8-4f70-81e5-f32fb379/backends/copy/rootfses/5e8bf3fa-53b1-4bd5-bb3

Re: volume / mount point error with Unified Containerizer

2016-05-23 Thread Guangya Liu
It is a bit strange to me, I also did some test and review code for
relative path, and found that relative path works well.

In 0.28.1, if deploy a docker container with MesosContaineirizer, then if
using absolute path as continer_path, the mesos agent will update the
container_path to a relative path by adding a prefix ./rootfs to the
container_path, e.g. /file/path = > ./rootfs/file/path.

If deploy a docker container with MesosContaineirizer with relative path as
container_path, then the mesos agent will not update the container_path.

So the final mount point for the container should be either

1) /tmp/mesos/slaves/agent_id/frameworks/framework_id/
executors/51/runs/container_id/.rootfs/file/path
2) /tmp/mesos/slaves/agent_id/frameworks/framework_id/executors/51/runs/
container_id/file/path

The only difference is adding ./rootfs as a prefix or not, the test result
is that 1) does not work and 2) works well. And even the mount for 1)
failed, but I can see the mount point path does exist.

@Yu Jie and @Gilbert, any comments for this?

@Oilivier,

In order not to block your test, can you please use mesos after 0.28.1? You
can use either 0.28.2 or above version.

Thanks,

Guangya


On Mon, May 23, 2016 at 10:30 PM, Guangya Liu <gyliu...@gmail.com> wrote:

> Thanks Olivier, I can reproduce this issue now and still checking what is
> wrong.
>
> What I did is as following:
> 1)  Check out code with tag of 0.28.1
> 2) update mesos-execute to add a host path volume
> diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp
> index 81a0388..0ff913c 100644
> --- a/src/cli/execute.cpp
> +++ b/src/cli/execute.cpp
> @@ -72,6 +72,8 @@ using mesos::v1::TaskID;
>  using mesos::v1::TaskInfo;
>  using mesos::v1::TaskState;
>  using mesos::v1::TaskStatus;
> +using mesos::v1::Volume;
> +using mesos::v1::Parameters;
>
>  using mesos::v1::scheduler::Call;
>  using mesos::v1::scheduler::Event;
> @@ -572,6 +574,12 @@ private:
>  }
>}
>
> +  Volume* volume1 = containerInfo.add_volumes();
> +  volume1->set_container_path("/tmp/abcd");
> +  volume1->set_mode(Volume::RW);
> +  volume1->set_host_path("/root/convoy");
> +   cout << "Add Voume 1" << endl;
> +
>return containerInfo;
>  } else if (containerizer == "docker") {
>// 'docker' containerizer only supports 'docker' images.
> 3) launch a task with docker image, task failed.
>
> 4) Check sandbox:
> + /root/src/mesos/m1/mesos/build/src/mesos-containerizer mount
> --help=false --operation=make-rslave --path=/
> + grep -E /tmp/mesos/.+ /proc/self/mountinfo
> + grep -v 3239aafc-78d8-4f70-81e5-f32fb379
> + cut+  -d  -f5
> xargs --no-run-if-empty umount -l
> + mount -n --rbind
> /tmp/mesos/provisioner/containers/3239aafc-78d8-4f70-81e5-f32fb379/backends/copy/rootfses/5e8bf3fa-53b1-4bd5-bb3d-525ddc7900b6
> /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs
> + mount -n --rbind /root/convoy
> /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
> mount: mount point
> /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
> does not exist
> Failed to execute a preparation shell command
>
> Will check more for this.
>
> Thanks,
>
> Guangya
>
> On Mon, May 23, 2016 at 3:35 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>>
>>
>> On 05/23/2016 09:33 AM, Olivier Sallou wrote:
>> >
>> > On 05/20/2016 03:26 PM, Guangya Liu wrote:
>> >> Since you are using docker image which means that your container will
>> have
>> >> rootfs, so it is not required to have the absolute path exist, the
>> linux
>> >> file system isolator will help create the path automatically
>> >>
>> https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402
>> >>
>> >> Can you please share your framework? How did you set the volume part in
>> >> your framework?
>> > @Guangya
>> >
>> > I use Python API.
>> >
>> > Here is related code:
>> >
>> > 
>> >  # Define container volumes
>> >  for v in job['container']['volumes']:
>> > volume = container.volumes.add()
>> >  

Re: volume / mount point error with Unified Containerizer

2016-05-23 Thread Guangya Liu
Thanks Olivier, I can reproduce this issue now and still checking what is
wrong.

What I did is as following:
1)  Check out code with tag of 0.28.1
2) update mesos-execute to add a host path volume
diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp
index 81a0388..0ff913c 100644
--- a/src/cli/execute.cpp
+++ b/src/cli/execute.cpp
@@ -72,6 +72,8 @@ using mesos::v1::TaskID;
 using mesos::v1::TaskInfo;
 using mesos::v1::TaskState;
 using mesos::v1::TaskStatus;
+using mesos::v1::Volume;
+using mesos::v1::Parameters;

 using mesos::v1::scheduler::Call;
 using mesos::v1::scheduler::Event;
@@ -572,6 +574,12 @@ private:
 }
   }

+  Volume* volume1 = containerInfo.add_volumes();
+  volume1->set_container_path("/tmp/abcd");
+  volume1->set_mode(Volume::RW);
+  volume1->set_host_path("/root/convoy");
+   cout << "Add Voume 1" << endl;
+
   return containerInfo;
 } else if (containerizer == "docker") {
   // 'docker' containerizer only supports 'docker' images.
3) launch a task with docker image, task failed.

4) Check sandbox:
+ /root/src/mesos/m1/mesos/build/src/mesos-containerizer mount --help=false
--operation=make-rslave --path=/
+ grep -E /tmp/mesos/.+ /proc/self/mountinfo
+ grep -v 3239aafc-78d8-4f70-81e5-f32fb379
+ cut+  -d  -f5
xargs --no-run-if-empty umount -l
+ mount -n --rbind
/tmp/mesos/provisioner/containers/3239aafc-78d8-4f70-81e5-f32fb379/backends/copy/rootfses/5e8bf3fa-53b1-4bd5-bb3d-525ddc7900b6
/tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs
+ mount -n --rbind /root/convoy
/tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
mount: mount point
/tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
does not exist
Failed to execute a preparation shell command

Will check more for this.

Thanks,

Guangya

On Mon, May 23, 2016 at 3:35 PM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> On 05/23/2016 09:33 AM, Olivier Sallou wrote:
> >
> > On 05/20/2016 03:26 PM, Guangya Liu wrote:
> >> Since you are using docker image which means that your container will
> have
> >> rootfs, so it is not required to have the absolute path exist, the linux
> >> file system isolator will help create the path automatically
> >>
> https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402
> >>
> >> Can you please share your framework? How did you set the volume part in
> >> your framework?
> > @Guangya
> >
> > I use Python API.
> >
> > Here is related code:
> >
> > 
> >  # Define container volumes
> >  for v in job['container']['volumes']:
> > volume = container.volumes.add()
> > volume.container_path = v['mount']
> > volume.host_path = v['path']
> > if v['acl'] == 'rw':
> > volume.mode = 1 # mesos_pb2.Volume.Mode.RW
> > else:
> > volume.mode = 2 # mesos_pb2.Volume.Mode.RO
> >
> > => In my test case, I add 2 volumes from a host shared directory,
> > mounted in container as /mnt/go-docker and /mnt/god-data.
> >
> > ...
> > # Define docker  image and network
> > docker = mesos_pb2.ContainerInfo.MesosInfo()
> > docker.image.type = 2 # Docker
> > docker.image.docker.name ='centos:latest'
> > # Request an IP from a network module
> > network_info = container.network_infos.add()
> > network_info_name = 'sampletest'
> > # Get an IP V4 address
> > ip_address = network_info.ip_addresses.add()
> > ip_address.protocol = 1
> > # The network group to join
> > group = network_info.groups.append(network_info_name)
> > port_list = [22]
> > if port_list:
> > for port in port_list:
> > job['container']['port_mapping'].append({'host':
> > port, 'container': port})
> > container.mesos.MergeFrom(docker)
> >
> > It results in error message:
> >
> > + mount -n --rbind
> >
> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d

Re: [REVIEW PROCESS] Proposal for new review process working group

2016-05-20 Thread Guangya Liu
+1, please count me in as well.

Thanks,

Guangya

On Sat, May 21, 2016 at 2:11 AM, Shivam Pathak 
wrote:

> Great! please add me to the group
>
> On Fri, May 20, 2016 at 11:07 AM, haosdent  wrote:
>
> > This sounds great, add me to the group please.
> >
> > On Sat, May 21, 2016 at 1:59 AM, Kevin Klues  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose starting a dedicated "review process" working
> > > group.  The goals of this working group will be to:
> > >
> > > 1) Discuss issues around the current review process
> > > 2) Propose improvements to the current review process
> > > 3) Implement / Monitor / Enforce the new process we come up with going
> > > forward
> > >
> > > Anyone who'd like to be involved, please respond to this thread so I
> > > can add you to the working group.  We will likely start actively
> > > discussing things after MesosCon.
> > >
> > > --
> > > ~Kevin
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>
>
>
> --
> *Shivam Pathak (Mr)*
> Software Engineer and Systems Architect
> Novatap Private Ltd.
> HP: +65 8543 2297
>


Re: volume / mount point error with Unified Containerizer

2016-05-20 Thread Guangya Liu
Since you are using docker image which means that your container will have
rootfs, so it is not required to have the absolute path exist, the linux
file system isolator will help create the path automatically
https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402

Can you please share your framework? How did you set the volume part in
your framework?

Thanks,

Guangya

On Fri, May 20, 2016 at 4:54 AM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> - Mail original -
> > De: "Gilbert Song" <gilb...@mesosphere.io>
> > À: "dev" <dev@mesos.apache.org>
> > Envoyé: Jeudi 19 Mai 2016 01:57:16
> > Objet: Re: volume / mount point error with Unified Containerizer
> >
> > @Olivier,
> > In mesos 0.28.1, you are supposed to be able bind mount a volume from
> > the host into the mesos container. Did you specify a docker image (we
> > determine
> > the mount point differently depending whether the container has a
> rootfs)?
>
> Yes I specified an image, a Docker image URI.
>
> > How
> > do you specify your 'container_path' (the mount point in the container)?
> If
> > it is an
> > absolute path, we require that dir to be pre-existed. If it is a relative
> > path, we will
> > mkdir for it.
>
> It is an absolute path, but it does not exists in image (this is the
> issue). Images are custom Docker images (images containing tools for batch
> computing), and I want, for example, to mount some shared resources (user
> home dir, common data, etc.) in the image. Of course those directories do
> not pre-exists in container images as they are specific to the environment.
> Requiring existence of the directory in the image is not issue as it
> prevents using any existing image from a repo.
>
> When using Docker containerizer it works fine, I can mount any external
> storage in the container.
>
> Olivie
>
>
> >
> > @Joshua,
> > Thank for posting your workaround on mesos. As I mentioned above, in
> 0.28.1
> > or
> > older, we only mkdir for container_path which is relative path (not
> > starting with "/").
> > Because if no rootfs specified for a mesos container, the container
> shares
> > the host
> > root filesystem. Obviously we don't want any random files to be created
> > implicitly
> > on your host fs.
> > From mesos 0.29 (release by the end of this month), we will mkdir the
> mount
> > point in the container except for the command task case that specify an
> > absolute
> > container_path without a rootfs. Because we simplify the mounting logic,
> and
> > sandbox bind mount will only be done in container mount namespace
> instead of
> > host mount namespace (what we did before). Please keep tuned.
> >
> > Cheers,
> > Gilbert
> >
> > On Wed, May 18, 2016 at 8:14 AM, Joshua Cohen <jco...@apache.org> wrote:
> >
> > > Hi Olivier,
> > >
> > > I touched on this issue as part of
> > > https://issues.apache.org/jira/browse/MESOS-5229. It would be nice if
> > > Mesos
> > > automatically created container mount points if they don't already
> exist.
> > > In the meantime, as a workaround for this, I've updated my filesystem
> > > images to include the path (e.g. in Dockerfile, add `RUN mkdir -p
> > > /some/mount/point`). Not the best solution, but the only thing I've
> seen
> > > that works at the moment.
> > >
> > > Cheers,
> > >
> > > Joshua
> > >
> > > On Wed, May 18, 2016 at 7:36 AM, Guangya Liu <gyliu...@gmail.com>
> wrote:
> > >
> > > > It's pretty simple for you from scratch with source code
> > > >
> > > >
> > >
> https://github.com/apache/mesos/blob/master/docs/getting-started.md#building-mesos
> > > > ;-)
> > > >
> > > > Thanks,
> > > >
> > > > Guangya
> > > >
> > > > On Wed, May 18, 2016 at 8:30 PM, Olivier Sallou <
> olivier.sal...@irisa.fr
> > > >
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > On 05/18/2016 02:31 PM, Guangya Liu wrote:
> > > > > > Just saw that you are working with 0.28.1, the "docker volume
> driver"
> > > > > code
> > > > > > was not in 0.28.1, can you please have a try with mesos master
> branch
> > > > if
> > > > > > you are only doing some test?
> > > > >

Re: volume / mount point error with Unified Containerizer

2016-05-18 Thread Guangya Liu
It's pretty simple for you from scratch with source code
https://github.com/apache/mesos/blob/master/docs/getting-started.md#building-mesos
;-)

Thanks,

Guangya

On Wed, May 18, 2016 at 8:30 PM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> On 05/18/2016 02:31 PM, Guangya Liu wrote:
> > Just saw that you are working with 0.28.1, the "docker volume driver"
> code
> > was not in 0.28.1, can you please have a try with mesos master branch if
> > you are only doing some test?
> this is indeed test only for the moment. But I will have to
> recompile/install mesos  :-(  (I used packages for install).
>
> I will try when possible, but thanks for the hint.
> >
> > Thanks,
> >
> > Guangya
> >
> > On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com> wrote:
> >
> >> Hi Olivier,
> >>
> >> I think that you need to enable "docker volume isolator" if you want use
> >> external storage with unified container I was writing a document here
> >> https://reviews.apache.org/r/47511/, perhaps you can have a try
> according
> >> to the document and post some comments there if you find any issues.
> >>
> >> Also you can patch mesos-execute here
> https://reviews.apache.org/r/46762/ to
> >> have a try with mesos-execute.
> >>
> >> Thanks,
> >>
> >> Guangya
> >>
> >> On Wed, May 18, 2016 at 7:17 PM, Olivier Sallou <
> olivier.sal...@irisa.fr>
> >> wrote:
> >>
> >>> Answering (partially) to myself.
> >>>
> >>> I seems issue is container_path does not exists inside container. On
> >>> Docker, path is created and mounted. With pure mesos, container_path
> >>> must exists.
> >>>
> >>> mesos.proto says: "If the path is an absolute path, that path must
> >>> already exist."
> >>>
> >>> This is an issue however, using Docker images, the path I want to mount
> >>> does not exists, and it cannot be modified "on the fly".
> >>>
> >>> Is there a workaround for this ?
> >>>
> >>>
> >>> On 05/18/2016 12:24 PM, Olivier Sallou wrote:
> >>>> Hi,
> >>>> I am trying unified containerizer on a single server (master/slave) on
> >>>> mesos 0.28.1, to switch from docker containerizer to mesos+docker
> image
> >>>> container.
> >>>>
> >>>> I have setup slave config as suggested in documentation:
> >>>>
> >>>> containerizers=docker,mesos
> >>>> image_providers=docker \
> >>>> isolation=filesystem/linux,docker/runtime
> >>>>
> >>>> However, when I execute my task with a volume I have an error:
> >>>>
> >>>> 
> >>>> + mount -n --rbind
> >>>>
> >>>
> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
> >>>
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
> >>>> + mount -n --rbind
> >>>>
> >>>
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> >>>
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> >>>> mount: mount point
> >>>>
> >>>
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> >>>> does not exist
> >>>> Failed to execute a preparation shell command
> >>>>
> >>>> Then, my task switches to FAILED.
> >>>>
> >>>> I define a local volume to bind mount in my "container"
> >>>>
> >>>
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> >>>> => /mnt/god-data
> >>>> My directory exists on local server.
> >>>> In mesos UI, I can see the .rootfs directory along stdout and stderr
> >>>> files, and inside .rootfs, I can see /mnt/god-data (empty).
> >>>>
> >>>> Running the same using Docker containerizer instead of mesos
> >>>> containerizer (with a Docker image) works fine.
> >>>>
> >>>> It seems it fails to mount my local directory in the container. Any
> idea
> >>>> of what is going wrong or how to debug this?
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>> --
> >>> Olivier Sallou
> >>> IRISA / University of Rennes 1
> >>> Campus de Beaulieu, 35000 RENNES - FRANCE
> >>> Tel: 02.99.84.71.95
> >>>
> >>> gpg key id: 4096R/326D8438  (keyring.debian.org)
> >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> >>>
> >>>
>
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


Re: volume / mount point error with Unified Containerizer

2016-05-18 Thread Guangya Liu
Just saw that you are working with 0.28.1, the "docker volume driver" code
was not in 0.28.1, can you please have a try with mesos master branch if
you are only doing some test?

Thanks,

Guangya

On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com> wrote:

> Hi Olivier,
>
> I think that you need to enable "docker volume isolator" if you want use
> external storage with unified container I was writing a document here
> https://reviews.apache.org/r/47511/, perhaps you can have a try according
> to the document and post some comments there if you find any issues.
>
> Also you can patch mesos-execute here https://reviews.apache.org/r/46762/ to
> have a try with mesos-execute.
>
> Thanks,
>
> Guangya
>
> On Wed, May 18, 2016 at 7:17 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> Answering (partially) to myself.
>>
>> I seems issue is container_path does not exists inside container. On
>> Docker, path is created and mounted. With pure mesos, container_path
>> must exists.
>>
>> mesos.proto says: "If the path is an absolute path, that path must
>> already exist."
>>
>> This is an issue however, using Docker images, the path I want to mount
>> does not exists, and it cannot be modified "on the fly".
>>
>> Is there a workaround for this ?
>>
>>
>> On 05/18/2016 12:24 PM, Olivier Sallou wrote:
>> > Hi,
>> > I am trying unified containerizer on a single server (master/slave) on
>> > mesos 0.28.1, to switch from docker containerizer to mesos+docker image
>> > container.
>> >
>> > I have setup slave config as suggested in documentation:
>> >
>> > containerizers=docker,mesos
>> > image_providers=docker \
>> > isolation=filesystem/linux,docker/runtime
>> >
>> > However, when I execute my task with a volume I have an error:
>> >
>> > 
>> > + mount -n --rbind
>> >
>> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
>> >
>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
>> > + mount -n --rbind
>> >
>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
>> >
>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
>> > mount: mount point
>> >
>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
>> > does not exist
>> > Failed to execute a preparation shell command
>> >
>> > Then, my task switches to FAILED.
>> >
>> > I define a local volume to bind mount in my "container"
>> >
>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
>> > => /mnt/god-data
>> > My directory exists on local server.
>> > In mesos UI, I can see the .rootfs directory along stdout and stderr
>> > files, and inside .rootfs, I can see /mnt/god-data (empty).
>> >
>> > Running the same using Docker containerizer instead of mesos
>> > containerizer (with a Docker image) works fine.
>> >
>> > It seems it fails to mount my local directory in the container. Any idea
>> > of what is going wrong or how to debug this?
>> >
>> >
>> > Thanks
>> >
>>
>> --
>> Olivier Sallou
>> IRISA / University of Rennes 1
>> Campus de Beaulieu, 35000 RENNES - FRANCE
>> Tel: 02.99.84.71.95
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>
>


Re: volume / mount point error with Unified Containerizer

2016-05-18 Thread Guangya Liu
Hi Olivier,

I think that you need to enable "docker volume isolator" if you want use
external storage with unified container I was writing a document here
https://reviews.apache.org/r/47511/, perhaps you can have a try according
to the document and post some comments there if you find any issues.

Also you can patch mesos-execute here https://reviews.apache.org/r/46762/ to
have a try with mesos-execute.

Thanks,

Guangya

On Wed, May 18, 2016 at 7:17 PM, Olivier Sallou 
wrote:

> Answering (partially) to myself.
>
> I seems issue is container_path does not exists inside container. On
> Docker, path is created and mounted. With pure mesos, container_path
> must exists.
>
> mesos.proto says: "If the path is an absolute path, that path must
> already exist."
>
> This is an issue however, using Docker images, the path I want to mount
> does not exists, and it cannot be modified "on the fly".
>
> Is there a workaround for this ?
>
>
> On 05/18/2016 12:24 PM, Olivier Sallou wrote:
> > Hi,
> > I am trying unified containerizer on a single server (master/slave) on
> > mesos 0.28.1, to switch from docker containerizer to mesos+docker image
> > container.
> >
> > I have setup slave config as suggested in documentation:
> >
> > containerizers=docker,mesos
> > image_providers=docker \
> > isolation=filesystem/linux,docker/runtime
> >
> > However, when I execute my task with a volume I have an error:
> >
> > 
> > + mount -n --rbind
> >
> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
> >
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
> > + mount -n --rbind
> >
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> >
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> > mount: mount point
> >
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> > does not exist
> > Failed to execute a preparation shell command
> >
> > Then, my task switches to FAILED.
> >
> > I define a local volume to bind mount in my "container"
> >
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> > => /mnt/god-data
> > My directory exists on local server.
> > In mesos UI, I can see the .rootfs directory along stdout and stderr
> > files, and inside .rootfs, I can see /mnt/god-data (empty).
> >
> > Running the same using Docker containerizer instead of mesos
> > containerizer (with a Docker image) works fine.
> >
> > It seems it fails to mount my local directory in the container. Any idea
> > of what is going wrong or how to debug this?
> >
> >
> > Thanks
> >
>
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


Re: Mesos admin REST API

2016-05-18 Thread Guangya Liu
No, but there are some discussion and JIRA tracing this
https://issues.apache.org/jira/browse/MESOS-3220

On Wed, May 18, 2016 at 4:08 PM, Olivier Sallou 
wrote:

> Hi,
> Is there any operator/admin admin to kill a task,  via an admin API ?
>
> I faced issue where mesos does not send any offer to my framework after
> a task failure (remains in staging, or can't contact an old framework.
> The result is my framework cannot send new kills etc..
>
> I'd like, as a mesos admin, to send a kill request (or other kind of
> requests), "by passing" the framework.
>
> Thanks
>
> Olivier
>
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


[Unified Container] [Unit Test] Pay attention to your test case when using "alpine" docker image with INTERNET filter

2016-05-10 Thread Guangya Liu
Hi,

We got two JIRA tickets for same issue about test failure on CentOS 7:
1) https://issues.apache.org/jira/browse/MESOS-5351
2) https://issues.apache.org/jira/browse/MESOS-4810

The issue for those two JIRA tickets are the test case will be failed on
CentOS 7, the root cause is that on some linux distribution, '/bin' is not
under $PATH when some shell is used. Since the container image 'alpine'
itself does not specify environment variables, $PATH will be inherit from
the agent. As a result, when we exec, the exec cannot find 'sh' because
it's under /bin in alpine, but '/bin' is not under $PATH.

I see that we are continually adding test cases for unified container with
INTERNET CURL option, so just sending this out as a reminder to avoid such
issue in your test case.

To avoid such issue, please refer to
https://github.com/apache/mesos/blob/master/src/tests/containerizer/provisioner_docker_tests.cpp#L435-L443
as a reference: *Use a non-shell command because 'sh' might not be in the
PATH.*

Thanks,

Guangya


[Allocator] Looking for a shepherd for MESOS-3765 (Make offer size adjustable)

2016-04-27 Thread Guangya Liu
Hi,

Can anyone help to shepherd MESOS-3765
? It would be a very
important feature if customer running mesos on some powerful machines.

Thanks,

Guangya


Re: Design Doc for Qemu/KVM containerizer

2016-04-07 Thread Guangya Liu
@haosdent,

You may see that I already linked MESOS-2717 to MESOS-3709
 ,  yes, we will
definitely consider the module mode.

Thanks,

Guangya

On Thu, Apr 7, 2016 at 11:00 PM, haosdent  wrote:

> Hi, @Abhishek Thanks your nice design document. I just take a look your
> code in
>
> https://github.com/abdasgupta/mesos/commit/e845ee70602dfc774381996f884587578c07a25b
> . It looks like a prototype now. Do you every consider implement it through
> the module way. Refer to the ticket [Modularize the containerizer
> interface](https://issues.apache.org/jira/browse/MESOS-3709) and the
> document [Containerizer Modularization Design](
>
> https://docs.google.com/document/d/1fj3G2-YFprqauQUd7fbHsD03vGAGg_k_EtH-s6fRkDo/edit?usp=sharing
> ).
>
> As you know, you have the ticket about add support to rkt container(
> https://issues.apache.org/jira/browse/MESOS-2162) and hyper container(
> https://issues.apache.org/jira/browse/MESOS-3435). We may like to support
> more container types in the future. I think add the support to these
> container types through module way would be better.
>
> Current my work status about [Modularize the containerizer interface](
> https://issues.apache.org/jira/browse/MESOS-3709) is public all my patches
> a week ago and not yet been reviewed. And I could not contact with till to
> review my design doc in past month. So the design may change in the future.
> Looking forward your feedbacks and concerns about this. I would appreciated
> for that.
>
> On Thu, Apr 7, 2016 at 8:34 PM, Abhishek Dasgupta <
> a10gu...@linux.vnet.ibm.com> wrote:
>
> > Hi all,
> >
> > I uploaded a design doc for Mesos-2717:
> >
> >
> >
> https://docs.google.com/document/d/1_VuFiJqxjlH_CA1BCMknl3sadlTZ69FuDe7qasDIOk0/edit?usp=sharing
> >
> > Need your views and comments on it.
> >
> > --
> >   Regards,
> >
> >
> >
> ---
> >   Abhishek Dasgupta
> >   Linux Software Developer - Linux Technology Centre
> >   IBM Systems Lab,
> >   IBM India Pvt. Ltd.
> >   Embassy Golf Link, D Block
> >   Koramongala - Off Indiranagar Ring Road
> >   Bangalore - 560 071
> >   Mobile: +91-8884107981
> >
> >
> ---
> >
> >
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: [Mesos Agent][Isolator][Storage] Docker Volume Driver Isolator Design is ready for review

2016-04-04 Thread Guangya Liu
Thanks Jie, updated to the latest and also add more for recovery part,
please take a look. ;-)

The recovery and checkpoint is not for recovering mount but for the mount
point generated by dvdcli, as we need to remove the mount point when there
are no container using the mount point. I also added one case in recovery
part, hope it is clear enough.

Thanks,

Guangya


On Mon, Apr 4, 2016 at 12:56 PM, Jie Yu <yujie@gmail.com> wrote:

> Thanks for the doc! I made some comments in the doc. Could you please make
> sure the doc is up-to-date according to our discussion last time? Thanks!
>
> - Jie
>
> On Sun, Apr 3, 2016 at 9:45 PM, Guangya Liu <gyliu...@gmail.com> wrote:
>
> > Hi,
> >
> > A design document for MESOS-4355
> > <https://issues.apache.org/jira/browse/MESOS-4355>
> >
> >
> https://docs.google.com/document/d/1uhi1lf1_sEmnl0HaqHUCsqPb9m9jOKbRlXYW1S-tZis/edit?usp=sharing
> > is ready for review, any comments are welcome.
> >
> > This project is enabling Mesos can integrate with external storage by
> > leveraging the Docker Volume Driver.
> >
> > Thanks,
> >
> > Guangya
> >
>


[Mesos Agent][Isolator][Storage] Docker Volume Driver Isolator Design is ready for review

2016-04-03 Thread Guangya Liu
Hi,

A design document for MESOS-4355

https://docs.google.com/document/d/1uhi1lf1_sEmnl0HaqHUCsqPb9m9jOKbRlXYW1S-tZis/edit?usp=sharing
is ready for review, any comments are welcome.

This project is enabling Mesos can integrate with external storage by
leveraging the Docker Volume Driver.

Thanks,

Guangya


Re: On launching command tasks

2016-04-02 Thread Guangya Liu
+1 on using docker mode, this can help the framework developer.

Setting the command twice can sometimes make people confused. When I was
working for the patch https://reviews.apache.org/r/1/ , I was also a
bit confused before go through the code in agent part.



On Sat, Apr 2, 2016 at 1:17 AM, haosdent  wrote:

> +1 For follow Docker behaviour, it is inconvenient to write the command
> twice.
>
> On Fri, Apr 1, 2016 at 10:12 PM, Alex Rukletsov 
> wrote:
>
> > When launching a command task without wrapping it in `/bin/sh -c` (i.e.
> > CommandInfo.shell=false), Mesos expects the first argument to be the same
> > as the command itself [1]. Though this is similar to how UNIX exec* calls
> > operate, it can be unclear to a user. Moreover, we do not validate this
> on
> > the master side, but rather let the command executor crash with a "bad
> > address" error. Docker, for example, requires the command only once in
> > their entrypoint specification [2].
> >
> > My suggestion is to change the command executor so that it ensures that
> the
> > first argument is always the command itself.
> >
> > Alternatively, if we prefer to keep the current behaviour, I would
> propose
> > to adjust the documentation to be more explicit and introduce a
> validation
> > check on the master.
> >
> > [1] Example snippet in C++
> >
> >commandInfo->set_value(command);
> >
> >commandInfo->add_arguments()->assign(command);
> >
> >
> > [2] https://docs.docker.com/engine/reference/builder/#entrypoint
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: [Proposal] Use dev mailing list for working groups

2016-03-24 Thread Guangya Liu
+1.

This is actually what OpenStack is doing now. OpenStack has many projects
and every time we send an email to dev list, we always specify if it is for
dev or user first, then specify the project as following:

[openstack-dev][Magnum] Integrate Mesos Unified Container to Magnum.

So for mesos, we can do similarly:
[mesos-dev][allocator] Performance improvement => mesos dev allocator
working group for performance.
[mesos-user][container] How can I use appc => mesos user for container
questions.

Thanks,

Guangya

On Fri, Mar 25, 2016 at 10:40 AM, haosdent  wrote:

> +1 The subject seems use "[MESOS WG][WG_NAME]" xxx would be better
>
> On Fri, Mar 25, 2016 at 6:55 AM, Jie Yu  wrote:
>
> > Hi,
> >
> > This came up during today's community sync.
> >
> > Mesos currently has a few working groups for various features:
> >
> >
> https://cwiki.apache.org/confluence/display/MESOS/Apache+Mesos+Working+Groups
> >
> > Some of those working groups are using separate mailing lists. That
> limits
> > the visibility of some discussions. Also, some people in the community
> are
> > not aware of those mailing lists (and the wiki page).
> >
> > Therefore, I am proposing that we consolidate all working groups mailing
> > lists to the dev mailing list. To distinguish discussions from different
> > working groups, please use a special subject format. For instance, if you
> > want to send an email to "Mesos GPU" working group, please use the
> subject:
> >
> > "[Mesos GPU WG] YOUR SUBJECT HERE"
> >
> > Let me know if you have any comments/thoughts on this!
> >
> > - Jie
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Compile with CFLAGS=-DWITH_NETWORK_ISOLATOR

2016-03-22 Thread Guangya Liu
I did try this feature before, and you may want to follow here
https://github.com/apache/mesos/blob/master/docs/network-monitoring.md#prerequisites
to install the right version prerequisites first.

On Tue, Mar 22, 2016 at 9:21 PM, Jay Guo  wrote:

> Hi,
>
> I got error trying to compile Mesos
> on Ubuntu
> with CFLAG WITH_NETWORK_ISOLATOR
>
> Here's what I did:
> 1. apt-get install libnl-dev
> 2. ./bootstrap
> 3. mkdir build && cd build
> 4. CXXFLAGS=-DWITH_NETWORK_ISOLATOR ../configure --disable-java
> --disable-python
> 5. make check
>
> Although I got following error:
>
> In file included from ../../src/linux/routing/filter/ip.hpp:35:0,
>  from
> ../../src/slave/containerizer/mesos/isolators/network/port_mapping.hpp:44,
>  from
> ../../src/slave/containerizer/mesos/containerizer.cpp:82:
> ../../src/linux/routing/handle.hpp:92:39: error: ‘TC_H_ROOT’ was not
> declared in this scope
>  constexpr Handle EGRESS_ROOT = Handle(TC_H_ROOT);
>^
> ../../src/linux/routing/handle.hpp:93:40: error: ‘TC_H_INGRESS’ was not
> declared in this scope
>  constexpr Handle INGRESS_ROOT = Handle(TC_H_INGRESS);
>
> Any ideas?
>
> Also, does this work with OSX? Is there any equivalent library as libnl?
>
> Cheers,
> /J
>


Re: RFC: RevocableInfo Changes

2016-03-21 Thread Guangya Liu
Some of my thinking here:

1) The ThrottleInfo may need to belong to "Resources" but not limited to 
"RevocableInfo". The cpus resources can be throttled even if it is not 
revocable resources.
2) There need to be a flag to indicate if the resources is Scavenge-able or 
Best effort, I did not have inclination on which one is better as both 
seems clear enough to describe the resources type. The Kubernetes document 
here 
<https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/resource-qos.md#motivation>
 
is also using scavenge and best effort concept here.

*message ThrottleInfo {}*
message RevocableInfo {
*message ScavengeInfo {}*

*// If set, indicates that the resources may be revoked at*
*// any time. Scavenge-able resources can be used for tasks*
*// that do not have strict performance requirements and are*
*// capable of handling being revoked.*
*optional **ScavengeInfo** scavenge_info = 1;*
  }

Thanks,

Guangya

在 2016年3月22日星期二 UTC+8上午4:13:10,Benjamin Mahler写道:
>
> Yeah that's definitely a question I've been asking myself, and we synced 
> on that with Niklas during the last meeting. The thought currently is that 
> we should choose a better name than ThrottleInfo. ThrottleInfo seems to 
> carry too strong of an implication about what the resources will 
> experience. Rather, we could pick a name like "ScavengeInfo" / 
> "BestEffortInfo" / etc that indicates that these resources are running 
> within the un-utilized portion of the machine and _may_ experience 
> degradation.
>
> On Mon, Mar 21, 2016 at 1:26 AM, Joris Van Remoortere <jo...@mesosphere.io
> > wrote:
>
>> @klaus:
>> I think @connor's question is whether we are absolutely sure we never 
>> want to support throttle-able but non-revocable resources.
>> It's clear from the protos that this is not supported, the question is 
>> whether we are sure that is what we want. If so, can you elaborate as to 
>> *why* we would never want that concept in Mesos.
>>
>> — 
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Sun, Mar 20, 2016 at 8:33 PM, Klaus Ma <klaus1982...@gmail.com> wrote:
>>
>>> Here's some input :).
>>>
>>> If throttling is tolerable but preemption is not, how would that be 
>>> expressed? (Is that supported?)
>>> [Klaus]: It's not supported; only revocable resources has this 
>>> attribute: non-throttleable or throttleable. The throttleable revocable 
>>> resources is reported by ResourceEstimator which means the resources maybe 
>>> throttled by its original owner.
>>>
>>> How does this work with the QoS controller? Will there be a new 
>>> correction type to indicate throttling, or does throttling happen "behind 
>>> the agent's back"?
>>> [Klaus]: The QoSController/ResourceEstimator only manages throttleable 
>>> revocable resources; the others resources (regular resources and 
>>> non-throttleable revocable resources) are managed by allocator. The 
>>> "manage" means generation and destroy/eviction. Regarding "throttling 
>>> happen", good question. I think the throttling will dependent on 
>>> containers, let me double check it :).
>>>
>>> If any comments, please let me know.
>>>
>>> 
>>> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer 
>>> Platform OpenSource Technology, STG, IBM GCG 
>>> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>>>
>>> On Sat, Mar 19, 2016 at 11:15 PM, <connor@gmail.com> wrote:
>>>
>>>> Thanks for the good explanations so far Ben and Klaus.  Apologies if 
>>>> you guys already covered these questions in the meeting:
>>>>
>>>> If throttling is tolerable but preemption is not, how would that be 
>>>> expressed? (Is that supported?)
>>>>
>>>> How does this work with the QoS controller? Will there be a new 
>>>> correction type to indicate throttling, or does throttling happen "behind 
>>>> the agent's back"?
>>>>
>>>> Thanks,
>>>> --
>>>> Connor
>>>>
>>>> > On Mar 19, 2016, at 04:01, Klaus Ma <klaus1982...@gmail.com> wrote:
>>>> >
>>>> > @team, in the latest meeting, we agree to keep current name 
>>>> ThrottleInfo.
>>>> >
>>>> > If any more comments, please let me know.
>>>> >
>>>> >> On Wednesday, March 16, 2016 at 9:32:37 PM UTC+8, Guangya Liu wrote:
>>>> >> 

Re: RFC: RevocableInfo Changes

2016-03-19 Thread Guangya Liu
Also please show your comments if any for the name here, the current name 
is *ThrottleInfo*, in Kubernetes resources qos design document, they are 
using scavenging as the key work for such behaviour, so a possible name 
here could be *ScavengeInfo , *please show your comments if any for those 
two names or even if you want to propose a new name here.

message RevocableInfo {
*message ThrottleInfo {}*

*// If set, indicates that the resources may be throttled at*
*// any time. Throttle-able resoruces can be used for tasks*
*// that do not have strict performance requirements and are*
*// capable of handling being throttled.*
*optional ThrottleInfo throttle_info = 1;*
  }

在 2016年3月16日星期三 UTC+8上午10:24:14,Klaus Ma写道:
>
> The patches are updated accordingly; JIRA: MESOS-3888 
>  , RR: 
> https://reviews.apache.org/r/40375/ .
>
> Thanks
> klaus
>
> On Saturday, March 12, 2016 at 11:09:46 AM UTC+8, Benjamin Mahler wrote:
>>
>> Hey folks,
>>
>> In the resource allocation working group we've been looking into a few 
>> projects that will make the allocator able to offer out resources as 
>> revocable. For example:
>>
>> -We'll want to eventually allocate resources as revocable _by default_, 
>> only allowing non-revocable when there are guarantees put in place (static 
>> reservations or quota).
>>
>> -On the path to revocable by default, we can incrementally start to offer 
>> certain resources as revocable. Consider when quota is set but the role 
>> isn't using all of the quota. The unallocated quota can be offered to other 
>> roles, but it should be revocable because we may revoke them should the 
>> quota'ed role want to use the resources. Unused reservations fall into a 
>> similar category.
>>
>> -Going revocable by default also allows us to enforce fairness in a 
>> dynamically changing cluster by revoking resources as weights are changed, 
>> frameworks are added or removed, etc.
>>
>> In this context, "revocable" means that the resources may be taken away 
>> and the container will be destroyed. The meaning of "revocable" in the 
>> context of usage oversubscription includes this, but also the container may 
>> experience a throttling (e.g. lower cpu shares, less network priority, etc).
>>
>> For this reason, and because we internally need to distinguish revocable 
>> resources between the those that are generated by usage oversubscription 
>> and those that are generated by the allocator, we're thinking of the 
>> following change to the API:
>>
>>
>>
>> -  message RevocableInfo {}
>> +  message RevocableInfo {
>> +message ThrottleInfo {}
>> +
>> +// If set, indicates that the resources may be throttled at
>> +// any time. Throttle-able resoruces can be used for tasks
>> +// that do not have strict performance requirements and are
>> +// capable of handling being throttled.
>> +optional ThrottleInfo throttle_info;
>> +  }
>>
>>// If this is set, the resources are revocable, i.e., any tasks or
>> -  // executors launched using these resources could get preempted or
>> -  // throttled at any time. This could be used by frameworks to run
>> -  // best effort tasks that do not need strict uptime or performance
>> +  // executors launched using these resources could be terminated at
>> +  // any time. This could be used by frameworks to run
>> +  // best effort tasks that do not need strict uptime
>>// guarantees. Note that if this is set, 'disk' or 'reservation'
>>// cannot be set.
>>optional RevocableInfo revocable = 9;
>>
>>
>>
>> Essentially we want to distinguish between revocable and revocable + 
>> throttle-able. This is because usage-oversubscription generates 
>> throttle-able revocable resources, whereas the allocator does not. This 
>> also solves our problem of distinguishing between these two kinds of 
>> revocable resources internally.
>>
>> Feedback welcome!
>>
>> Ben
>>
>>

Looking for shepherd (MESOS-4355 - Docker Volume Isolator)

2016-03-19 Thread Guangya Liu
Hi,

I was now working on the FS for MESOS-4355 with some EMC guys, can anyone
help shepherd for this? There are some issues need to discuss with the
shepherd.

Thanks,

Guangya


Re: RFC: RevocableInfo Changes

2016-03-14 Thread Guangya Liu
This is the google doc that we used to trace all discussion topics:
https://docs.google.com/document/d/1B_v52zCOFcwCpqCPhgYi9h630a0NE-QM9Br0nCOZUR4/edit?usp=sharing

This is the link for google hang out
video call Join video call
<https://plus.google.com/hangouts/_/calendar/a2xhdXMxOTgyLmNuQGdtYWlsLmNvbQ.rgasirkilgmn1bm69kdcqjsb2s>The
time is March 15th at 5pm PST

Thanks,

Guangya

On Tue, Mar 15, 2016 at 9:47 AM, Benjamin Mahler <bmah...@apache.org> wrote:

> Sounds good, the next one is tomorrow March 15th at 5pm PST (they are at
> 5pm PST to accommodate China time zone).
>
> Will that work?
>
> On Mon, Mar 14, 2016 at 10:53 AM, Niklas Nielsen <n...@qni.dk> wrote:
>
>> Ben, when do you have your next mesos allocator sync? We don't have our
>> next performance isolation sync lined up yet, so we could piggy back on
>> yours if you have it scheduled already.
>>
>> Niklas
>>
>> On Mon, Mar 14, 2016 at 9:32 AM, Jie Yu <yujie@gmail.com> wrote:
>>
>> > >
>> > > Just a quick note: Ian D. and the performance isolation working group
>> are
>> > > discussing similar annotations and we should meet and talk about the
>> > > options.
>> >
>> >
>> > +1
>> >
>> > Would love to understand the relationship between this and the
>> > task/executor level annotations.
>> >
>> > - Jie
>> >
>> > On Mon, Mar 14, 2016 at 9:29 AM, Niklas Nielsen <n...@qni.dk> wrote:
>> >
>> > > Hi Ben,
>> > >
>> > > Just a quick note: Ian D. and the performance isolation working group
>> are
>> > > discussing similar annotations and we should meet and talk about the
>> > > options.
>> > >
>> > > Niklas
>> > >
>> > > On Sat, Mar 12, 2016 at 12:05 AM, Klaus Ma <klaus1982...@gmail.com>
>> > wrote:
>> > >
>> > > > Yes, I think that's true for now; so we define `ThrottleInfo` as
>> > message
>> > > to
>> > > > be more flexible. In Optimistic Offer Phase 1, we only use it to
>> > > > distinguish usage oversubscriptions and allocation oversubscription,
>> > > > similar to bool :).
>> > > >
>> > > > Regarding the resources type, two questions after the discussion:
>> > > >
>> > > > 1. should we send different offer to the framework, so when
>> > > > usage/allocation oversubscription updated, only one type of offer
>> will
>> > be
>> > > > rescinded?
>> > > > 2. should we define framework's capability against `ThrottleInfo`?
>> > > >
>> > > > 
>> > > > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
>> > > > Platform OpenSource Technology, STG, IBM GCG
>> > > > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>> > > >
>> > > > On Sat, Mar 12, 2016 at 12:03 PM, Guangya Liu <gyliu...@gmail.com>
>> > > wrote:
>> > > >
>> > > > >
>> > > > > Hi Ben,
>> > > > >
>> > > > > I think that currently and even in the near future, the
>> > > __ThrottleInfo__
>> > > > > will only be used by the usage oversubscriptions and the
>> > > oversubscription
>> > > > > for allocator (Both quota and reservations) will not use this
>> value
>> > but
>> > > > > only using __RevocableInfo__ is enough.
>> > > > >
>> > > > > I can even think that the __ThrottleInfo__ as a boolean value in
>> > > > > optimistic offer phase 1 as it is mainly used to distinguish
>> > resources
>> > > > > between usage oversubscriptions and allocation oversubscription
>> > (Quota
>> > > > and
>> > > > > Reservations), comments?
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Guangya
>> > > > >
>> > > > > 在 2016年3月12日星期六 UTC+8上午11:09:46,Benjamin Mahler写道:
>> > > > >
>> > > > >> Hey folks,
>> > > > >>
>> > > > >> In the resource allocation working group we've been looking into
>> a
>> > few
>> > > > >> projects that will make the allocator able to offer out
>> resources as
>> > > > >> revocable. For example:
>> > > 

Re: RFC: RevocableInfo Changes

2016-03-11 Thread Guangya Liu

Hi Ben,

I think that currently and even in the near future, the __ThrottleInfo__ 
will only be used by the usage oversubscriptions and the oversubscription 
for allocator (Both quota and reservations) will not use this value but 
only using __RevocableInfo__ is enough.

I can even think that the __ThrottleInfo__ as a boolean value in optimistic 
offer phase 1 as it is mainly used to distinguish resources between usage 
oversubscriptions and allocation oversubscription (Quota and Reservations), 
comments?

Thanks,

Guangya

在 2016年3月12日星期六 UTC+8上午11:09:46,Benjamin Mahler写道:
>
> Hey folks,
>
> In the resource allocation working group we've been looking into a few 
> projects that will make the allocator able to offer out resources as 
> revocable. For example:
>
> -We'll want to eventually allocate resources as revocable _by default_, 
> only allowing non-revocable when there are guarantees put in place (static 
> reservations or quota).
>
> -On the path to revocable by default, we can incrementally start to offer 
> certain resources as revocable. Consider when quota is set but the role 
> isn't using all of the quota. The unallocated quota can be offered to other 
> roles, but it should be revocable because we may revoke them should the 
> quota'ed role want to use the resources. Unused reservations fall into a 
> similar category.
>
> -Going revocable by default also allows us to enforce fairness in a 
> dynamically changing cluster by revoking resources as weights are changed, 
> frameworks are added or removed, etc.
>
> In this context, "revocable" means that the resources may be taken away 
> and the container will be destroyed. The meaning of "revocable" in the 
> context of usage oversubscription includes this, but also the container may 
> experience a throttling (e.g. lower cpu shares, less network priority, etc).
>
> For this reason, and because we internally need to distinguish revocable 
> resources between the those that are generated by usage oversubscription 
> and those that are generated by the allocator, we're thinking of the 
> following change to the API:
>
>
>
> -  message RevocableInfo {}
> +  message RevocableInfo {
> +message ThrottleInfo {}
> +
> +// If set, indicates that the resources may be throttled at
> +// any time. Throttle-able resoruces can be used for tasks
> +// that do not have strict performance requirements and are
> +// capable of handling being throttled.
> +optional ThrottleInfo throttle_info;
> +  }
>
>// If this is set, the resources are revocable, i.e., any tasks or
> -  // executors launched using these resources could get preempted or
> -  // throttled at any time. This could be used by frameworks to run
> -  // best effort tasks that do not need strict uptime or performance
> +  // executors launched using these resources could be terminated at
> +  // any time. This could be used by frameworks to run
> +  // best effort tasks that do not need strict uptime
>// guarantees. Note that if this is set, 'disk' or 'reservation'
>// cannot be set.
>optional RevocableInfo revocable = 9;
>
>
>
> Essentially we want to distinguish between revocable and revocable + 
> throttle-able. This is because usage-oversubscription generates 
> throttle-able revocable resources, whereas the allocator does not. This 
> also solves our problem of distinguishing between these two kinds of 
> revocable resources internally.
>
> Feedback welcome!
>
> Ben
>
>

Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-08 Thread Guangya Liu
There are also issues related with overlayfs, the overlayfs will not work
if linux kernel is greater than 4.2, and also there is no document related
to overlayfs.

There are already patches and JIRA tickets for both issues.

https://reviews.apache.org/r/44421/
https://reviews.apache.org/r/44391/

Thanks,

Guangya

On Wed, Mar 9, 2016 at 7:30 AM, Joseph Wu  wrote:

> If we're re-cutting the release, can we also add this fix for maintenance?
> (still under review)
> https://reviews.apache.org/r/44258/
>
> On Tue, Mar 8, 2016 at 2:43 PM, Kevin Klues  wrote:
>
> > Here are the list of reviews/patches that have been called out in this
> > thread for inclusion in 0.28.0-rc2.  Some of them are still under
> > review and will need to land by Thursday to be included.
> >
> > Are there others?
> >
> > Jie's container image documentation (submitted):
> > commit 7de8cdd4d8ed1d222fa03ea0d8fa6740c4a9f84b
> > https://reviews.apache.org/r/44414
> >
> > Restore Mesos' ability to extract Docker assigned IPs (still under
> review):
> > https://reviews.apache.org/r/43093/
> >
> > Fixed the logic for default docker cmd case (submitted).
> > commit e42f740ccb655c0478a3002c0b6fa90c1144f41c
> > https://reviews.apache.org/r/44468/
> >
> > Implemented runtime isolator default cmd test (still under review).
> > https://reviews.apache.org/r/44469/
> >
> > Fixed a bug that causes the task stuck in staging state (still under
> > review).
> > https://reviews.apache.org/r/44435/
> >
> > On Tue, Mar 8, 2016 at 10:30 AM, Kevin Klues  wrote:
> > > Yes, will do.
> > >
> > > On Tue, Mar 8, 2016 at 10:26 AM, Vinod Kone 
> > wrote:
> > >> +kevin klues
> > >>
> > >> OK. I'm cancelling this vote since there are some show stopper issues
> > that
> > >> we need to cherry-pick. I'll cut another RC on Thursday.
> > >>
> > >> @shepherds: can you please make sure the blocker tickets are marked
> with
> > >> fix version and that they land today or tomorrow?
> > >>
> > >> @kevin: since you have volunteered to help with the release, can you
> > make
> > >> sure we have a list of commits to cherry pick for rc2?
> > >>
> > >> Thanks,
> > >>
> > >>
> > >> On Tue, Mar 8, 2016 at 12:05 AM, Shuai Lin 
> > wrote:
> > >>
> > >>> Maybe also https://issues.apache.org/jira/browse/MESOS-4877 and
> > >>> https://issues.apache.org/jira/browse/MESOS-4878 ?
> > >>>
> > >>>
> > >>> On Tue, Mar 8, 2016 at 9:13 AM, Jie Yu  wrote:
> > >>>
> >  I'd like to fix https://issues.apache.org/jira/browse/MESOS-4888 as
> > well
> >  if you guys plan to cut another RC
> > 
> >  On Mon, Mar 7, 2016 at 10:16 AM, Daniel Osborne <
> >  daniel.osbo...@metaswitch.com> wrote:
> > 
> > > -1
> > >
> > > If it doesn’t cause too much pain, I'm hoping we can squeeze a
> > > relatively small patch which restores Mesos' ability to extract
> > Docker
> > > assigned IPs. This has been broken with Docker 1.10's release over
> > a month
> > > ago, and prevents service discovery and DNS from working.
> > >
> > > Mesos-4370: https://issues.apache.org/jira/browse/MESOS-4370
> > > RB# 43093: https://reviews.apache.org/r/43093/
> > >
> > > I've built 0.28.0-rc1 with this patch and can confirm that it fixes
> > it
> > > as expected.
> > >
> > > Apologies for not bringing this to attention earlier.
> > >
> > > Thanks all,
> > > Dan
> > >
> > > -Original Message-
> > > From: Vinod Kone [mailto:vinodk...@apache.org]
> > > Sent: Thursday, March 3, 2016 5:44 PM
> > > To: dev ; user 
> > > Subject: [VOTE] Release Apache Mesos 0.28.0 (rc1)
> > >
> > > Hi all,
> > >
> > >
> > > Please vote on releasing the following candidate as Apache Mesos
> > 0.28.0.
> > >
> > >
> > > 0.28.0 includes the following:
> > >
> > >
> > >
> >
> 
> > >
> > >   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
> > > subsystem in
> > >
> > > Linux. The cgroups/net_cls isolator allows operators to provide
> > > network
> > >
> > >
> > > performance isolation and network segmentation for containers
> > within
> > > a Mesos
> > >
> > > cluster. To enable the cgroups/net_cls isolator, append
> > > `cgroups/net_cls` to
> > >
> > > the `--isolation` flag when starting the slave. Please refer to
> > >
> > >
> > > docs/mesos-containerizer.md for more details.
> > >
> > >
> > >
> > >
> > >
> > >   * [MESOS-4687] - The implementation of scalar resource values
> > (e.g.,
> > > "2.5
> > >
> > >
> > > CPUs") has changed. Mesos now reliably supports resources with
> > up to
> > > three
> 

Re: 0.28.0 release

2016-03-03 Thread Guangya Liu
I think that we also need to include https://reviews.apache.org/r/44258/ (
https://issues.apache.org/jira/browse/MESOS-4831) in 0.28 cause this issue
will lead two inverse offers with one maintain call.

Thanks,

Guangya

On Fri, Mar 4, 2016 at 9:44 AM, Vinod Kone  wrote:

> Release vote sent. The soft lock is released as well. Commit away!
>
> On Thu, Mar 3, 2016 at 4:58 PM, Timothy Chen  wrote:
>
>> Sorry I pushed a quick typo fix before seeing this email.
>>
>> Tim
>>
>> On Thu, Mar 3, 2016 at 4:15 PM, Vinod Kone  wrote:
>> > Alright, all the blockers are resolved. I'll be cutting the RC shortly.
>> >
>> > I'm also taking a soft lock on the 'master' branch. *Committers:*
>> *Please
>> > do not push any commits upstream until I release the lock.*
>> >
>> > Thanks,
>> >
>> > On Mon, Feb 29, 2016 at 1:36 PM, Vinod Kone 
>> wrote:
>> >
>> >> Hi folks,
>> >>
>> >> I'm volunteering to be the Release Manager for 0.28.0. Joris and Kevin
>> >> Klues have kindly agreed to help me out. The plan is cut an RC tomorrow
>> >> 03/01.
>> >>
>> >> The dashboard for the release is here:
>> >>
>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327751
>> >>
>> >> *If you have a ticket marked with "Fix Version 028.0" and is not in
>> >> "Resolved" state, verify if it's a blocker for 0.28.0. If not, please
>> unset
>> >> the Fix Version.*
>> >>
>> >>
>> >> Thanks,
>> >> Vinod
>> >>
>> >>
>>
>
>


Re: Making 'curl' a prerequisite for installing Mesos

2016-03-03 Thread Guangya Liu
libcurl can automatically picks up certain environment variables and
adjusts its settings accordingly, so libcurl support enabling http_proxy
and https_proxy by default, this is important feature for someone who want
to use a proxy to connect internet. One example is that I cannot get google
docker images but need a proxy set in China.

If we depend on "curl" (I saw that we already finished the this in
MESOS-2840) when using fetcher, I think that we may also need to enable
slave to pass a proxy to fetch curl to enable someone can pull google
docker images under a firewall. Does it make sense file a JIRA to support
http proxy?

Thanks,

Guangya

On Fri, Mar 4, 2016 at 9:39 AM, Klaus Ma  wrote:

> +1 to add 'curl' dependency firstly.
>
> 
> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>
> On Fri, Mar 4, 2016 at 5:04 AM, Jojy Varghese  wrote:
>
> > +1
> >
> > On Thu, Mar 3, 2016 at 12:52 PM Jake Farrell 
> wrote:
> >
> > > +1
> > >
> > > -Jake
> > >
> > > On Thu, Mar 3, 2016 at 12:10 PM, Jie Yu  wrote:
> > >
> > > > Hi,
> > > >
> > > > I am proposing making 'curl' a prerequisite when installing Mesos.
> > > > Currently, we require 'libcurl' being present when installing Mesos (
> > > > http://mesos.apache.org/gettingstarted/). However, we found that it
> > does
> > > > not compose well with our asynchronous runtime environment (i.e.,
> it'll
> > > > block the current worker thread).
> > > >
> > > > Recent work on URI fetcher
> > > >  uses 'curl'
> > directly,
> > > > instead of using 'libcurl' to fetch artifacts, because it composes
> well
> > > > with our async runtime env. 'curl' is installed by default in most
> > > systems
> > > > (e.g., OSX, centos, RHEL).
> > > >
> > > > So I am proposing adding 'curl' to our prerequisite list. Let me know
> > if
> > > > you have any concern on this. I'll update the Getting Started doc if
> > you
> > > > are OK with this change.
> > > >
> > > > Thanks,
> > > > - Jie
> > > >
> > >
> >
>


Re: Mesos supports to get the available total resource per-roles ?

2016-02-22 Thread Guangya Liu
Does /master/state endpoint help? This can get all resources per-role in
the cluster, including total, used, reserved etc per role.

*"reserved_resources"*: {

*"r2"*: {

  *"mem"*: 0,

  *"disk"*: 0,

  *"cpus"*: 8

},

*"r1"*: {

  *"mem"*: 8000,

  *"disk"*: 0,

  *"cpus"*: 0

}

  }

Thanks,

Guangya

On Mon, Feb 22, 2016 at 5:11 PM, 陈强  wrote:

> Hi all,
>
> Does Mesos support to get the available total resource for every roles
> now? if don't, who are focusing on this? thanks.
>
> --
>
> Best Regards,
> ChenQiang
>
>


Re: Question about "Framework directly access Meso agent"

2016-02-17 Thread Guangya Liu
For your concern of large scale cluster, there is indeed a JIRA tracking
this https://issues.apache.org/jira/browse/MESOS-3548

Thanks,

Guangya

On Wed, Feb 17, 2016 at 12:14 PM, Suteng <sut...@huawei.com> wrote:

> Hi,
>
>
>
> Currently, Mesos framework’s task related operations lauchTask,
> updateStatus and executorSendMessage etc., and resource related operations
> resourceOffer etc., all operations are pass through Mesos Master.
>
> When the cluster and task number become huge, or with optimistic resource
> offer, multi-framework concurrently launchTask, maybe Mesos Master will be
> a bottleneck.
>
> Is possible for framework scheduler directly access Mesos agent,
> launchTask, updateStatus and SendMessage2Executore to Mesos Agent directly,
> bypass the Master?
>
> Will invoke big conflict with current mechanism?
>
>
>
> Looking forward to your comments and opinions.
>
>
>
> Best Regards,
>
> Teng
>
>
>
>
>
>
>
> Su Teng  00241668
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:sut...@huawei.com
>
>
>
>
>



-- 
Guangya Liu (刘光亚)
Senior Software Engineer
DCOS and OpenStack Development
IBM Platform Computing
Systems and Technology Group


Re: Question about "Framework directly access Meso agent"

2016-02-17 Thread Guangya Liu
For your concern of large scale cluster, there is indeed a JIRA tracking
this https://issues.apache.org/jira/browse/MESOS-3548

Thanks,

Guangya

On Wed, Feb 17, 2016 at 12:14 PM, Suteng <sut...@huawei.com> wrote:

> Hi,
>
>
>
> Currently, Mesos framework’s task related operations lauchTask,
> updateStatus and executorSendMessage etc., and resource related operations
> resourceOffer etc., all operations are pass through Mesos Master.
>
> When the cluster and task number become huge, or with optimistic resource
> offer, multi-framework concurrently launchTask, maybe Mesos Master will be
> a bottleneck.
>
> Is possible for framework scheduler directly access Mesos agent,
> launchTask, updateStatus and SendMessage2Executore to Mesos Agent directly,
> bypass the Master?
>
> Will invoke big conflict with current mechanism?
>
>
>
> Looking forward to your comments and opinions.
>
>
>
> Best Regards,
>
> Teng
>
>
>
>
>
>
>
> Su Teng  00241668
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:sut...@huawei.com
>
>
>
>
>



-- 
Guangya Liu (刘光亚)
Senior Software Engineer
DCOS and OpenStack Development
IBM Platform Computing
Systems and Technology Group


Re: [2/2] mesos git commit: Added documentation for labeled reserved resources.

2016-02-12 Thread Guangya Liu
Neil, I think what you want to show for setting reservationInfo for static
reservation is https://issues.apache.org/jira/browse/MESOS-4476

On Sat, Feb 13, 2016 at 2:16 AM, Neil Conway  wrote:

> Hi Ben,
>
> On Fri, Feb 12, 2016 at 2:34 AM, Benjamin Mahler 
> wrote:
> > Any plans to support labels for static reservations?
> >
> > Are we intentionally not supporting ReservationInfo for static
> > reservations? Or is this just outside of the initial scope?
>
> Labels for static reservations are not currently supported because
> `labels` is part of `ReservationInfo`, and the latter is not set for
> static reservations.
>
> Setting ReservationInfo for static reservations is
> https://issues.apache.org/jira/browse/MESOS-3486 . I didn't take this
> on right now, because there are some backward compatibility concerns
> with making this change. It is also unclear if we want to continue
> adding features to static reservations vs. continuing to enhance
> dynamic reservations to the point at which they can replace static
> reservations for most use cases.
>
> Neil
>


Re: Specifying a preferred host with a Resource Request

2016-02-05 Thread Guangya Liu
Hi Jagadish,

Even though Mesos have the interface of "requestResources", it was not
implemented in the built-in allocator at the moment, so the call of
"driver.requestResources
(resources);" will not work.

Is it possible that you update your framework logic as this:
1) framework get resoruce offer from mesos master
2) framework filter the resource offers based on its preferences

The problem for such solution is that the framework sometimes may not get
its preferred resources if the preferred resource was offered to other
frameworks.

Can you please file a JIRA ticket to request implement the API of
"requestResources"?
It would be great if you can append some background for your request so
that the community can evaluate how to move this forward.

Thanks,

Guangya


On Sat, Feb 6, 2016 at 6:45 AM, Jagadish Venkatraman  wrote:

> I have fair experience in writing frameworks on Yarn. In the Yarn world,
> the amClient supports a method where I can specify the preferredHost with
> the resource request.
>
> Is there a way to specify a preferred host with the resource request in
> Mesos?
>
> I currently do:
>
> driver.requestResources (resources);
>
> I don't find a way to associate a preferred hostname with a resource
> request. A code sample will be really helpful. (for example, I want 1G mem,
> 1cpu core preferrably on host: xyz.aws.com )
>
> Thanks,
> Jagadish
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: work on Mesos Containerizer to support docker containers

2016-01-19 Thread Guangya Liu
I think that this is the entry point if you want to get to know more for
latest container
https://github.com/apache/mesos/blob/master/docs/mesos-provisioner.md

On Tue, Jan 19, 2016 at 10:55 PM, Jan Schlicht  wrote:

> Hi Olivier,
>
> status for the "Unified Containerizer" project is tracked under this epic:
> https://issues.apache.org/jira/browse/MESOS-2840
> There's a design document linked in the epic, unfortunately I'm not able to
> access it.
>
> Cheers,
> Jan
>
> On Tue, Jan 19, 2016 at 3:06 PM, Qian Zhang  wrote:
>
> > Hi Olivier,
> >
> > Here is the doc of MesosContainerizer:
> > https://github.com/apache/mesos/blob/master/docs/mesos-containerizer.md
> >
> > And you may also find the following docs helpful:
> > https://github.com/apache/mesos/blob/master/docs/containerizer.md
> >
> https://github.com/apache/mesos/blob/master/docs/containerizer-internals.md
> >
> > And the code of MesosContainerizer is under:
> > src/slave/containerizer/mesos/
> >
> >
> > Regards,
> > Qian
> >
> >
> > On Tue, Jan 19, 2016 at 9:14 PM, Olivier Sallou  >
> > wrote:
> >
> > > Hi,
> > > I have seen there are some work on Mesos Containerizer to support
> docker
> > > containers instead of using Docker Containerizer, which would help
> > > support Docker network etc... with Calico for example.
> > > Is there any doc on this available somewhere ? Where is code of the
> > > Mesos Containerizer? (I found Docker one but can't find default Mesos
> > one).
> > >
> > > Thanks
> > >
> > > Olivier
> > >
> > > --
> > >
> > > gpg key id: 4096R/326D8438  (keyring.debian.org)
> > > Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> > >
> > >
> >
>
>
>
> --
> *Jan Schlicht*
> Distributed Systems Engineer, Mesosphere
>


Re: 答复: Request Mesos contributor role

2016-01-14 Thread Guangya Liu
You can also take a look at following link to get more detail:
https://github.com/apache/mesos/blob/master/docs/submitting-a-patch.md
https://github.com/apache/mesos/blob/master/docs/effective-code-reviewing.md

The last one is from OpenStack git commit, you can also take it as a
reference:
https://wiki.openstack.org/wiki/GitCommitMessages


On Fri, Jan 15, 2016 at 3:16 PM, Adam Bordelon  wrote:

> Just run `git log` or `git log --oneline` for many more examples. :)
>
> On Thu, Jan 14, 2016 at 11:15 PM, Adam Bordelon 
> wrote:
>
> > You'll need to shorten your commit summary, which is derived from the
> > 'Summary' field in ReviewBoard. You can put a longer description in the
> > 'Description' field, and it will also be included in the final commit
> > message (although not in the first line, which is restricted to 72
> chars).
> >
> > Also, since the patch's summary ends up being the commit message, please
> > phrase it in terms of what you did to fix the problem, rather than just
> > restating the problem without a solution. For example:
> > "Added timestamp to DockerContainerizer's ResourceStatistics."
> >
> > On Thu, Jan 14, 2016 at 4:36 AM, pangbingqiang  >
> > wrote:
> >
> >> Thanks! Yeah, the hooks dir have a commit-msg file, so what I should do
> >> to fix this? The file line no more than 72 chars.
> >>
> >> -邮件原件-
> >> 发件人: Benjamin Bannier [mailto:benjamin.bann...@mesosphere.io]
> >> 发送时间: 2016年1月14日 20:20
> >> 收件人: dev@mesos.apache.org
> >> 主题: Re: Request Mesos contributor role
> >>
> >> Hi,
> >>
> >> >> Error:
> >> >> 2016-01-14 09:19:38 URL:https://reviews.apache.org/r/42288/diff/raw/
> >> >> [612/612] -> "42288.patch" [1] Total errors found: 0 Checking 1 files
> >> >> Error: Commit message summary (the first line) must not exceed 72
> >> characters.
> >> >
> >> >> my patch first line is:
> >> >> diff --git a/src/slave/containerizer/docker.cpp
> >> >> b/src/slave/containerizer/docker.cpp
> >> >
> >> >> how could I to fix this?
> >>
> >> This refers to the commit message,
> >>
> >> Docker container REST API /monitor/statistics.json output have no
> >> timestamp field
> >>
> >> which is too long (I count 81 chars, but a hard max is put at 72 chars);
> >> the same automated check also rejects commit summaries not ending in a
> >> period `.`. Additionally, a human reviewer will likely ask you to use
> past
> >> tense (e.g., “Fixed … for …”).
> >>
> >> If you rerun `bootstrap` from the project root it should install local
> >> git hooks so that the same checks are run locally on your machine while
> you
> >> develop.
> >>
> >>
> >> HTH,
> >>
> >> Benjamin
> >>
> >
> >
>


Re: Docker Executor in Mesos

2015-12-08 Thread Guangya Liu
Some comments in line.

Thanks,

Guangya

On Tue, Dec 8, 2015 at 2:13 PM, Du, Fan  wrote:

>
> So why not use one executor to launch docker tasks?
>>
>
> Each task resides(or runs to be more precisely) in its own docker
> container,
> Every docker container is an executor by its nature.
> If you launch 6 instances of task, there will be 6 docker
> containers(docker ps).
>
> I'm not sure what's the intention of "use one executor to launch docker
> tasks",
> and how to do this.

The Kubernetes+Mesos is using such logic, it is using one executor to
manage all of the pods on one slave host, this can definitely reduce the
resource usage overhead. The current docker executor will waste a lot of
resources as it request one extra container.

>
>
> On 2015/12/8 14:05, Klaus Ma wrote:
>
>> Hi team,
>>
>> Currently, if we run docker in mesos, we'll start docker-executor, "docker
>> run" and container in slave hosts. So why not use one executor to launch
>> docker tasks? One reason I can image is compatibility of docker API. If
>> there're thousands of tasks in a powerful task, do you get docker startup
>> performance issue?
>>
>> 
>> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
>> Platform Symphony/DCOS Development & Support, STG, IBM GCG
>> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>>
>>


Re: Docker network support

2015-12-08 Thread Guangya Liu
There is a JIRA ticket here https://issues.apache.org/jira/browse/MESOS-3828
, you can append some of your requirement here to move this forward. Thanks!

On Tue, Dec 8, 2015 at 8:41 PM, Olivier Sallou 
wrote:

> Hi,
> what is the current/planned feature support for Docker network ?
>
> Docker network creates an overlay network to link multiple containers on
> multiple hosts. Is it supported/planned in mesos ? I do not find any
> such info for the moment in mesos.proto
>
> Thanks
>
> Olivier
>
> --
>
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


Re: Docker network support

2015-12-08 Thread Guangya Liu
There is a JIRA ticket here https://issues.apache.org/jira/browse/MESOS-3828
, you can append some of your requirement here to move this forward. Thanks!

On Tue, Dec 8, 2015 at 8:41 PM, Olivier Sallou 
wrote:

> Hi,
> what is the current/planned feature support for Docker network ?
>
> Docker network creates an overlay network to link multiple containers on
> multiple hosts. Is it supported/planned in mesos ? I do not find any
> such info for the moment in mesos.proto
>
> Thanks
>
> Olivier
>
> --
>
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>


Re: Configuration file?

2015-11-23 Thread Guangya Liu
+1000, introducing a new configuration file for mesos master and slave can
help end user take the configuration file as the source of all flags.

The OpenStack is also using same way to manage all of the flags, it is
putting all flags into a configuration file and the configuration file
including all flag examples. Most of the flags are disabled by default and
the end user can just enable those flags based on his requirement.

Also the flags in the configuration file can be classified to different
groups for a better management, and mesos can also follow this to classify
those flags to different groups, such as ACL, Cluster, framework etc.


On Mon, Nov 23, 2015 at 9:08 PM, Alexander Rojas 
wrote:

> Hey guys,
>
> Over the time I’ve been involved in Mesos I’ve seen that we went from a
> handful of flags to around 42 supported flags in the master. At this point
> I’m wondering if perhaps we should support a configuration file in
> conjunction (or instead of) with all the command flags.
>
> My intuition is that it will make it easier for operators as well as for
> debuggers to be able to replicate configurations easier.
>
> Any comments on this idea?


Mesos is helping OpenStack Container Deployment

2015-11-06 Thread Guangya Liu
Just FYI, the OpenStack Kolla Team [1] is forming a small team trying to
investigate if Mesos could be used as an orchestration engine in place of
Ansible [2], Mesos plus Marathon might be the solution.

[1] https://github.com/openstack/kolla
[2] http://markmail.org/thread/vg6xkavfpp6c4odr

Thanks,

Guangya


Re: [Breaking bug fix] Binary in state endpoints

2015-11-02 Thread Guangya Liu
+1 to remove the field directly, one comment is that the upgrade document
may need to be updated.

>From my understanding, since the data is binary data and I did not see too
much requirement on retrieving binary data.

Thanks!

On Sat, Oct 24, 2015 at 5:33 AM, Joseph Wu  wrote:

> Hello,
>
> The state endpoints, on master and agent, currently serialize two binary
> data fields in the ExecutorInfo and TaskInfo objects.  These fields are set
> by frameworks; and Mesos does not inspect their values.
>
> The data fields can be found in the state JSON blobs:
> /master/state -> frameworks[*].executors[*].data
> /slave/state ->
>
> frameworks[*].(executors|completed_executors)[*].(tasks|queued_tasks|completed_tasks)[*].data
>
> *Problem:*
> The state endpoints are JSON-ified in a non-standard way (i.e. not via our
> normal Protobuf-to-json methods).  When we serialize the binary "data"
> fields, the binary is dumped as a string, as is.  The resulting JSON may
> not be valid if the binary data includes random bytes (i.e. not unicode).
> Most JSON parsers will error on the state endpoints in this case.
>
> *Proposed solution *(and breaking change)*:*
> Simple -- remove the "data" fields from the state endpoints.  (And only
> from the state endpoints.  The ExecutorInfo and TaskInfo objects will not
> change.)
>
> *Question:*
> We believe that frameworks/tools do not rely on retrieving the "data"
> fields from the state endpoints.
>
> Is there any framework/tool that retrieves the "data" field from the state
> endpoints?
> And if so, is it critical to how the framework/tool works?
>
> More details here: https://issues.apache.org/jira/browse/MESOS-3771
>
> Thanks,
> ~Joseph
>


Re: Community Sync Interval

2015-10-15 Thread Guangya Liu
+1 for bi-weekly

Thanks,

Guangya

On Fri, Oct 16, 2015 at 6:08 AM, Daniel Mercer 
wrote:

> +1 for weekly -- if this results in diminishing returns we can always reset
> to biweekly.
>
> On Thu, Oct 15, 2015 at 2:44 PM, Kapil Arya  wrote:
>
> > +1 for bi-weekly.
> >
> > On Thu, Oct 15, 2015 at 4:40 PM, Jan Schlicht  wrote:
> >
> > > +1 for weekly.
> > >
> > > On Thu, Oct 15, 2015 at 1:36 PM, Artem Harutyunyan <
> ar...@mesosphere.io>
> > > wrote:
> > >
> > > > +1 for weekly.
> > > >
> > > > On Thu, Oct 15, 2015 at 10:41 AM, haosdent 
> wrote:
> > > > > +1 for bi-weekly
> > > > >
> > > > > On Fri, Oct 16, 2015 at 1:19 AM, Michael Park 
> > > wrote:
> > > > >
> > > > >> We discussed whether the community syncs should be weekly or
> > bi-weekly
> > > > >> (once every 2 weeks).
> > > > >>
> > > > >> There were differing opinions on the subject during the community
> > sync
> > > > >> today.
> > > > >>
> > > > >> An argument for weekly: meetings can be shorter and missing a
> > meeting
> > > > won't
> > > > >> be as big a deal as missing a longer meeting.
> > > > >>
> > > > >> An argument for bi-weekly: there are many people involved in these
> > > > >> meetings, we should keep it infrequent so that it reduces people's
> > > time
> > > > >> commitments.
> > > > >>
> > > > >> This email is intended to capture your +1s or other ideas you
> might
> > > > have!
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> MPark.
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Haosdent Huang
> > > >
> > >
> > >
> > >
> > > --
> > > *Jan Schlicht*
> > > Distributed Systems Engineer, Mesosphere
> > >
> >
>


Re: Do we still need to add InverseOffer support to Scheduler API?

2015-09-14 Thread Guangya Liu
Hi Joris,

I think that those APIs are still needed as HTTP API is mainly initiated by
operator, the current call for HTTP API including TEARDOWN, ACCEPT,
DECLINE, REVIVE, KILL, SHUTDOWN etc, but the offer related operations such
as offer and InverserOffers are initiatedby mesos master, the master need
notify the framework for those offers via the callbacks. Comments?

Thanks,

Guangya

On Mon, Sep 14, 2015 at 10:42 PM, Joris Van Remoortere 
wrote:

> Hi Qian,
>
> There is no current plan to add this to the old API. Those tickets were
> created pre-V1 API.
> Currently the goal is to encourage developers to use the V1 API to have
> access to new features such as maintenance primitives.
>
> Joris
>
> On Mon, Sep 14, 2015 at 10:22 AM, Qian AZ Zhang 
> wrote:
>
> >
> >
> > Hi,
> >
> > In the maintenance epic (MESOS-1474), I see there are 3 tasks created to
> > add InverseOffer support to Scheduler API:
> > MESOS-2063  Add InverseOffer to C++ Scheduler API
> > MESOS-2064  Add InverseOffer to Java Scheduler API
> > MESOS-2065  Add InverseOffer to Python Scheduler API
> >
> > I think we have already supported Schedule HTTP API, so do we still need
> to
> > update the C++ scheduler API (and the Java/Python binding) to support
> > InverseOffer? If so, I think we may need to update all the example
> > frameworks as well. Take C++ scheduler API as an example, we may need to
> > add a new callback inverseResourceOffers() in the Scheduler class, and
> each
> > example framework's scheduler needs to implement it.
> >
> >
> > Regards,
> > Qian Zhang
>


Re: Do we still need to add InverseOffer support to Scheduler API?

2015-09-14 Thread Guangya Liu
Thanks Haosdent and Joris, I see that the host maintain patch (
https://reviews.apache.org/r/37180/diff/8#0) is also sending
"ResourceOffersMessage" to framework so the framework can still use
"ResourceOffer" to handle the inverseOffer when framework got the
inverseOffer, right?

Thanks,

Guangya

On Mon, Sep 14, 2015 at 11:15 PM, haosdent <haosd...@gmail.com> wrote:

> Hi @Guangya Liu. V1 API support both mesos call frameworks or frameworks
> call mesos.
>
>
> https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit
>
> And I think Java or Python API libraries would be deprecated and more out
> to a better place to maintain in the future(Also maybe support more
> languages through V1 API). Continue to add them to old APIs may be not a
> good choice.
>
> On Mon, Sep 14, 2015 at 11:02 PM, Guangya Liu <gyliu...@gmail.com> wrote:
>
> > Hi Joris,
> >
> > I think that those APIs are still needed as HTTP API is mainly initiated
> by
> > operator, the current call for HTTP API including TEARDOWN, ACCEPT,
> > DECLINE, REVIVE, KILL, SHUTDOWN etc, but the offer related operations
> such
> > as offer and InverserOffers are initiatedby mesos master, the master need
> > notify the framework for those offers via the callbacks. Comments?
> >
> > Thanks,
> >
> > Guangya
> >
> > On Mon, Sep 14, 2015 at 10:42 PM, Joris Van Remoortere <
> > jo...@mesosphere.io>
> > wrote:
> >
> > > Hi Qian,
> > >
> > > There is no current plan to add this to the old API. Those tickets were
> > > created pre-V1 API.
> > > Currently the goal is to encourage developers to use the V1 API to have
> > > access to new features such as maintenance primitives.
> > >
> > > Joris
> > >
> > > On Mon, Sep 14, 2015 at 10:22 AM, Qian AZ Zhang <zhang...@cn.ibm.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > In the maintenance epic (MESOS-1474), I see there are 3 tasks created
> > to
> > > > add InverseOffer support to Scheduler API:
> > > > MESOS-2063  Add InverseOffer to C++ Scheduler API
> > > > MESOS-2064  Add InverseOffer to Java Scheduler API
> > > > MESOS-2065  Add InverseOffer to Python Scheduler API
> > > >
> > > > I think we have already supported Schedule HTTP API, so do we still
> > need
> > > to
> > > > update the C++ scheduler API (and the Java/Python binding) to support
> > > > InverseOffer? If so, I think we may need to update all the example
> > > > frameworks as well. Take C++ scheduler API as an example, we may need
> > to
> > > > add a new callback inverseResourceOffers() in the Scheduler class,
> and
> > > each
> > > > example framework's scheduler needs to implement it.
> > > >
> > > >
> > > > Regards,
> > > > Qian Zhang
> > >
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Do we still need to add InverseOffer support to Scheduler API?

2015-09-14 Thread Guangya Liu
Just noticed that there is a framework example using Call/Event lingo for
V1 API:
https://github.com/apache/mesos/blob/master/src/examples/event_call_framework.cpp
, it is a good reference/example for how to use V1 API in a framework.

Thanks,

Guangya

On Tue, Sep 15, 2015 at 10:59 AM, Anand Mazumdar <an...@mesosphere.io>
wrote:

> Hi Qian,
>
> We currently don’t intend to move the old C++ Scheduler/Scheduler Driver <
> https://github.com/apache/mesos/blob/master/src/sched/sched.cpp>Scheduler
> Driver <https://github.com/apache/mesos/blob/master/src/sched/sched.cpp>
> interface to use the Mesos V1 API.
>
> If you want to use the new V1 API’s , you can use the low-level C++
> Scheduler Library <
> https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp>
> that speaks the new Call/Event lingo. Joris already pointed you to a very
> good example of an existing test using the Inverse Offer functionality :
> https://reviews.apache.org/r/37283 <https://reviews.apache.org/r/37283>
>
> Let me know if this resolves your confusion.
>
> -anand
>
>
> > On Sep 14, 2015, at 7:27 PM, Qian AZ Zhang <zhang...@cn.ibm.com> wrote:
> >
> > If we keep the current C++ scheduler API as it is, then I think
> framework can never receive inverse offer in its "resourceOffers()"
> callback, the reason is, In SchedulerProcess::initialize(), we have the
> following code:
> > install(
> > ::resourceOffers,
> > ::offers,
> > ::pids);
> > In the above code, only "offers" and "pids" fields of
> ResourceOffersMessage are passed into SchedulerProcess::resourceOffers()
> when it is invoked, but the "inverse_offers" field of ResourceOffersMessage
> is NOT passed into it.
> >
> >
> > Regards,
> > Qian Zhang (张乾)
> > Developer, IBM Platform Computing
> >   Phone: 86-29-68797144 | Tie-Line: 87144
> > E-mail: zhang...@cn.ibm.com <mailto:zhang...@cn.ibm.com>
> > Chat: zhq527725
> > “An educated man should know everything about something and something
> about everything"
> >
> >
> >
> > 陕西省西安市高新区
> > 高新六路42号中清大厦3层
> > Xian, Shaanxi Province 710075
> > China
> >
> > Guangya Liu ---09/14/2015 23:29:45---Thanks Haosdent and Joris, I see
> that the host maintain patch ( https://reviews.apache.org/r/37180/d <
> https://reviews.apache.org/r/37180/d>
> >
> > From: Guangya Liu <gyliu...@gmail.com>
> > To:   dev@mesos.apache.org
> > Date: 09/14/2015 23:29
> > Subject:  Re: Do we still need to add InverseOffer support to
> Scheduler API?
> >
> >
> >
> > Thanks Haosdent and Joris, I see that the host maintain patch (
> > https://reviews.apache.org/r/37180/diff/8#0 <
> https://reviews.apache.org/r/37180/diff/8#0>) is also sending
> > "ResourceOffersMessage" to framework so the framework can still use
> > "ResourceOffer" to handle the inverseOffer when framework got the
> > inverseOffer, right?
> >
> > Thanks,
> >
> > Guangya
> >
> > On Mon, Sep 14, 2015 at 11:15 PM, haosdent <haosd...@gmail.com> wrote:
> >
> > > Hi @Guangya Liu. V1 API support both mesos call frameworks or
> frameworks
> > > call mesos.
> > >
> > >
> > >
> https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit
> <
> https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit
> >
> > >
> > > And I think Java or Python API libraries would be deprecated and more
> out
> > > to a better place to maintain in the future(Also maybe support more
> > > languages through V1 API). Continue to add them to old APIs may be not
> a
> > > good choice.
> > >
> > > On Mon, Sep 14, 2015 at 11:02 PM, Guangya Liu <gyliu...@gmail.com>
> wrote:
> > >
> > > > Hi Joris,
> > > >
> > > > I think that those APIs are still needed as HTTP API is mainly
> initiated
> > > by
> > > > operator, the current call for HTTP API including TEARDOWN, ACCEPT,
> > > > DECLINE, REVIVE, KILL, SHUTDOWN etc, but the offer related operations
> > > such
> > > > as offer and InverserOffers are initiatedby mesos master, the master
> need
> > > > notify the framework for those offers via the callbacks. Comments?
> > > >
> > > > Thanks,
> > > >
> > > > Guangya
> > > >
> > > > On Mon, Sep 14, 2015 at 10:42 PM, Joris Van R

[Mesos][Patch] How to make the patches under review board link to a mesos bug

2015-08-05 Thread Guangya Liu
Hi Mesos,

I see that there are some patches can be easily be directed to a mesos bug
by a link on the right side of the review board, how can I have this for my
patch?
​

Thanks,

Guangya


Re: [Mesos][Patch] How to make the patches under review board link to a mesos bug

2015-08-05 Thread Guangya Liu
Got it. Thanks Kapil!

On Wed, Aug 5, 2015 at 6:07 PM, Kapil Arya ka...@mesosphere.io wrote:

 You need to click on the pencil icon near Bugs: and type in the Jira
 ticket number, e.g. MESOS-1234. You can do this only if you are logged in
 into Reviewboard and have created the review request yourself.

 Kapil

 PS: The dev list doesn't allow images, so if you had attached any, they
 were discarded :-).

 On Wed, Aug 5, 2015 at 5:52 PM, Guangya Liu gyliu...@gmail.com wrote:

  Hi Mesos,
 
  I see that there are some patches can be easily be directed to a mesos
 bug
  by a link on the right side of the review board, how can I have this for
 my
  patch?
  ​
 
  Thanks,
 
  Guangya
 



[mesos][gdb] How to use gdb-mesos-master.sh to debug mesos master code?

2015-07-15 Thread Guangya Liu
Hi Mesos dev:

I'm now trying to use gdb-mesos-master.sh to debug mesos master but found
that I cannot hit my break point, can someone help to see if there are
something wrong with my steps?

The following are my steps:

*1) Start gdb-mesos-master.sh*
root@devstack007:/home/gyliu/src/mesos/m1/mesos/build#
./bin/gdb-mesos-master.sh --ip=9.111.242.143 --work_dir=/var/lib/mesos
--log_dir=/root/mesos-log
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu.
Type show configuration for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type help.
Type apropos word to search for commands related to word...
Reading symbols from
/home/gyliu/src/mesos/m1/mesos/build/src/.libs/lt-mesos-master...done.
*2) Set break point*
(gdb) b master.cpp:2498
No source file named master.cpp.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) dir /home/gyliu/src/mesos/m1/mesos/src/master/
Source directories searched:
/home/gyliu/src/mesos/m1/mesos/src/master:$cdir:$cwd
(gdb) b Master::accept
Function Master::accept not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (Master::accept) pending.
(gdb) i b
Num Type   Disp Enb AddressWhat
1   breakpoint keep y   PENDING  Master::accept
(gdb) b master.cpp:2498
No source file named master.cpp.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (master.cpp:2498) pending.
*3) Start mesos master*
(gdb) r
Starting program:
/home/gyliu/src/mesos/m1/mesos/build/src/.libs/lt-mesos-master
--ip=9.111.242.143 --work_dir=/var/lib/mesos --log_dir=/root/mesos-log
Traceback (most recent call last):
  File /usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.
so.6.0.19-gdb.py, line 63, in module
from libstdcxx.v6.printers import register_libstdcxx_printers




*4) Start up mesos slave in another console*

*5) Start up python test framework, the code for mesos master did not stop
at my break point*
Thanks,

Guangya