Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Yan Xu
+1

It’s been quite a journey!

Yan

On Tue, Apr 6, 2021 at 7:49 PM Vinod Kone  wrote:

> Hi Rich,
>
> Thanks for chiming in and providing your perspective.
>
> Charles already did a great job summarizing some of the current sentiments
> in the community above. I wanted to add a couple more points based on my
> discussions with folks in the community and PMC.
>
> Yes, there are some folks who are still interested in making some (minor)
> contributions but for that they just need a single repo to collaborate on.
> ASF has been a great home and steward for the Mesos project, but at this
> stage in its lifecycle, Mesos project could actually benefit from an ultra
> lightweight process and collaboration model. A public GitHub repo with
> requisite permissions for collaborators would serve these purposes well
> compared to the ASF process (PMC, voting, board reports etc).
>
> As an aside, would the ASF Board have any issue with the community forking
> the project and collaborating at https://github.com/mesos/mesos ?
>
> Thanks,
> Vinod
>
> On Tue, Apr 6, 2021 at 9:01 PM Samuel Marks  wrote:
>
> > Who runs this one? https://github.com/mesos
> >
> > Samuel Marks
> > Charity  | consultancy <
> https://offscale.io>
> > | open-source  | LinkedIn
> > 
> >
> >
> > On Wed, Apr 7, 2021 at 11:42 AM Charles-François Natali <
> > cf.nat...@gmail.com>
> > wrote:
> >
> > > Hi Rich,
> > >
> > > FWIW, I'm one of those people who said they were interested, and I
> > > still voted to move it to the attic (even though my vote is non
> > > binding as I'm not a committer).
> > >
> > > Initially I also thought that we could try to revive it within the
> > > ASF, but it quickly became clear that *none* of the current committers
> > > is willing to go down that route, i.e. put in the effort needed to
> > > onboard new committers. And without that, there's just no way forward.
> > > Various people voiced other concerns as well, such as viability of the
> > > project when other alternatives like Kubernetes exist, lack of clear
> > > technical direction for the future, etc.
> > > While they're relevant questions, I think currently they don't really
> > > make sense since the current Mesos community is basically dead.
> > > Finally, I think that the project should be moved to the Attic de
> > > facto because AFAICT the Apache rules require at least 3 *active*
> > > committers, and that's definitely not the case.
> > >
> > > However I still do believe in the project for the reasons I outlined
> > > in some of the previous threads, and I'm still interested in
> > > contributing: I just think that the current structure of the project
> > > is not suited for that anymore. And to be honest, I just want to move
> > > on, I'm tired of those endless discussions - it's been almost 2 months
> > > since the first thread stared, and nothing happened.
> > >
> > > It's a shame that we won't be able to continue using
> > > https://github.com/apache/mesos though, as it creates a much higher
> > > barrier to continuing the project.
> > >
> > > However if that's really not possible, then I guess that leaves no
> > > other option: once the vote has passed, I guess I'll start a final
> > > thread to gather people who'd be interested to create a new project
> > > forked off master on github, so we can start from scratch with our own
> > > repository, bug tracker etc. I hope those people who said they're
> > > actually interested will be willing to take an active part.
> > >
> > > Cheers,
> > >
> > > Charles
> > >
> > >
> > >
> > >
> > >
> > > Le mer. 7 avr. 2021 à 02:50, Rich Bowen  a écrit :
> > > >
> > > > I hope y'all can forgive me for sticking my nose in, as a concerned
> > > member. Color me confused by this vote.
> > > >
> > > > A month ago, on this same list -
> > >
> >
> https://lists.apache.org/thread.html/r307db648e201182fcf39b0de63ba224b94965501e20e6cbcecc085e4%40%3Cdev.mesos.apache.org%3E
> > > - Qian asked who was still interested in keeping the project going. SIX
> > > people responded that, given the chance, they'd step up and keep it
> > going.
> > > >
> > > > Around that same time -
> > >
> >
> https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cdev.mesos.apache.org%3E
> > > - Vinod observed that the too-high barrier to granting committer rights
> > has
> > > been a major factor in the slowdown of the project.
> > > >
> > > > And yet, y'all are voting to attic the project.
> > > >
> > > > So, again, it's not my project, and I don't have a vote here, but the
> > > reason the Board asks projects to have these attic conversations on the
> > Dev
> > > list is *specifically* so that interested people can say, hey, don't
> > attic
> > > it, we'll take it from here. Which six people, plus Qian, have done.
> > > >
> > > > Maybe it's time to lower the barrier to entry, and let these willing
> > > people take 

Re: Welcome Meng Zhu as PMC member and committer!

2018-11-01 Thread Yan Xu
Congratulations!


On Wed, Oct 31, 2018 at 4:50 PM Vinod Kone  wrote:

> Congrats Meng!
>
> Thanks,
> Vinod
>
> > On Oct 31, 2018, at 4:26 PM, Gilbert Song  wrote:
> >
> > Well deserved, Meng!
> >
> >> On Wed, Oct 31, 2018 at 2:36 PM Benjamin Mahler 
> wrote:
> >> Please join me in welcoming Meng Zhu as a PMC member and committer!
> >>
> >> Meng has been active in the project for almost a year and has been very
> productive and collaborative. He is now one of the few people of
> understands the allocator code well, as well as the roadmap for this area
> of the project. He has also found and fixed bugs, and helped users in slack.
> >>
> >> Thanks for all your work so far Meng, I'm looking forward to more of
> your contributions in the project.
> >>
> >> Ben
>


Re: Getting write access to our GitHub repo

2018-06-22 Thread Yan Xu
IIUC this wouldn't necessarily rule out RB reviews just better support for
Github PRs?

On Fri, Jun 22, 2018 at 9:13 PM Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> GitHub PR code reviews have gotten _significantly_ better over the last
> two years. You can actually open addressable issues now (like
> ReviewBoard), and assign reviewers, and "officially" mark it as
> signed-off (ship-it) too. They used to suck so bad that I preferred
> inline email comments to PRs, but they've improved.
>
> On 06/22/2018 9:01 pm, James Peach wrote:
> >> On Jun 22, 2018, at 7:34 PM, Jie Yu  wrote:
> >>
> >> +1
> >>
> >> Does this means we can add CI webhooks to the git repo?
> >
> > FWIW, I'm hugely -1 on doing code reviews on GitHub. I'm cautiously
> > optimistic about other kinds of integration though.
> >
> >> On Thu, Jun 21, 2018 at 3:45 PM, James Peach  wrote:
> >>
> >>>
> >>>
>  On Jun 20, 2018, at 7:58 PM, Vinod Kone 
>  wrote:
> 
>  Hi folks,
> 
>  Looks like ASF now supports  giving
>  write
>  access to committers for their GitHub mirrors, which means we can
>  merge
> >>> PRs
>  directly on GitHub!
> >>>
> >>> Are you proposing that we move to Github generally?
> >>>
>  FWICT, this requires us moving our repo to a new gitbox server by
>  filing
> >>> an
>  INFRA ticket. We probably need to update our CI and other tooling
>  that
>  references our git repo directly, so there will be work involved on
>  our
> >>> end
>  as well.
> 
>  This has been one of the long requested features from several
>  committers,
>  so I'm gauging interest to see if folks think we should go down this
> >>> route
>  (several projects seem to be already moving
>  )
> >>> too.
> 
>  If there is enough interest, we could start a vote.
> 
>  Thanks,
>  Vinod
> >>>
> >>>
>


Re: API working group

2018-01-31 Thread Yan Xu
I’m in. Thanks!

(mobile)

> On Jan 30, 2018, at 10:42 AM, Vinod Kone  wrote:
> 
> Hi folks,
> 
> We've had good success with our containerization, performance and community
> working groups and so we would like to keep the momentum going and spin up
> new WGs as necessary.
> 
> One of the recent proposals is to spin up a API working group to discuss
> about API changes and infrastructure.
> 
> I'm sending this email to gauge interest from the broader community
> regarding this working group. Particularly I would like to know
> 1) Would you be interested in participating in the API WG?
> 2) Would you be interested in leading or co-leading the API WG?
> 
> Please reply to this thread if interested.
> 
> Thanks,
> Vinod


Reusing `reserve_resources` ACL for static reservation

2017-12-12 Thread Yan Xu
Hi,

In https://issues.apache.org/jira/browse/MESOS-8306 I am proposing that we
use an ACL to restrict the roles that agents can statically reserve
resources for to address a security concern in which a process on a
compromised host can impersonate an agent and then then reservation
resources for arbitrary roles.

Resuing `reserve_resources` ACL for this purpose feels intuitive to me and
I don't think it interferes with its use for authorizing dynamic
reservations by the frameworks and operators.

Are there any concerns about it?

Also as part of this change I am revising the doc to change the wording on
static reservations so its use is not discouraged:
https://reviews.apache.org/r/64516/diff

Thanks,
Yan


Re: [Proposal] Fetcher extract path

2017-10-09 Thread Yan Xu
+1.

Could you file a JIRA laying out the problem and the proposal? Here a link
about submitting a patch: http://mesos.apache.org/documentation/latest/
submitting-a-patch/

Note that there's already an output_file

field that decides where the artifacts will be dropped before the
extraction but it's not useful for the extract=true case.

---
@xujyan 

On Fri, Oct 6, 2017 at 8:04 AM, sigurd.spieckerm...@gmail.com <
sigurd.spieckerm...@gmail.com> wrote:

> Hi all,
>
> I'm using the Mesos fetcher to download artifacts (e.g. ZIP archives) to
> the sandbox prior to running a task. I noticed that the archive content is
> always extracted to the sandbox root directory (when extract=true) and
> there is currently no way to provide a different path where the content
> shall be extracted. I've written a patch that adds an additional parameter
> called "extract_path" and believe this feature may be valuable to others,
> too. First, is there any interest in adding this feature to Mesos? Second,
> what is the next step to start the review process of my patch?
>
> Thanks,
> Sigurd
>


Re: Updating running tasks in-place

2017-10-09 Thread Yan Xu
---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, Oct 4, 2017 at 11:50 AM, Zhitao Li <zhitaoli...@gmail.com> wrote:

> Thanks for taking the lead, Yan! Replying to your points inline:
>
> On Wed, Oct 4, 2017 at 11:11 AM, Yan Xu <y...@jxu.me> wrote:
>
> > Hi Mesos users/devs,
> >
> > I am curious about what use cases do folks in the community have about
> > updating running tasks? i.e., amending the current task without going
> > through the typical kill -> offer -> relaunch process.
> >
> > Typically you would only want to do that for the "pets
> > <https://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/>"
> > in
> > your cluster as it adds complexity in managing the tasks' lifecycle but
> > nevertheless in some cases it is too expensive to relocate the app or
> even
> > relaunching it onto the same host later.
> >
> > https://issues.apache.org/jira/browse/MESOS-1280 has some context about
> > this. In particular, people have mentioned the desire to:
> >
> >- Dynamically reconfiguring the task without restarting it.
> >- Upgrading the task transparently (i.e., restarting without dropping
> >connections)
> >
>
> One possible use case we have on this is to upgrade service mesh components
> (consider something similar to haproxy): because these instances handles
> all connections on the machine, restarting without dropping connection is a
> must for them.
>
>
Yeah this is an interesting. Sometime like this
<https://medium.com/@mattklein123/envoy-hot-restart-1d16b14555b5> or
SO_REUSEPORT
like you've mentioned before right? Seems like this would require a period
of time where both processes are running inside the pod and connections are
gradually drained from the old process and established on the new process?
Have already made it work outside of Mesos or on Mesos as separate tasks?


> >- Replacing tasks with another without going through offer cycles
> >
>
> We have concrete use case for this one.
>
>
> >- Task resizing <https://issues.apache.org/jira/browse/MESOS-1279>
> > (which
> >is captured in another JIRA)
>
>- Certain metadata, e.g., labels (but I imagine not all metadata makes
> >equal sense to be updatable).
> >
> > What other/specific use cases are folks interested in?
> >
> > Best,
> > Yan
> >
>
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread Yan Xu
Does it make sense to wrap the resources in a `Limitation` message in case
we add new fields for it?

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, Oct 9, 2017 at 10:56 AM, James Peach <jor...@gmail.com> wrote:

> Hi all,
>
> In https://reviews.apache.org/r/62644/, I am proposing to add an optional
> Resources field to the TaskStatus message named `limited_resources`.
>
> In the case that a task is killed because it violated a resource
> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> this field may be populated with the resource that triggered the
> limitation. This is intended to give better information to schedulers about
> task resource failures, in the expectation that it will help them bubble
> useful information up to the user or a monitoring system.
>
> diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
> index d742adbbf..559d09e37 100644
> --- a/include/mesos/v1/mesos.proto
> +++ b/include/mesos/v1/mesos.proto
> @@ -2252,6 +2252,13 @@ message TaskStatus {
>// status updates for tasks running on agents that are unreachable
>// (e.g., partitioned away from the master).
>optional TimeInfo unreachable_time = 14;
> +
> +  // If the reason field indicates a container resource limitation,
> +  // this field contains the resource whose limits were violated.
> +  //
> +  // NOTE: 'Resources' is used here because the resource may span
> +  // multiple roles (e.g. `"mem(*):1;mem(role):2"`).
> +  repeated Resource limited_resources = 16;
>  }
>
>
>
> cheers,
> James
>
>
>


Updating running tasks in-place

2017-10-04 Thread Yan Xu
Hi Mesos users/devs,

I am curious about what use cases do folks in the community have about
updating running tasks? i.e., amending the current task without going
through the typical kill -> offer -> relaunch process.

Typically you would only want to do that for the "pets
" in
your cluster as it adds complexity in managing the tasks' lifecycle but
nevertheless in some cases it is too expensive to relocate the app or even
relaunching it onto the same host later.

https://issues.apache.org/jira/browse/MESOS-1280 has some context about
this. In particular, people have mentioned the desire to:

   - Dynamically reconfiguring the task without restarting it.
   - Upgrading the task transparently (i.e., restarting without dropping
   connections)
   - Replacing tasks with another without going through offer cycles
   - Task resizing  (which
   is captured in another JIRA)
   - Certain metadata, e.g., labels (but I imagine not all metadata makes
   equal sense to be updatable).

What other/specific use cases are folks interested in?

Best,
Yan


Welcome James Peach as a new committer and PMC memeber!

2017-09-06 Thread Yan Xu
Hi Mesos devs and users,

Please welcome James Peach as a new Apache Mesos committer and PMC member.

James has been an active contributor to Mesos for over two years now. He
has made many great contributions to the project which include XFS disk
isolator, improvement to Linux capabilities support and IPC namespace
isolator. He's super active on the mailing lists and slack channels, always
eager to help folks in the community and he has been helping with a lot of
Mesos reviews as well.

Here is his formal committer candidate checklist:

https://docs.google.com/document/d/19G5zSxhrRBdS6GXn9KjCznjX
3cp0mUbck6Jy1Hgn3RY/edit?usp=sharing


Congrats James!

Yan


Re: [VOTE] Release Apache Mesos 1.4.0 (rc3)

2017-08-28 Thread Yan Xu
Also the libprocess refactor seems to have stability issues:
https://issues.apache.org/jira/browse/MESOS-7921

CI failures this crash caused:
https://lists.apache.org/list.html?bui...@mesos.apache.org:lte=1M:process%3A%3AEventQueue%3A%3AConsumer%3A%3Aempty%28%29%20

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, Aug 28, 2017 at 12:01 PM, Michael Park <mcyp...@gmail.com> wrote:

> -1
>
> I found MESOS-7922 <https://issues.apache.org/jira/browse/MESOS-7922>,
> which is an issue around the communication between
> old masters and new agents. Currently, the new agent re-registers with
> the tasks and executors in the new format. The new master conditionally
> upgrades the resources depending on whether the agent has the
> RESERVATION_REFINEMENT capability, but the old master does not know
> to this. The old master would incorrectly interpret any reserved resources
> in the tasks / executors as unreserved.
>
> On Mon, Aug 28, 2017 at 10:32 AM Kapil Arya <ka...@mesosphere.io> wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.4.0.
>>
>> 1.4.0 includes the following:
>> 
>> 
>>   * Ability to recover the agent ID after a host reboot.
>>   * File-based and image-pull secrets.
>>   * Linux ambient and bounding capabilities support.
>>   * Ability to efficiently measure disk usage without enforcing usage
>> constraints.
>>   * Hierarchical resource allocation roles. [EXPERIMENTAL]
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
>> plain;f=CHANGELOG;hb=1.4.0-rc3
>> 
>>
>>
>> The candidate for Mesos 1.4.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc3/mesos-1.4.0.tar.gz
>>
>> The tag to be voted on is 1.4.0-rc3:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.4.0-rc3
>>
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc3/
>> mesos-1.4.0.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc3/
>> mesos-1.4.0.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1212
>>
>> Please vote on releasing this package as Apache Mesos 1.4.0!
>>
>> The vote is open until Thursday, Aug 31 11:59 PM PDT 2017 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.4.0
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Anand and Kapil
>>
>


Re: Offer operation reconciliation discussion notes

2017-08-23 Thread Yan Xu
Yeah a reason for failed operations is probably useful for all resource
operations. It looks like the task-style status update is still the best
approach.

---
@xujyan 

On Wed, Aug 23, 2017 at 11:40 AM, Jie Yu  wrote:

> We should continue the discussion here:
>
> I think I forgot to mention one important reason that I went for the
> operation based reconciliation API proposal. For new operations like
> CREATE_VOLUME/CREATE_BLOCK, not only we need to know the end result (the
> resources) if it's successful, we also need to know the failure reason if
> it fails. For instance, imagine you're creating an EBS volume by talking to
> a CSI EBS plugin. Surfacing the creation error (e.g., retryable or not from
> the CSI plugin) will be useful for scheduler to determine the next step.
>
> I don't think a resources based reconciliation API can address this. Maybe
> we can add both if we feel both are useful?
>
> Thoughts?
> - Jie
>
> On Wed, Aug 23, 2017 at 11:26 AM, Jie Yu  wrote:
>
>> Hi,
>>
>> We had a discussion on some very early proposal (see the attached slides)
>> on providing feedback for offer operations (e.g., CREATE/DESTORY,
>> RESERVE/UNRESERVE, etc.) with a bunch of folks from the community. Here are
>> the notes I captured in the meeting:
>>
>>
>>- One alternative approach discussed was to have best effort
>>feedback, and a resources based reconciliation API allowing framework to
>>query the resources on a given resource provider or agent. That way, we
>>don't necessarily need the status update mechanism for offer operations,
>>which causes complexity in the frameworks.
>>- In the current proposal, do we need agent_id (or resource provider
>>id) when performing reconciliation for that operation? The reason we
>>require that in the task reconciliation case is because agent might not
>>re-register yet during master failover.
>>- We need to mock up the operator API for this work.
>>- What's the order guarantee for the operations specified in one API
>>call?
>>- Wish list
>>   - Reservation tie to framework instead of role.
>>   - When a framework teardown, auto release resources reserved for
>>   that framework
>>
>> If I miss anything, please reply to this thread! Thanks!
>>
>> https://docs.google.com/presentation/d/1Mef8K3aLIuzcFVc3MnAo
>> 64TkjpyTWarYVShtvCN4e48/edit?usp=sharing
>>
>> - Jie
>>
>
>


Re: [VOTE] Release Apache Mesos 1.4.0 (rc1)

2017-08-21 Thread Yan Xu
Note that https://issues.apache.org/jira/browse/MESOS-7714 would prevent
1.4.0 agents from being downgraded to 1.3. IMHO this is a pretty big deal
when considering known issues to accept in a release.

>From the versioning doc:

Every (minor) release is a stable release and recommended for production
use. This means a release candidate will go through rigorous testing (unit
tests, integration tests, benchmark tests, cluster tests, scalability etc)
before being officially released. In the rare case that a regular release
is not deemed stable, a patch release will be released that will stabilize
it.

If we release 1.4.0 without fixing it (or workarounds to disable it), we
are probably not going to be able to adhere to this guideline.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, Aug 21, 2017 at 11:53 AM, Vinod Kone <vinodk...@apache.org> wrote:

> Ran on ASF CI.
>
> Found 3 issues with tests.
>
> 1) GarbageCollectorIntegrationTest.ExitedFramework
> <https://issues.apache.org/jira/browse/MESOS-7905> : This one seems
> fairly new. *@Kapil can you confirm if this is a test issue or something
> in the code?*
>
> 2) DiskResource Persistent Volume tests seem to have interleaved output.
> This is a known issue <https://issues.apache.org/jira/browse/MESOS-6356>
> which we never got to the bottom of; I added logs from the CI. This is a
> CMake build FWIW.
>
> 3)  Double free corruption in python example framework test. Known issue
> <https://issues.apache.org/jira/browse/MESOS-7218> which we never got to
> the bottom of; added latest logs.
>
>
> *Revision*: b9187d54a97206b4a09fb5cb1d0834ab5fa5abd3
>
>- refs/tags/1.4.0-rc1
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Not run]
> cmake
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Not run]
> --verbose autotools
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Not run]
> cmake
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Failed]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> cmake
> [image: Failed]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> --verbose autotools
> [image: Success]
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/39/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> [image: Failed]
> <https://bui

Re: Performance working group meeting

2017-07-21 Thread Yan Xu
I'm in. Thanks!

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Fri, Jul 21, 2017 at 12:48 PM, Deepak Vij (A) <deepak@huawei.com>
wrote:

> Definitely interested in this. Thanks.
>
> Regards,
> Deepak Vij
>
> -Original Message-
> From: Benjamin Mahler [mailto:bmah...@apache.org]
> Sent: Friday, July 21, 2017 9:03 AM
> To: dev
> Cc: Dmitry Zhuk; Ilya Pronin; Michael Park; b...@apache.org;
> y...@apache.org
> Subject: Performance working group meeting
>
> Since there have been several folks working on performance related things
> lately, I'd like to try to schedule a meeting, this could be recurring if
> we find it useful,
>
> For an agenda, we could discuss:
>
> - ongoing work for libprocess optimizations and faster master failovers
> - existing performance related pain points
> - what people's priorities are, how much they can contribute
>
> Please join the #performance slack channel if you're interested in general,
> and if you'd like to join the meeting please reply here and include your
> time zone!
>
> Ben
>


Re: C++14 Upgrade

2017-07-19 Thread Yan Xu
+1!!

Thanks for the summary and collecting info for these distros.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, Jul 19, 2017 at 3:34 PM, Michael Park <mp...@apache.org> wrote:

> I'd like move us to C++14!
>
> The following I'd say are the important C++14 features for us:
>
>- Generic lambdas: [](const auto& x) { /* ... */ }
>- Extended lambda captures: [x = move(x)]() { /* ... */ }
>
> The following are some features that would be helpful for libprocess/stout:
>
>- Function return type deduction
>- Relaxed constexpr functions
>- std::integer_sequence + other meta-programming facilities
>
> The minimum GCC version would become 5, and minimum VS would be 2017
> (deprecation of VS 2015 is already in progress). Clang 3.5 is our current
> minimum Clang and it already implements C++14 so there's nothing to do
> there.
>
> As a bonus, we pick up the  header which is C++11 but haven't been
> usable for us since it's not implemented in GCC 4.8.
>
> Here's the spreadsheet of available compilers from various distros and how
> to get them:
> https://docs.google.com/spreadsheets/d/1ocQ19Uv1d8wdb-
> QL4fDRAiQ12gPQwL3cIAzuV0csYwM/edit#gid=0
>
> Please suggest more distros we should consider, and
> provide feedback with your concerns!
>
> Thanks,
>
> MPark
>


Re: Agent reregistration timeout, no TASK_LOST messages

2017-07-17 Thread Yan Xu
On Mon, Jul 17, 2017 at 9:34 AM, Neil Conway  wrote:

> On Mon, Jul 17, 2017 at 9:20 AM, Ilya Pronin 
> wrote:
>
> > AFAIK the absence of TASK_LOST statuses is expected. Master registry
> > persists information only about agents. Tasks are recovered from
> > re-registering agents. Because of that the failed over master can't send
> > TASK_LOST for tasks that were running on the agent that didn't
> re-register,
> > it simply doesn't know about them. The only thing the master can do in
> this
> > situation is send LostSlaveMessage that will tell the scheduler that
> tasks
> > on this agent are LOST/UNREACHABLE.
> >
>
> +1.
>
> The situation where the agent came back after reregistration timeout
> > doesn't sound good. The only way for the framework to learn about tasks
> > that are still running on such agent is either from status updates or via
> > implicit reconciliation. Perhaps, the master could send updates for tasks
> > it learned about when such agent is readmitted?
> >
>
> I agree this would be a good idea:
> https://issues.apache.org/jira/browse/MESOS-6406
>
> I haven't had a chance to implement it yet, but if someone is interested, I
> think this would be a pretty nicely scoped project.
>

The master should probably send updates about non-partition-aware framework
tasks as well. Especially in light of MESOS-7215 for which we are going to
stop killing tasks in all cases.


>
> Neil
>


Re: RFC: removing process implementations from common headers

2017-06-27 Thread Yan Xu
This sounds reasonable to me. Do others have comments?

---
@xujyan 

On Fri, Jun 23, 2017 at 4:23 PM, James Peach  wrote:

> Hi all,
>
> There is a common Mesos pattern where a subsystem is implemented by a
> facade class that forwards calls to an internal Process class, eg. Fetcher
> and FetcherProcess, or zookeeper::Group and zookeeper::GroupProcess. Since
> the Process is an internal implementation detail, I'd like to propose that
> we adopt a general policy that it should not be exposed in the primary
> header file. This has the following benefits:
>
> - reduces the number of symbols exposed to clients including the primary
> header file
> - reduces the number of header files needed in the primary header file
> - reduces the number of rebuilt dependencies when the process
> implementation changes
>
> Although each individual case of this practice may not improve build
> times, I think it is likely that over time, consistent application of this
> will help.
>
> In many cases, when FooProcess is only used by Foo, both the declaration
> and definitions of Foo can be inlined into "foo.cpp", which is already our
> common practice. If the implementation of the Process class is needed
> outside the facade (eg. for testing), the pattern I would propose is:
>
> foo.hpp - Primary API for Foo, forward declares FooProcess
> foo_process.hpp - Declarations for FooProcess
> foo_process.cpp - Definitions of FooProcess
>
> The "checks/checker.hpp" interface almost follows this pattern, but gives
> up the build benefits by including "checker_process.hpp" in "checker.hpp".
> This should be simple to fix however.
>
> thanks,
> James


Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-06-02 Thread Yan Xu
+1 (binding)

Ran it in a test cluster.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Thu, Jun 1, 2017 at 2:34 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> +1 (binding)
>
> Looks like ExamplesTest.DynamicReservationFramework is flaky,
> unfortunately
> wasn't able to get the logs for a failed run.
>
> On Thu, Jun 1, 2017 at 2:03 PM, Benjamin Mahler <bmah...@apache.org>
> wrote:
>
> > Not a blocker, but noticed the parallel test runner isn't bundled in the
> > release, if you configure with '--enable-parallel-test-execution':
> >
> > /Users/bmahler/Downloads/mesos-1.3.0/support/mesos-gtest-runner.py
> > --sequential=*ROOT_* ./stout-tests
> > /bin/sh: /Users/bmahler/Downloads/mesos-1.3.0/support/mesos-
> gtest-runner.py:
> > No such file or directory
> >
> > On Wed, May 31, 2017 at 1:48 PM, Vinod Kone <vinodk...@apache.org>
> wrote:
> >
> >> Thanks for the triage.
> >>
> >> +1 (binding)
> >>
> >> On Wed, May 31, 2017 at 1:33 PM, Neil Conway <neil.con...@gmail.com>
> >> wrote:
> >>
> >>> On Tue, May 30, 2017 at 3:43 PM, Neil Conway <neil.con...@gmail.com>
> >>> wrote:
> >>> > Attached is the test log for this failure. From a quick look, seems
> as
> >>> > though the agent starts to launch the task, including forking the
> >>> > child process, but no subsequent task status updates or error
> messages
> >>> > are observed. Gaston, have you seen this before?
> >>> >
> >>> > I filed https://issues.apache.org/jira/browse/MESOS-7589 to track
> >>> this.
> >>>
> >>> I wasn't able to repro this failure. Per Gaston's email, there isn't
> >>> enough information in the logs to understand what is going on here,
> >>> although it certainly seems weird that apparently the executor doesn't
> >>> start.
> >>>
> >>> I think this doesn't justify blocking the release, but we should watch
> >>> to see if the problem recurs.
> >>>
> >>> Neil
> >>>
> >>
> >>
> >
>


Re: Use of ACLs.RegisterAgent.agent

2017-05-24 Thread Yan Xu
I should have added that it will be inconvenient for us but not impossible
to cope. However I am not convinced about the need to rename so let's chat
about this on the review or the dev list.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, May 24, 2017 at 10:29 AM, Yan Xu <y...@jxu.me> wrote:

> I made a comment on https://reviews.apache.org/r/59453/
>
> ---
> Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>
>
> On Wed, May 24, 2017 at 9:48 AM, Vinod Kone <vi...@mesosphere.io> wrote:
>
>> If it hasn't been released it should be ok for us to do the rename. There
>> are no backwards compatible guarantees for such things. But a heads up is
>> always nice, so thanks for doing that.
>>
>> On Wed, May 24, 2017 at 12:44 PM, Neil Conway <neil.con...@gmail.com>
>> wrote:
>>
>>> FYI, I merged the change to rename this field into the master and
>>> 1.3.x branches; it will be included in the next 1.3.0 release
>>> candidate.
>>>
>>> Neil
>>>
>>>
>>> On Mon, May 22, 2017 at 10:43 AM, Alexander Rojas
>>> <alexan...@mesosphere.io> wrote:
>>> > Hey guys,
>>> >
>>> > We just noted that there was an error when the `RegisterAgent` act was
>>> > introduced. Namely, its object field is listed as `agent` when by
>>> convention
>>> > we have used plural, so it should be `agents`. This ACL hasn’t been
>>> part of
>>> > any released version of Mesos, so if no one is using it I will try to
>>> push
>>> > for a rename without going through any deprecation cycle.
>>> >
>>> > The big question is if any of you are using this particular ACL in
>>> > production right now?
>>> >
>>> > Alexander Rojas
>>> > alexan...@mesosphere.io
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>


Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Yan Xu
Congrats! Well deserved!

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, May 24, 2017 at 10:54 AM, Vinod Kone <vinodk...@apache.org> wrote:

> Congrats Gilbert!
>
> On Wed, May 24, 2017 at 1:32 PM, Neil Conway <neil.con...@gmail.com>
> wrote:
>
> > Congratulations Gilbert! Well-deserved!
> >
> > Neil
> >
> > On Wed, May 24, 2017 at 10:32 AM, Jie Yu <yujie@gmail.com> wrote:
> > > Hi folks,
> > >
> > > I' happy to announce that the PMC has voted Gilbert Song as a new
> > committer
> > > and member of PMC for the Apache Mesos project. Please join me to
> > > congratulate him!
> > >
> > > Gilbert has been working on Mesos project for 1.5 years now. His main
> > > contribution is his work on unified containerizer, nested container
> (aka
> > > Pod) support. He also helped a lot of folks in the community regarding
> > their
> > > patches, questions and etc. He also played an important role organizing
> > > MesosCon Asia last year and this year!
> > >
> > > His formal committer checklist can be found here:
> > > https://docs.google.com/document/d/1iSiqmtdX_0CU-YgpViA6r6PU_
> > aMCVuxuNUZ458FR7Qw/edit?usp=sharing
> > >
> > > Welcome, Gilbert!
> > >
> > > - Jie
> >
>


Re: Use of ACLs.RegisterAgent.agent

2017-05-24 Thread Yan Xu
I made a comment on https://reviews.apache.org/r/59453/

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, May 24, 2017 at 9:48 AM, Vinod Kone <vi...@mesosphere.io> wrote:

> If it hasn't been released it should be ok for us to do the rename. There
> are no backwards compatible guarantees for such things. But a heads up is
> always nice, so thanks for doing that.
>
> On Wed, May 24, 2017 at 12:44 PM, Neil Conway <neil.con...@gmail.com>
> wrote:
>
>> FYI, I merged the change to rename this field into the master and
>> 1.3.x branches; it will be included in the next 1.3.0 release
>> candidate.
>>
>> Neil
>>
>>
>> On Mon, May 22, 2017 at 10:43 AM, Alexander Rojas
>> <alexan...@mesosphere.io> wrote:
>> > Hey guys,
>> >
>> > We just noted that there was an error when the `RegisterAgent` act was
>> > introduced. Namely, its object field is listed as `agent` when by
>> convention
>> > we have used plural, so it should be `agents`. This ACL hasn’t been
>> part of
>> > any released version of Mesos, so if no one is using it I will try to
>> push
>> > for a rename without going through any deprecation cycle.
>> >
>> > The big question is if any of you are using this particular ACL in
>> > production right now?
>> >
>> > Alexander Rojas
>> > alexan...@mesosphere.io
>> >
>> >
>> >
>> >
>>
>
>


Re: [VOTE] Release Apache Mesos 1.3.0 (rc2)

2017-05-17 Thread Yan Xu
-1 (binding)

Let's address this blocker first. Neil's looking into it now.

Yan


Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-08 Thread Yan Xu
We work around autotools and protobuf bugs and glibc is only harder for
users and developers to upgrade. :)

I agree that we can establish the minimum glibc version/linux distro
releases etc we support but currently we don't and there are folks who use
Mesos that depend on this version.

We should follow http://mesos.apache.org/documentation/latest/versioning/
and when answers are not in there, we should seek consensus to improve the
process and update the doc and give folks adequate time to adjust to the
new or better defined) process. In the meantime, I think we shouldn't break
the current users.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, May 8, 2017 at 3:59 PM, Neil Conway <neil.con...@gmail.com> wrote:

> Personally, I'm not convinced that we need to fix MESOS-7378. The
> problem is essentially a bug in glibc that was fixed 6 years ago. (As
> a point of reference, the oldest version of g++ we support was
> released 2 years ago... :) )
>
> Neil
>
> On Mon, May 8, 2017 at 3:45 PM, Yan Xu <y...@jxu.me> wrote:
> > I am still hoping that we get
> > https://issues.apache.org/jira/browse/MESOS-7378 fixed before shipping
> > 0.13.0. :)
> >
> > ---
> > Jiang Yan Xu <y...@jxu.me> | @xujyan
> >
> > On Fri, May 5, 2017 at 6:31 PM, Michael Park <mp...@apache.org> wrote:
> >>
> >> Hi all,
> >>
> >> Please vote on releasing the following candidate as Apache Mesos 1.3.0.
> >>
> >>
> >> 1.3.0 includes the following:
> >>
> >> 
> 
> >>   - Multi-role framework support
> >>   - Executor authentication support
> >>   - Allow frameworks to modify their roles.
> >>   - Hierarchical roles (*EXPERIMENTAL*)
> >>
> >> The CHANGELOG for the release is available at:
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.3.0-rc1
> >>
> >> 
> 
> >>
> >> The candidate for Mesos 1.3.0 release is available at:
> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/
> mesos-1.3.0.tar.gz
> >>
> >> The tag to be voted on is 1.3.0-rc1:
> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=
> commit;h=1.3.0-rc1
> >>
> >> The MD5 checksum of the tarball can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/
> mesos-1.3.0.tar.gz.md5
> >>
> >> The signature of the tarball can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/
> mesos-1.3.0.tar.gz.asc
> >>
> >> The PGP key used to sign the release is here:
> >> https://dist.apache.org/repos/dist/release/mesos/KEYS
> >>
> >> The JAR is up in Maven in a staging repository here:
> >> https://repository.apache.org/content/repositories/orgapachemesos-1190
> >>
> >> Please vote on releasing this package as Apache Mesos 1.3.0!
> >>
> >> The vote is open until Wed May 10 11:59:59 PDT 2017 and passes if a
> >> majority of at least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Mesos 1.3.0
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >>
> >> MPark & Neil
> >
> >
>


Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-08 Thread Yan Xu
s/0.13.0/1.3.0/ :)

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, May 8, 2017 at 3:45 PM, Yan Xu <y...@jxu.me> wrote:

> I am still hoping that we get https://issues.apache.org/
> jira/browse/MESOS-7378 fixed before shipping 0.13.0. :)
>
> ---
> Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>
>
> On Fri, May 5, 2017 at 6:31 PM, Michael Park <mp...@apache.org> wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.3.0.
>>
>>
>> 1.3.0 includes the following:
>> 
>> 
>>   - Multi-role framework support
>>   - Executor authentication support
>>   - Allow frameworks to modify their roles.
>>   - Hierarchical roles (*EXPERIMENTAL*)
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.3.0-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.3.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos-1.3.0.tar.gz
>>
>> The tag to be voted on is 1.3.0-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.3.0-rc1
>>
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos
>> -1.3.0.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos
>> -1.3.0.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1190
>>
>> Please vote on releasing this package as Apache Mesos 1.3.0!
>>
>> The vote is open until Wed May 10 11:59:59 PDT 2017 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.3.0
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>>
>> MPark & Neil
>>
>
>


Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-08 Thread Yan Xu
I am still hoping that we get
https://issues.apache.org/jira/browse/MESOS-7378 fixed before shipping
0.13.0. :)

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Fri, May 5, 2017 at 6:31 PM, Michael Park <mp...@apache.org> wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.3.0.
>
>
> 1.3.0 includes the following:
> 
> 
>   - Multi-role framework support
>   - Executor authentication support
>   - Allow frameworks to modify their roles.
>   - Hierarchical roles (*EXPERIMENTAL*)
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.3.0-rc1
> 
> 
>
> The candidate for Mesos 1.3.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos-1.3.0.tar.gz
>
> The tag to be voted on is 1.3.0-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.3.0-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/
> mesos-1.3.0.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/
> mesos-1.3.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1190
>
> Please vote on releasing this package as Apache Mesos 1.3.0!
>
> The vote is open until Wed May 10 11:59:59 PDT 2017 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.3.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
>
> MPark & Neil
>


Re: Welcome Kevin Klues as a Mesos Committer and PMC member!

2017-03-02 Thread Yan Xu
Congrats Kevin!

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Wed, Mar 1, 2017 at 2:05 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> Hi all,
>
> Please welcome Kevin Klues as the newest committer and PMC member of the
> Apache Mesos project.
>
> Kevin has been an active contributor in the project for over a year, and in
> this time he made a number of contributions to the project: Nvidia GPU
> support [1], the containerization side of POD support (new container init
> process), and support for "attach" and "exec" of commands within running
> containers [2].
>
> Also, Kevin took on an effort with Haris Choudhary to revive the CLI [3]
> via a better structured python implementation (to be more accessible to
> contributors) and a more extensible architecture to better support adding
> new or custom subcommands. The work also adds a unit test framework for the
> CLI functionality (we had no tests previously!). I think it's great that
> Kevin took on this much needed improvement with Haris, and I'm very much
> looking forward to seeing this land in the project.
>
> Here is his committer eligibility document for perusal:
> https://docs.google.com/document/d/1mlO1yyLCoCSd85XeDKIxTYyboK_
> uiOJ4Uwr6ruKTlFM/edit
>
> Thanks!
> Ben
>
> [1] http://mesos.apache.org/documentation/latest/gpu-support/
> [2]
> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482N
> SVk8V0D56sFMzU
> [3]
> https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVsc
> gYqrD07OyIglsA/
>


Re: How to consistent handle default values for message types

2017-02-16 Thread Yan Xu
On Mon, Feb 13, 2017 at 5:27 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> The way I think about this is that if the field is semantically required
>

"semantically required": good point and this should definitely be one of
the criteria.

I guess we still need to clarify this for each of these messages: there's
another class of "semantically required" fields that you DO need to set we
need to disambiguate.

I'll start by improving the comments on `Filters`.



> then there can be a default. If not, then the optionality has meaning. For
> example, if there is always a notion of filtering, then having a default
> filter makes sense. But if the absence of a filter means no filtering
> occurs, then absence of the optional field has a meaning and we don't
> interpret the overall message to have a default value.
>
> Also, if we want to move to proto3 syntax at some point, we'll have to push
> our defaults into our API handling code rather than in the proto file
> AFAICT.
>
> On Thu, Feb 2, 2017 at 12:06 PM, Yan Xu <xuj...@apple.com> wrote:
>
> > With protobuf you can specify custom default values for scalar types
> > (proto2 at least) but not message types, e.g.,
> >
> > ```
> > message Filters {
> >   // Time to consider unused resources refused. Note that all unused
> >   // resources will be considered refused and use the default value
> >   // (below) regardless of whether Filters was passed to
> >   // SchedulerDriver::launchTasks. You MUST pass Filters with this
> >   // field set to change this behavior (i.e., get another offer which
> >   // includes unused resources sooner or later than the default).
> >   optional double refuse_seconds = 1 [default = 5.0];
> > }
> > ```
> >
> > However, the message `Filters` essential has a default value because
> *all*
> > its
> > fields have default values. It all depends on whether the receiver
> chooses
> > to check it is not set or directly accesses it and gets the default
> values.
> >
> > When we reference the type in other messages, e.g.,
> >
> > ```
> >   message Accept {
> > repeated OfferID offer_ids = 1;
> > repeated Offer.Operation operations = 2;
> > optional Filters filters = 3;
> >   }
> > ```
> >
> > We are not explicitly telling users what's going to happen when `filters`
> > is not set. The master just directly uses it without checking.
> >
> > It does feel intuitive to me that "*if all the fields in a message have
> > default values, and it semantically feels like a config, then we can just
> > interpret them when unset as indication to use defaults*".
> >
> > However we probably should document it better.
> >
> > To generalize it further, for something like this with multiple fields
> >
> > ```
> > message ExponentialBackoff {
> >   optional double initial_interval_seconds = 1 [default = 0.5];
> >   optional double max_interval_seconds = 2 [default = 300.0];
> >   optional double randomization_factor = 3 [default = 0.5];
> >   optional double max_elapsed_seconds = 4 [default = 2592000.0];
> > }
> > ```
> >
> > we should be able to not require them to be set and assume the defaults?
> >
> > One step further, if the message has recursively nested messages with
> > default values, we can treat the parent message as having a default value
> > too?
> >
> > Thoughts?
> >
> > Yan
> >
>


Re: How to consistent handle default values for message types

2017-02-07 Thread Yan Xu

> On Feb 7, 2017, at 11:13 AM, Joris Van Remoortere <jo...@mesosphere.io> wrote:
> 
>> 
>> we should be able to not require them to be set and assume the defaults?
> 
> 
> What happens when we add a field without a default value to the message?
> If users can assume they don't need to set the message this may cause
> backwards compatibility issues in the future.

We can't. So this is why I said such message should semantically sound like a 
config object with defaults. (I think the backoff example I gave fits this 
criterion).
If we really need to add such a field, we would have to use a different message.

> 
> Maybe I am not understanding whether you mean leaving the message unset, or
> the field?

I meant leaving the message unset.

> 
> —
> *Joris Van Remoortere*
> Mesosphere
> 
> On Thu, Feb 2, 2017 at 3:06 PM, Yan Xu <xuj...@apple.com> wrote:
> 
>> With protobuf you can specify custom default values for scalar types
>> (proto2 at least) but not message types, e.g.,
>> 
>> ```
>> message Filters {
>> // Time to consider unused resources refused. Note that all unused
>> // resources will be considered refused and use the default value
>> // (below) regardless of whether Filters was passed to
>> // SchedulerDriver::launchTasks. You MUST pass Filters with this
>> // field set to change this behavior (i.e., get another offer which
>> // includes unused resources sooner or later than the default).
>> optional double refuse_seconds = 1 [default = 5.0];
>> }
>> ```
>> 
>> However, the message `Filters` essential has a default value because *all*
>> its
>> fields have default values. It all depends on whether the receiver chooses
>> to check it is not set or directly accesses it and gets the default values.
>> 
>> When we reference the type in other messages, e.g.,
>> 
>> ```
>> message Accept {
>>   repeated OfferID offer_ids = 1;
>>   repeated Offer.Operation operations = 2;
>>   optional Filters filters = 3;
>> }
>> ```
>> 
>> We are not explicitly telling users what's going to happen when `filters`
>> is not set. The master just directly uses it without checking.
>> 
>> It does feel intuitive to me that "*if all the fields in a message have
>> default values, and it semantically feels like a config, then we can just
>> interpret them when unset as indication to use defaults*".
>> 
>> However we probably should document it better.
>> 
>> To generalize it further, for something like this with multiple fields
>> 
>> ```
>> message ExponentialBackoff {
>> optional double initial_interval_seconds = 1 [default = 0.5];
>> optional double max_interval_seconds = 2 [default = 300.0];
>> optional double randomization_factor = 3 [default = 0.5];
>> optional double max_elapsed_seconds = 4 [default = 2592000.0];
>> }
>> ```
>> 
>> we should be able to not require them to be set and assume the defaults?
>> 
>> One step further, if the message has recursively nested messages with
>> default values, we can treat the parent message as having a default value
>> too?
>> 
>> Thoughts?
>> 
>> Yan
>> 



Design for OnTerminationPolicy

2017-02-02 Thread Yan Xu
Hi all,

So after some discussions on the previous draft for restartable tasks and
considering other feature requests such as to allow frameworks to customize
the behavior when tasks in a task group terminate, we incorporated these
features into one OnTerminationPolicy concept, which governs the executor's
handling of task termination and the agent's handling of executor
termination.

Here is new doc:
https://docs.google.com/document/d/1VxfoZ-DzMHnKY0gzoccHEhx1rvdC2-RATJfJUfiAwGY/edit?usp=sharing

Feedback welcome!

Yan
---
@xujyan 

On Tue, Nov 29, 2016 at 8:47 AM, Megha Sharma  wrote:

> Hi All,
>
> Thanks for your feedback on the design, here’s the revised design for
> Restartable Tasks.
>
> https://docs.google.com/document/d/1epYCznSjevbiA776Yr72xx365IGEz
> KRnoLnXBorm0J0/edit?usp=sharing
>
> Based on the feedback we have had on the old design and discussion with a
> bunch of committers, we have added the restart by executor and taskgroup
> restart to the design. Looking forward to your comments/feedback.
>
> Many Thanks
> Megha Sharma
>
> On Oct 26, 2016, at 6:23 PM, Benjamin Mahler  wrote:
>
> Thanks for publishing this! Saw some tickets being created and was
> wondering where this email was.. :)
>
> The higher level thing that strikes me is that I think the notion of a
> task restart policy should be managed by the executor (i.e. the executor
> restarts the task based on the policy). This is aligned with how the
> existing kill and health check policies work. This project seems to be
> something more along the lines of a restartable executor, alongside a
> change to perform agent recovery across reboot?
>
> Since this project is pretty complicated, it would be prudent to gather
> some committers to provide feedback and we can publish our notes to the
> lists.
>
> Ben
>
> On Wed, Oct 26, 2016 at 5:13 PM, Megha Sharma  wrote:
>
>> Hi All,
>>
>> We have been working on the design to allow tasks which need to be
>> restarted on the agent post its restart. Looking forward to your
>> comments/feedback.
>>
>> Design Doc:
>> https://docs.google.com/document/d/1YS_EBUNLkzpSru0dwn_hPUIe
>> TATiWckSaosXSIaHUCo/edit#heading=h.tlevdyt3yv0a
>>
>> JIRA:
>> https://issues.apache.org/jira/browse/MESOS-3545
>>
>> Many Thanks
>> Megha Sharma
>>
>>
>>
>>
>>
>
>


How to consistent handle default values for message types

2017-02-02 Thread Yan Xu
With protobuf you can specify custom default values for scalar types
(proto2 at least) but not message types, e.g.,

```
message Filters {
  // Time to consider unused resources refused. Note that all unused
  // resources will be considered refused and use the default value
  // (below) regardless of whether Filters was passed to
  // SchedulerDriver::launchTasks. You MUST pass Filters with this
  // field set to change this behavior (i.e., get another offer which
  // includes unused resources sooner or later than the default).
  optional double refuse_seconds = 1 [default = 5.0];
}
```

However, the message `Filters` essential has a default value because *all* its
fields have default values. It all depends on whether the receiver chooses
to check it is not set or directly accesses it and gets the default values.

When we reference the type in other messages, e.g.,

```
  message Accept {
repeated OfferID offer_ids = 1;
repeated Offer.Operation operations = 2;
optional Filters filters = 3;
  }
```

We are not explicitly telling users what's going to happen when `filters`
is not set. The master just directly uses it without checking.

It does feel intuitive to me that "*if all the fields in a message have
default values, and it semantically feels like a config, then we can just
interpret them when unset as indication to use defaults*".

However we probably should document it better.

To generalize it further, for something like this with multiple fields

```
message ExponentialBackoff {
  optional double initial_interval_seconds = 1 [default = 0.5];
  optional double max_interval_seconds = 2 [default = 300.0];
  optional double randomization_factor = 3 [default = 0.5];
  optional double max_elapsed_seconds = 4 [default = 2592000.0];
}
```

we should be able to not require them to be set and assume the defaults?

One step further, if the message has recursively nested messages with
default values, we can treat the parent message as having a default value
too?

Thoughts?

Yan


Re: Order of includes

2016-12-18 Thread Yan Xu
The example is helpful. Thanks!

I have no objection to sticking to the new rule then. But the we have to:

- For contributors and committers, start using the new style when creating
new files today.
- Fix the existing include order hopefully with the help of tools like
clang-tidy and have it enforce the style going forward.

Agreed?

---
@xujyan 

On Fri, Dec 16, 2016 at 8:54 PM, Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi,
>
> > How does putting your own header at the top (vs. ~the bottom) help ensure
> > "a header file always includes all symbols it requires”?
>
>
> Given an incomplete header
>
> // foo.hpp
> std::string f();
>
> // foo.cpp
> #include “foo.hpp”
> #include 
>
> std::string f() { return {}; }
>
> I get
>
> % clang++ -fsyntax-only foo.cpp --std=c++11
> In file included from foo.cpp:1:
> ./foo.hpp:1:1: error: use of undeclared identifier 'std'
> std::string f();
> ^
> 1 error generated.
>
> Swapping the include order makes this pass as `#include` is just textual
> replacement, and the `#include ` in `foo.cpp` would declare the
> symbol used in `foo.hpp`.
>
>
> Cheers,
>
> Benjamin


Re: Welcome Guangya Liu as Mesos Committer and PMC member!

2016-12-18 Thread Yan Xu
Congrats!

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Mon, Dec 19, 2016 at 1:31 AM, haosdent <haosd...@gmail.com> wrote:

> Congrats Guangya!
>
> On Sun, Dec 18, 2016 at 10:02 PM, Klaus Ma <klaus1982...@gmail.com> wrote:
>
>> Congratulations!!
>>
>> On Sat, Dec 17, 2016 at 1:23 PM Dharmesh Kakadia <dhkaka...@gmail.com>
>> wrote:
>>
>>> Congrats Guangya !
>>>
>>> Thanks,
>>> Dharmesh
>>>
>>> On Fri, Dec 16, 2016 at 5:03 PM, Dario Rexin <dre...@apple.com> wrote:
>>>
>>> Congrats!
>>>
>>> > On Dec 16, 2016, at 4:27 PM, Vinod Kone <vinodk...@apache.org> wrote:
>>> >
>>> > Congrats Guangya! Welcome to the PMC!
>>> >
>>> >> On Fri, Dec 16, 2016 at 7:03 PM, Sam <usultra...@gmail.com> wrote:
>>> >> congratulations Guangya
>>> >>
>>> >> Sent from my iPhone
>>> >>
>>> >>> On 17 Dec 2016, at 3:23 AM, Avinash Sridharan <avin...@mesosphere.io>
>>> wrote:
>>> >>>
>>> >>> Congrats Guangya !!
>>> >>>
>>> >>>> On Fri, Dec 16, 2016 at 11:20 AM, Greg Mann <g...@mesosphere.io>
>>> wrote:
>>> >>>> Congratulations Guangya!!! :D
>>> >>>>
>>> >>>>> On Fri, Dec 16, 2016 at 11:10 AM, Jie Yu <yujie@gmail.com>
>>> wrote:
>>> >>>>> Hi folks,
>>> >>>>>
>>> >>>>> Please join me in formally welcoming Guangya Liu as Mesos
>>> Committer and PMC
>>> >>>>> member.
>>> >>>>>
>>> >>>>> Guangya has worked on the project for more than a year now and has
>>> been a
>>> >>>>> very active contributor to the project. I think one of the most
>>> important
>>> >>>>> contribution he has for the community is that he helped grow the
>>> Mesos
>>> >>>>> community in China. He initiated the Xian-Mesos-User-Group and
>>> successfully
>>> >>>>> organized two meetups which attracted more than 100 people from
>>> Xi’an
>>> >>>>> China. He wrote a handful of blogs and articles in Chinese tech
>>> media which
>>> >>>>> attracted a lot of interests in Mesos. He had given several talks
>>> about
>>> >>>>> Mesos at conferences in China.
>>> >>>>>
>>> >>>>> His major coding contribution to the project was the docker volume
>>> driver
>>> >>>>> isolator. He has also been involved in allocator performance
>>> improvement,
>>> >>>>> gpu support for docker containerizer, Mesos Tiers/Optimistic Offer
>>> design,
>>> >>>>> scarce resources discussion, and many others.
>>> >>>>>
>>> >>>>> His formal checklist is here:
>>> >>>>> https://docs.google.com/document/d/1tot79kyJCTTgJHBhzStFKrVkDK4pX
>>> >>>>> qfl-LHCLOovNtI/edit?usp=sharing
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> - Jie
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Avinash Sridharan, Mesosphere
>>> >>> +1 (323) 702 5245
>>> >
>>>
>>>
>>> --
>>
>> Regards,
>> 
>> Da (Klaus), Ma (马达), PMP® | Software Architect
>> IBM Platform Development & Support, STG, IBM GCG
>> +86-10-8245 4084 <+86%2010%208245%204084> | mad...@cn.ibm.com |
>> http://k82.me
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Order of includes

2016-12-15 Thread Yan Xu
On Thu, Dec 15, 2016 at 7:44 AM, Michael Park  wrote:

> I would vote to keep the "include yourself first" rule, for reasons that
> Benjamin points out.
>
> I think that we (committers) shouldn't be actively (and silently) going
> against the rules we have in place.
> Aside from that... the constructive thing I can suggest is to help
> enforcement by continuing the work on
> clang-tidy (entirely blocked on me at this point).
>

Would be great if it can fix the existing include order. :)


>
> On Tue, Dec 13, 2016 at 11:15 PM, Benjamin Bannier <
> benjamin.bann...@mesosphere.io> wrote:
>
> > Hi Yan,
> >
> > I don’t feel too strongly about most of our style rules regarding include
> > ordering since they are just about style.
> >
> > > For a cpp file foo.cpp, our style guide instructs folks to put the
> header
> > > foo.hpp at the top of the include list:
> > > https://github.com/apache/mesos/blob/master/docs/c%2B%
> > 2B-style-guide.md#order-of-includes
> > >
> > > This is consistent with Google style guide but in reality most of the
> our
> > > files follow the rule of "treat foo.hpp the same way as other project
> > > headers”.
> >
> > Among all our style rules regarding includes, this one actually does have
> > a solid technical justification: It helps ensure that a header file
> always
> > includes all symbols it requires (OK, possibly via discouraged transitive
> > includes in the header itself). Not strictly following this rule has lead
> > to broken header files making their way into the code base (both in the
> > case of internal and public headers), see e.g.,
> >
> >   https://reviews.apache.org/r/54083/
> >   https://reviews.apache.org/r/54084/
> >   https://reviews.apache.org/r/54083/
> >
> > I’d rather have us follow a style that performs some automagic checking
> of
> > header completeness than rely on humans to catch all issues.
> >
> > Note that including `foo.hpp` first in `foo.cpp` is common practice, and
> I
> > expect following this rule would lead to _less friction_ for newcomers to
> > the Mesos code base, see e.g., (no particular order)
> >
> >   http://llvm.org/docs/CodingStandards.html#include-style
> >   https://github.com/bloomberg/bde/wiki/physical-code-
> > organization#component-design-rules
> >   https://webkit.org/code-style-guidelines/#include-statements
> >   https://github.com/facebook/hhvm/blob/master/hphp/doc/
> > coding-conventions.md#what-to-include
> >   https://google.github.io/styleguide/cppguide.html#
> > Names_and_Order_of_Includes
> >
> >
> > Cheers,
> >
> > Benjamin
>


Re: Order of includes

2016-12-15 Thread Yan Xu
On Wed, Dec 14, 2016 at 3:15 PM, Benjamin Bannier  wrote:

> Hi Yan,
>
> I don’t feel too strongly about most of our style rules regarding include
> ordering since they are just about style.
>
> > For a cpp file foo.cpp, our style guide instructs folks to put the header
> > foo.hpp at the top of the include list:
> > https://github.com/apache/mesos/blob/master/docs/c%2B%2B-
> style-guide.md#order-of-includes
> >
> > This is consistent with Google style guide but in reality most of the our
> > files follow the rule of "treat foo.hpp the same way as other project
> > headers”.
>
> Among all our style rules regarding includes, this one actually does have
> a solid technical justification: It helps ensure that a header file always
> includes all symbols it requires (OK, possibly via discouraged transitive
> includes in the header itself). Not strictly following this rule has lead
> to broken header files making their way into the code base (both in the
> case of internal and public headers), see e.g.,
>
>   https://reviews.apache.org/r/54083/
>   https://reviews.apache.org/r/54084/
>   https://reviews.apache.org/r/54083/


How does putting your own header at the top (vs. ~the bottom) help ensure
"a header file always includes all symbols it requires"?


>
>
> I’d rather have us follow a style that performs some automagic checking of
> header completeness than rely on humans to catch all issues.
>
> Note that including `foo.hpp` first in `foo.cpp` is common practice, and I
> expect following this rule would lead to _less friction_ for newcomers to
> the Mesos code base, see e.g., (no particular order)
>
>   http://llvm.org/docs/CodingStandards.html#include-style
>   https://github.com/bloomberg/bde/wiki/physical-code-organiza
> tion#component-design-rules
>   https://webkit.org/code-style-guidelines/#include-statements
>   https://github.com/facebook/hhvm/blob/master/hphp/doc/coding
> -conventions.md#what-to-include
>   https://google.github.io/styleguide/cppguide.html#Names_and_
> Order_of_Includes
>
>
>
Yeah I have no issues with this practice. I am mainly commenting on the
status quo: there's an entrenched practice and the effort to change it
seemed to have gone no where. I don't personally like one approach better
than the other so I'd like to hear the community thoughts (e.g., technical
benefit) on this and hope we reach a consensus.

Cheers,
>
> Benjamin


Re: Order of includes

2016-12-13 Thread Yan Xu
Another related practice we should standardize is that, Google style guide
suggests here
<https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes>
:

1) If you rely on symbols from bar.h, don't count on the fact that you
included foo.h which (currently) includes bar.h: include bar.h yourself,
2) unless foo.h explicitly demonstrates its intent to provide you the
symbols of bar.h.
3) However, any includes present in the related header do not need to be
included again in the related cc (i.e., foo.cc can rely on foo.h's
includes).

We obviously follow 1) but AFAIK we don't follow 2) and 3), instead we have
been doing "include everything needed yourself, no exceptions".

It would be great to clarify this in the Mesos style guide as well.

---
@xujyan <https://twitter.com/xujyan>

On Wed, Dec 14, 2016 at 10:14 AM, Yan Xu <xuj...@apple.com> wrote:

> For a cpp file foo.cpp, our style guide instructs folks to put the header
> foo.hpp at the top of the include list: https://github.com/apache/meso
> s/blob/master/docs/c%2B%2B-style-guide.md#order-of-includes
>
> This is consistent with Google style guide but in reality most of the our
> files follow the rule of "treat foo.hpp the same way as other project
> headers".
>
> Since this rule has been introduced to the Mesos style guide I haven't
> seen much adoption. Most committers still observe the "old" rule when
> creating new files and have been instructing contributors to do the same.
>
> Given the current status I'd suggest we revert the style guide on this?
> It's only adding confusion and I don't see the need to do a sweeping change
> to the codebase to comply with the new rule.
>
> Thoughts?
>
> Yan
>


Order of includes

2016-12-13 Thread Yan Xu
For a cpp file foo.cpp, our style guide instructs folks to put the header
foo.hpp at the top of the include list:
https://github.com/apache/mesos/blob/master/docs/c%2B%2B-style-guide.md#order-of-includes

This is consistent with Google style guide but in reality most of the our
files follow the rule of "treat foo.hpp the same way as other project
headers".

Since this rule has been introduced to the Mesos style guide I haven't seen
much adoption. Most committers still observe the "old" rule when creating
new files and have been instructing contributors to do the same.

Given the current status I'd suggest we revert the style guide on this?
It's only adding confusion and I don't see the need to do a sweeping change
to the codebase to comply with the new rule.

Thoughts?

Yan


Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-28 Thread Yan Xu
So one thing that was brought up during offline conversations was that if
the host reboot is associated with hardware change (e.g., a new memory
stick):


   - Currently: the agent would skip the recovery (and the chance of
   running into incompatible agent info) and register as a new agent.
   - With the change: the agent could run into incompatible agent info due
   to resource change and flap
   

   indefinitely until the operator intervenes.


To mitigate this and maintain the current behavior, we can have the agent
remove `rm -f /meta/slaves/latest` automatically upon recovery
failure but only after the host has rebooted. This way the agent can
restart as a new agent without operator intervention.

Any thoughts?

BTW this speaks to the need for MESOS-1739.

Yan

On Tue, Nov 15, 2016 at 7:37 AM, Megha Sharma  wrote:

> Hi All,
>
> We have been working on the design for Restartable tasks (
> MESOS-3545) and allowing agents to recover and re-register post reboot is a
> pre-requisite for that.
> Agent today doesn’t recover its state that includes its SlaveID post a
> host reboot, it short-circuits the recovery upon discovering the reboot and
> registers with the master as a new agent. With Partition Awareness, the
> mesos master even allows agents which have failed master’s health check
> pings (unreachable agents) to re-register with it and reconcile the
> tasks/executors. The executors on a rebooted host are anyway terminated so
> there is no harm in letting such an agent recover and re-register with the
> master using its old SlaveID.
> Would like to hear from the folks here if you see any operational concerns
> with letting the agents recover post a host reboot.
>
> MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223
>
> Many Thanks
> Megha Sharma
>
>
>


Re: [VOTE] Release Apache Mesos 1.0.2 (rc3)

2016-11-11 Thread Yan Xu
+1.

Tested `make check` on CentOS 6.

On Thu, Nov 10, 2016 at 9:39 AM, Till Toenshoff  wrote:

> +1
>
> Tested `make distcheck`:
>
> With SSL.
> MacOS 10.12.1 (16B2555) => OK
>
> With SSL and without.
> Centos 7 => OK
> Debian 8 => OK
> Fedora 23 => OK
> Ubuntu 14 => OK
> Ubuntu 12 => OK
> Ubuntu 15 => OK
> Ubuntu 16 => OK
>
>
> On Nov 7, 2016, at 8:24 PM, Vinod Kone  wrote:
>
> Hi all,
>
>
> Please vote on releasing the following candidate as Apache Mesos 1.0.2.
>
>
> This is a bug fix release.
>
>
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> lain;f=CHANGELOG;hb=1.0.2-rc3
>
> 
> 
>
>
> The candidate for Mesos 1.0.2 release is available at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc3/mesos-1.0.2.tar.gz
>
>
> The tag to be voted on is 1.0.2-rc3:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.2-rc3
>
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc3/mesos
> -1.0.2.tar.gz.md5
>
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc3/mesos
> -1.0.2.tar.gz.asc
>
>
> The PGP key used to sign the release is here:
>
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
>
> The JAR is up in Maven in a staging repository here:
>
> https://repository.apache.org/content/repositories/orgapachemesos-1168
>
>
> Please vote on releasing this package as Apache Mesos 1.0.2!
>
>
> The vote is open until Thu Nov 10 11:22:30 PST 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache Mesos 1.0.2
>
> [ ] -1 Do not release this package because ...
>
>
> Thanks,
>
>
>


Re: [VOTE] Release Apache Mesos 1.1.0 (rc1)

2016-10-24 Thread Yan Xu
-1 (binding)

Hope we can https://issues.apache.org/jira/browse/MESOS-6446 in? It's
already a blocker for 1.0.2 and it could be annoying for operators.

If we cut another rc, maybe we can get this one (
https://issues.apache.org/jira/browse/MESOS-6455) in too? It's not a
blocker but it fails my build.

Yan

Jiang Yan Xu 

On Mon, Oct 24, 2016 at 11:49 AM, Alexander Rojas <alexan...@mesosphere.io>
wrote:

> +1 (non-biding)
>
> Ubuntu 16.04
>
> ../configure --enable-ssl --enable-libevent && sudo make check
>
> On Mon, Oct 24, 2016 at 3:41 PM, Gastón Kleiman <gas...@mesosphere.io>
> wrote:
>
> > +1 (non-binding), "make check' and also Marathon's integration tests pass
> > on OS X.
> >
> > -Gastón
> >
> > On Tue, Oct 18, 2016 at 10:01 PM, Till Toenshoff <toensh...@me.com>
> wrote:
> >
> > > Hi all,
> > >
> > > Please vote on releasing the following candidate as Apache Mesos 1.1.0.
> > >
> > >
> > > 1.1.0 includes the following:
> > > 
> > > 
> > >   * [MESOS-2449] - **Experimental** support for launching a group of
> > tasks
> > > via a new `LAUNCH_GROUP` Offer operation. Mesos will guarantee that
> > > either
> > > all tasks or none of the tasks in the group are delivered to the
> > > executor.
> > > Executors receive the task group via a new `LAUNCH_GROUP` event.
> > >
> > >   * [MESOS-2533] - **Experimental** support for HTTP and HTTPS health
> > > checks.
> > > Executors may now use the updated `HealthCheck` protobuf to
> implement
> > > HTTP(S) health checks. Both default executors (command and docker)
> > > leverage
> > > `curl` binary for sending HTTP(S) requests and connect to
> > `127.0.0.1`,
> > > hence a task must listen on all interfaces. On Linux, For BRIDGE
> and
> > > USER
> > > modes, docker executor enters the task's network namespace.
> > >
> > >   * [MESOS-3421] - **Experimental** Support sharing of resources across
> > > containers. Currently persistent volumes are the only resources
> > > allowed to
> > > be shared.
> > >
> > >   * [MESOS-3567] - **Experimental** support for TCP health checks.
> > > Executors
> > > may now use the updated `HealthCheck` protobuf to implement TCP
> > health
> > > checks. Both default executors (command and docker) connect to
> > > `127.0.0.1`,
> > > hence a task must listen on all interfaces. On Linux, For BRIDGE
> and
> > > USER
> > > modes, docker executor enters the task's network namespace.
> > >
> > >   * [MESOS-4324] - Allow access to persistent volumes as read-only or
> > > read-write
> > > by tasks. Mesos doesn't allow persistent volumes to be created as
> > > read-only
> > > but in 1.1 it starts allow tasks to use the volumes as read-only.
> > This
> > > is
> > > mainly motivated by shared persistent volumes but applies to
> regular
> > > persistent volumes as well.
> > >
> > >   * [MESOS-5275] - **Experimental** support for linux capabilities.
> > > Frameworks
> > > or operators now have fine-grained control over the capabilities
> > that a
> > > container may have. This allows a container to run as root, but not
> > > have all
> > > the privileges associated with the root user (e.g., CAP_SYS_ADMIN).
> > >
> > >   * [MESOS-5344] -- **Experimental** support for partition-aware Mesos
> > > frameworks. In previous Mesos releases, when an agent is
> partitioned
> > > from
> > > the master and then reregisters with the cluster, all tasks running
> > on
> > > the
> > > agent are terminated and the agent is shutdown. In Mesos 1.1,
> > > partitioned
> > > agents will no longer be shutdown when they reregister with the
> > > master. By
> > > default, tasks running on such agents will still be killed (for
> > > backward
> > > compatibility); however, frameworks can opt-in to the new
> > > PARTITION_AWARE
> > > capability. If they do this, their tasks will not be killed when a
> > > partition
> > > is healed. This allows frameworks to define their own policies for
> > how
> > > to
> > > handle partitioned tasks. Enabling the PARTITION_AWARE cap

Re: On Mesos versioning and deprecation policy

2016-10-14 Thread Yan Xu
On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu <xuj...@apple.com> wrote:

> Thanks Alex for starting this!
>
> In addition to comments below, I think it'll be helpful to keep the
> existing versioning doc concise and user-friendly while having a dedicated
> doc for the "implementation details" where precise requirements and
> procedures go. Maybe some duplication/cross-referencing is needed but Mesos
> developers will find the latter much more helpful while the users/framework
> developer will find the former easy to read.
>
> e.g., a similar split:
> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
> https://github.com/kubernetes/kubernetes/blob/master/docs/de
> vel/api_changes.md (which has a lot of details on how the kubernetes
> community is thinking about similar issues, which we can learn from)
>
> Jiang Yan Xu 
>
> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com>
> wrote:
>
>> Folks,
>>
>> There have been a bunch of online [1, 2] and offline discussions about our
>> deprecation and versioning policy. I found that people—including
>> myself—read the versioning doc [3] differently; moreover some aspects are
>> not captured there. I would like to start a discussion around this topic
>> by
>> sharing my confusions and suggestions. This will hopefully help us stay on
>> the same page and have similar expectations. The second goal is to
>> eliminate ambiguities from the versioning doc (thanks Vinod for
>> volunteering to update it).
>>
>
> +1 Let me know if there are things I can help with.
>
>
>>
>> 1. API vs. semantic changes.
>> Current versioning guide treat features (e.g. flags, metrics, endpoints)
>> and API differently: incompatible changes for the former are allowed after
>> 6 month deprecation cycle, while for the latter they require bumping a
>> major version. I suggest we consolidate these policies.
>>
>
> I feel that the distinction is not API vs. semantic changes, Backwards
> compatible API guarantee should imply backwards compatible semantics (of
> the API).
> i.e., if a change in API doesn't cause the message to be dropped to the
> floor but leads to behavior change that causes problems in the system, it
> still breaks compatibility.
>
> IMO the distinction is more between:
> - Compatibility between components that are impossible/very unpleasant to
> upgrade in lockstep - high priority for compatibility guarantee.
> - Compatibility between components that are generally bundled (modules) or
> things that usually aren't built into automated tooling (e.g., the /state
> endpoint) - more relaxed for now but we should explicitly exclude them from
> the guarantee.
>
>
>>
>> We should also define and clearly explain what changes require bumping the
>> major version. I have no strong opinion here and would love to hear what
>> people think. The original motivation for maintaining backwards
>> compatibility is to make sure vN schedulers can correctly work with vN API
>> without being updated. But what about semantic changes that do not touch
>> the API? For example, what if we decide to send less task health updates
>> to
>> schedulers based on some health policy? It influences the flow of task
>> status updates, should such change be considered compatible? Taking it to
>> an extreme, we may not even be able to fix some bugs because someone may
>> already rely on this behaviour!
>>
>
> API changes should warrant a major version bump. Also the API is not just
> what the machine reads but all the documentation associated with it, right?
> It depends on what the documentation says; what the user _should_ expect.
>
> That said, I feel that these things are hard to be talked about in the
> abstract. Even with a guideline, we still need to make case-by-case
> decisions. (e.g., has the documentation precisely defined this precise
> behavior? If not, is it reasonable for the users to expect some behavior
> because it's common sense? How bad is it if some behavior just changes a
> tiny bit?) Therefore we need to make sure the process for API changes are
> more rigorously defined.
>
> Whether something is a bug depends on whether the API does what it says
> it'll do. The line may sometimes be blurry but in general I don't feel it's
> a problem. If someone is relying on the behavior that is a bug, we should
> still help them fix it but the bug shouldn't count as "our guarantee".
>
>
>>
>> Another tightly related thing we should explicitly call out is
>> upgradability and rollback capabilities inside a major release. Committing
>> to this may significantly li

Re: On Mesos versioning and deprecation policy

2016-10-14 Thread Yan Xu
Thanks Alex for starting this!

In addition to comments below, I think it'll be helpful to keep the
existing versioning doc concise and user-friendly while having a dedicated
doc for the "implementation details" where precise requirements and
procedures go. Maybe some duplication/cross-referencing is needed but Mesos
developers will find the latter much more helpful while the users/framework
developer will find the former easy to read.

e.g., a similar split:
https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api_changes.md
(which has a lot of details on how the kubernetes community is thinking
about similar issues, which we can learn from)

Jiang Yan Xu 

On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com> wrote:

> Folks,
>
> There have been a bunch of online [1, 2] and offline discussions about our
> deprecation and versioning policy. I found that people—including
> myself—read the versioning doc [3] differently; moreover some aspects are
> not captured there. I would like to start a discussion around this topic by
> sharing my confusions and suggestions. This will hopefully help us stay on
> the same page and have similar expectations. The second goal is to
> eliminate ambiguities from the versioning doc (thanks Vinod for
> volunteering to update it).
>

+1 Let me know if there are things I can help with.


>
> 1. API vs. semantic changes.
> Current versioning guide treat features (e.g. flags, metrics, endpoints)
> and API differently: incompatible changes for the former are allowed after
> 6 month deprecation cycle, while for the latter they require bumping a
> major version. I suggest we consolidate these policies.
>

I feel that the distinction is not API vs. semantic changes, Backwards
compatible API guarantee should imply backwards compatible semantics (of
the API).
i.e., if a change in API doesn't cause the message to be dropped to the
floor but leads to behavior change that causes problems in the system, it
still breaks compatibility.

IMO the distinction is more between:
- Compatibility between components that are impossible/very unpleasant to
upgrade in lockstep - high priority for compatibility guarantee.
- Compatibility between components that are generally bundled (modules) or
things that usually aren't built into automated tooling (e.g., the /state
endpoint) - more relaxed for now but we should explicitly exclude them from
the guarantee.


>
> We should also define and clearly explain what changes require bumping the
> major version. I have no strong opinion here and would love to hear what
> people think. The original motivation for maintaining backwards
> compatibility is to make sure vN schedulers can correctly work with vN API
> without being updated. But what about semantic changes that do not touch
> the API? For example, what if we decide to send less task health updates to
> schedulers based on some health policy? It influences the flow of task
> status updates, should such change be considered compatible? Taking it to
> an extreme, we may not even be able to fix some bugs because someone may
> already rely on this behaviour!
>

API changes should warrant a major version bump. Also the API is not just
what the machine reads but all the documentation associated with it, right?
It depends on what the documentation says; what the user _should_ expect.

That said, I feel that these things are hard to be talked about in the
abstract. Even with a guideline, we still need to make case-by-case
decisions. (e.g., has the documentation precisely defined this precise
behavior? If not, is it reasonable for the users to expect some behavior
because it's common sense? How bad is it if some behavior just changes a
tiny bit?) Therefore we need to make sure the process for API changes are
more rigorously defined.

Whether something is a bug depends on whether the API does what it says
it'll do. The line may sometimes be blurry but in general I don't feel it's
a problem. If someone is relying on the behavior that is a bug, we should
still help them fix it but the bug shouldn't count as "our guarantee".


>
> Another tightly related thing we should explicitly call out is
> upgradability and rollback capabilities inside a major release. Committing
> to this may significantly limit what we can change within a major release;
> on the other side it will give users more time and a better experience
> about using and maintaining Mesos clusters.
>

According to the versioning doc upgradability depends on whether you depend
on deprecated/removed features.

That paragraph should be explained more precisely:
- "deprecated" means your system won't break but warnings are shown (Maybe
we should use some standard deprecation warning keywords so the operator
can monitor the 

Re: Deprecate MESOS_DIRECTORY executor environment variable

2016-10-07 Thread Yan Xu
The rules for APIs stipulate that the deprecation period doesn't start
until the next major release (2.0). I suppose environment variable changes
don't qualify (or do they? This is how the agent interface with the
executor and requires executor developers to change their programs). So
when should it start (1.1 vs. 2.0)?

On Fri, Oct 7, 2016 at 5:33 PM, Jie Yu  wrote:

> https://github.com/apache/mesos/blob/master/docs/versioning.md
>
> "The deprecation period for any given feature will be 6 months. Having a
> set period allows Mesos developers to not indefinitely accrue technical
> debt and allows users time to plan for upgrades."
>
> - Jie
>
> On Fri, Oct 7, 2016 at 5:28 PM, Zameer Manji  wrote:
>
> > Jie,
> >
> > Without commenting on this deprecation, how is this going to work now
> that
> > Mesos is 1.0?
> >
> > What is the definition of "deprecate" being used here? Is it something
> that
> > will be removed in Mesos 2.0?
> >
> > On Fri, Oct 7, 2016 at 4:49 PM, Jie Yu  wrote:
> >
> > > Hi,
> > >
> > > Want to initiate a discussion here. Before Mesos containerizer has
> > > container image support (all containers share the same host file
> system),
> > > $MESOS_DIRECTORY env variable is used to let executor know their
> sandbox
> > > location.
> > >
> > > Later, we introduced container image support to Mesos containerizer so
> > that
> > > each container can has its own root filesystem. Due to some historical
> > > reason (thermos), we decided to keep $MESOS_DIRECTORY to be the path to
> > the
> > > sandbox on the host filesystem (e.g., `/var/lib/mesos/slaves/...`) even
> > if
> > > the container has its own root filesystem. And introduced a new
> > > $MESOS_SANDBOX to point to the sandbox in the container's root
> filesystem
> > > (e.g., `/mnt/mesos/sandbox`). If the container does not have a root
> > > filesystem, $MESOS_DIRECTORY == $MESOS_SANDBOX.
> > >
> > > Now, we plan to deprecate $MESOS_DIRECTORY because it'll be really
> > > confusing to executor writers, and it'll be an error if they try to
> > access
> > > $MESOS_DIRECTORY if their container has a root filesystem defined.
> > >
> > > - Jie
> > >
> > > --
> > > Zameer Manji
> > >
> >
>


Re: Deprecate MESOS_DIRECTORY executor environment variable

2016-10-07 Thread Yan Xu
I agree that the executor shouldn't need to known about the sandbox host
path for itself but today it provides a way for the information to be
propagated to the scheduler or external systems.

I don't remember how thermos observer works but I guess the executor
registers itself to a the observer and provides it with information about
the host path for its sandbox? I could be incorrect about thermos but this
usage pattern exists elsewhere as well and will break after this change.

Overall I agree that we should deprecate this usage pattern but it's worth
providing solutions to replace it. This sandbox host path information is
available through the agent's /state endpoint but I don't think it's
available via the agent HTTP API? The `GetExecutors` call doesn't return
this path but its `ListFiles` and `ReadFile` calls expect this path. Should
the host path be added to the result of GetExecutors call? (This being just
one example)

On Fri, Oct 7, 2016 at 4:49 PM, Jie Yu  wrote:

> Hi,
>
> Want to initiate a discussion here. Before Mesos containerizer has
> container image support (all containers share the same host file system),
> $MESOS_DIRECTORY env variable is used to let executor know their sandbox
> location.
>
> Later, we introduced container image support to Mesos containerizer so that
> each container can has its own root filesystem. Due to some historical
> reason (thermos), we decided to keep $MESOS_DIRECTORY to be the path to the
> sandbox on the host filesystem (e.g., `/var/lib/mesos/slaves/...`) even if
> the container has its own root filesystem. And introduced a new
> $MESOS_SANDBOX to point to the sandbox in the container's root filesystem
> (e.g., `/mnt/mesos/sandbox`). If the container does not have a root
> filesystem, $MESOS_DIRECTORY == $MESOS_SANDBOX.
>
> Now, we plan to deprecate $MESOS_DIRECTORY because it'll be really
> confusing to executor writers, and it'll be an error if they try to access
> $MESOS_DIRECTORY if their container has a root filesystem defined.
>
> - Jie
>


Re: Question about the deprecated policy after 1.0

2016-09-22 Thread Yan Xu
Thanks @haosdent. Sorry there's been offline chats and I was waiting for that 
to circle back to the list.

Comments inlined. Besides this particular issue I think we all agree that we 
can improve the versioning doc to avoid confusion in the future. I'll send a RR 
for that but let's focus on this particular issue first.

> On Sep 14, 2016, at 7:42 PM, haosdent <haosd...@gmail.com> wrote:
> 
> Thx @yan's update!
> 
> >The remaining issue we'd like to discuss with the community is, of the 
> >things in we plan to support in v1 API and hope to deprecate in Mesos 2.0, 
> >should we slate them for deprecation before we have defined their 
> >replacement?
> 
> IMO, if we sure a feature would be deprecated in Mesos 2.0, we should 
> deprecate it immediately although could not give a clear replacement at that 
> time.
> Then users would think that feature is not recommended to use and avoid to 
> use it.

There has been misunderstandings about whether the v1 HTTP health check API is 
supported but I thought that the eventual conclusion was that we are supporting 
it in 1.x.
See 
http://mesos.slackarchive.io/health-check/-/1472748463.51/1473696580.000102/147323563288/
 
<http://mesos.slackarchive.io/health-check/-/1472748463.51/1473696580.000102/147323563288/>

But this is not even about it.

If the API is totally not supposed to be used, no need to wait until 2.0, 
remove it today.
If the API is supported in v1 and we are not sunsetting this feature, we need 
to provide an "upgrade path" whenever we plan to replace it with something else 
(in addition to conforming to the process here 
<http://mesos.apache.org/documentation/latest/versioning/>)

In either case, "deprecate this API in 2.0 without an alternative" doesn't make 
sense.

This is not a hypothetical issue - there are people who depend on this field.

> 
> Otherwise, we would encounter the same problem again:
> After few releases, we finish the better replacement of that old feature. 
> Unfortunately, we find it has been used for a lot of users because we didn' 
> mark it deprecated before. Then we have to deprecate it in Mesos 3.0.
> A worse case is we forget and ignore that TODO item as time flows, and left 
> this tech debt to other guys. This is which we should avoid now if we could.

Think about the users who depend on this who find out that alternative API 
doesn't make into the v2 but the old API is removed from it in v2. What should 
they do?

I don't think we should ever do this.

Like I said, it's perfectly fine if we want to deprecate this in 2.0. To do 
that, let's make sure the alternative API makes into 2.0 first. If we fail to 
achieve that, why shouldn't we postpone the deprecation further?

> 
> Bases on the above two points, I think we should slate the deprecated feature 
> although we don't have defined their replacement.
> 
> On Thu, Sep 15, 2016 at 4:31 AM, Yan Xu <xuj...@apple.com 
> <mailto:xuj...@apple.com>> wrote:
> To follow up on this, after discussions we (contributors and reviewers of 
> MESOS-6110) have agreed to support the existing HTTP health check API in 
> Mesos 1.x. (See https://reviews.apache.org/r/51803 
> <https://reviews.apache.org/r/51803>)
> 
> The remaining issue we'd like to discuss with the community is, of the things 
> in we plan to support in v1 API and hope to deprecate in Mesos 2.0, should we 
> slate them for deprecation before we have defined their replacement?
> 
> Two contrasting examples In /r/51803:
> `HTTPCheckInfo::type`:
> - v1 API: setting `type` is not required.
> - Proposal: Deprecate the above in 2.0, i.e., requiring it to be set.
> - This is clearly OK because it's clear to the users how to migrate to the 
> new API -> just set the `type`.
> 
> `HTTPCheckInfo::statuses`:
> - v1 API: This field can be set, even though not acted on by Mesos default 
> executors, they can be used by custom executors.
> - Proposal: Deprecate the above in 2.0, i.e., removing it.
> - IMO we cannot slate this API for deprecation in 2.0 right now because 
> there's no replacement defined yet. I think ultimately we don't want to be in 
> the situation where we remove `statuses` without replacing it with something 
> else. If that's the case, I don't think we need to rush to decide that it has 
> to be deprecated in 2.0. What if the replacement doesn't make it into 2.0?
> 
> Feel free to provide your feedback here or on 
> https://reviews.apache.org/r/51803 <https://reviews.apache.org/r/51803> where 
> more context may help clarify things.
> 
> 
> > On Sep 6, 2016, at 12:05 PM, Yan Xu <xuj...@apple.com 
> > <mailto:xuj...@apple.com>> wrote:
> >
> > Should we think of this not as "whether this change should be s

Re: 中国的Mesos爱好者们,关于今年在杭州的MesosCon大会

2016-09-19 Thread Yan Xu
Would surveymonkey.com be more accessible?

On Monday, September 19, 2016, Yan Yan YY Hu  wrote:

> We need proxy to access Google in China mainland. But I guess this won't
> be a problem for us since there are many tools can help to cross the
> GreatFireWall :)
>
> Best regards!
> **
> Yanyan Hu(胡彦彦) Ph.D.
> Cloud Infrastructure & Technology Team
> Building 19 Zhongguancun Software Park, 8 Dongbeiwang WestRoad, Haidian
> District, Beijing,P.R.C.100094
> E-mail: yanya...@cn.ibm.com
> 
> Tel: 8610-58748025
> ***
>
> [image: Inactive hide details for David Greenberg ---2016-09-20 上午
> 10:48:43---Specifically, we want to make sure that everyone will]David
> Greenberg ---2016-09-20 上午 10:48:43---Specifically, we want to make sure
> that everyone will be able to access the forms if we put them on
>
> From: David Greenberg  >
> To: user  >
> Cc: "dev@mesos.apache.org
> " <
> dev@mesos.apache.org
> >
> Date: 2016-09-20 上午 10:48
> Subject: Re: 中国的Mesos爱好者们,关于今年在杭州的MesosCon大会
> --
>
>
>
> Specifically, we want to make sure that everyone will be able to access
> the forms if we put them on Google forms/docs.
>
> On Mon, Sep 19, 2016 at 7:04 PM Hechen Gao <*hechen@autodesk.com*
> > wrote:
>
>Hey David,
>
>I would love to contribute to your survey about the MesosCon, please
>count me in.
>
>Best regards,
>*Hechen Gao*
>Senior Software Engineer, Cloud Platforms - Engineering Core Services
>
>*Autodesk, Inc.*
>The Landmark @ One Market, Suite 500
>San Francisco, CA  94105
>*www.autodesk.com* 
>
>
>On Sep 19, 2016, at 5:57 PM, tommy xiao <*xia...@gmail.com*
>  > wrote:
>
>  +1
>
>  在 2016年9月20日 上午8:22,David Greenberg <*dsg123456...@gmail.com*
>  >写道:
>  作为此次MesosCon大会的主席,我希望你们能够在今年杭州的MesosCon大会中听到你们喜欢
>的演讲和分享。所以,我们正在准备发出一个Google
>Forms的调查,这个调查将会帮助我们更好的决定演讲和分享的内容。
>希望你们能够积极参与这个调查。你们的意见对我们很重要。
>
>
>David Greenberg, co-chair of MesosCon 敬上
>
>
>
>  --
>  Deshi Xiao
>  Twitter: xds2000
>  E-mail: xiaods(AT)*gmail.com* 
>
>
>
>
>

-- 
Sent from mobile


Re: Question about the deprecated policy after 1.0

2016-09-14 Thread Yan Xu
To follow up on this, after discussions we (contributors and reviewers of 
MESOS-6110) have agreed to support the existing HTTP health check API in Mesos 
1.x. (See https://reviews.apache.org/r/51803)

The remaining issue we'd like to discuss with the community is, of the things 
in we plan to support in v1 API and hope to deprecate in Mesos 2.0, should we 
slate them for deprecation before we have defined their replacement?

Two contrasting examples In /r/51803: 
`HTTPCheckInfo::type`:
- v1 API: setting `type` is not required.
- Proposal: Deprecate the above in 2.0, i.e., requiring it to be set.
- This is clearly OK because it's clear to the users how to migrate to the new 
API -> just set the `type`.

`HTTPCheckInfo::statuses`:
- v1 API: This field can be set, even though not acted on by Mesos default 
executors, they can be used by custom executors.
- Proposal: Deprecate the above in 2.0, i.e., removing it.
- IMO we cannot slate this API for deprecation in 2.0 right now because there's 
no replacement defined yet. I think ultimately we don't want to be in the 
situation where we remove `statuses` without replacing it with something else. 
If that's the case, I don't think we need to rush to decide that it has to be 
deprecated in 2.0. What if the replacement doesn't make it into 2.0?

Feel free to provide your feedback here or on 
https://reviews.apache.org/r/51803 where more context may help clarify things. 


> On Sep 6, 2016, at 12:05 PM, Yan Xu <xuj...@apple.com> wrote:
> 
> Should we think of this not as "whether this change should be subject to the 
> current policy" but rather "whether this change presents an exception case 
> that shouldn't be subject to the current policy"? 
> 
> We shouldn't make exceptions liberally but given that the community as a 
> whole is new to the 1.0 world, we should allow the policy to evolve into 
> something that works for both the developers and the frameworks 
> writers/users. If we discover exceptions we should amend the policy document.
> 
> The current policy on this is very clear on this (as Silas has cited): "The 
> deprecation clock for vN-1 API will start as soon as we release “N.0.0” 
> version of Mesos. We will strive to give enough time (e.g., 6 months) for 
> frameworks/operators to upgrade to vN API before we stop supporting vN-1 
> API." 
> 
> In this case it translates to "Start the deprecation for this change with 
> Mesos 2.0 and thrive to not break it until 2.0.0 release date + 6 months".
> 
> Can we list the technical challenges that would make adhering to this policy 
> not worthwhile? If this is the case, how should we change policy document 
> accordingly?
> 
> Separately from the details of this change but motivated by it, I think we 
> need to improve the policy document on:
> Clearly state what in the versioned protobufs that don't constitute as the 
> official API. IMO we should give the developers the flexibility to add to the 
> API before the feature is fully developed (and they are not subject to the 
> policy) but in the policy doc we should point out the words to look for for 
> such exceptions, e.g., "Unimplemented. DO NOT USE." or "Experimental. 
> Unstable API."
> Clearly state what to do with deprecations by developers and users. People 
> will likely appreciate early heads-ups so perhaps the developer should be 
> able to start warning people as soon as the change is made. e.g., print a 
> warning message that references a future release: "This API/flag will be 
> deprecated in Mesos 2.0... "
> To help folks discover deprecations we can have a live document that lists 
> the deprecated features by version. Currently the CHANGELOG file only lists 
> deprecation in the next release so there's not a place to put the 
> deprecations for 2.0 when we are only at 1.1.0 (WIP).
> Thoughts?
> 
> Jiang Yan Xu 
> 
> On Tue, Sep 6, 2016 at 6:40 AM, Silas Snider <swsni...@apple.com 
> <mailto:swsni...@apple.com>> wrote:
> Responses inline
> 
> > On Sep 6, 2016, at 1:33 AM, haosdent <haosd...@gmail.com 
> > <mailto:haosd...@gmail.com>> wrote:
> >
> > Hi, Silas. Thanks a lot to help test the health check changes recently.
> >
> > According to my understanding about your email, you mentioned two problems:
> >
> > 1. The bug that broken exists HTTP/command health check caused by r50812 
> > <https://reviews.apache.org/r/50812 <https://reviews.apache.org/r/50812>> 
> > and r50996 <https://reviews.apache.org/r/50996 
> > <https://reviews.apache.org/r/50996>>
> >
> > >It is now true that even with the proposed change (51560), we will still 
> > >get tasks rejected with TASK_ERROR in 1.1.

Re: Question about the deprecated policy after 1.0

2016-09-06 Thread Yan Xu
Should we think of this *not* as "whether this change should be subject to
the current policy" but rather "whether this change presents an *exception
case* that *shouldn't* be subject to the current policy"?

We shouldn't make exceptions liberally but given that the community as a
whole is new to the 1.0 world, we should allow the policy to evolve into
something that works for both the developers and the frameworks
writers/users. If we discover exceptions we should amend the policy
document.

The current policy on this is very clear on this (as Silas has cited): "The
deprecation clock for vN-1 API will start as soon as we release “N.0.0”
version of Mesos. We will strive to give enough time (e.g., 6 months) for
frameworks/operators to upgrade to vN API before we stop supporting vN-1
API."

In this case it translates to "Start the deprecation for this change with
Mesos 2.0 and thrive to not break it until 2.0.0 release date + 6 months".

Can we list the technical challenges that would make adhering to this
policy not worthwhile? If this is the case, how should we change policy
document accordingly?

Separately from the details of this change but motivated by it, I think we
need to improve the policy document on:

   - Clearly state what in the versioned protobufs that don't constitute as
   the official API. IMO we should give the developers the flexibility to add
   to the API before the feature is fully developed (and they are not subject
   to the policy) but in the policy doc we should point out the words to look
   for for such exceptions, e.g., "Unimplemented. DO NOT USE."
   or "Experimental. Unstable API."
   - Clearly state what to do with deprecations by developers and users.
   People will likely appreciate early heads-ups so perhaps the developer
   should be able to start warning people as soon as the change is made. e.g.,
   print a warning message that references a *future *release: "This
   API/flag will be deprecated in Mesos 2.0... "
   - To help folks discover deprecations we can have a live document that
   lists the deprecated features *by version*. Currently the CHANGELOG file
   only lists deprecation in the next release so there's not a place to put
   the deprecations for 2.0 when we are only at 1.1.0 (WIP).

Thoughts?

Jiang Yan Xu 

On Tue, Sep 6, 2016 at 6:40 AM, Silas Snider <swsni...@apple.com> wrote:

> Responses inline
>
> > On Sep 6, 2016, at 1:33 AM, haosdent <haosd...@gmail.com> wrote:
> >
> > Hi, Silas. Thanks a lot to help test the health check changes recently.
> >
> > According to my understanding about your email, you mentioned two
> problems:
> >
> > 1. The bug that broken exists HTTP/command health check caused by r50812
> <https://reviews.apache.org/r/50812> and r50996 <
> https://reviews.apache.org/r/50996>
> >
> > >It is now true that even with the proposed change (51560), we will
> still get tasks rejected with TASK_ERROR in 1.1.0, despite the same exact
> code working in 1.0.0.
> > >Even in the case of the command health checks, which are once again
> supported in 51560, we now get deprecation warnings, suggesting that mesos
> will again break us in 1.4.
> >
> > As you mentioned, this is a bug and we definitely should fix before
> release 1.1.0.
> > I have updated r51560 <https://reviews.apache.org/r/51560> yesterday
> and verify it fix the problem via r51635. As you see in
> > the r51560 <https://reviews.apache.org/r/51560>, we make sure the
> protobuf compatible again and didn't lose any
> > fields. Would you help to double check if it fixes your problem when you
> free?
> > It would be highly appreciated that if you could help to verify it.
> >
> > After this bug fix, we could ensure all tasks with HTTP/command health
> check are not when upgrading to 1.1.0.
> >
>
> I see those changes now (I’m very very bad at the review board UI, so I’m
> sorry if it was always there and I missed it somehow).
>
> > 2. Should we make the `HealthCheck::type` required after v2 ?
> >
> > To be honest, I think 6 months should be enough and it also should be
> changed in
> > v1 because it is a minor change and we didn't make it `required` in
> protobuf
> > message level. We still keeping it `option` in protobuf message
> definition and
> > add a check about it in Mesos code.
> > But your concerns make sense as well, so let's see what other
> users/developers say to
> > see if we could make an agreement on this.
> >
>
> This is an important point. It doesn’t make sense to me that the
> compatibility policy is talking about only whether a protobuf field is
> optional or required — it seems to me that any change that takes

Re: [VOTE] Release Apache Mesos 1.0.1 (rc1)

2016-08-15 Thread Yan Xu
+1 (binding)

Ran make check on macOS 10.11.5 and clang-703.0.31.
Additionally, although not rigorous enough as a proof, we deployed a version of 
head (> 1.0) that includes fixes in this release and it's working fine (checked 
that webUI redirect worked and our test workloads ran).

Yan

> On Aug 13, 2016, at 6:38 AM, haosdent  wrote:
> 
> +1 (non-binding)
> 
> Run `sudo make check` on CentOS 7.2 and Ubuntu 14.04
> 
> On Sat, Aug 13, 2016 at 6:07 AM, Kapil Arya  > wrote:
> +1 (binding)
> 
> You can find the rpm/deb packages here:
>   http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.1-rc1 
> 
> 
> The following docker tags (built off of ubuntu 14.04) are also available:
> mesosphere/mesos:1.0.1-rc1
> mesosphere/mesos-master:1.0.1-rc1
> mesosphere/mesos-slave:1.0.1-rc1
> 
> Kapil
> 
> On Fri, Aug 12, 2016 at 4:39 PM, Alex Rukletsov  > wrote:
> +1 (binding)
> 
> make check on Mac OS 10.11.6 with apple clang-703.0.31.
> 
> DockerFetcherPluginTest.INTERNET_CURL_FetchImage is flaky (MESOS-4570), but
> this does not seem to be a regression or a blocker.
> 
> On Fri, Aug 12, 2016 at 10:30 PM, Radoslaw Gruchalski  >
> wrote:
> 
> > I am trying to build Mesos 1.0.1 for Centos 7 in a Docker container but
> > I'm hitting this: https://issues.apache.org/jira/browse/MESOS-5925 
> > .
> >
> > Kind regards,
> >
> > Radek Gruchalski
> > ra...@gruchalski.com 
> > +4917685656526 
> >
> > *Confidentiality:*
> > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, nor
> > must you copy or show it to anyone; please delete/destroy and inform the
> > sender immediately.
> >
> > On Thu, Aug 11, 2016 at 2:32 AM, Vinod Kone  > > wrote:
> >
> >> Hi all,
> >>
> >>
> >> Please vote on releasing the following candidate as Apache Mesos 1.0.1.
> >>
> >>
> >> The CHANGELOG for the release is available at:
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p 
> >> 
> >> lain;f=CHANGELOG;hb=1.0.1-rc1
> >>
> >> 
> >> 
> >>
> >>
> >> The candidate for Mesos 1.0.1 release is available at:
> >>
> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos-1.0.1.tar.gz 
> >> 
> >>
> >>
> >> The tag to be voted on is 1.0.1-rc1:
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.1-rc1 
> >> 
> >>
> >>
> >> The MD5 checksum of the tarball can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos 
> >> 
> >> -1.0.1.tar.gz.md5
> >>
> >>
> >> The signature of the tarball can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos 
> >> 
> >> -1.0.1.tar.gz.asc
> >>
> >>
> >> The PGP key used to sign the release is here:
> >>
> >> https://dist.apache.org/repos/dist/release/mesos/KEYS 
> >> 
> >>
> >>
> >> The JAR is up in Maven in a staging repository here:
> >>
> >> https://repository.apache.org/content/repositories/orgapachemesos-1155 
> >> 
> >>
> >>
> >> Please vote on releasing this package as Apache Mesos 1.0.1!
> >>
> >>
> >> The vote is open until Mon Aug 15 17:29:33 PDT 2016 and passes if a
> >> majority of at least 3 +1 PMC votes are cast.
> >>
> >>
> >> [ ] +1 Release this package as Apache Mesos 1.0.1
> >>
> >> [ ] -1 Do not release this package because ...
> >>
> >>
> >> Thanks,
> >>
> >
> >
> 
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang



Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Yan Xu
I am OK with withdrawing -1 but I feel it's more prudent to go with 1b) or 2c) 
or the reason Jie mentioned. If we go with 1a) let's make sure we call out the 
known issue.

> On Jul 26, 2016, at 7:09 PM, Adam Bordelon <a...@mesosphere.io> wrote:
> 
> I don't like the idea of 2) bypassing the 72 hour voting period, so I'd
> suggest either we:
> 1a) ask Yan to cancel his -1 so we can cut 1.0.0 today and blog about it,
> then cut 1.0.1 soon after with this and other fixes.
> 1b) cut an rc5 now and the blogs posted tomorrow mention the rc rather than
> an official release. After 72hrs we can hopefully call rc5 the official 1.0
> (or maybe more blockers come up). We could have more blogs/press then about
> the official 1.0 release.
> 1c) push the press releases and announcements out a few more days. Not sure
> if this is possible at this point?
> I'd prefer 1c) if possible, or a/b otherwise.
> 
> On Tue, Jul 26, 2016 at 6:52 PM, Benjamin Hindman <
> benjamin.hind...@gmail.com> wrote:
> 
>> I agree with Vinod that we should go with option 1. I think redirect is a
>> valuable feature but it's not imperative for the operation of the cluster.
>> 
>> On Tue, Jul 26, 2016 at 5:39 PM, Vinod Kone <vinodk...@apache.org> wrote:
>> 
>>> We've the ASF press wire and other community blog posts lined up to be
>>> posted tomorrow, so it will be really hard to tell all those folks to
>>> postpone it this late. I've a couple options that I want to propose
>>> 
>>> 1) Fix the webui bug in 1.0.1 which we will cut as soon as we fix this
>>> bug.
>>> 
>>> 2) Try to fix the bug in the next couple hours, cut rc5, and vote it in
>>> tonight without doing the typical 72 hour voting period.
>>> 
>>> 
>>> I'm personally leaning towards 1) given the timing and the nature of the
>>> bug. What do others think? PMC?
>>> 
>>> On Tue, Jul 26, 2016 at 4:08 PM, Yan Xu <xuj...@apple.com> wrote:
>>> 
>>>> I don't mind if it's shepherd by folks with more front-end expertise.
>>>> Actually my original suggested solution on
>>>> https://issues.apache.org/jira/browse/MESOS-5911 seemed incorrect.
>>>> Let's discuss the actual fix on the ticket, I feel that a short term fix
>>>> shouldn't be more than a few lines to unblock the release.
>>>> 
>>>> On Jul 26, 2016, at 3:26 PM, Jie Yu <yujie@gmail.com> wrote:
>>>> 
>>>> Yan, are you going to shepherd the fix for this one? If yes, when do you
>>>> think it can be done?
>>>> 
>>>> - Jie
>>>> 
>>>> On Tue, Jul 26, 2016 at 3:05 PM, Yan Xu <xuj...@apple.com> wrote:
>>>> 
>>>> -1
>>>> 
>>>> We tested it in our testing environment but webUI redirect didn't work.
>>>> We
>>>> filed: https://issues.apache.org/jira/browse/MESOS-5911
>>>> 
>>>> Given that webUI is the portal for Mesos clusters I feel that we should
>>>> at
>>>> least have a basic fix (more context in the JIRA) before release 1.0.
>>>> Thoughts?
>>>> 
>>>> On Jul 26, 2016, at 2:52 PM, Kapil Arya <ka...@mesosphere.io> wrote:
>>>> 
>>>> +1 (binding)
>>>> 
>>>> OpenSUSE Tumbleweed:
>>>>   ./configure --disable-java --disable-python && make check
>>>> 
>>>> On Tue, Jul 26, 2016 at 4:44 PM, Zhitao Li <zhitaoli...@gmail.com>
>>>> wrote:
>>>> 
>>>> Also tested:
>>>> 
>>>> make check passes on OS X
>>>> 
>>>> One thing I found when testing RC4 debian with Aurora integration test
>>>> suite (on its master) is that scheduler previously expected GPU resource
>>>> will not receive offers without new `GPU_RESOURCES` capability even it's
>>>> the only scheduler.
>>>> 
>>>> Given that GPU support is not technically released until 1.0, I don't
>>>> consider this is a blocker to me, but it might be surprising to people
>>>> already testing GPU support.
>>>> 
>>>> On Tue, Jul 26, 2016 at 12:45 PM, Benjamin Mahler <bmah...@apache.org>
>>>> wrote:
>>>> 
>>>> +1 (binding)
>>>> 
>>>> OS X 10.11.6
>>>> ./configure --disable-python --disable-java
>>>> make check
>>>> 
>>>> On Tue, Jul 26, 2016 at 10:24 AM, Greg Mann <g...@mesosphere.io> wrote:
>>>> 
>>>> +

Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Yan Xu
I don't mind if it's shepherd by folks with more front-end expertise. Actually 
my original suggested solution on 
https://issues.apache.org/jira/browse/MESOS-5911 
<https://issues.apache.org/jira/browse/MESOS-5911> seemed incorrect. Let's 
discuss the actual fix on the ticket, I feel that a short term fix shouldn't be 
more than a few lines to unblock the release.

> On Jul 26, 2016, at 3:26 PM, Jie Yu <yujie@gmail.com> wrote:
> 
> Yan, are you going to shepherd the fix for this one? If yes, when do you
> think it can be done?
> 
> - Jie
> 
> On Tue, Jul 26, 2016 at 3:05 PM, Yan Xu <xuj...@apple.com> wrote:
> 
>> -1
>> 
>> We tested it in our testing environment but webUI redirect didn't work. We
>> filed: https://issues.apache.org/jira/browse/MESOS-5911
>> 
>> Given that webUI is the portal for Mesos clusters I feel that we should at
>> least have a basic fix (more context in the JIRA) before release 1.0.
>> Thoughts?
>> 
>> On Jul 26, 2016, at 2:52 PM, Kapil Arya <ka...@mesosphere.io> wrote:
>> 
>> +1 (binding)
>> 
>> OpenSUSE Tumbleweed:
>>./configure --disable-java --disable-python && make check
>> 
>> On Tue, Jul 26, 2016 at 4:44 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:
>> 
>>> Also tested:
>>> 
>>> make check passes on OS X
>>> 
>>> One thing I found when testing RC4 debian with Aurora integration test
>>> suite (on its master) is that scheduler previously expected GPU resource
>>> will not receive offers without new `GPU_RESOURCES` capability even it's
>>> the only scheduler.
>>> 
>>> Given that GPU support is not technically released until 1.0, I don't
>>> consider this is a blocker to me, but it might be surprising to people
>>> already testing GPU support.
>>> 
>>> On Tue, Jul 26, 2016 at 12:45 PM, Benjamin Mahler <bmah...@apache.org>
>>> wrote:
>>> 
>>>> +1 (binding)
>>>> 
>>>> OS X 10.11.6
>>>> ./configure --disable-python --disable-java
>>>> make check
>>>> 
>>>> On Tue, Jul 26, 2016 at 10:24 AM, Greg Mann <g...@mesosphere.io> wrote:
>>>> 
>>>>> +1 (non-binding)
>>>>> 
>>>>> * Ran `sudo make distcheck` successfully on CentOS 7.1 with only one
>>>> test
>>>>> failure: ExamplesTest.PythonFramework fails for me the first time it's
>>>>> executed as part of the whole test suite, and then succeeds on
>>>> subsequent
>>>>> executions. I'm investigating further, and will file a ticket if
>>>> necessary.
>>>>> * Ran the upgrade testing script successfully from 0.28.2 -> 1.0.0-rc4
>>>>> 
>>>>> Cheers,
>>>>> Greg
>>>>> 
>>>>> On Tue, Jul 26, 2016 at 1:58 AM, haosdent <haosd...@gmail.com> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> * make check in CentOS 7.2
>>>>>> * make check in Ubuntu 14.04
>>>>>> * test upgrade from 0.28.2 to 1.0.0-rc4
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 26, 2016 at 8:33 AM, Kapil Arya <ka...@mesosphere.io>
>>>> wrote:
>>>>>> 
>>>>>>> One can find the deb/rpm packages here:
>>>>>>> 
>>>>>> http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.0-rc4
>>>>>>> 
>>>>>>> And here are the corresponding docker images based off of Ubuntu
>>>> 14.04:
>>>>>>>mesosphere/mesos:1.0.0-rc4
>>>>>>>mesosphere/mesos-master:1.0.0-rc4
>>>>>>>mesosphere/mesos-slave:1.0.0-rc4
>>>>>>> 
>>>>>>> Kapil
>>>>>>> 
>>>>>>> On Sat, Jul 23, 2016 at 1:40 AM, Vinod Kone <vinodk...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Please vote on releasing the following candidate as Apache Mesos
>>>>>> 1.0.0.
>>>>>>>> 
>>>>>>>> *The vote is open until Tue Jul 25 11:00:00 PDT 2016 and passes
>>>> if a
>>>>>>>> majority of at least 3 +1 PMC votes are cast.*
>>>>>>>> 
>>>>>>>> 1.0.0 includes 

Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Yan Xu
-1

We tested it in our testing environment but webUI redirect didn't work. We 
filed: https://issues.apache.org/jira/browse/MESOS-5911 


Given that webUI is the portal for Mesos clusters I feel that we should at 
least have a basic fix (more context in the JIRA) before release 1.0. Thoughts?

> On Jul 26, 2016, at 2:52 PM, Kapil Arya  wrote:
> 
> +1 (binding)
> 
> OpenSUSE Tumbleweed:
> ./configure --disable-java --disable-python && make check
> 
> On Tue, Jul 26, 2016 at 4:44 PM, Zhitao Li  > wrote:
> Also tested:
> 
> make check passes on OS X
> 
> One thing I found when testing RC4 debian with Aurora integration test suite 
> (on its master) is that scheduler previously expected GPU resource will not 
> receive offers without new `GPU_RESOURCES` capability even it's the only 
> scheduler.
> 
> Given that GPU support is not technically released until 1.0, I don't 
> consider this is a blocker to me, but it might be surprising to people 
> already testing GPU support.
> 
> On Tue, Jul 26, 2016 at 12:45 PM, Benjamin Mahler  > wrote:
> +1 (binding)
> 
> OS X 10.11.6
> ./configure --disable-python --disable-java
> make check
> 
> On Tue, Jul 26, 2016 at 10:24 AM, Greg Mann  > wrote:
> 
> > +1 (non-binding)
> >
> > * Ran `sudo make distcheck` successfully on CentOS 7.1 with only one test
> > failure: ExamplesTest.PythonFramework fails for me the first time it's
> > executed as part of the whole test suite, and then succeeds on subsequent
> > executions. I'm investigating further, and will file a ticket if necessary.
> > * Ran the upgrade testing script successfully from 0.28.2 -> 1.0.0-rc4
> >
> > Cheers,
> > Greg
> >
> > On Tue, Jul 26, 2016 at 1:58 AM, haosdent  > > wrote:
> >
> >> +1
> >>
> >> * make check in CentOS 7.2
> >> * make check in Ubuntu 14.04
> >> * test upgrade from 0.28.2 to 1.0.0-rc4
> >>
> >>
> >> On Tue, Jul 26, 2016 at 8:33 AM, Kapil Arya  >> > wrote:
> >>
> >> > One can find the deb/rpm packages here:
> >> >
> >> http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.0-rc4 
> >> 
> >> >
> >> > And here are the corresponding docker images based off of Ubuntu 14.04:
> >> > mesosphere/mesos:1.0.0-rc4
> >> > mesosphere/mesos-master:1.0.0-rc4
> >> > mesosphere/mesos-slave:1.0.0-rc4
> >> >
> >> > Kapil
> >> >
> >> > On Sat, Jul 23, 2016 at 1:40 AM, Vinod Kone  >> > >
> >> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > >
> >> > > Please vote on releasing the following candidate as Apache Mesos
> >> 1.0.0.
> >> > >
> >> > > *The vote is open until Tue Jul 25 11:00:00 PDT 2016 and passes if a
> >> > > majority of at least 3 +1 PMC votes are cast.*
> >> > >
> >> > > 1.0.0 includes the following:
> >> > >
> >> > >
> >> > >
> >> >
> >> 
> >> > >
> >> > >   * Scheduler and Executor v1 HTTP APIs are now considered stable.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4791] - **Experimental** support for v1 Master and Agent
> >> APIs.
> >> > > These
> >> > >
> >> > > APIs let operators and services (monitoring, load balancers) send
> >> > > HTTP
> >> > >
> >> > > requests to '/api/v1' endpoint on master or agent. See
> >> > >
> >> > >
> >> > > `docs/operator-http-api.md ` for 
> >> > > details.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4828] - **Experimental** support for a new `disk/xfs'
> >> isolator
> >> > >
> >> > >
> >> > > has been added to isolate disk resources more efficiently. Please
> >> > > refer to
> >> > >
> >> > > docs/mesos-containerizer.md  for 
> >> > > more details.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4355] - **Experimental** support for Docker volume plugin.
> >> We
> >> > > added a
> >> > >
> >> > > new isolator 'docker/volume' which allows users to use external
> >> > > volumes in
> >> > >
> >> > > Mesos containerizer. Currently, the isolator interacts with the
> >> > > Docker
> >> > >
> >> > > volume plugins using a tool called 'dvdcli'. By speaking the
> >> Docker
> >> > > volume
> >> > >
> >> > > plugin API, most of the Docker volume plugins are supported.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4641] - **Experimental** A new network isolator, the
> >> > >
> >> > >
> >> > > `network/cni` isolator, has been introduced in the
> >> > > `MesosContainerizer`. The
> >> > >
> >> > > `network/cni` isolator implements the Container Network 

Re: getting added to contributors

2016-07-12 Thread Yan Xu
I don't think it's a requirement but I agree that let's also make it
explicit (Vinod already made it clear in the initial announcement but we
could also be explicit when asking individual contributors whether they'd
like to be added) so it's not perceived as one.

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Tue, Jul 12, 2016 at 1:27 AM, Neil Conway <neil.con...@gmail.com> wrote:

> Do we really want everyone who wants to be assigned a JIRA to also add
> themselves to the YAML file? To me, this adds another step to a
> contribution process that probably has too many steps already.
>
> Neil
>
> On Mon, Jul 11, 2016 at 7:31 PM, Vinod Kone <vinodk...@apache.org> wrote:
> > Welcome to the community!
> >
> > Mind sending a PR to add yourself to
> > https://github.com/apache/mesos/blob/master/docs/contributors.yaml ?
> >
> > On Mon, Jul 11, 2016 at 10:28 AM, Lawrence Wu
> <lawren...@twitter.com.invalid
> >> wrote:
> >
> >> Hi, I will be working on
> https://issues.apache.org/jira/browse/MESOS-5376.
> >> idownes already added me as a contributor but I'm sending this email
> just
> >> for reference.
> >>
>


Re: Persistent volume ownership issue

2016-06-21 Thread Yan Xu
+1 if no one is relying on the old behavior.

Jiang Yan Xu 

On Mon, Jun 20, 2016 at 11:25 PM, Jie Yu <yujie@gmail.com> wrote:

> Hi folks,
>
> Currently, the ownership of the persistent volumes are set to be the same
> as the sandbox. In the implementation, we call `chown -R` on the persistent
> volume to match that of the sandbox each time before we mount it into the
> container.
>
> Recently, we realized that this behavior is not ideal. Especially, if a
> task created some files in the persistent volume, and the owner of those
> file might be different than the task's user. For instance, a task is
> running under root and it creates some database files under user 'database'
> and launch the database process under user 'database'. When the database
> process is restarted by the scheduler, the current behavior is that the
> we'll do a 'chown -R root.root' on the persistent volume, causes database
> files to be chown to 'root'.
>
> The true fix of this problem is to allow frameworks to explicit specify
> owner of persistent volumes during creation. THis is captured in this
> ticket:
> https://issues.apache.org/jira/browse/MESOS-4893
>
> In the short-term (for 1.0), I propose that, instead of doing a recursive
> chown, we do a non-recursive chown. That'll allow the new task to at least
> create new files under the persistent volume, but do not change ownership
> of files created by previous tasks. It should be a very simple fix which we
> can ship in 1.0. We'll ship MESOS-4893 after 1.0. What do you guys think?
>
> Thanks,
> - Jie
>


Re: New external dependency

2016-06-20 Thread Yan Xu
It's not immediately clear form the ticket why the change from optional
dependency to required dependency though? Could you summarize?

On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues  wrote:

> Thanks Zhitao,
>
> I just pushed out a review for upgrades.md and added you as a reviewer.
>
> The new dependence was added in the JIRA that haosdent linked, but the
> actual reason for adding the dependence is more related to:
> https://issues.apache.org/jira/browse/MESOS-5401
>
> On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
> > The related issue is Change build to always enable Nvidia GPU support for
> > Linux
> > Last time my local build break before Kevin send out the email, and then
> > find this change.
> >
> > On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
> wrote:
> >>
> >> Hi Kevin,
> >>
> >> Thanks for letting us know. It seems like this is not called out in
> >> upgrades.md, so can you please document this additional dependency
> there?
> >>
> >> Also, can you include the link to the JIRA or patch requiring this
> >> dependency so we can have some contexts?
> >>
> >> Thanks!
> >>
> >> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues 
> wrote:
> >>
> >> > Hello all,
> >> >
> >> > Just an FYI that the newest libmesos now has an external dependence on
> >> > libelf on Linux. This dependence can be installed via the following
> >> > packages:
> >> >
> >> > CentOS 6/7: yum install elfutils-libelf.x86_64
> >> > Ubuntu14.04:   apt-get install libelf1
> >> >
> >> > Alternatively you can install from source:
> >> > https://directory.fsf.org/wiki/Libelf
> >> >
> >> > For developers, you will also need to install the libelf headers in
> >> > order to build master. This dependency can be installed via:
> >> >
> >> > CentOS: elfutils-libelf-devel.x86_64
> >> > Ubuntu: libelf-dev
> >> >
> >> > Alternatively, you can install from source:
> >> > https://directory.fsf.org/wiki/Libelf
> >> >
> >> > The getting started guide and the support/docker_build.sh scripts have
> >> > been updated appropriately, but you may need to update your local
> >> > environment if you don't yet have these packages installed.
> >> >
> >> > --
> >> > ~Kevin
> >> >
> >>
> >>
> >>
> >> --
> >> Cheers,
> >>
> >> Zhitao Li
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>
>
>
> --
> ~Kevin
>


Re: Slack as the canonical chat channel

2016-06-17 Thread Yan Xu
+1 Slack.

On Friday, June 17, 2016, José Guilherme Vanz 
wrote:

> +1 Slack
>
> On Fri, 17 Jun 2016 at 17:05 Joris Van Remoortere 
> wrote:
>
> > +1 Slack
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Fri, Jun 17, 2016 at 10:04 PM, Vinod Kone 
> wrote:
> >
> > > Looks like people have jumped the gun here before I sent the email :)
> > >
> > > Here is the context. During the community sync we discussed about using
> > > *Slack* or *HipChat* as our official chat channel instead of our
> current
> > > #mesos IRC channel on freenode.
> > >
> > > The main reasons for using Slack/Hipchat are
> > >
> > >- In-client chat history
> > >- Discoverability of work group specific channels
> > >- Email notifications when offline
> > >- Modern UX and clients
> > >
> > > During the sync most people preferred the move to *Slack*. I wanted to
> > get
> > > a sense from other community members as well through this email. Please
> > let
> > > us know what you think.
> > >
> > > Note that even if we move to Slack, we will make sure people can still
> > > connect using IRC clients and that the chat history is publicly
> available
> > > (per ASF guidelines). During the transition period, we might mirror
> > > messages from Slack channel to IRC and vice-versa.
> > >
> > > Thoughts?
> > >
> > > On Fri, Jun 17, 2016 at 8:52 AM, Vinit Mahedia  >
> > > wrote:
> > >
> > > > +1 Slack.
> > > >
> > > > On Fri, Jun 17, 2016 at 12:59 AM, Jay JN Guo 
> > > > wrote:
> > > >
> > > > > +1 Slack!
> > > > >
> > > > > /J
> > > > >
> > > > > Vaibhav Khanduja  wrote on 06/16/2016
> > > > 22:26:27:
> > > > >
> > > > > > From: Vaibhav Khanduja 
> > > > > > To: dev@mesos.apache.org
> > > > > > Date: 06/16/2016 22:26
> > > > > > Subject: Re: Notification: Community Meeting @ Thu Jun 16, 2016
> 3pm
> > > > > > - 4pm (Apache Mesos)
> > > > > >
> > > > > > + 1 slack
> > > > > >
> > > > > > Sent from my iPhone. Please excuse for typos and brevity of this
> > > > message.
> > > > > >
> > > > > > > On Jun 16, 2016, at 6:46 PM, haosdent 
> > wrote:
> > > > > > >
> > > > > > > +1 For Slack.
> > > > > > >
> > > > > > >> On Fri, Jun 17, 2016 at 7:33 AM, Greg Mann <
> g...@mesosphere.io>
> > > > > wrote:
> > > > > > >>
> > > > > > >> Hello all,
> > > > > > >> Here are the notes from our community sync meeting this
> > afternoon:
> > > > > > >>
> > > > > > >> Attendees:
> > > > > > >>
> > > > > > >> Mesosphere: Joris, Greg, Haris, Artem, Joseph, Kapil, Anand,
> > > > Gilbert,
> > > > > > >> Harpreet, Kevin, Vinod, Jie, Joerg, MPark
> > > > > > >>
> > > > > > >> Uber: Zhitao Li
> > > > > > >>
> > > > > > >> Agenda/Note:
> > > > > > >>
> > > > > > >>   -
> > > > > > >>
> > > > > > >>   Reviewing the list of maintainers on
> > > > > > >>   http://mesos.apache.org/documentation/latest/committers/
> > > > > > >>   -
> > > > > > >>
> > > > > > >>  Add components for
> > > > > > >>  -
> > > > > > >>
> > > > > > >> Documentation (docs/*)
> > > > > > >> -
> > > > > > >>
> > > > > > >> Windows (*windows*)
> > > > > > >> -
> > > > > > >>
> > > > > > >> C++ standards (docs/c++-style-guide.md)
> > > > > > >> -
> > > > > > >>
> > > > > > >> HTTP API (http.*)
> > > > > > >> -
> > > > > > >>
> > > > > > >> Persistence
> > > > > > >> -
> > > > > > >>
> > > > > > >> Test infrastructure (src/tests/*)
> > > > > > >> -
> > > > > > >>
> > > > > > >> Build-related
> > > > > > >> -
> > > > > > >>
> > > > > > >>Autotools, CMake
> > > > > > >>-
> > > > > > >>
> > > > > > >> Subdivide Stout
> > > > > > >> -
> > > > > > >>
> > > > > > >> Subdivide Libprocess (3rdparty/libprocess/*)
> > > > > > >> -
> > > > > > >>
> > > > > > >> Subdivide Container-related things
> > > > (src/slave/containerizer/*)
> > > > > > >> -
> > > > > > >>
> > > > > > >>Networking
> > > > > > >>-
> > > > > > >>
> > > > > > >>Storage
> > > > > > >>-
> > > > > > >>
> > > > > > >> Resource allocation/Scheduler
> > > > > > >> -
> > > > > > >>
> > > > > > >> Development tools
> > > > > > >> -
> > > > > > >>
> > > > > > >>  Think about some tooling to facilitate this
> > > > > > >>  -
> > > > > > >>
> > > > > > >> http://lxr.free-electrons.com/source/MAINTAINERS
> > > > > > >> -
> > > > > > >>
> > > > > > >>  AI: Ben Mahler will send out an email to the mailing list
> > > about
> > > > > > >>  maintainers
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>   -
> > > > > > >>
> > > > > > >>   Reviewing and updating the roadmap.
> > > > > > >>
> > > > > > >> 

Re: [VOTE] Release Apache Mesos 1.0.0 (rc1)

2016-06-08 Thread Yan Xu
What do you think about this Vinod?

I think we can remove this major version checking altogether.
Backwards-incompatible changes would warrant a major version bump but not
vise versa. Plus it's more standard to express and check dependency
versions outside of the code but through package metadata.

Yan

On Mon, Jun 6, 2016 at 7:32 PM, Robert Lacroix  wrote:

> Hi Vinod,
>
> In convert.cpp
>  we
> compare the major versions of the native library and the jar. This makes
> upgrading frameworks unnecessarily hard because you would have to deploy
> Mesos and frameworks in lockstep.
>
> Non-binding -1 , as this check isn’t strictly useful - especially given
> this is probably the last major upgrade where libmesos is even relevant.
>
>  Robert
>
> On Jun 1, 2016, at 12:38 AM, Vinod Kone  wrote:
>
> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.0.0.
>
>
> NOTE: The voting period for this release is 3 weeks. Also, we are willing
> to make API changes before the final release. So please test it thoroughly.
>
>
> 1.0.0 includes the following features:
>
>
> 
>
>  * Scheduler and Executor v1 HTTP APIs are now considered stable.
>
>
>
>
>
>  * [MESOS-4791] - **Experimental** support for v1 Master and Agent APIs.
> These
>
>APIs let operators and services (monitoring, load balancers) send HTTP
>
>
>requests to '/api/v1' endpoint on master or agent. These APIs look
> similar
>
>to the v1 Scheduler and Executor APIs.
>
>
>
>
>
>  * [MESOS-4828] - **Experimental** support for a new `disk/xfs' isolator
>
>
>has been added to isolate disk resources more efficiently. Please refer
> to
>
>docs/mesos-containerizer.md for more details.
>
>
>
>
>
>  * [MESOS-4355] - **Experimental** support for Docker volume plugin. We
> added a
>
>new isolator 'docker/volume' which allows users to use external volumes
> in
>
>Mesos containerizer. Currently, the isolator interacts with the Docker
>
>
>volume plugins using a tool called 'dvdcli'. By speaking the Docker
> volume
>
>plugin API, most of the Docker volume plugins are supported.
>
>
>
>
>
>  * [MESOS-4641] - **Experimental** A new network isolator, the
>
>
>`network/cni` isolator, has been introduced in the
> `MesosContainerizer`. The
>
>`network/cni` isolator implements the Container Network Interface (CNI)
>
>
>specification proposed by CoreOS.  With CNI the `network/cni` isolator
> is
>
>able to allocate a network namespace to Mesos containers and attach the
>
>
>container to different types of IP networks by invoking network drivers
>
>
>called CNI plugins.
>
>
>
>
>
>  * [MESOS-2948, MESOS-5403] - The authorizer interface has been refactored
> in
>
>order to decouple the ACLs definition language from the interface.
>
>
>It additionally includes the option of retrieving `ObjectApprover`. An
>
>
>`ObjectApprover` can be used to synchronously check authorizations for
> a
>
>given object and is hence useful when authorizing a large number of
> objects
>
>and/or large objects (which need to be copied using request based
>
>
>authorization). NOTE: This is a **breaking change** for authorizer
> modules.
>
>
>
>
>  * [MESOS-4931] - Authorization based HTTP endpoint filtering enables
> operators
>
>to restrict what part of the cluster state a user is authorized to see.
>
>
>Consider for example the `/state` master endpoint: an operator can now
>
>
>authorize users to only see a subset of the running frameworks, tasks,
> or
>
>executors.
>
>
>
>
>
>  * [MESOS-4909] - Tasks can now specify a kill policy. They are
> best-effort,
>
>because machine failures or forcible terminations may occur. Currently,
> the
>
>only available kill policy is how long to wait between graceful and
> forcible
>
>task kill. In the future, more policies may be available (e.g. hitting
> an
>
>HTTP endpoint, running a command, etc). Note that it is the executor's
>
>
>responsibility to enforce kill policies. For executor-less
> command-based
>
>tasks, the kill is performed via sending a signal to the task process:
>
>
>SIGTERM for the graceful kill and SIGKILL for the forcible kill. For
> docker
>
>executor-less tasks the grace period is passed to 'docker stop --time'.
> This
>
>feature supersedes the '--docker_stop_timeout', which is now
> deprecated.
>
>
>
>
>  * [MESOS-4908] - The task kill policy defined within 'TaskInfo' can now
> be
>
>overridden when the scheduler kills the task. This can be used by
> schedulers
>
>to forcefully kill a task which is already being killed, e.g. if
> something
>
>went wrong during a graceful kill and a forcible kill is desired. Note
> that
>
>it is the executor's responsibility to honor the
> 

Re: [VOTE] Release Apache Mesos 1.0.0 (rc1)

2016-06-08 Thread Yan Xu
Awesome, I'll send a patch.

On Wed, Jun 8, 2016 at 9:25 AM, Vinod Kone <vinodk...@apache.org> wrote:

> Thanks for catching this bug. We need to fix convert.cpp to make
> scheduler/executor upgrades to 1.0 easier. @Robert/@Yan: Can one of you
> file a JIRA issue for this? A patch would be even better :)
>
> I'm cancelling the vote for 1.0-RC1 now. We'll cut RC2 after we fix this
> and any other issues that cropped up so far.
>
>
> On Tue, Jun 7, 2016 at 6:13 PM, Yan Xu <xuj...@apple.com> wrote:
>
>> What do you think about this Vinod?
>>
>> I think we can remove this major version checking altogether.
>> Backwards-incompatible changes would warrant a major version bump but not
>> vise versa. Plus it's more standard to express and check dependency
>> versions outside of the code but through package metadata.
>>
>> Yan
>>
>> On Mon, Jun 6, 2016 at 7:32 PM, Robert Lacroix <rlacr...@apple.com>
>> wrote:
>>
>>> Hi Vinod,
>>>
>>> In convert.cpp
>>> <https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153> 
>>> we
>>> compare the major versions of the native library and the jar. This makes
>>> upgrading frameworks unnecessarily hard because you would have to deploy
>>> Mesos and frameworks in lockstep.
>>>
>>> Non-binding -1 , as this check isn’t strictly useful - especially
>>> given this is probably the last major upgrade where libmesos is even
>>> relevant.
>>>
>>>  Robert
>>>
>>> On Jun 1, 2016, at 12:38 AM, Vinod Kone <vinodk...@apache.org> wrote:
>>>
>>> Hi all,
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 1.0.0.
>>>
>>>
>>> NOTE: The voting period for this release is 3 weeks. Also, we are willing
>>> to make API changes before the final release. So please test it
>>> thoroughly.
>>>
>>>
>>> 1.0.0 includes the following features:
>>>
>>>
>>> 
>>>
>>>  * Scheduler and Executor v1 HTTP APIs are now considered stable.
>>>
>>>
>>>
>>>
>>>
>>>  * [MESOS-4791] - **Experimental** support for v1 Master and Agent APIs.
>>> These
>>>
>>>APIs let operators and services (monitoring, load balancers) send HTTP
>>>
>>>
>>>requests to '/api/v1' endpoint on master or agent. These APIs look
>>> similar
>>>
>>>to the v1 Scheduler and Executor APIs.
>>>
>>>
>>>
>>>
>>>
>>>  * [MESOS-4828] - **Experimental** support for a new `disk/xfs' isolator
>>>
>>>
>>>has been added to isolate disk resources more efficiently. Please
>>> refer
>>> to
>>>
>>>docs/mesos-containerizer.md for more details.
>>>
>>>
>>>
>>>
>>>
>>>  * [MESOS-4355] - **Experimental** support for Docker volume plugin. We
>>> added a
>>>
>>>new isolator 'docker/volume' which allows users to use external
>>> volumes
>>> in
>>>
>>>Mesos containerizer. Currently, the isolator interacts with the Docker
>>>
>>>
>>>volume plugins using a tool called 'dvdcli'. By speaking the Docker
>>> volume
>>>
>>>plugin API, most of the Docker volume plugins are supported.
>>>
>>>
>>>
>>>
>>>
>>>  * [MESOS-4641] - **Experimental** A new network isolator, the
>>>
>>>
>>>`network/cni` isolator, has been introduced in the
>>> `MesosContainerizer`. The
>>>
>>>`network/cni` isolator implements the Container Network Interface
>>> (CNI)
>>>
>>>
>>>specification proposed by CoreOS.  With CNI the `network/cni` isolator
>>> is
>>>
>>>able to allocate a network namespace to Mesos containers and attach
>>> the
>>>
>>>
>>>container to different types of IP networks by invoking network
>>> drivers
>>>
>>>
>>>called CNI plugins.
>>>
>>>
>>>
>>>
>>>
>>>  * [MESOS-2948, MESOS-5403] - The authorizer interface has been
>>> refactored
>>> in
>>>
>>>order to decouple the ACLs definition language from the interface.
>>>
>>

Fwd: [VOTE] Release Apache Mesos 1.0.0 (rc1)

2016-06-07 Thread Yan Xu
What do you think about this Vinod?

I think we can remove this major version checking altogether.
Backwards-incompatible changes would warrant a major version bump but not
vise versa. Plus it's more standard to express and check dependency
versions outside of the code but through package metadata.

Yan

On Mon, Jun 6, 2016 at 7:32 PM, Robert Lacroix  wrote:

> Hi Vinod,
>
> In convert.cpp
>  we
> compare the major versions of the native library and the jar. This makes
> upgrading frameworks unnecessarily hard because you would have to deploy
> Mesos and frameworks in lockstep.
>
> Non-binding -1 , as this check isn’t strictly useful - especially given
> this is probably the last major upgrade where libmesos is even relevant.
>
>  Robert
>
> On Jun 1, 2016, at 12:38 AM, Vinod Kone  wrote:
>
> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.0.0.
>
>
> NOTE: The voting period for this release is 3 weeks. Also, we are willing
> to make API changes before the final release. So please test it thoroughly.
>
>
> 1.0.0 includes the following features:
>
>
> 
>
>  * Scheduler and Executor v1 HTTP APIs are now considered stable.
>
>
>
>
>
>  * [MESOS-4791] - **Experimental** support for v1 Master and Agent APIs.
> These
>
>APIs let operators and services (monitoring, load balancers) send HTTP
>
>
>requests to '/api/v1' endpoint on master or agent. These APIs look
> similar
>
>to the v1 Scheduler and Executor APIs.
>
>
>
>
>
>  * [MESOS-4828] - **Experimental** support for a new `disk/xfs' isolator
>
>
>has been added to isolate disk resources more efficiently. Please refer
> to
>
>docs/mesos-containerizer.md for more details.
>
>
>
>
>
>  * [MESOS-4355] - **Experimental** support for Docker volume plugin. We
> added a
>
>new isolator 'docker/volume' which allows users to use external volumes
> in
>
>Mesos containerizer. Currently, the isolator interacts with the Docker
>
>
>volume plugins using a tool called 'dvdcli'. By speaking the Docker
> volume
>
>plugin API, most of the Docker volume plugins are supported.
>
>
>
>
>
>  * [MESOS-4641] - **Experimental** A new network isolator, the
>
>
>`network/cni` isolator, has been introduced in the
> `MesosContainerizer`. The
>
>`network/cni` isolator implements the Container Network Interface (CNI)
>
>
>specification proposed by CoreOS.  With CNI the `network/cni` isolator
> is
>
>able to allocate a network namespace to Mesos containers and attach the
>
>
>container to different types of IP networks by invoking network drivers
>
>
>called CNI plugins.
>
>
>
>
>
>  * [MESOS-2948, MESOS-5403] - The authorizer interface has been refactored
> in
>
>order to decouple the ACLs definition language from the interface.
>
>
>It additionally includes the option of retrieving `ObjectApprover`. An
>
>
>`ObjectApprover` can be used to synchronously check authorizations for
> a
>
>given object and is hence useful when authorizing a large number of
> objects
>
>and/or large objects (which need to be copied using request based
>
>
>authorization). NOTE: This is a **breaking change** for authorizer
> modules.
>
>
>
>
>  * [MESOS-4931] - Authorization based HTTP endpoint filtering enables
> operators
>
>to restrict what part of the cluster state a user is authorized to see.
>
>
>Consider for example the `/state` master endpoint: an operator can now
>
>
>authorize users to only see a subset of the running frameworks, tasks,
> or
>
>executors.
>
>
>
>
>
>  * [MESOS-4909] - Tasks can now specify a kill policy. They are
> best-effort,
>
>because machine failures or forcible terminations may occur. Currently,
> the
>
>only available kill policy is how long to wait between graceful and
> forcible
>
>task kill. In the future, more policies may be available (e.g. hitting
> an
>
>HTTP endpoint, running a command, etc). Note that it is the executor's
>
>
>responsibility to enforce kill policies. For executor-less
> command-based
>
>tasks, the kill is performed via sending a signal to the task process:
>
>
>SIGTERM for the graceful kill and SIGKILL for the forcible kill. For
> docker
>
>executor-less tasks the grace period is passed to 'docker stop --time'.
> This
>
>feature supersedes the '--docker_stop_timeout', which is now
> deprecated.
>
>
>
>
>  * [MESOS-4908] - The task kill policy defined within 'TaskInfo' can now
> be
>
>overridden when the scheduler kills the task. This can be used by
> schedulers
>
>to forcefully kill a task which is already being killed, e.g. if
> something
>
>went wrong during a graceful kill and a forcible kill is desired. Note
> that
>
>it is the executor's responsibility to honor the
> 

Re: Hoping to join Mesos "contributors"

2016-05-24 Thread Yan Xu
What's your Apache JIRA <https://issues.apache.org/jira/browse/MESOS>
username?

---
Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan>

On Tue, May 24, 2016 at 12:28 AM, He, Tong <tong...@hpe.com> wrote:

> Dear committers,
>
> I hope to join in the development of  GPU
> features(src/slave/containerizer/mesos/isolators/cgroups/devices/gpus),hoping
> to create new issues or solve existed issues. According to the document(
> http://mesos.apache.org/documentation/latest/submitting-a-patch/ ,
> Submitting a Patch, Before you start writing code, 4.Assign the JIRA to
> yourself..,2.You need to be added to the list of Mesos "contributors" by a
> Mesos Committer.. ).
> Thank you very much.
>
> Best Wishes.
> Tong  He
>
>


Re: [proposal] MESOS-4610: MasterContender/MasterDetector should be loadable as modules

2016-03-28 Thread Yan Xu
My apologies that due to some email filter issues I only see this right
now. Thanks Kapil for stepping in (and I am probably unable to shepherd
this right now)!

On Fri, Mar 25, 2016 at 12:58 PM, Anurag Singh <
anurag.prakash.si...@gmail.com> wrote:

> Hello Kapil,
>
> I just wanted to know if you'd had a chance to take a look at the changes
> and had any comments on them. Thanks.
>
> On Wed, Mar 23, 2016 at 4:07 PM, Anurag Singh <
> anurag.prakash.si...@gmail.com> wrote:
>
>> Thanks, Kapil. I've added you as reviewer.
>>
>> On Wed, Mar 23, 2016 at 11:45 AM, Kapil Arya  wrote:
>>
>>> Anurag, Yan,
>>>
>>> I can also help in reviewing this stuff. If Yan, has cycles to shepherd
>>> it,
>>> great, otherwise, I can shepherd it too.
>>>
>>> Best,
>>> Kapil
>>>
>>> On Wed, Mar 23, 2016 at 2:38 PM, Anurag Singh <
>>> anurag.prakash.si...@gmail.com> wrote:
>>>
>>> > Yan, would you like me to add you as the reviewer on the ReviewBoard
>>> > changes? I'm assuming that you will be the shepherd. Please let me
>>> know if
>>> > that isn't confirmed yet.
>>> >
>>> > On Mon, Mar 21, 2016 at 1:08 PM, Benjamin Mahler 
>>> > wrote:
>>> >
>>> > > +Yan
>>> > >
>>> > > On Mon, Mar 21, 2016 at 10:28 AM, Anurag Singh <
>>> > > anurag.prakash.si...@gmail.com> wrote:
>>> > >
>>> > > > Joseph's suggestion is that since Ben Hindman may not have enough
>>> time
>>> > to
>>> > > > shepherd this issue, we should seek another one. Would anyone here
>>> be
>>> > > able
>>> > > > to shepherd https://issues.apache.org/jira/browse/MESOS-4610?
>>> > > >
>>> > > >
>>> > > > On Tue, Mar 15, 2016 at 1:13 PM, Anurag Singh <
>>> > > > anurag.prakash.si...@gmail.com> wrote:
>>> > > >
>>> > > > > As of now, we've got Ben Hindman as the designated shepherd.
>>> Joesph
>>> > Wu
>>> > > > has
>>> > > > > been helping us with the reviews and suggesting changes.
>>> > > > >
>>> > > > > On Tue, Mar 15, 2016 at 1:09 PM, Vinod Kone <
>>> vinodk...@apache.org>
>>> > > > wrote:
>>> > > > >
>>> > > > >> This is great to hear! @YanXu is this something you might be
>>> able to
>>> > > > >> shepherd?
>>> > > > >>
>>> > > > >> On Tue, Mar 15, 2016 at 1:03 PM, Anurag Singh <
>>> > > > >> anurag.prakash.si...@gmail.com> wrote:
>>> > > > >>
>>> > > > >> > Hi,
>>> > > > >> >
>>> > > > >> > We're inviting user and developer comments on a series of
>>> changes
>>> > we
>>> > > > >> have
>>> > > > >> > been working on that would modularize MasterContender and
>>> > > > >> MasterDetectors.
>>> > > > >> > The goal is to allow the use of detector and contender
>>> > > implementations
>>> > > > >> > other than the ones that are part of Mesos source (Standalone
>>> and
>>> > > > >> > Zookeeper). So if one would like to use a custom leader
>>> election
>>> > > > >> mechanism
>>> > > > >> > (e.g. one that relies on etcd, consul, etc.), it will be
>>> possible
>>> > to
>>> > > > >> load
>>> > > > >> > the implementation from a shared library. In practice, it
>>> > translates
>>> > > > to
>>> > > > >> > using the following command line options:
>>> > > > >> >
>>> > > > >> > For the mesos master:
>>> > > > >> >
>>> > > > >> > --master_contender: The value of this command line option is
>>> the
>>> > > name
>>> > > > >> of a
>>> > > > >> > symbol (defined in a module and referenced in the value of the
>>> > > > --modules
>>> > > > >> > flag). The symbol refers to an object of type
>>> > > Module.
>>> > > > >> For
>>> > > > >> > an example, please see the test_contender_module.cpp file in
>>> > > > >> > https://reviews.apache.org/r/44289/.
>>> > > > >> >
>>> > > > >> > For the mesos master and slave:
>>> > > > >> >
>>> > > > >> > --master_detector:  The value of this command line option is
>>> the
>>> > > name
>>> > > > >> of a
>>> > > > >> > symbol (defined in a module and referenced in the value of the
>>> > > > --modules
>>> > > > >> > flag). The symbol refers to an object of type
>>> > > Module.
>>> > > > >> For
>>> > > > >> > an example, please see the test_detector_module.cpp file in
>>> > > > >> > https://reviews.apache.org/r/44289/.
>>> > > > >> >
>>> > > > >> > The --modules option, in addition to pointing to the shared
>>> > library
>>> > > > and
>>> > > > >> > symbols, can be used to pass parameters (via the Parameters
>>> class)
>>> > > to
>>> > > > >> the
>>> > > > >> > modules in the form of key-value pairs.
>>> > > > >> >
>>> > > > >> > Also, please note that there is no change in the behavior of
>>> the
>>> > > > legacy
>>> > > > >> > --zk and --master options. They will continue to work as
>>> before.
>>> > > > >> >
>>> > > > >> > The following changes implement this functionality and have
>>> been
>>> > > under
>>> > > > >> > review (thanks to Joseph Wu (Mesosphere) for his input):
>>> > > > >> >
>>> > > > >> > https://reviews.apache.org/r/44287/
>>> > > > >> > https://reviews.apache.org/r/44288/
>>> > > > >> > https://reviews.apache.org/r/44543/
>>> > > > >> > https://reviews.apache.org/r/44544/
>>> > > > >> > 

Re: Proposal: moving Mesos website to project codebase

2015-10-09 Thread Yan Xu
+1 for making it easier for contributors to understand the website code and
collaboratively maintain it!

--
Jiang Yan Xu <y...@jxu.me> @xujyan <http://twitter.com/xujyan>

On Fri, Oct 9, 2015 at 5:21 PM, Paul Brett <pbr...@twitter.com.invalid>
wrote:

> +1
>
> On Fri, Oct 9, 2015 at 8:59 AM, haosdent <haosd...@gmail.com> wrote:
>
> > +1!
> > On Oct 9, 2015 10:37 PM, "Kevin Sweeney" <kevi...@apache.org> wrote:
> >
> > > +1!
> > >
> > > On Fri, Oct 9, 2015 at 3:35 PM Marco Massenzio <ma...@mesosphere.io>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Dave - great stuff!
> > > >
> > > > *Marco Massenzio*
> > > >
> > > > *Distributed Systems Engineerhttp://codetrips.com <
> > http://codetrips.com
> > > >*
> > > >
> > > > On Fri, Oct 9, 2015 at 3:05 PM, Dave Lester <d...@davelester.org>
> > wrote:
> > > >
> > > > > As part of the #MesosCon Europe hackathon, my team has been making
> > > > > improvements to the website. Among these changes, we'd like to
> > propose
> > > > > changing where the website source files live by moving them to the
> > main
> > > > > Mesos codebase. Our current progress / working branch of this is
> > > > > available on GitHub:
> https://github.com/fayusohenson/mesos/tree/site
> > > > >
> > > > > * What does this mean? *
> > > > > We've added a /site directory to the Mesos codebase, which includes
> > the
> > > > > website source files. Today, these live in subversion. The rake
> file
> > > and
> > > > > other parts of building the website all work in this new
> environment,
> > > > > plus a number of related fixes (image linking, etc).
> > > > >
> > > > > For committers that are familiar with the current model for pushing
> > the
> > > > > site live, this immediate change still requires us `svn commit` the
> > > > > /publish directory for the website (static files that are
> generated).
> > > > >
> > > > > * Why this change? *
> > > > > 1. Today we do not have an easy process for the community to
> > contribute
> > > > > to the project website. By merging this with the Mesos codebase, it
> > > will
> > > > > be significantly easier to send a review or pull request.
> > > > > 2. It'll be easier for committers to manage the website, and check
> > that
> > > > > documentation changes render on the website properly before
> > committing.
> > > > > Because it's difficult to do today, this is often not checked. :(
> > > > > 3. It's a solid step toward an automated deployment of the website
> in
> > > > > the future: https://issues.apache.org/jira/browse/MESOS-1309
> > > > >
> > > > > * Who approves of this change? *
> > > > > As the Mesos website maintainer, I feel good about this change and
> > its
> > > > > direction for the project. Before committing this change, I'd like
> > > > > community support that including this in the main Mesos codebase
> > makes
> > > > > sense.
> > > > >
> > > > > Comments? Questions?
> > > > >
> > > > > Dave
> > > > >
> > > >
> > >
> >
>
>
>
> --
> @paul_b
>


Re: [VOTE] Release Apache Mesos 0.24.0 (rc2)

2015-09-04 Thread Yan Xu
+1 (binding)

Tested on our CI.

--
Jiang Yan Xu <y...@jxu.me> @xujyan <http://twitter.com/xujyan>

On Fri, Sep 4, 2015 at 12:11 PM, Bernd Mathiske <be...@mesosphere.io> wrote:

> +1 [binding]
>
> MacOS X (make check)
> CentOS 7 (make distcheck)
> Ubuntu 14.4 (make distcheck)
>
>
> On Sep 3, 2015, at 11:47 PM, Niklas Nielsen <nik...@mesosphere.io> wrote:
>
> +1 - tested on our CI
>
> On Tuesday, September 1, 2015, Vinod Kone <vinodk...@apache.org> wrote:
>
>> Hi all,
>>
>>
>> Please vote on releasing the following candidate as Apache Mesos 0.24.0.
>>
>>
>> 0.24.0 includes the following:
>>
>>
>> 
>>
>> Experimental support for v1 scheduler HTTP API!
>>
>> This release also wraps up support for fetcher.
>>
>> The CHANGELOG for the release is available at:
>>
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.0-rc2
>>
>>
>> 
>>
>>
>> The candidate for Mesos 0.24.0 release is available at:
>>
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.0-rc2/mesos-0.24.0.tar.gz
>>
>>
>> The tag to be voted on is 0.24.0-rc2:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.0-rc2
>>
>>
>> The MD5 checksum of the tarball can be found at:
>>
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.0-rc2/mesos-0.24.0.tar.gz.md5
>>
>>
>> The signature of the tarball can be found at:
>>
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.0-rc2/mesos-0.24.0.tar.gz.asc
>>
>>
>> The PGP key used to sign the release is here:
>>
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>>
>> The JAR is up in Maven in a staging repository here:
>>
>> https://repository.apache.org/content/repositories/orgapachemesos-1066
>>
>>
>> Please vote on releasing this package as Apache Mesos 0.24.0!
>>
>>
>> The vote is open until Fri Sep  4 17:33:05 PDT 2015 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Mesos 0.24.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> Thanks,
>>
>> Vinod
>>
>
>


Re: Request to adding me to contributors list

2015-08-07 Thread Yan Xu
Done. Welcome!

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Thu, Aug 6, 2015 at 11:41 PM, Jian BJ Qiu qiuj...@cn.ibm.com wrote:

 Hi,

 Can anyone add me to the contributors list?
 My JIRA account name is : qiujian

 Regards

 Qiu Jian


Re: Regarding old frameworks in Mesos repository

2015-07-07 Thread Yan Xu
The cleanup has been committed:
https://issues.apache.org/jira/browse/MESOS-2640.

Jake,
Is it still against ASF policy if these frameworks are moved under
github.com/mesos but we make it very clear we are not taking contributions
to these repos and they are read-only? People can still fork them just like
they can fork github.com/apache/mesos but I don't think it's our intention
to block that right?

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Wed, Jun 24, 2015 at 3:48 PM, Marco Massenzio ma...@mesosphere.io
wrote:

 +1
 for deleting obsolete / unmaintained code
 (this is also a great courtesy to newbies who feel like they wasted time
 learning/reading code that they're then told is out-of-date)

 One of the great things about using git is that we can resurrect them (or
 parts thereof) at any time, should we wish to do so: where to graveyard
 them I would argue is sort of already implied here, and it's in this repo.

 Thanks for offering to do this!

 *Marco Massenzio*
 *Distributed Systems Engineer*

 On Wed, Jun 24, 2015 at 11:05 AM, Niklas Nielsen nik...@mesosphere.io
 wrote:

  On 24 June 2015 at 10:55, Yan Xu y...@jxu.me wrote:
 
   If we anticipated further development we wouldn't have proposed moving
  them
   out. :)
   So I think it's safe to say that we are just looking for a graveyard
 for
   them. Do the policies still apply?
   That said, if they cannot be moved out, we should still just delete
 them
   and point to the commit in which they are deleted.
  
 
  +1
 
 
  
   Most of the discussions around this are on this thread. Relevant notes
  from
   the community sync are in this google doc
   
  
 
 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit
   
   (June
   4). It's not about ec2 maintenance as the scripts in this repo are not
   up-to-date and and people have created ec2 tools not based on these
   scripts. (See this
   
  
 
 http://mail-archives.apache.org/mod_mbox/mesos-dev/201505.mbox/%3ccak8jagmtmdetskckqmwatyab3dmautsxo5p+6p2f2s2k-zt...@mail.gmail.com%3E
   
   .)
  
   Yan
  
   --
   Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan
  
   On Tue, Jun 23, 2015 at 7:18 PM, Jake Farrell jfarr...@apache.org
  wrote:
  
(infra hat on) This can not be just shifted to a random github org
 and
still be maintained under the Apache Mesos project. All commits must
   occur
to Apache hardware before going to any mirrors such as github, which
currently does not support anything not under the Apache github org.
   
(Mesos hat back on) Sorry I missed the last sync up, I did not see
 any
notes from this brought back to the dev list. If there is a need for
maintainers for the ec2 section of the code base I would be happy to
  step
in and help either rework to cloudformation or answer any outstanding
questions
   
-Jake
   
   
   
On Tue, Jun 23, 2015 at 9:26 PM, Erik Weathers 
 eweath...@groupon.com
wrote:
   
 Please maintain the git history for the files when you move them.
  They
 should not all appear to have been born into the new repos...

 - Erik

 On Tuesday, June 23, 2015, Yan Xu y...@jxu.me wrote:

  So I'd like to resurface this topic. The last attempt
  https://reviews.apache.org/r/33090/ to remove things under
frameworks/
  was put off because scripts under ec2/ still reference these
frameworks.
 
  However we seem to have reached the consensus that these
  unmaintained
 code
  need to be moved out to avoid confusion (People asking questions
 /
  reporting errors for things we don't maintain anymore). This
 point
   was
  reiterated during our last community sync and we decided to
 remove
   ec2/
  folder as well.
 
  Therefore, if there's no objection, I will delete these files and
 recreate
  them as individual projects under github.com/mesos. Our website
  will
be
  updated with the links to them either deleted or replaced by
  similar
  external projects.
 
  --
  Jiang Yan Xu y...@jxu.me javascript:; @xujyan 
  http://twitter.com/xujyan
 
  On Fri, Apr 10, 2015 at 2:39 AM, Alexander Rojas 
 alexan...@mesosphere.io
  javascript:;
  wrote:
 
   +1 If they are not maintain they should be somewhere else.
  
On 06 Apr 2015, at 21:10, Yan Xu y...@jxu.me javascript:;
wrote:
   
There exist a couple of frameworks in the Mesos codebase
 under
   /frameworks:
deploy_jar haproxy+apache mesos-submit   torque
(See https://github.com/apache/mesos/tree/master/frameworks)
   
Anyone still uses them?
   
These frameworks are not trivial implementations like the
 ones
under
src/examples to demonstrate/test Mesos features and they rely
  on
  external
programs to run. Since we don't actively maintain them, they
  may
have

Re: Regarding old frameworks in Mesos repository

2015-06-23 Thread Yan Xu
So I'd like to resurface this topic. The last attempt
https://reviews.apache.org/r/33090/ to remove things under frameworks/
was put off because scripts under ec2/ still reference these frameworks.

However we seem to have reached the consensus that these unmaintained code
need to be moved out to avoid confusion (People asking questions /
reporting errors for things we don't maintain anymore). This point was
reiterated during our last community sync and we decided to remove ec2/
folder as well.

Therefore, if there's no objection, I will delete these files and recreate
them as individual projects under github.com/mesos. Our website will be
updated with the links to them either deleted or replaced by similar
external projects.

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Fri, Apr 10, 2015 at 2:39 AM, Alexander Rojas alexan...@mesosphere.io
wrote:

 +1 If they are not maintain they should be somewhere else.

  On 06 Apr 2015, at 21:10, Yan Xu y...@jxu.me wrote:
 
  There exist a couple of frameworks in the Mesos codebase under
 /frameworks:
  deploy_jar haproxy+apache mesos-submit   torque
  (See https://github.com/apache/mesos/tree/master/frameworks)
 
  Anyone still uses them?
 
  These frameworks are not trivial implementations like the ones under
  src/examples to demonstrate/test Mesos features and they rely on external
  programs to run. Since we don't actively maintain them, they may have
  already stopped working with the current versions of these programs.
 
  We'd like to remove these from the Mesos repository. If there are folks
 who
  still use them and would like to contribute, the ideal place to host them
  is in their own repos. e.g., https://github.com/mesos/hadoop
 
  Any comments?
 
  --
  Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan




Review Request 33090: Removed unmaintained frameworks code.

2015-04-21 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33090/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-2640
https://issues.apache.org/jira/browse/MESOS-2640


Repository: mesos


Description
---

They should be maintained outside the core repo.


Diffs
-

  frameworks/deploy_jar/README.txt 37ccc1733a0689ab0eea802b91e76d8d7eb71a82 
  frameworks/deploy_jar/daemon_executor.py 
4720a6e972308ef324f3f84642607072f0e69f3b 
  frameworks/deploy_jar/daemon_executor.sh 
ac54d73fbcd225463e47fbad8ca5c4ffac48ca93 
  frameworks/deploy_jar/daemon_framework 
2bb9107791ffe8fd0bd7a97667f6a2c85993e511 
  frameworks/deploy_jar/daemon_scheduler.py 
a919717a25b050b465202c1de7be6dfeb5720cfa 
  frameworks/deploy_jar/haproxy.config.template 
4d537e9abf29c34f9b709384a5f259c6108d05a2 
  frameworks/deploy_jar/hw.jar aa134596f5d4f8dd4c7a2b3c99b7ba9f18159965 
  frameworks/haproxy+apache/README.txt 37ccc1733a0689ab0eea802b91e76d8d7eb71a82 
  frameworks/haproxy+apache/haproxy+apache 
05b975989fbec97d13eb0a181cd9d85c8c7c564e 
  frameworks/haproxy+apache/haproxy+apache.py 
39d4e7ba681b6437e978fadbf58e92e7e32345f1 
  frameworks/haproxy+apache/haproxy.config.template 
ca82ba998d645dbfb3b057afa873d1379543ff43 
  frameworks/haproxy+apache/startapache.py 
dce0ddd0e5bd655f9d745e88eefab1603a6a5e62 
  frameworks/haproxy+apache/startapache.sh 
6eb24560130f5aa6eafb1ed6abecd8ab804e1e85 
  frameworks/mesos-submit/executor dd8b98c8a1971dd4b1a5504ba607f2d052837f22 
  frameworks/mesos-submit/executor.py 38813b9f9d151fa5293b710bc077075a5b1571cf 
  frameworks/mesos-submit/mesos-submit 8cd5a674ced6a0441eb692db3bbf67a0f2bb7dfd 
  frameworks/mesos-submit/mesos_submit.py 
94d6232da504a86aca558b3655b3dc881e6d981d 
  frameworks/torque/README.txt dd671e221dc448f89f593017fdcc81fae2ab9aa8 
  frameworks/torque/hpl-24node.qsub 3df1a45126e71c17fd642e90352eca733fbd0006 
  frameworks/torque/hpl-48node.qsub 105a7bbe4d4fdafd4984ec170790591a1a11160f 
  frameworks/torque/hpl-8node-small.qsub 
b14dc95a172bcb61d3c631968106689dcc8b5172 
  frameworks/torque/hpl-8node.qsub b14dc95a172bcb61d3c631968106689dcc8b5172 
  frameworks/torque/mesos-hpl/24node/HPL.dat 
e9ab3fdc2295419a84540b9f0bd03b09a1527758 
  frameworks/torque/mesos-hpl/48node/HPL.dat 
d5c3e85b171698d86c8f588502b6a3839d23d248 
  frameworks/torque/mesos-hpl/8node-small/HPL.dat 
941726c1d77534928d01cdbc5fc1a74c91cd635f 
  frameworks/torque/mesos-hpl/8node/HPL.dat 
250563c4d5a021119d89578110b32d4d031c3d04 
  frameworks/torque/mesos-hpl/Make.Linux_PII_CBLAS 
de646bab3bb26edbc086b245b407c190c6bbd77f 
  frameworks/torque/mesos-hpl/README 17b7b3be298f2e9d8802e9a64b792a83a66551da 
  frameworks/torque/mpi_example.c 3f3bcd08978aee7dc6803fabdbf171353a6728e8 
  frameworks/torque/start_pbs_mom.py ef8ed7cd8df8ff7d8c5e7720a823be6e967b121c 
  frameworks/torque/start_pbs_mom.sh e0c81d0899e1f7a01590fec11cbc61e299ebef57 
  frameworks/torque/test_date_sleep_date_1node.qsub 
99d91da292e950b72ca527fc6d94b924b1a57ea4 
  frameworks/torque/test_date_sleep_date_2node.qsub 
bd7ea0f6d737c5ec30f0930538e9484b691e0066 
  frameworks/torque/test_date_sleep_date_3node.qsub 
b8391da262ce2d5d01e63a4427b0614aeb59c2f3 
  frameworks/torque/test_date_sleep_date_5node_10sec.qsub 
c0e6d28e24b514d6b440f4b54b4db96ba55777a0 
  frameworks/torque/test_date_sleep_date_5node_60sec.qsub 
17cd352ebeac5bbf5a68ed2d06e99a032ded35e4 
  frameworks/torque/torquelib.py b69df1c528c7ce32103d23e16bee8276cf3bffaf 
  frameworks/torque/torquesched.py c22ef9180c3aaf447c0288ee4948d0c8067791b7 
  frameworks/torque/torquesched.sh a40c7eafd71a11c8f974aa49a612fa81d0016102 

Diff: https://reviews.apache.org/r/33090/diff/


Testing
---

make check


Thanks,

Jiang Yan Xu



Re: Regarding old frameworks in Mesos repository

2015-04-07 Thread Yan Xu
You are right. Then the question is which CLI should it go to? Seems like
src/cli is superseded by https://github.com/mesosphere/mesos-cli too and
should be on the spring cleaning list? I personally don't mind cli to be
inside the core source tree but it seems like
https://github.com/mesosphere/mesos-cli is more feature complete already.

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Mon, Apr 6, 2015 at 2:46 PM, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 +1 on removing deploy_jar, haproxy+apache, torque.

 For mesos-submit, it seems that this should instead be a mesos CLI command.

 On Mon, Apr 6, 2015 at 2:04 PM, Vinod Kone vinodk...@apache.org wrote:

  +1
 
  On Mon, Apr 6, 2015 at 12:28 PM, Adam Bordelon a...@mesosphere.io
 wrote:
 
   +1 to moving these out to https://github.com/mesos/framework even if
   they
   are used, in which case we should open an issue tracker for each
 separate
   project and give write permissions to that repo to anyone willing to
   maintain it.
  
   On Mon, Apr 6, 2015 at 12:10 PM, Yan Xu y...@jxu.me wrote:
  
There exist a couple of frameworks in the Mesos codebase under
   /frameworks:
deploy_jar haproxy+apache mesos-submit   torque
(See https://github.com/apache/mesos/tree/master/frameworks)
   
Anyone still uses them?
   
These frameworks are not trivial implementations like the ones under
src/examples to demonstrate/test Mesos features and they rely on
  external
programs to run. Since we don't actively maintain them, they may have
already stopped working with the current versions of these programs.
   
We'd like to remove these from the Mesos repository. If there are
 folks
   who
still use them and would like to contribute, the ideal place to host
  them
is in their own repos. e.g., https://github.com/mesos/hadoop
   
Any comments?
   
--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan
   
  
 



Regarding old frameworks in Mesos repository

2015-04-06 Thread Yan Xu
There exist a couple of frameworks in the Mesos codebase under /frameworks:
deploy_jar haproxy+apache mesos-submit   torque
(See https://github.com/apache/mesos/tree/master/frameworks)

Anyone still uses them?

These frameworks are not trivial implementations like the ones under
src/examples to demonstrate/test Mesos features and they rely on external
programs to run. Since we don't actively maintain them, they may have
already stopped working with the current versions of these programs.

We'd like to remove these from the Mesos repository. If there are folks who
still use them and would like to contribute, the ideal place to host them
is in their own repos. e.g., https://github.com/mesos/hadoop

Any comments?

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan


Re: Splitting reviews and build emails into their own mailing lists

2015-03-28 Thread Yan Xu
+1

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Fri, Mar 27, 2015 at 12:19 PM, Vinod Kone vinodk...@apache.org wrote:

 Hi,

 What do you guys think about moving the review and build (CI) emails to
 their own mailing lists (reviews@ and builds@).

 Quite a few people that I have talked to have mentioned that there is too
 much noise on our dev list to get at the signal (dev discussions, release
 announcements, API changes etc). We have taken the first step by moving
 JIRA emails (issues@) off the dev list. I think it's time we move
 reviews/build emails too to keep the fidelity of dev list high.

 Comments?



Re: Request to be added as mesos contributor

2015-02-08 Thread Yan Xu
Added you as an contributor on MESOS JIRA. Welcome!

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Sun, Feb 8, 2015 at 3:30 AM, Palak Choudhary palakchoudhar...@gmail.com
wrote:

 Hey,

 I have been following mesos development for quite some time and would like
 to contribute to the project.

 I am an undergrad with computer science majors and have some experience in
 kernel development, distributed computing and high-performance computing. I
 would love to contribute to mesos community in any way possible.

 My skill set includes C, C++, Java, Hadoop, Python and shell scripting.

 Please add me as a contributor.

 Let me know if you need any other specific information.

 --
 Regards
 Palak Choudhary



Re: Review Request 30511: Moved framework related rate limiters into Master::Frameworks.

2015-02-03 Thread Jiang Yan Xu


 On Feb. 2, 2015, 6:10 p.m., Ben Mahler wrote:
  I recall Yan and I discussed improving the BoundedRateLimiter abstraction 
  to provide a interface that returns failed futures when the bound is 
  reached, but I don't think we documented it or the implications of such a 
  change.
  
  Mind adding a small TODO for this?

It's documented here: https://issues.apache.org/jira/browse/MESOS-1723 but 
unfortunately not in the code.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30511/#review70695
---


On Feb. 2, 2015, 10:39 a.m., Vinod Kone wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30511/
 ---
 
 (Updated Feb. 2, 2015, 10:39 a.m.)
 
 
 Review request for mesos, Ben Mahler and Jiang Yan Xu.
 
 
 Bugs: MESOS-1148
 https://issues.apache.org/jira/browse/MESOS-1148
 
 
 Repository: mesos
 
 
 Description
 ---
 
 In the subsequent review a slave related rate limiter is added. So moved the 
 framework related rate limiters inside 'frameworks'.
 
 No functional changes.
 
 
 Diffs
 -
 
   src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 
   src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 
 
 Diff: https://reviews.apache.org/r/30511/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Vinod Kone
 




Re: Review Request 30511: Moved framework related rate limiters into Master::Frameworks.

2015-02-02 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30511/#review70601
---

Ship it!


Ship It!

- Jiang Yan Xu


On Feb. 2, 2015, 10:39 a.m., Vinod Kone wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30511/
 ---
 
 (Updated Feb. 2, 2015, 10:39 a.m.)
 
 
 Review request for mesos, Ben Mahler and Jiang Yan Xu.
 
 
 Bugs: MESOS-1148
 https://issues.apache.org/jira/browse/MESOS-1148
 
 
 Repository: mesos
 
 
 Description
 ---
 
 In the subsequent review a slave related rate limiter is added. So moved the 
 framework related rate limiters inside 'frameworks'.
 
 No functional changes.
 
 
 Diffs
 -
 
   src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 
   src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 
 
 Diff: https://reviews.apache.org/r/30511/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Vinod Kone
 




Re: Review Request 27832: Fixed MasterAuthorizationTest.DuplicateReregistration test.

2014-11-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27832/#review60709
---

Ship it!


Ship It!

- Jiang Yan Xu


On Nov. 10, 2014, 2:48 p.m., Vinod Kone wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27832/
 ---
 
 (Updated Nov. 10, 2014, 2:48 p.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: MESOS-2008
 https://issues.apache.org/jira/browse/MESOS-2008
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The bug here is that since we don't control the scheduler re-registration 
 retries, it could so happen that scheduler might send more than expected 
 re-registration (and hence authz) requests causing the expectations to fail. 
 Paused the clock as a fix.
 
 
 Diffs
 -
 
   src/tests/master_authorization_tests.cpp 
 5ae855e59036c6cbcec15db5449620a8e5b2aa44 
 
 Diff: https://reviews.apache.org/r/27832/diff/
 
 
 Testing
 ---
 
 make check MESOS_VERBOSE=1 
 GTEST_FILTER=*MasterAuthorizationTest.DuplicateReregistration* 
 GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 GLOG_v=1
 
 
 Thanks,
 
 Vinod Kone
 




Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2508

2014-10-30 Thread Yan Xu
https://issues.apache.org/jira/browse/MESOS-2017

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Wed, Oct 29, 2014 at 10:56 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2508/changes
 

 Changes:

 [vinodkone] Added MesosCon'14 playlist to presentations.

 --
 [...truncated 55745 lines...]
 I1030 05:55:05.082120 24483 replica.cpp:508] Replica received write
 request for position 4
 I1030 05:55:05.082358 24486 hierarchical_allocator_process.hpp:679]
 Performed allocation for slave 20141030-055504-3142697795-40429-24459-S0 in
 632705ns
 I1030 05:55:05.082669 24483 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 514640ns
 I1030 05:55:05.082698 24483 replica.cpp:676] Persisted action at 4
 I1030 05:55:05.082751 24480 master.cpp:3795] Sending 1 offers to framework
 20141030-055504-3142697795-40429-24459- (default) at
 scheduler-4c373650-2b25-4f8d-90bf-ebce4293d044@67.195.81.187:40429
 I1030 05:55:05.083233 24477 sched.cpp:544] Scheduler::resourceOffers took
 130329ns
 I1030 05:55:05.083710 24483 replica.cpp:655] Replica received learned
 notice for position 4
 I1030 05:55:05.084239 24483 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 497420ns
 I1030 05:55:05.084332 24483 leveldb.cpp:401] Deleting ~2 keys from leveldb
 took 65118ns
 I1030 05:55:05.084355 24483 replica.cpp:676] Persisted action at 4
 I1030 05:55:05.084367 24477 master.cpp:2321] Processing reply for offers:
 [ 20141030-055504-3142697795-40429-24459-O0 ] on slave
 20141030-055504-3142697795-40429-24459-S0 at slave(202)@
 67.195.81.187:40429 (pomona.apache.org) for framework
 20141030-055504-3142697795-40429-24459- (default) at
 scheduler-4c373650-2b25-4f8d-90bf-ebce4293d044@67.195.81.187:40429
 I1030 05:55:05.084378 24483 replica.cpp:661] Replica learned TRUNCATE
 action at position 4
 I1030 05:55:05.084741 24486 hierarchical_allocator_process.hpp:659]
 Performed allocation for 1 slaves in 123939ns
 I1030 05:55:05.084908 24487 master.cpp:120] No whitelist given.
 Advertising offers for all slaves
 I1030 05:55:05.085345 24482 slave.cpp:2522] Received ping from
 slave-observer(183)@67.195.81.187:40429
 I1030 05:55:05.085438 24488 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
 (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024;
 ports(*):[31000-32000]) on slave 20141030-055504-3142697795-40429-24459-S0
 from framework 20141030-055504-3142697795-40429-24459-
 I1030 05:55:06.908022 24488 hierarchical_allocator_process.hpp:599]
 Framework 20141030-055504-3142697795-40429-24459- filtered slave
 20141030-055504-3142697795-40429-24459-S0 for 5secs
 I1030 05:55:06.910189 24459 master.cpp:677] Master terminating
 I1030 05:55:06.910305 24482 sched.cpp:745] Stopping framework
 '20141030-055504-3142697795-40429-24459-'
 I1030 05:55:06.910907 24475 slave.cpp:2607] master@67.195.81.187:40429
 exited
 W1030 05:55:06.910935 24475 slave.cpp:2610] Master disconnected! Waiting
 for a new master to be elected
 I1030 05:55:06.926357 24459 slave.cpp:484] Slave terminating
 [   OK ] MasterTest.OfferNotRescindedOnceDeclined (2135 ms)
 [--] 27 tests from MasterTest (73928 ms total)

 [--] 3 tests from DRFAllocatorTest
 [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
 Using temporary directory
 '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
 I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
 I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
 I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
 I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in
 2018ns
 I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in
 the db in 335ns
 I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log
 positions 0 - 0 with 1 holes and 0 unlearned
 I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
 I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
 I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status
 received a broadcasted recover request
 I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response
 from a replica in EMPTY status
 I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to
 STARTING
 I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes)
 to leveldb took 591981ns
 I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to
 STARTING
 I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
 I1030 05:55:06.940820 24489 master.cpp:312] Master
 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on
 67.195.81.187:40429
 I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing

Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2502

2014-10-29 Thread Yan Xu
https://issues.apache.org/jira/browse/MESOS-2007



--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Tue, Oct 28, 2014 at 4:49 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2502/changes
 

 Changes:

 [tstclair] Add --enable-debug and --enable-optimize flag for controlling
 building debug and optimized verisons of libprocess

 [idownes] Include tests/setns_test_helper.hpp in Makefile.

 --
 [...truncated 12824 lines...]
 I1028 23:48:22.413159 31211 hierarchical_allocator_process.hpp:734]
 Offering cpus(*):2; mem(*):1024; disk(*):3.70122e+06;
 ports(*):[31000-32000] on slave 20141028-234822-3193029443-50043-31190-S0
 to framework 20141028-234822-3193029443-50043-31190-
 I1028 23:48:22.413290 31208 replica.cpp:508] Replica received write
 request for position 4
 I1028 23:48:22.413421 31211 hierarchical_allocator_process.hpp:679]
 Performed allocation for slave 20141028-234822-3193029443-50043-31190-S0 in
 346658ns
 I1028 23:48:22.413650 31208 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 336067ns
 I1028 23:48:22.413668 31208 replica.cpp:676] Persisted action at 4
 I1028 23:48:22.413797 31216 master.cpp:3795] Sending 1 offers to framework
 20141028-234822-3193029443-50043-31190- (default) at
 scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043
 I1028 23:48:22.414077 31212 replica.cpp:655] Replica received learned
 notice for position 4
 I1028 23:48:22.414356 31212 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 260401ns
 I1028 23:48:22.414403 31212 leveldb.cpp:401] Deleting ~2 keys from leveldb
 took 28541ns
 I1028 23:48:22.414417 31212 replica.cpp:676] Persisted action at 4
 I1028 23:48:22.414446 31212 replica.cpp:661] Replica learned TRUNCATE
 action at position 4
 I1028 23:48:22.414422 31207 sched.cpp:544] Scheduler::resourceOffers took
 310278ns
 I1028 23:48:22.415086 31214 master.cpp:2321] Processing reply for offers:
 [ 20141028-234822-3193029443-50043-31190-O0 ] on slave
 20141028-234822-3193029443-50043-31190-S0 at slave(34)@67.195.81.190:50043
 (pietas.apache.org) for framework
 20141028-234822-3193029443-50043-31190- (default) at
 scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043
 W1028 23:48:22.415163 31214 master.cpp:1969] Executor default for task 0
 uses less CPUs (None) than the minimum required (0.01). Please update your
 executor, as this will be mandatory in future releases.
 W1028 23:48:22.415186 31214 master.cpp:1980] Executor default for task 0
 uses less memory (None) than the minimum required (32MB). Please update
 your executor, as this will be mandatory in future releases.
 I1028 23:48:22.415256 31214 master.cpp:2417] Authorizing framework
 principal 'test-principal' to launch task 0 as user 'jenkins'
 I1028 23:48:22.416033 31219 master.hpp:877] Adding task 0 with resources
 cpus(*):1; mem(*):500 on slave 20141028-234822-3193029443-50043-31190-S0 (
 pietas.apache.org)
 I1028 23:48:22.416084 31219 master.cpp:2480] Launching task 0 of framework
 20141028-234822-3193029443-50043-31190- (default) at
 scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 with
 resources cpus(*):1; mem(*):500 on slave
 20141028-234822-3193029443-50043-31190-S0 at slave(34)@67.195.81.190:50043
 (pietas.apache.org)
 I1028 23:48:22.416317 31214 slave.cpp:1081] Got assigned task 0 for
 framework 20141028-234822-3193029443-50043-31190-
 I1028 23:48:22.416679 31215 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):1; mem(*):524; disk(*):3.70122e+06;
 ports(*):[31000-32000] (total allocatable: cpus(*):1; mem(*):524;
 disk(*):3.70122e+06; ports(*):[31000-32000]) on slave
 20141028-234822-3193029443-50043-31190-S0 from framework
 20141028-234822-3193029443-50043-31190-
 I1028 23:48:22.416721 31215 hierarchical_allocator_process.hpp:599]
 Framework 20141028-234822-3193029443-50043-31190- filtered slave
 20141028-234822-3193029443-50043-31190-S0 for 5secs
 I1028 23:48:22.416724 31214 slave.cpp:1191] Launching task 0 for framework
 20141028-234822-3193029443-50043-31190-
 I1028 23:48:22.418534 31214 slave.cpp:3871] Launching executor default of
 framework 20141028-234822-3193029443-50043-31190- in work directory
 '/tmp/AllocatorTest_0_SlaveReregistersFirst_QPPV21/slaves/20141028-234822-3193029443-50043-31190-S0/frameworks/20141028-234822-3193029443-50043-31190-/executors/default/runs/d593f433-3c16-4678-8f76-4038fe2841c4'
 I1028 23:48:22.420557 31214 exec.cpp:132] Version: 0.21.0
 I1028 23:48:22.420755 31213 exec.cpp:182] Executor started at:
 executor(22)@67.195.81.190:50043 with pid 31190
 I1028 23:48:22.420903 31214 slave.cpp:1317] Queuing task '0' for executor
 default of framework '20141028-234822-3193029443-50043-31190-
 I1028 23:48:22.420997 31214 slave.cpp:555] Successfully attached file
 '/tmp

Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #2221

2014-10-28 Thread Yan Xu
MasterTest.RecoverResources is flaky:
https://issues.apache.org/jira/browse/MESOS-2003

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Mon, Oct 27, 2014 at 7:41 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/2221/changes
 

 Changes:

 [tnachen] Added support for both 1.8 and earlier versions of svn library.

 --
 [...truncated 14964 lines...]
 I1028 02:38:04.510162 11893 log.cpp:680] Attempting to append 139 bytes to
 the log
 I1028 02:38:04.510406 11889 coordinator.cpp:340] Coordinator attempting to
 write APPEND action at position 1
 I1028 02:38:04.511181 11889 replica.cpp:508] Replica received write
 request for position 1
 I1028 02:38:04.511767 11889 leveldb.cpp:343] Persisting action (158 bytes)
 to leveldb took 550337ns
 I1028 02:38:04.511792 11889 replica.cpp:676] Persisted action at 1
 I1028 02:38:04.512370 11889 replica.cpp:655] Replica received learned
 notice for position 1
 I1028 02:38:04.512802 11889 leveldb.cpp:343] Persisting action (160 bytes)
 to leveldb took 411874ns
 I1028 02:38:04.512821 11889 replica.cpp:676] Persisted action at 1
 I1028 02:38:04.512841 11889 replica.cpp:661] Replica learned APPEND action
 at position 1
 I1028 02:38:04.513649 11895 registrar.cpp:490] Successfully updated the
 'registry' in 6.086144ms
 I1028 02:38:04.513772 11895 registrar.cpp:376] Successfully recovered
 registrar
 I1028 02:38:04.513991 11891 log.cpp:699] Attempting to truncate the log to
 1
 I1028 02:38:04.514045 11895 master.cpp:1100] Recovered 0 slaves from the
 Registry (101B) ; allowing 10mins for slaves to re-register
 I1028 02:38:04.514138 11889 coordinator.cpp:340] Coordinator attempting to
 write TRUNCATE action at position 2
 I1028 02:38:04.514847 11898 replica.cpp:508] Replica received write
 request for position 2
 I1028 02:38:04.515223 11898 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 349767ns
 I1028 02:38:04.515239 11898 replica.cpp:676] Persisted action at 2
 I1028 02:38:04.515666 11898 replica.cpp:655] Replica received learned
 notice for position 2
 I1028 02:38:04.516059 11898 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 377837ns
 I1028 02:38:04.516091 11898 leveldb.cpp:401] Deleting ~1 keys from leveldb
 took 16661ns
 I1028 02:38:04.516103 11898 replica.cpp:676] Persisted action at 2
 I1028 02:38:04.516118 11898 replica.cpp:661] Replica learned TRUNCATE
 action at position 2
 I1028 02:38:04.528028 11893 slave.cpp:169] Slave started on 28)@
 67.195.81.187:39931
 I1028 02:38:04.528062 11893 credentials.hpp:84] Loading credential for
 authentication from '/tmp/MasterTest_RecoverResources_vPPbEB/credential'
 I1028 02:38:04.528213 11893 slave.cpp:276] Slave using credential for:
 test-principal
 I1028 02:38:04.528563 11893 slave.cpp:289] Slave resources: cpus(*):2;
 mem(*):1024; disk(*):1024; ports(*):[1-10, 20-30]
 I1028 02:38:04.528663 11893 slave.cpp:318] Slave hostname:
 pomona.apache.org
 I1028 02:38:04.528677 11893 slave.cpp:319] Slave checkpoint: false
 W1028 02:38:04.528684 11893 slave.cpp:321] Disabling checkpointing is
 deprecated and the --checkpoint flag will be removed in a future release.
 Please avoid using this flag
 I1028 02:38:04.529392 11892 state.cpp:33] Recovering state from
 '/tmp/MasterTest_RecoverResources_vPPbEB/meta'
 I1028 02:38:04.532413 11875 sched.cpp:137] Version: 0.21.0
 I1028 02:38:05.495925 11903 hierarchical_allocator_process.hpp:697] No
 resources available to allocate!
 I1028 02:38:09.495995 11899 master.cpp:120] No whitelist given.
 Advertising offers for all slaves
 I1028 02:38:13.648022 11903 hierarchical_allocator_process.hpp:659]
 Performed allocation for 0 slaves in 8.152112733secs
 I1028 02:38:13.648339 11903 status_update_manager.cpp:197] Recovering
 status update manager
 I1028 02:38:13.648790 11903 sched.cpp:233] New master detected at
 master@67.195.81.187:39931
 I1028 02:38:13.648816 11903 sched.cpp:283] Authenticating with master
 master@67.195.81.187:39931
 I1028 02:38:13.648828 11900 slave.cpp:3456] Finished recovery
 I1028 02:38:13.649049 11894 authenticatee.hpp:133] Creating new client
 SASL connection
 I1028 02:38:13.649325 11901 master.cpp:3853] Authenticating
 scheduler-fa6faf5c-dace-42f2-a5ab-d32295e6006e@67.195.81.187:39931
 I1028 02:38:13.649443 11893 status_update_manager.cpp:171] Pausing sending
 status updates
 I1028 02:38:13.649458 11896 slave.cpp:602] New master detected at
 master@67.195.81.187:39931
 I1028 02:38:13.649487 11896 slave.cpp:665] Authenticating with master
 master@67.195.81.187:39931
 I1028 02:38:13.649515 11894 authenticator.hpp:161] Creating new server
 SASL connection
 I1028 02:38:13.649595 11896 slave.cpp:638] Detecting new master
 I1028 02:38:13.649634 11897 authenticatee.hpp:133] Creating new client
 SASL connection
 I1028 02:38:13.649688 11898 authenticatee.hpp:224] Received SASL
 authentication mechanisms: CRAM

Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2498

2014-10-28 Thread Yan Xu
Appears to be just the slowness of the build box (its disk):

I1028 02:38:46.849603  9003 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 4
I1028 02:38:46.849669  9009 slave.cpp:2522] Received ping from
slave-observer(182)@67.195.81.187:33379
I1028 02:38:46.850030  8999 hierarchical_allocator_process.hpp:442] Added
slave 20141028-023846-3142697795-33379-8981-S0 (pomona.apache.org) with
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
I1028 02:38:46.850214  9004 replica.cpp:508] Replica received write request
for position 4
I1028 02:38:46.850265  8999 hierarchical_allocator_process.hpp:734]
Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on
slave 20141028-023846-3142697795-33379-8981-S0 to framework
20141028-023846-3142697795-33379-8981-
I../../src/tests/master_authorization_tests.cpp:244: Failure
Failed to wait 10secs for offers
../../src/tests/master_authorization_tests.cpp:238: Failure
Actual function call count doesn't match EXPECT_CALL(sched,
resourceOffers(driver, _))...
 Expected: to be called at least once
   Actual: never called - unsatisfied and active`
1028 02:38:46.850360  8998 master.cpp:3086] Registered slave
20141028-023846-3142697795-33379-8981-S0 at slave(203)@67.195.81.187:33379 (
pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
I1028 02:38:46.850438  9009 slave.cpp:756] Registered with master
master@67.195.81.187:33379; given slave ID
20141028-023846-3142697795-33379-8981-S0
I1028 02:38:46.850680  9004 leveldb.cpp:343] Persisting action (16 bytes)
to leveldb took 440959ns
I1028 02:38:46.850715  8999 hierarchical_allocator_process.hpp:679]
Performed allocation for slave 20141028-023846-3142697795-33379-8981-S0 in
606258ns
I1028 02:38:51.348700  9000 master.cpp:120] No whitelist given. Advertising
offers for all slaves
I1028 02:39:07.113581  9004 replica.cpp:676] Persisted action at 4
I1028 02:39:07.114874  8996 status_update_manager.cpp:178] Resuming sending
status updates
I1028 02:39:07.115105  9009 slave.cpp:2522] Received ping from
slave-observer(182)@67.195.81.187:33379
I1028 02:39:07.115129  8999 hierarchical_allocator_process.hpp:659]
Performed allocation for 1 slaves in 235330ns
I1028 02:39:07.115429  8998 master.cpp:3795] Sending 1 offers to framework
20141028-023846-3142697795-33379-8981- (default) at
scheduler-5f51928e-4576-41a0-ac20-b629e535af9c@67.195.81.187:33379

Master sent the offer after the 10sec deadline.

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Mon, Oct 27, 2014 at 7:41 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2498/changes
 

 Changes:

 [tnachen] Added support for both 1.8 and earlier versions of svn library.

 --
 [...truncated 64049 lines...]
 I1028 02:40:10.677944  8999 replica.cpp:508] Replica received write
 request for position 5
 I1028 02:40:10.678493  8999 leveldb.cpp:343] Persisting action (159 bytes)
 to leveldb took 518803ns
 I1028 02:40:10.678514  8999 replica.cpp:676] Persisted action at 5
 I1028 02:40:10.678982  9010 replica.cpp:655] Replica received learned
 notice for position 5
 I1028 02:40:10.679489  9010 leveldb.cpp:343] Persisting action (161 bytes)
 to leveldb took 485019ns
 I1028 02:40:10.679512  9010 replica.cpp:676] Persisted action at 5
 I1028 02:40:10.679532  9010 replica.cpp:661] Replica learned APPEND action
 at position 5
 I1028 02:40:10.680330  8997 registrar.cpp:490] Successfully updated the
 'registry' in 0ns
 I1028 02:40:10.680462  9000 master.cpp:4527] Removed slave
 20141028-024010-3142697795-33379-8981-S0 (pomona.apache.org)
 I1028 02:40:10.680486  9000 master.cpp:4545] Notifying framework
 20141028-024010-3142697795-33379-8981- (default) at
 scheduler-a122c3c4-95ac-4eba-851c-4fb6dba2fce6@67.195.81.187:33379 of
 lost slave 20141028-024010-3142697795-33379-8981-S0 (pomona.apache.org)
 after recovering
 I1028 02:40:10.680572  9005 log.cpp:699] Attempting to truncate the log to
 5
 I1028 02:40:10.680577  8999 sched.cpp:686] Lost slave
 20141028-024010-3142697795-33379-8981-S0
 I1028 02:40:10.680605  8999 sched.cpp:697] Scheduler::slaveLost took
 16548ns
 I1028 02:40:10.680644  9003 coordinator.cpp:340] Coordinator attempting to
 write TRUNCATE action at position 6
 I1028 02:40:10.681084  8998 replica.cpp:508] Replica received write
 request for position 6
 I1028 02:40:10.681591  8998 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 485952ns
 I1028 02:40:10.681612  8998 replica.cpp:676] Persisted action at 6
 I1028 02:40:10.682003  9002 replica.cpp:655] Replica received learned
 notice for position 6
 I1028 02:40:10.682461  9002 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 441310ns
 I1028 02:40:10.682526

Re: Review Request 26571: Fixed ZooKeeper 3.4.5 OSX Yosemite build.

2014-10-16 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26571/#review56957
---

Ship it!


Thanks Till!

- Jiang Yan Xu


On Oct. 16, 2014, 3:13 a.m., Till Toenshoff wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26571/
 ---
 
 (Updated Oct. 16, 2014, 3:13 a.m.)
 
 
 Review request for mesos.
 
 
 Bugs: MESOS-1797
 https://issues.apache.org/jira/browse/MESOS-1797
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Fixes build issue caused by htonll introduced by OSX 10.10 (Yosemite).
 
 This poses as a hotfix to prevent us from having to upgrade the bundled 
 ZooKeeper altogether to prevent longer regression testing. Once ZooKeeper 
 does get updated into a recent version (3.4.x = 3.4.7, 3.5.x = 3.5.1 or = 
 3.6.0), this patch should get removed.
 
 
 Diffs
 -
 
   3rdparty/zookeeper-3.4.5.patch PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26571/diff/
 
 
 Testing
 ---
 
 make check (OSX 10.10 BETA5)
 
 
 Thanks,
 
 Till Toenshoff
 




Re: Review Request 25663: MESOS-1392: MasterDetector now returns a None when it cannot read the content of the ZNode it has detected.

2014-09-17 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25663/
---

(Updated Sept. 17, 2014, 11:12 a.m.)


Review request for mesos and Ben Mahler.


Changes
---

New version: now Group::data() returns FutureOptionstring  instead of 
Futurestring to capture the NONODE case.


Bugs: MESOS-1392
https://issues.apache.org/jira/browse/MESOS-1392


Repository: mesos-git


Description
---

See summary.


Diffs (updated)
-

  src/log/network.hpp fc85a57a38f89190fe246f16cd1fde3168d70613 
  src/master/detector.cpp 6436b8ee7e1ab6451a6b999a1cfbb2f79190e6ca 
  src/tests/group_tests.cpp 7ed98956181f167e16cd723c049738f1c217c73b 
  src/zookeeper/group.hpp 16f9b7b390551402e3c1eddaf5657aa18766b47c 
  src/zookeeper/group.cpp 58491c01052b68ddaee6af32f33192d5a1f20e58 

Diff: https://reviews.apache.org/r/25663/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Re: Review Request 25663: MESOS-1392: MasterDetector now returns a None when it cannot read the content of the ZNode it has detected.

2014-09-17 Thread Jiang Yan Xu


 On Sept. 16, 2014, 11:20 a.m., Ben Mahler wrote:
  src/master/detector.cpp, lines 396-404
  https://reviews.apache.org/r/25663/diff/1/?file=689921#file689921line396
 
  Any way to explicitly ignore the no node case?
  
  Here we're ignoring all failures, which is a bit scary, since it's not 
  obvious why the more serious errors would be caught elsewhere.
  
  I haven't put a lot of thought into this, could we return a 
  FutureOptionstring  data? Or should we consider wrapping our things in 
  some kind of ZKOperationstring which lets us look at the various error 
  cases?
  
  Happy to chat further!

Yeah I was (unnecessarily) concerned about the ambiguity of returning a None if 
the method is changed to: ResultOptionstring  GroupProcess::doData(const 
Group::Membership membership).
Group needs to interpret the three different cases from the return value:
1. retryable error - retry
2. nonode - None
3. non-retryable error - Failure

It turns out it's not that hard and we have precedence in the code base: 
https://github.com/apache/mesos/blob/190e87c51d25646fa501ffca0bf7150157982050/src/state/zookeeper.cpp#L427

So I revised the review to have Group return a None in the case of NONODE.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25663/#review53558
---


On Sept. 17, 2014, 11:12 a.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25663/
 ---
 
 (Updated Sept. 17, 2014, 11:12 a.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Bugs: MESOS-1392
 https://issues.apache.org/jira/browse/MESOS-1392
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/log/network.hpp fc85a57a38f89190fe246f16cd1fde3168d70613 
   src/master/detector.cpp 6436b8ee7e1ab6451a6b999a1cfbb2f79190e6ca 
   src/tests/group_tests.cpp 7ed98956181f167e16cd723c049738f1c217c73b 
   src/zookeeper/group.hpp 16f9b7b390551402e3c1eddaf5657aa18766b47c 
   src/zookeeper/group.cpp 58491c01052b68ddaee6af32f33192d5a1f20e58 
 
 Diff: https://reviews.apache.org/r/25663/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu
 




Re: Review Request 25663: MESOS-1392: MasterDetector now returns a None when it cannot read the content of the ZNode it has detected.

2014-09-17 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25663/
---

(Updated Sept. 17, 2014, 12:26 p.m.)


Review request for mesos and Ben Mahler.


Changes
---

Minor styling fix.


Bugs: MESOS-1392
https://issues.apache.org/jira/browse/MESOS-1392


Repository: mesos-git


Description
---

See summary.


Diffs (updated)
-

  src/log/network.hpp fc85a57a38f89190fe246f16cd1fde3168d70613 
  src/master/detector.cpp 6436b8ee7e1ab6451a6b999a1cfbb2f79190e6ca 
  src/tests/group_tests.cpp 7ed98956181f167e16cd723c049738f1c217c73b 
  src/zookeeper/group.hpp 16f9b7b390551402e3c1eddaf5657aa18766b47c 
  src/zookeeper/group.cpp 58491c01052b68ddaee6af32f33192d5a1f20e58 

Diff: https://reviews.apache.org/r/25663/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Review Request 25663: MESOS-1392: MasterDetector now returns a None when it cannot read the content of the ZNode it has detected.

2014-09-15 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25663/
---

Review request for mesos and Ben Mahler.


Bugs: MESOS-1392
https://issues.apache.org/jira/browse/MESOS-1392


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/master/detector.cpp 6436b8ee7e1ab6451a6b999a1cfbb2f79190e6ca 

Diff: https://reviews.apache.org/r/25663/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Re: Review Request 25588: Fixed flaky MasterTest.LaunchDuplicateOfferTest.

2014-09-12 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25588/#review53223
---

Ship it!


Thanks Niklas for fixing additional tests!

- Jiang Yan Xu


On Sept. 12, 2014, 10:46 a.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25588/
 ---
 
 (Updated Sept. 12, 2014, 10:46 a.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: mesos-1783
 https://issues.apache.org/jira/browse/mesos-1783
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 A couple of races could occur in the launch tasks on multiple offers tests 
 where recovered resources from purposely-failed invocations turned into a 
 subsequent resource offer and oversaturated the expect's.
 
 
 Diffs
 -
 
   src/tests/master_tests.cpp 3d080b2 
 
 Diff: https://reviews.apache.org/r/25588/diff/
 
 
 Testing
 ---
 
 make check - however, I haven't been able to provoke the actual fault but 
 verified that the subsequent offers could occur (by hand).
 
 
 Thanks,
 
 Niklas Nielsen
 




Re: Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/
---

(Updated Sept. 10, 2014, 11 a.m.)


Review request for mesos and Ben Mahler.


Changes
---

Minor fix per Dominic's review.


Bugs: MESOS-1676
https://issues.apache.org/jira/browse/MESOS-1676


Repository: mesos-git


Description
---

- On slower machines sometimes the zookeeper c client times out where we aren't 
expecting because either the test server or the client is too slow to respond. 
Increasing this value helps mitigate the problem.
- The effect of server-shutdownNetwork() is immediate so this won't prolong 
the tests so long as they don't wait for session expiration without clock 
advances, which I have checked and there is none.


Diffs (updated)
-

  src/tests/master_contender_detector_tests.cpp 
9ac59aa446a132e734238e0e55801117c4ef31b4 
  src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 

Diff: https://reviews.apache.org/r/25487/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Review Request 25511: Pulled the log line in ZooKeeperTestServer::shutdownNetwork() to above the shutdown call.

2014-09-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25511/
---

Review request for mesos and Ben Mahler.


Repository: mesos-git


Description
---

- When debugging zookeeper related tests it's often more useful to know when 
the tests is about to shut down the ZK server to reason about the order of 
events. Otherwise client disconnections are often logged before this shutdown 
line and can be confusing.


Diffs
-

  src/tests/zookeeper_test_server.cpp a8c9b1cd8a546abdeb4d89a8fe9ebc3b3d577665 

Diff: https://reviews.apache.org/r/25511/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Re: Review Request 25516: Fixed authorization tests to properly deal with registration retries.

2014-09-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25516/#review52960
---

Ship it!


Ship It!

- Jiang Yan Xu


On Sept. 10, 2014, 12:55 p.m., Vinod Kone wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25516/
 ---
 
 (Updated Sept. 10, 2014, 12:55 p.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: MESOS-1760 and MESOS-1766
 https://issues.apache.org/jira/browse/MESOS-1760
 https://issues.apache.org/jira/browse/MESOS-1766
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Since the authorization tests do not control the retry behavior of the 
 scheduler driver, it is possible for the driver to retry registrations and 
 thus 'register_framework' authorizations. The MockAuthorizer needs to account 
 for this by allowing all subsequent authorization attempts.
 
 
 Diffs
 -
 
   src/tests/master_authorization_tests.cpp 
 b9aa7bf4f53e414d84f8cf4e020a645db8e5d855 
 
 Diff: https://reviews.apache.org/r/25516/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Vinod Kone
 




Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2358

2014-09-09 Thread Yan Xu
this is https://issues.apache.org/jira/browse/MESOS-1766

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Fri, Sep 5, 2014 at 8:53 AM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2358/changes
 

 Changes:

 [tstclair] Minor update to include package config file

 --
 [...truncated 57484 lines...]
 I0905 15:53:16.220577 25788 replica.cpp:676] Persisted action at 1
 I0905 15:53:16.220588 25788 replica.cpp:661] Replica learned APPEND action
 at position 1
 I0905 15:53:16.221040 25794 registrar.cpp:479] Successfully updated
 'registry'
 I0905 15:53:16.221119 25795 log.cpp:699] Attempting to truncate the log to
 1
 I0905 15:53:16.221146 25794 registrar.cpp:372] Successfully recovered
 registrar
 I0905 15:53:16.221195 25791 coordinator.cpp:340] Coordinator attempting to
 write TRUNCATE action at position 2
 I0905 15:53:16.221336 25797 master.cpp:1063] Recovered 0 slaves from the
 Registry (102B) ; allowing 10mins for slaves to re-register
 I0905 15:53:16.221873 25795 replica.cpp:508] Replica received write
 request for position 2
 I0905 15:53:16.61 25795 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 362390ns
 I0905 15:53:16.81 25795 replica.cpp:676] Persisted action at 2
 I0905 15:53:16.222586 25789 replica.cpp:655] Replica received learned
 notice for position 2
 I0905 15:53:16.222740 25789 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 129933ns
 I0905 15:53:16.222772 25789 leveldb.cpp:401] Deleting ~1 keys from leveldb
 took 14255ns
 I0905 15:53:16.222786 25789 replica.cpp:676] Persisted action at 2
 I0905 15:53:16.222796 25789 replica.cpp:661] Replica learned TRUNCATE
 action at position 2
 I0905 15:53:16.376282 25769 sched.cpp:137] Version: 0.21.0
 I0905 15:53:16.376565 25789 sched.cpp:233] New master detected at
 master@67.195.81.186:49188
 I0905 15:53:16.376590 25789 sched.cpp:283] Authenticating with master
 master@67.195.81.186:49188
 I0905 15:53:16.376866 25784 authenticatee.hpp:128] Creating new client
 SASL connection
 I0905 15:53:16.376965 25784 master.cpp:3637] Authenticating
 scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
 I0905 15:53:16.377059 25796 authenticator.hpp:156] Creating new server
 SASL connection
 I0905 15:53:16.377255 25789 authenticatee.hpp:219] Received SASL
 authentication mechanisms: CRAM-MD5
 I0905 15:53:16.377290 25789 authenticatee.hpp:245] Attempting to
 authenticate with mechanism 'CRAM-MD5'
 I0905 15:53:16.377455 25793 authenticator.hpp:262] Received SASL
 authentication start
 I0905 15:53:16.377508 25793 authenticator.hpp:384] Authentication requires
 more steps
 I0905 15:53:16.377614 25786 authenticatee.hpp:265] Received SASL
 authentication step
 I0905 15:53:16.377678 25786 authenticator.hpp:290] Received SASL
 authentication step
 I0905 15:53:16.377699 25786 auxprop.cpp:81] Request to lookup properties
 for user: 'test-principal' realm: 'penates.apache.org' server FQDN: '
 penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
 SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
 I0905 15:53:16.377710 25786 auxprop.cpp:153] Looking up auxiliary property
 '*userPassword'
 I0905 15:53:16.377723 25786 auxprop.cpp:153] Looking up auxiliary property
 '*cmusaslsecretCRAM-MD5'
 I0905 15:53:16.377737 25786 auxprop.cpp:81] Request to lookup properties
 for user: 'test-principal' realm: 'penates.apache.org' server FQDN: '
 penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
 SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
 I0905 15:53:16.377745 25786 auxprop.cpp:103] Skipping auxiliary property
 '*userPassword' since SASL_AUXPROP_AUTHZID == true
 I0905 15:53:16.377753 25786 auxprop.cpp:103] Skipping auxiliary property
 '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
 I0905 15:53:16.377768 25786 authenticator.hpp:376] Authentication success
 I0905 15:53:16.377856 25798 authenticatee.hpp:305] Authentication success
 I0905 15:53:16.377874 25796 master.cpp:3677] Successfully authenticated
 principal 'test-principal' at
 scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
 I0905 15:53:16.378038 25798 sched.cpp:357] Successfully authenticated with
 master master@67.195.81.186:49188
 I0905 15:53:16.378059 25798 sched.cpp:476] Sending registration request to
 master@67.195.81.186:49188
 I0905 15:53:16.378139 25794 master.cpp:1324] Received registration request
 from scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
 I0905 15:53:16.378170 25794 master.cpp:1284] Authorizing framework
 principal 'test-principal' to receive offers for role '*'
 I0905 15:53:16.378247 25794 master.cpp:1383] Registering framework
 20140905-155316-3125920579-49188-25769- at
 scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
 I0905 15:53:16.378340 25783 sched.cpp:407] Framework

Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-09 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/
---

Review request for mesos and Ben Mahler.


Bugs: MESOS-1676
https://issues.apache.org/jira/browse/MESOS-1676


Repository: mesos-git


Description
---

- On slower machines sometimes the zookeeper c client times out where we aren't 
expecting because either the test server or the client is too slow to respond. 
Increasing this value helps mitigate the problem.
- The effect of server-shutdownNetwork() is immediate so this won't prolong 
the tests so long as they don't wait for session expiration without clock 
advances, which I have checked and there is none.


Diffs
-

  src/tests/master_contender_detector_tests.cpp 
9ac59aa446a132e734238e0e55801117c4ef31b4 
  src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 

Diff: https://reviews.apache.org/r/25487/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu



Re: Review Request 25302: Count pending tasks as staging in the slave.

2014-09-03 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25302/#review52204
---

Ship it!


Ship It!

- Jiang Yan Xu


On Sept. 3, 2014, 10:21 a.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25302/
 ---
 
 (Updated Sept. 3, 2014, 10:21 a.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: MESOS-1716
 https://issues.apache.org/jira/browse/MESOS-1716
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/slave/slave.cpp 5c76dd1b9d3f7d262053aa4c20ebc2e8a00a0f4e 
 
 Diff: https://reviews.apache.org/r/25302/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 25303: Fixed a bug in the staging tasks metric.

2014-09-03 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25303/#review52206
---

Ship it!


Ship It!

- Jiang Yan Xu


On Sept. 3, 2014, 10:21 a.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25303/
 ---
 
 (Updated Sept. 3, 2014, 10:21 a.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: MESOS-1716
 https://issues.apache.org/jira/browse/MESOS-1716
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Tasks that are sent to the executor can still be in the staging state, not 
 counting these was a bug!
 
 
 Diffs
 -
 
   src/slave/slave.cpp 5c76dd1b9d3f7d262053aa4c20ebc2e8a00a0f4e 
 
 Diff: https://reviews.apache.org/r/25303/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 25304: Removed an unnecessarily introduced Option.

2014-09-03 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25304/#review52209
---

Ship it!


Ship It!

- Jiang Yan Xu


On Sept. 3, 2014, 10:21 a.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25304/
 ---
 
 (Updated Sept. 3, 2014, 10:21 a.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Looking back through the commit history, this Option was introduced as a 
 dependency for some changes that were never committed.
 
 This complicates the resource accounting I need to do to prevent overcommit, 
 so I'm reverting the use of an Option here.
 
 
 Diffs
 -
 
   src/slave/http.cpp 39f840026595547c36f2c8b630ea8bcfc7b8dfbc 
   src/slave/slave.hpp 9d4607ef126f40ade9c861e3ea0eb41f10a3dff9 
   src/slave/slave.cpp 5c76dd1b9d3f7d262053aa4c20ebc2e8a00a0f4e 
 
 Diff: https://reviews.apache.org/r/25304/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Ben Mahler
 




Review Request 25195: Fixed MasterZooKeeperTest.LostZooKeeperCluster

2014-08-29 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25195/
---

Review request for mesos and Jie Yu.


Repository: mesos-git


Description
---

- Should have placed the FUTURE_MESSAGE that attempts to capture this
messages before the slave starts.


Diffs
-

  src/tests/master_tests.cpp 9de242432afd404fc820b3a99daca1043e43a844 

Diff: https://reviews.apache.org/r/25195/diff/


Testing
---

Ran MasterZooKeeperTest.LostZooKeeperCluster 1000 for iterations.


Thanks,

Jiang Yan Xu



Re: Review Request 24667: Added a user doc for framework rate limiting.

2014-08-14 Thread Jiang Yan Xu


 On Aug. 13, 2014, 4:44 p.m., Vinod Kone wrote:
  docs/framework-rate-limiting.md, line 61
  https://reviews.apache.org/r/24667/diff/1/?file=659575#file659575line61
 
  what do you mean by nature of the frameworks? do you mean messages 
  generated by the frameworks?

Removed these sentences.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24667/#review50526
---


On Aug. 14, 2014, 3:16 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24667/
 ---
 
 (Updated Aug. 14, 2014, 3:16 p.m.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1683
 https://issues.apache.org/jira/browse/MESOS-1683
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   docs/framework-rate-limiting.md PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24667/diff/
 
 
 Testing
 ---
 
 Rendered version: https://gist.github.com/xujyan/d2bdd2052fac489fb1a9
 
 
 Thanks,
 
 Jiang Yan Xu
 




Re: Review Request 24667: Added a user doc for framework rate limiting.

2014-08-14 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24667/
---

(Updated Aug. 14, 2014, 3:16 p.m.)


Review request for mesos and Vinod Kone.


Bugs: MESOS-1683
https://issues.apache.org/jira/browse/MESOS-1683


Repository: mesos-git


Description
---

See summary.


Diffs (updated)
-

  docs/framework-rate-limiting.md PRE-CREATION 

Diff: https://reviews.apache.org/r/24667/diff/


Testing
---

Rendered version: https://gist.github.com/xujyan/d2bdd2052fac489fb1a9


Thanks,

Jiang Yan Xu



Review Request 24667: Added a user doc for framework rate limiting.

2014-08-13 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24667/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-1683
https://issues.apache.org/jira/browse/MESOS-1683


Repository: mesos-git


Description
---

See summary.


Diffs
-

  docs/framework-rate-limiting.md PRE-CREATION 

Diff: https://reviews.apache.org/r/24667/diff/


Testing
---

Rendered version: https://gist.github.com/xujyan/d2bdd2052fac489fb1a9


Thanks,

Jiang Yan Xu



Re: Review Request 24582: Removed unused test file 'process_spawn.cpp'.

2014-08-11 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24582/#review50265
---

Ship it!


Ship It!

- Jiang Yan Xu


On Aug. 11, 2014, 4:32 p.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24582/
 ---
 
 (Updated Aug. 11, 2014, 4:32 p.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 This file was not referenced by anything.
 
 
 Diffs
 -
 
   src/tests/process_spawn.cpp db07ba0e5ddefe6e03683102c3696e46f862168f 
 
 Diff: https://reviews.apache.org/r/24582/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 24583: Added a missing test target in Makefile.am.

2014-08-11 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24583/#review50266
---

Ship it!


So 'values_tests.cpp' was written but never run before?

- Jiang Yan Xu


On Aug. 11, 2014, 4:32 p.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24583/
 ---
 
 (Updated Aug. 11, 2014, 4:32 p.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 I've added 'values_tests.cpp' to the targets, also alphabetized the list.
 
 
 Diffs
 -
 
   src/Makefile.am 39af0365e429b8d08addadb09ee18080a19625f8 
 
 Diff: https://reviews.apache.org/r/24583/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 24343: Improved framework rate limiting by imposing the max number of outstanding messages per framework principal.

2014-08-07 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24343/
---

(Updated Aug. 6, 2014, 11:26 p.m.)


Review request for mesos, Ben Mahler and Vinod Kone.


Changes
---

Vinod's comments. No need for review.


Bugs: MESOS-1578
https://issues.apache.org/jira/browse/MESOS-1578


Repository: mesos-git


Description
---

See summary.


Diffs (updated)
-

  include/mesos/mesos.proto 628cce12d2fae645d2ef55e4809631ca03a56207 
  src/examples/load_generator_framework.cpp 
7d94c49cf91bf327ac80f04d9f1a7370996b6ba4 
  src/master/master.hpp d8a4d9e04ecff60020b99ea6447055787d187797 
  src/master/master.cpp 97e4340f7949a261558f09ea533aac0bbb0e40f6 
  src/tests/rate_limiting_tests.cpp fc23a1946ad1a78e699552440df2193ea10dc472 

Diff: https://reviews.apache.org/r/24343/diff/


Testing
---

make check

./bin/mesos-tests.sh --verbose --gtest_filter=*RateLimit* --gtest_repeat=1000


Thanks,

Jiang Yan Xu



Re: Build failed in Jenkins: Mesos-Ubuntu-distcheck #256

2014-08-07 Thread Yan Xu
This is fixed now:
https://github.com/apache/mesos/commit/8c4f45d67be22cfe252ad6ed27a79ad4a1f972c6

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan


On Thu, Aug 7, 2014 at 3:42 AM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See https://builds.apache.org/job/Mesos-Ubuntu-distcheck/256/changes

 Changes:

 [yan] Improved framework rate limiting by imposing the max number of
 outstanding messages per framework principal.

 --
 [...truncated 42275 lines...]
 I0807 10:41:07.131455 13721 log.cpp:680] Attempting to append 139 bytes
 to the log
 I0807 10:41:07.131508 13721 coordinator.cpp:340] Coordinator attempting to
 write APPEND action at position 1
 I0807 10:41:07.131765 13721 replica.cpp:508] Replica received write
 request for position 1
 I0807 10:41:07.136440 13721 leveldb.cpp:343] Persisting action (158
 bytes) to leveldb took 4.654646ms
 I0807 10:41:07.136468 13721 replica.cpp:676] Persisted action at 1
 I0807 10:41:07.140547 13721 replica.cpp:655] Replica received learned
 notice for position 1
 I0807 10:41:07.144439 13721 leveldb.cpp:343] Persisting action (160 bytes)
 to leveldb took 3.871337ms
 I0807 10:41:07.144465 13721 replica.cpp:676] Persisted action at 1
 I0807 10:41:07.144479 13721 replica.cpp:661] Replica learned APPEND action
 at position 1
 I0807 10:41:07.144779 13721 registrar.cpp:479] Successfully updated
 'registry'
 I0807 10:41:07.144825 13721 registrar.cpp:372] Successfully recovered
 registrar
 I0807 10:41:07.144876 13721 log.cpp:699] Attempting to truncate the log to
 1
 I0807 10:41:07.144959 13721 master.cpp:1044] Recovered 0 slaves from the
 Registry (101B) ; allowing 10mins for slaves to re-register
 I0807 10:41:07.145009 13721 coordinator.cpp:340] Coordinator attempting to
 write TRUNCATE action at position 2
 I0807 10:41:07.145287 13721 replica.cpp:508] Replica received write
 request for position 2
 I0807 10:41:07.152439 13721 leveldb.cpp:343] Persisting action (16 bytes)
 to leveldb took 7.132694ms
 I0807 10:41:07.152467 13721 replica.cpp:676] Persisted action at 2
 I0807 10:41:07.160706 13723 replica.cpp:655] Replica received learned
 notice for position 2
 I0807 10:41:07.161252 13723 leveldb.cpp:343] Persisting action (18 bytes)
 to leveldb took 524024ns
 I0807 10:41:07.161283 13723 leveldb.cpp:401] Deleting ~1 keys from
 leveldb took 13607ns
 I0807 10:41:07.161296 13723 replica.cpp:676] Persisted action at 2
 I0807 10:41:07.161306 13723 replica.cpp:661] Replica learned TRUNCATE
 action at position 2
 I0807 10:41:07.173075 13689 containerizer.cpp:124] Using isolation:
 posix/cpu,posix/mem
 I0807 10:41:07.175858 13718 slave.cpp:167] Slave started on 128)@
 67.195.81.187:52767
 I0807 10:41:07.175883 13718 credentials.hpp:84] Loading credential for
 authentication from
 '/tmp/ResourceOffersTest_TaskUsesMoreResourcesThanOffered_7sF3nY/credential'
 I0807 10:41:07.175969 13718 slave.cpp:265] Slave using credential for:
 test-principal
 I0807 10:41:07.176086 13718 slave.cpp:278] Slave resources: cpus(*):2;
 mem(*):1024; disk(*):1024; ports(*):[31000-32000]
 I0807 10:41:07.176162 13718 slave.cpp:306] Slave hostname:
 pomona.apache.org
 I0807 10:41:07.176179 13718 slave.cpp:307] Slave checkpoint: false
 I0807 10:41:07.176616 13718 state.cpp:33] Recovering state from
 '/tmp/ResourceOffersTest_TaskUsesMoreResourcesThanOffered_7sF3nY/meta'
 I0807 10:41:07.176722 13718 status_update_manager.cpp:193] Recovering
 status update manager
 I0807 10:41:07.176794 13718 containerizer.cpp:287] Recovering containerizer
 I0807 10:41:07.177069 13718 slave.cpp:3175] Finished recovery
 I0807 10:41:07.177332 13718 slave.cpp:589] New master detected at
 master@67.195.81.187:52767
 I0807 10:41:07.177366 13718 slave.cpp:663] Authenticating with master
 master@67.195.81.187:52767
 I0807 10:41:07.177417 13718 slave.cpp:636] Detecting new master
 I0807 10:41:07.177467 13718 status_update_manager.cpp:167] New master
 detected at master@67.195.81.187:52767
 I0807 10:41:07.177515 13718 authenticatee.hpp:128] Creating new client
 SASL connection
 I0807 10:41:07.177620 13718 master.cpp:3605] Authenticating slave(128)@
 67.195.81.187:52767
 I0807 10:41:07.177719 13718 authenticator.hpp:156] Creating new server
 SASL connection
 I0807 10:41:07.177793 13718 authenticatee.hpp:219] Received SASL
 authentication mechanisms: CRAM-MD5
 I0807 10:41:07.177819 13718 authenticatee.hpp:245] Attempting to
 authenticate with mechanism 'CRAM-MD5'
 I0807 10:41:07.177858 13718 authenticator.hpp:262] Received SASL
 authentication start
 I0807 10:41:07.177902 13718 authenticator.hpp:384] Authentication requires
 more steps
 I0807 10:41:07.177947 13718 authenticatee.hpp:265] Received SASL
 authentication step
 I0807 10:41:07.178001 13718 authenticator.hpp:290] Received SASL
 authentication step
 I0807 10:41:07.178026 13718 auxprop.cpp:81] Request to lookup properties
 for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: '
 pomona.apache.org

Re: Review Request 23700: Added 'timed_tests.sh' script to help investigate the cause of hanging tests.

2014-08-07 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23700/
---

(Updated Aug. 7, 2014, 11:27 a.m.)


Review request for mesos and Vinod Kone.


Changes
---

Removed the 'make command' option, added a TODO for making it OSX compatible.


Bugs: MESOS-1559
https://issues.apache.org/jira/browse/MESOS-1559


Repository: mesos-git


Description
---

Example usage:

./support/timed_tests.sh -m make check GTEST_FILTER= MESOS_VERBOSE=1 make 
check GTEST_SHUFFLE=1 $((120 * 60))

# Bypass the 'make' stage.
./support/timed_tests.sh MESOS_VERBOSE=1 make check GTEST_SHUFFLE=1 3600

It works by setting a new sid for the test script so that all subprocesses are 
in this new session.


Diffs (updated)
-

  support/timed_tests.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/23700/diff/


Testing
---

Tested on Linux with and without Jenkins.


Thanks,

Jiang Yan Xu



  1   2   3   4   5   6   7   8   >