Deprecating `--disable-zlib` in libprocess

2017-08-08 Thread Chun-Hung Hsiao
Hi all, In libprocess, we have an optional `--disable-zlib` flag, but it's currently not used for conditional compilation and we always use zlib in libprocess, and there's a requirement check in Mesos to make sure that zlib exists. Should this option be removed then? Or is there anyone working on

Adding process::Executor::execute()

2017-09-11 Thread Chun-Hung Hsiao
Hi, I'm thinking about extending `process::Executor` with a new `execute()` interface. The need of this new interface surfaced when I'm working on https://issues.apache.org/jira/browse/MESOS-7964 Summary: 1. A disk GC might execute multiple `rmdirs` callbacks, and some of them are heavy duty. We

Re: Adding process::Executor::execute()

2017-09-12 Thread Chun-Hung Hsiao
still tie up a worker thread, > but only one of them. > > Either way it makes sense to add `process::Executor::execute()`. I'm > happy to shepherd that for you Chun, send me a patch! > > On Mon, Sep 11, 2017 at 7:32 PM, Chun-Hung Hsiao <chhs...@mesosphere.io> > wrote: >

Re: [VOTE] Release Apache Mesos 1.6.0 (rc1)

2018-05-10 Thread Chun-Hung Hsiao
+1 (binding) Tested on our internal CI (sudo make check) on Mac, CentOS 6/7, Debian 8/9 and Ubuntu 14/16/17, with gRPC/SSL disabled/enabled. Also manually tested "make distcheck" w/ autotools, and "ninja check" w/ CMake on Mac and CentOS 7 with gRPC enabled. Observed the following failures:

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

2018-05-16 Thread Chun-Hung Hsiao
16, 2018 at 6:18 PM, Vinod Kone <vi...@mesosphere.io> wrote: > Can you paste some logs here too if you have? > > On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) <j...@apache.org> > wrote: > > > > > [ https://issues

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

2018-05-16 Thread Chun-Hung Hsiao
Unfortunately I don't have any log. IIRC the executor received the the `KILL` event because this is printed: On Wed, May 16, 2018 at 6:18 PM, Vinod Kone <vi...@mesosphere.io> wrote: > Can you paste some logs here too if you have? > > On Wed, May 16, 2018 at 5:53 PM, Chun-Hung

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

2018-05-16 Thread Chun-Hung Hsiao
when I examined it. On Wed, May 16, 2018 at 6:57 PM, Chun-Hung Hsiao <chhs...@mesosphere.io> wrote: > Unfortunately I don't have the log right now. IIRC the executor received > the `KILL` event because the log I saw contained this line: > https://github.com/ap

[Design Doc] External Resource Provider and CSI

2018-06-11 Thread Chun-Hung Hsiao
Folks, As a natural extension to prior work [1, 2] to improve storage support in Mesos, I'm working on the general design of external resource providers, and the specific design for external storage support through CSI [3]. The goal is to enable Mesos to manage cluster-wide resources such as EBS

Re: [VOTE] Release Apache Mesos 1.6.1 (rc1)

2018-06-29 Thread Chun-Hung Hsiao
-1 on https://issues.apache.org/jira/browse/MESOS-8830. This is a critical bug that would wipe out persistent data. I'm backporting this to 1.4, 1.5 and 1.6. On Fri, Jun 29, 2018 at 9:05 AM Greg Mann wrote: > The failures here are mostly command executor/default executor tests. > Looking at

Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-05 Thread Chun-Hung Hsiao
+1 (non-binding) Tested with `make distcheck` with grpc disabled and enabled on mac. Tested with `make distcheck DISTCHECK_CONFIGURE_FLAGS='--enable-grpc'` on centos 7. On Mon, Feb 5, 2018 at 8:33 PM, Vinod Kone wrote: > +1 (binding) > > Tested on ASF CI. The red builds

Re: API working group

2018-02-13 Thread Chun-Hung Hsiao
I'm in. Especially, I'd like to continue the work of adapting gRPC into libprocess, so we could have a gRPC-based API!

Re: 1.7 release manager?

2018-07-25 Thread Chun-Hung Hsiao
drafting the announcements. > > -Gastón > > On Tue, Jul 17, 2018 at 2:26 PM Vinod Kone wrote: > > > +dev > > > > -- Vinod > > > > > > On Tue, Jul 17, 2018 at 4:20 PM Chun-Hung Hsiao > > wrote: > > > > > I could volunteer unl

Mesos 1.7.0 Release

2018-08-06 Thread Chun-Hung Hsiao
Hi folks, We are considering to cut the 1.7.0 release on Monday, August 13th since there are not many blocker or critical issues targeting 1.7.0: We currently have 1 blocker, 1 critical issue and 45 major issues on the 1.7.0 release dashboard

Update: Mesos 1.7.0 Release

2018-08-13 Thread Chun-Hung Hsiao
Hi folks, I just created a new 1.7.x branch from the master. If you are committing patches for any 1.7.0 issues, please backport them to the 1.7.x branch and update the CHANGELOG. Currently there are still 12 unresolved issues targeting 1.7.0:

[VOTE] Release Apache Mesos 1.7.0 (rc1)

2018-08-21 Thread Chun-Hung Hsiao
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.7.0. 1.7.0 includes the following: * Performance Improvements: * Master `/state` endpoint: ~130% throughput improvement through RapidJSON

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-08-23 Thread Chun-Hung Hsiao
I'm a bit concerned about the recovery logic and backward compatibility: The changes we're making shouldn't affect existing users, and we should try hard to avoid any future backward compatibility problem. Say if there is already some custom framework using task ID 'Hello%3AWorld', then if we

[VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-24 Thread Chun-Hung Hsiao
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.7.0. 1.7.0 includes the following: * Performance Improvements: * Master `/state` endpoint: ~130% throughput improvement through RapidJSON

Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-24 Thread Chun-Hung Hsiao
Hi all, Since there will be a weekend during the vote period, the vote will be open until Wed Aug 29 23:59:59 PDT 2018, so we can have more time testing. Best, Chun-Hung On Fri, Aug 24, 2018 at 4:42 PM Chun-Hung Hsiao wrote: > Hi all, > > Please vote on releasing the following

Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-28 Thread Chun-Hung Hsiao
Folks, This is a gentle reminder for 1.7.0-rc2. The vote is open until Wed Aug 29 23:59:59 PDT 2018 and passes if a majority of at least 3 +1 PMC votes are cast. Thanks! On Fri, Aug 24, 2018, 4:45 PM Chun-Hung Hsiao wrote: > Hi all, > > Since there will be a weekend during the vo

Re: [VOTE] Release Apache Mesos 1.7.0 (rc1)

2018-08-22 Thread Chun-Hung Hsiao
Hi all, The URL for the JAR in the previous email is incorrect. The JAR is in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1232 Thanks, Chun-Hung On Tue, Aug 21, 2018 at 7:34 PM Chun-Hung Hsiao wrote: > Hi all, > > Please vote on

Re: Backport Policy

2018-07-17 Thread Chun-Hung Hsiao
I just have a comment on a special case: If a fix for a flaky test is easy to backport, IMO we probably should backport it, otherwise if someone backports another critical fix in the future, it would take them extra effort to check all CI failures. On Mon, Jul 16, 2018 at 11:39 AM Vinod Kone

Re: Operations Working Group - First Meeting

2018-07-17 Thread Chun-Hung Hsiao
Unfortunately the time conflicts with the CSI community sync so I'll have to skip :( On Tue, Jul 17, 2018 at 2:55 AM Abel Souza wrote: > Thank you for setting this up Gaston, > > Would you mind giving us a brief of what you have in mind for discussion? > > Thank you, > > Abel > > On 07/17/2018

Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-30 Thread Chun-Hung Hsiao
nce you already have the fix. > > Thanks, > Vinod > > > On Aug 29, 2018, at 8:44 PM, Chun-Hung Hsiao > wrote: > > > > I found two issues when compiling with clang 3.5: > > > > 1. The `-Wno-inconsistent-missing-override` option added in > https://reviews.apa

Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-29 Thread Chun-Hung Hsiao
like to ask for some feedbacks. Thanks! On Wed, Aug 29, 2018 at 10:18 AM James Peach wrote: > +1 (binding) > > Built and tested on Fedora 28 (clang). > > On Aug 24, 2018, at 4:42 PM, Chun-Hung Hsiao > wrote: > > Hi all, > > Please vote on releasing the following c

Re: [VOTE] Release Apache Mesos 1.6.1 (rc2)

2018-07-12 Thread Chun-Hung Hsiao
Seems you missed MESOS-9049. And this seems not just a bug fix release because of MESOS-8934? ;) On Wed, Jul 11, 2018, 9:37 PM Greg Mann wrote: > Whoops, I forgot to include the list of changes included in this release - > sorry! > > 1.6.1-rc2 includes the following notable bug fixes: > > *

Re: [VOTE] Release Apache Mesos 1.6.1 (rc2)

2018-07-13 Thread Chun-Hung Hsiao
ist to make it more digestible and provided a list of > "notable" bug fixes. The entire list of changes can be found in the > CHANGELOG. > > On Thu, Jul 12, 2018, 7:56 AM Chun-Hung Hsiao > wrote: > >> Seems you missed MESOS-9049. And this seems not just a bug

Re: [VOTE] Release Apache Mesos 1.5.0 (rc1)

2018-01-23 Thread Chun-Hung Hsiao
-1 for https://issues.apache.org/jira/browse/MESOS-8481 On Tue, Jan 23, 2018 at 9:38 AM, Jie Yu wrote: > +1 > > Verified in our internal CI that `sudo make check` passed in CentOS 6, > CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL enabled). > > - Jie > >

Re: Welcome Chun-Hung Hsiao as Mesos Committer and PMC Member

2018-03-12 Thread Chun-Hung Hsiao
Just a heads up, >>> don't let >>> > > that happen to you too! >>> > > >>> > > I look forward to continuing to work with you. >>> > > >>> > > Cheers, >>> > > >>> > > Andy >>>

Re: Welcome Zhitao Li as Mesos Committer and PMC Member

2018-03-12 Thread Chun-Hung Hsiao
Congrats Zhitao! On Mon, Mar 12, 2018 at 2:51 PM, Benjamin Mahler wrote: > Welcome Zhitao! Thanks for your contributions so far > > On Mon, Mar 12, 2018 at 2:02 PM, Gilbert Song wrote: > > > Hi, > > > > I am excited to announce that the PMC has voted

Convention for Backward Compatibility for New Operations in Mesos 1.6

2018-04-16 Thread Chun-Hung Hsiao
Hi all, As some might have already known, we are currently working on patches to implement the new GROW_VOLUME and SHRINK_VOLUME operations [1]. One problem surfaces is that, since the new operations are not supported in Mesos 1.5, they will lead to an agent crash during the operation

Re: Convention for Backward Compatibility for New Operations in Mesos 1.6

2018-04-16 Thread Chun-Hung Hsiao
sion and reject > such operations at master? This is one of the main reasons we introduced > the concept of framework, master, agent capabilities. > > On Mon, Apr 16, 2018 at 2:04 PM, Chun-Hung Hsiao <chhs...@apache.org> > wrote: > > > Hi all, > > > > A

Re: Convention for Backward Compatibility for New Operations in Mesos 1.6

2018-04-16 Thread Chun-Hung Hsiao
her we do > option 1 as well, since although in this case it will still crash 1.5.0, at > least in the future we won't have to have this worry again. > > On 4/16/18, 2:04 PM, "Chun-Hung Hsiao" <chhs...@apache.org> wrote: > > Hi all, > > As some m

Re: API Review: Resize (persistent) volume support

2018-03-16 Thread Chun-Hung Hsiao
Thanks Zhitao for the summary. My thoughts are: For `SHRINK_VOLUME`, I feel option 2 is appropriate, as it gives the component that actually applies the operation to decide what the resulting free disk space would become. Option 3 is also acceptable. For `GROW_VOLUME`, I actually prefer option 1

Re: API Review: Resize (persistent) volume support

2018-03-19 Thread Chun-Hung Hsiao
>From the perspective of resource allocation, GROW takes two resources and merge them into one, while SHRINK takes one resource and split it into two. So, having two separated calls could make it explicit to the framework about what the resources being consumed are. Jie also mentioned in the

Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
that `collect()` returns in the same order of their dependent futures, this can be avoided. On Mar 1, 2018 12:50 PM, "Benjamin Mahler" <bmah...@apache.org> wrote: > Could you explain the problem in more detail? > > On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao <chhs...@mesosphe

Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
Hi all, Meng found a bug in `slave.cpp`, where the proper fix requires collecting futures in order. Currently every `collect` call spawns it's own actor, so for two `collect` calls, even though their futures are satisfied in order, they may finish out-of-order. So we need some libprocess changes

Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-02 Thread Chun-Hung Hsiao
This is a new behavior we have after solving MESOS-1720, and thus a new problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same executor) will be launched because whoever comes first will launch the executor. Since 1.5, one might be dropped. On Mar 1, 2018 4:36 PM, "Gilbert Song"

Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-02 Thread Chun-Hung Hsiao
Gilbert I think you're right. The code path doesn't exist in 1.5.0. On Mar 2, 2018 9:36 AM, "Chun-Hung Hsiao" <chhs...@mesosphere.io> wrote: > This is a new behavior we have after solving MESOS-1720, and thus a new > problem only in 1.5.x. Prior to 1.5, reordered tasks

Proposal: Changing `CREATE_VOLUME` and `CREATE_BLOCK` to `CREATE_DISK`.

2018-06-28 Thread Chun-Hung Hsiao
Hi folks, *TL;DR* I'm proposing a breaking API change on experimental offer operations, as shown in the review request: https://reviews.apache.org/r/67779/ Reasons: 1. "Volume" is overloaded and leads to conflicting/inconsistent naming. 2. The concept of "PATH" disks does not exist in CSI, which

[RESULT][VOTE] Release Apache Mesos 1.7.0 (rc3)

2018-09-19 Thread Chun-Hung Hsiao
Hi all, The vote for Mesos 1.7.0 (rc3) has passed with the following votes. +1 (Binding) -- *** Alex Rukletsov *** Kapil Arya *** James Peach There were no 0 or -1 votes. Please find the release at: https://dist.apache.org/repos/dist/release/mesos/1.7.0 It is

Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Chun-Hung Hsiao
+1 (binding) `sudo make -j32 DISTCHECK_CONFIGURE_FLAGS='LIBS=-ldl --enable-ssl --enable-libevent --enable-grpc' distcheck` on Ubuntu 16.04. I got 4 known test failures on my machine: [ FAILED ] 4 tests, listed below: [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_CFS_EnableCfs [ FAILED ]

[VOTE] Release Apache Mesos 1.7.1 (rc2)

2019-01-15 Thread Chun-Hung Hsiao
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.7.1. 1.7.1 includes the following: * This is a bug fix release. Also includes performance and API improvements: * **Allocator**:

[VOTE] Release Apache Mesos 1.7.1 (rc1)

2018-12-21 Thread Chun-Hung Hsiao
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.7.1. 1.7.1 includes the following: * This is a bug fix release. Also includes performance and API improvements: * **Allocator**:

Discussion: Scheduler API for Operation Reconciliation

2018-12-11 Thread Chun-Hung Hsiao
Hi folks, Recently I've being discussing the problems of the current design of the experimental `RECONCILE_OPERATIONS` scheduler API with a couple people. The discussion was started from MESOS-9318 : when a framework receives an

Re: [VOTE] Release Apache Mesos 1.5.2 (rc2)

2018-11-26 Thread Chun-Hung Hsiao
-1 for https://issues.apache.org/jira/browse/MESOS-8623 I'm working on a fix. On Thu, Nov 22, 2018 at 1:40 PM Meng Zhu wrote: > +1 > make check on Ubuntu 18.04 > > On Wed, Oct 31, 2018 at 4:26 PM Gilbert Song > wrote: > > > Hi all, > > > > Please vote on releasing the following candidate as

Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-03 Thread Chun-Hung Hsiao
ck passes on macOS 10.14.2 > >> > >> $ clang++ --version > >> Apple LLVM version 10.0.0 (clang-1000.10.44.4) > >> Target: x86_64-apple-darwin18.2.0 > >> Thread model: posix > >> InstalledDir: /Library/Developer/CommandLineTools/usr/bin > >>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-09-14 Thread Chun-Hung Hsiao
we later > find another character in use that also needs to be encoded, we can then > abstract the single encoding into a per-platform encoding set. > > Does this seem reasonable? > > Thanks, > > Andy > > P.S. Sorry this took a while to get back to, I was out last week. > &

Re: Discussion: Scheduler API for Operation Reconciliation

2019-01-24 Thread Chun-Hung Hsiao
>> > >> > As far as I know, we don't have any formal guarantees on which >> operations status changes the framework will receive without >> reconciliation. So putting on my framework-implementer hat it seems like >> I'd have no choice but to implement a continously p

Re: Mesos on ssl

2019-04-05 Thread Chun-Hung Hsiao
I'm not sure if this is related: https://issues.apache.org/jira/browse/MESOS-7076 In summary, Ubuntu 18.04 ships libevent 2.1.x (for OpenSSL 1.1.x support). But libevent 2.1.x has an unknown bug that caused some Mesos tests to fail. As a workaround, the current Mesos master branch (will be 1.8

Re: [VOTE] Release Apache Mesos 1.5.3 (rc1)

2019-03-14 Thread Chun-Hung Hsiao
+1 (binding) `sudo make check` with `--enable-grpc --enable-ssh --enable-libevent` on Ubuntu 16.04 with the following known test failures: [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_CFS_EnableCfs [ FAILED ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen [ FAILED ]

[RESULT][VOTE] Release Apache Mesos 1.7.1 (rc2)

2019-01-28 Thread Chun-Hung Hsiao
Hi all, The vote for Mesos 1.7.1 (rc2) has passed with the following votes. +1 (Binding) -- *** Vinod Kone *** Gilbert Song *** Meng Zhu There were no 0 or -1 votes. Please find the release at: https://dist.apache.org/repos/dist/release/mesos/1.7.1 It is

Re: [VOTE] Release Apache Mesos 1.8.0 (rc3)

2019-05-02 Thread Chun-Hung Hsiao
>From the log you attached, it seems that you're using Mesos containerizer, so a docker pull won't affect Mesos. Can you verify if the error occurs with the latest nvidia/cuda image? On Wed, May 1, 2019, 4:25 PM Chun-Hung Hsiao wrote: > Hi Jorge, > > Can you provide the output of

Re: [VOTE] Release Apache Mesos 1.8.0 (rc3)

2019-05-01 Thread Chun-Hung Hsiao
Hi Jorge, Can you provide the output of `docker run --rm -ti nvidia/cuda ls /usr/local/cuda-10.1/compat/`? It seems that the nvidia kernel driver installed on your host has version 418, but the image you're using is version 410. The lastest `nvidia/cuda` image uses version 418 as well. Can you

Re: [VOTE] Release Apache Mesos 1.9.0 (rc2)

2019-08-29 Thread Chun-Hung Hsiao
-1 for https://issues.apache.org/jira/browse/MESOS-9956. I'm working on a fix for it. On Wed, Aug 28, 2019 at 4:13 AM Qian Zhang wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.9.0. > > > 1.9.0 includes the following: > >

Re: [VOTE] Release Apache Mesos 1.9.0 (rc3)

2019-09-04 Thread Chun-Hung Hsiao
+1 (binding) make distcheck on Ubuntu 16.04 and 18.04. On 18.04 I got the following known failure: [ FAILED ] DockerFetcherPluginTest.INTERNET_CURL_FetchBlob Also the mesos-gtest-runner invoked by make distcheck seems not working on both platforms. On Tue, Sep 3, 2019 at 1:34 PM Gilbert Song