Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Chun-Hung Hsiao
+1 (binding)

`sudo make -j32 DISTCHECK_CONFIGURE_FLAGS='LIBS=-ldl --enable-ssl
--enable-libevent --enable-grpc' distcheck` on Ubuntu 16.04.
I got 4 known test failures on my machine:
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_CFS_EnableCfs
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen
[  FAILED  ] CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat
[  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS

However with gcc 5.4.0, LIBS=-ldl is required for linking.

On Wed, Jan 16, 2019 at 12:03 PM Vinod Kone  wrote:

> +1  (binding)
>
> Passed in ASF CI. Known flaky tests, but otherwise builds look good.
>
> *Revision*: 3088295d4156eb58d092ad9b3529b85fd33bd36e
>
>- refs/tags/1.5.2-rc3
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl autotools
> [image: Failed]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> --verbose autotools
> [image: Failed]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> [image: Failed]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
> --verbose autotools
> [image: Success]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
>
>
> On Wed, Jan 16, 2019 at 11:04 AM Jie Yu  wrote:
>
>> +1
>>
>> make dist check on macOS Mojave
>>
>> On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:
>>
>>>  Hi all,
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>>>
>>> 1.5.2 includes the following:
>>>
>>> 
>>> *Announce major bug fixes here*
>>> https://jira.apache.org/jira/issues/?filter=12345443
>>>
>>> The CHANGELOG for the release is available 

Re: [VOTE] Release Apache Mesos 1.7.1 (rc2)

2019-01-16 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Failing builds are due to missed SSL dep in the docker
build file and a flaky test.

*Revision*: d5678c3c5500cec72e22e775d9d048c55c128954

   - refs/tags/1.7.1-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Failed]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Tue, Jan 15, 2019 at 8:30 PM Chun-Hung Hsiao  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.1.
>
>
> 1.7.1 includes the 

Re: Discussion: Scheduler API for Operation Reconciliation

2019-01-16 Thread Benjamin Bannier
Hi,

have we reached a conclusion here?

From the Mesos side of things I would be strongly in favor of proposal (III). 
This is not only consistent with what we do with task status updates, but also 
would allow us to provide improved operation status (e.g., 
`OPERATION_UNREACHABLE` instead of just `OPERATION_UNKNOWN` to better 
distinguish non-terminal from terminal operation states. To accomplish that we 
wouldn’t need to introduce extra information leakage (e.g., explicitly keeping 
master up to date on local resource provider state and associated internal 
consistency complications).

This approach should also simplify framework development as a framework would 
only need to watch a single channel to see operation status updates (no need to 
reconcile different information sources). The benefits of better status updates 
and simpler implementation IMO outweigh any benefits of the current approach 
(disclaimer: I filed the slightly inflammatory MESOS-9448).

What is keeping us from moving forward with (III) at this point?


Cheers,

Benjamin

> On Jan 3, 2019, at 11:30 PM, Benno Evers  wrote:
> 
> Hi Chun-Hung,
> 
> > imagine that there are 1k nodes and 10 active + 10 gone LRPs per node, then 
> > the master need to maintain 20k entries for LRPs.
> 
> How big would the required additional storage be in this scenario? Even if 
> it's 1KiB per LRP, using 20 MiB of extra memory doesn't sound too bad for 
> such a big custer.
> 
> In general, it seems hard to discuss the trade-offs between your proposals 
> without looking at the users of that API - do you know if there are ayn 
> frameworks out there that already use
>  operation reconciliation, and if so what do they do based on the 
> reconciliation response?
> 
> As far as I know, we don't have any formal guarantees on which operations 
> status changes the framework will receive without reconciliation. So putting 
> on my framework-implementer hat it seems like I'd have no choice but to 
> implement a continously polling background loop anyways if I care about 
> knowing the latest operation statuses. If this is indeed the case, having a 
> synchronous `RECONCILE_OPERATIONS` would seem to have little additional 
> benefit.
> 
> Best regards,
> Benno
> 
> On Wed, Dec 12, 2018 at 4:07 AM Chun-Hung Hsiao  wrote:
> Hi folks,
> 
> Recently I've being discussing the problems of the current design of the
> experimental
> `RECONCILE_OPERATIONS` scheduler API with a couple people. The discussion
> was started
> from MESOS-9318 : when a
> framework receives an `OPERATION_UNKNOWN`, it doesn't know
> if it should retry the operation or not (further details described below).
> As the discussion
> evolves, we realize there are more issues to consider, design-wise and
> implementation-wise, so
> I'd like to reach out to the community to get valuable opinions from you
> guys.
> 
> Before I jump right into the issues I'd like to discuss, let me fill you
> guys in with some
> background of operation reconciliation. Since the design of this feature
> was informed by the
> pre-existing implementation of task reconciliation, I'll begin there.
> 
> *Task Reconciliation: Design*
> 
> The scheduler API has a `RECONCILE` call for a framework to query the
> current statuses
> of its tasks. This call supports the following modes:
> 
>- *Explicit reconciliation*: The framework specifies the list of tasks
>it wants to know
>about, and expects status updates for these tasks.
> 
>- *Implicit reconciliation*: The framework does not specify a list of
>tasks, and simply
>expects status updates for all tasks the master knows about.
> 
> In both cases, the master looks into its in-memory task bookkeeping and
> sends
> *one or more`UPDATE` events* to respond to the reconciliation request.
> 
> *Task Reconciliation: Problems*
> 
> This API design of task reconciliation has the following shortcomings:
> 
>- (1) There is no clear boundary of when the "reconciliation response"
>ends, and thus
>there is
> *no 1-1 correspondence between the reconciliation request and the response*.
>For explicit reconciliation, the framework might wait for an extended 
> period
>of time before it receives all status updates; for implicit
>reconciliation, there is no way for
>a framework to tell if it has learned about all of its tasks, which
>could be inconvenient if
>the framework has lost its task bookkeeping.
> 
>- (2) The "reconciliation response" may be outdated. If an agent
>reregisters after a task
>reconciliation has been responded,
> *the framework wouldn't learn about the tasks **from this recovered agent*.
>Mesos relies on the framework to call the `RECONCILE` call
>*periodically* to get up-to-date task statuses.
> 
> 
> 
> *Operation Reconciliation: Design & Problems*
> 
> When designing operation reconciliation, we made the `RECONCILE_OPERATIONS`
> call
> *asynchronous 

Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Vinod Kone
+1  (binding)

Passed in ASF CI. Known flaky tests, but otherwise builds look good.

*Revision*: 3088295d4156eb58d092ad9b3529b85fd33bd36e

   - refs/tags/1.5.2-rc3

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Wed, Jan 16, 2019 at 11:04 AM Jie Yu  wrote:

> +1
>
> make dist check on macOS Mojave
>
> On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:
>
>>  Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>>
>> 1.5.2 includes the following:
>>
>> 
>> *Announce major bug fixes here*
>> https://jira.apache.org/jira/issues/?filter=12345443
>>
>> The CHANGELOG for the release is available at:
>>
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc3
>>
>> 
>>
>> The candidate for Mesos 1.5.2 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz
>>
>> The tag to be voted on is 1.5.2-rc3:
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc3
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> 

Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Jie Yu
+1

make dist check on macOS Mojave

On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:

>  Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>
> 1.5.2 includes the following:
>
> 
> *Announce major bug fixes here*
> https://jira.apache.org/jira/issues/?filter=12345443
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc3
>
> 
>
> The candidate for Mesos 1.5.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz
>
> The tag to be voted on is 1.5.2-rc3:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc3
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1242
>
> Please vote on releasing this package as Apache Mesos 1.5.2!
>
> The vote is open until Fri Jan 18 00:52:44 PST 2019 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.5.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Gilbert
>