Re: Welcome Greg Mann as a new committer and PMC member!

2017-06-13 Thread Neil Conway
Congratulations, Greg!! Very well-deserved. Looking forward to continuing to work with you on the project. Neil On Tue, Jun 13, 2017 at 2:42 PM, Vinod Kone wrote: > Hi folks, > > Please welcome Greg Mann as the newest committer and PMC member of the > Apache Mesos

Re: [Design doc] RPC: Fault domains in Mesos

2017-06-08 Thread Neil Conway
/framework API challenges around supporting domain opt-in w/o. The review chain for the MVP of this feature are up now (MESOS-7607). Neil On Mon, Apr 17, 2017 at 9:44 AM, Neil Conway <neil.con...@gmail.com> wrote: > Folks, > > I'd like to enhance Mesos to support a first-class notion

Re: June 3rd: MesosCon North America CFP due!

2017-06-03 Thread Neil Conway
Hi Jay, The CFP deadline has been extended to June 30. Neil On Sat, Jun 3, 2017 at 12:55 AM Jay Guo wrote: > According to the link CFP for MesosCon North America > > ​ the CFP closes by

Re: RFC: Partition Awareness

2017-06-01 Thread Neil Conway
Hi Ben, The argument for changing the semantics is that correct frameworks should _always_ have accounted for the possibility that TASK_LOST tasks would go back to running (due to the non-strict registry semantics). The proposed change would just increase the probability of this behavior

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-31 Thread Neil Conway
On Tue, May 30, 2017 at 3:43 PM, Neil Conway <neil.con...@gmail.com> wrote: > Attached is the test log for this failure. From a quick look, seems as > though the agent starts to launch the task, including forking the > child process, but no subsequent task status updates or

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-31 Thread Neil Conway
On Tue, May 30, 2017 at 2:36 PM, Vinod Kone wrote: > Failed test: OneWayPartitionTest.MasterToSlave >

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-30 Thread Neil Conway
On Tue, May 30, 2017 at 2:36 PM, Vinod Kone wrote: > Ran on ASF CI. > > Found following issues. > > Failed test: CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled >

Re: [VOTE] Release Apache Mesos 1.3.0 (rc2)

2017-05-24 Thread Neil Conway
The vote has failed; we'll cut a new release shortly. The release blocker (MESOS-7521) has been investigated and fixed. The next RC will also include MESOS-7538, as well as the `register_agents` ACL change mentioned in a different thread. Neil On Wed, May 17, 2017 at 3:11 PM, Yan Xu

Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Neil Conway
Congratulations Gilbert! Well-deserved! Neil On Wed, May 24, 2017 at 10:32 AM, Jie Yu wrote: > Hi folks, > > I' happy to announce that the PMC has voted Gilbert Song as a new committer > and member of PMC for the Apache Mesos project. Please join me to > congratulate him! >

Re: Use of ACLs.RegisterAgent.agent

2017-05-24 Thread Neil Conway
FYI, I merged the change to rename this field into the master and 1.3.x branches; it will be included in the next 1.3.0 release candidate. Neil On Mon, May 22, 2017 at 10:43 AM, Alexander Rojas wrote: > Hey guys, > > We just noted that there was an error when the

Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-08 Thread Neil Conway
Personally, I'm not convinced that we need to fix MESOS-7378. The problem is essentially a bug in glibc that was fixed 6 years ago. (As a point of reference, the oldest version of g++ we support was released 2 years ago... :) ) Neil On Mon, May 8, 2017 at 3:45 PM, Yan Xu wrote: > I

Version numbers in Mesos

2017-05-08 Thread Neil Conway
I would like to make a few changes to how Mesos handles version numbers: (1) Mesos versions (e.g., as defined in `configure.ac`) must be parseable as valid SemVer (see http://semver.org/). This has always been the case with version numbers assigned by the Apache Mesos project, but if your

Re: [Design doc] RPC: Fault domains in Mesos

2017-04-19 Thread Neil Conway
l we could implement this by identifying a fault domain with a simple > list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"] > or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-le

[Design doc] RPC: Fault domains in Mesos

2017-04-17 Thread Neil Conway
Folks, I'd like to enhance Mesos to support a first-class notion of "fault domains" -- i.e., identifying the "rack" and "region" (DC) where a Mesos agent or master is located. The goal is to enable two main features: (1) To make it easier to write "rack-aware" Mesos frameworks that are portable

Re: Welcome Kevin Klues as a Mesos Committer and PMC member!

2017-03-01 Thread Neil Conway
Congratulations Kevin! Very well-deserved. Neil On Wed, Mar 1, 2017 at 2:05 PM, Benjamin Mahler wrote: > Hi all, > > Please welcome Kevin Klues as the newest committer and PMC member of the > Apache Mesos project. > > Kevin has been an active contributor in the project for

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-01 Thread Neil Conway
The perf core dump might be addressed if we backport this change: https://reviews.apache.org/r/56611/ Although my guess is that this isn't a severe problem: for some as-yet-unknown reason, running `perf` on the host segfaulted, which causes the test to fail. Neil On Wed, Mar 1, 2017 at 11:09

Re: Welcome Neil Conway as Mesos Committer and PMC member!

2017-01-22 Thread Neil Conway
017 at 11:03 PM, Vinod Kone <vinodk...@apache.org> wrote: > >> Hi folks, >> >> Please welcome Neil Conway as the newest committer and PMC member of the >> Apache Mesos project. >> >> Neil has been an active contributor to Mesos for more than a year now. As

Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
On Mon, Dec 12, 2016 at 1:32 PM, Joris Van Remoortere wrote: > It sounds like using a multi_hashmap for now allows you to clean up the > code and avoid some bugs, without changing the existing behavior. Because we want cache-like behavior (bounded size + LRU replacement),

Disabling the --registry_strict flag in 1.0

2016-07-12 Thread Neil Conway
Hi folks, I'd like to propose that we disable the --registry_strict flag for Mesos 1.0. You can find the rationale for this change here: https://issues.apache.org/jira/browse/MESOS-5833 Please let me know if you have any thoughts on whether we should make this change. Thanks, Neil

RFC: partitioned tasks and the strict registry

2016-07-11 Thread Neil Conway
Folks, We're working on some Mesos features that will allow frameworks to control how partitioned tasks are handled [1]. As part of designing how this will work, I'd love to hear from users and framework developers about they handle partitioned tasks/agents. Specifically: (a) Have you enabled

Improving support for partitioned tasks

2016-06-21 Thread Neil Conway
Currently, Mesos implements a hardcoded policy for handling partitioned agents and tasks: * agents are deemed to be partitioned when they fail health checks (~75 seconds by default) * partitioned agents are removed from the cluster. Frameworks receive TASK_LOST for all tasks running on the

Re: source code compile failure mesos-0.28.0

2016-06-21 Thread Neil Conway
Can you post the content of "config.log"? Thanks, Neil On Tue, Jun 21, 2016 at 3:17 PM, Ali Aktar wrote: > Hi; > > All dependencies as per doc were installed. I’m using Centos 7: > Linux ip-172-31-46-249.eu-west-1.compute.internal 3.10.0-327.10.1.el7.x86_64 > #1 SMP

Re: Need CHANGELOG updates

2016-03-03 Thread Neil Conway
I sent https://reviews.apache.org/r/44348/ for the floating point math changes; if you'd prefer a different format or more/less details, just let me know. Thanks, Neil On Thu, Mar 3, 2016 at 10:57 AM, Vinod Kone wrote: > Hi guys, > > The 0.28.0 release is currently blocked

Re: Making 'curl' a prerequisite for installing Mesos

2016-03-03 Thread Neil Conway
No objection to about the additional dependency, but using 'curl' instead of 'libcurl' seems unfortunate. Can you share some more detailed information about the problems that have been encountered using libcurl? e.g., was using the curl_multi_xxx() APIs explored? Neil On Thu, Mar 3, 2016 at 9:10

Re: [VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-02-29 Thread Neil Conway
As described (briefly) in the release emails, 0.27.2, 0.26.1, 0.25.1, and 0.24.2 contains a new feature: "reliable floating point for scalar resources" (MESOS-4687). To elaborate on that slightly, Mesos now only supports scalar resource values with three decimal digits of precision (e.g.,

Precision of scalar resources

2016-02-12 Thread Neil Conway
tl;dr: If you use resource values with more than three decimal digits of precision (e.g., you are launching a task that uses 2.5001 CPUs), please speak up! Mesos uses floating point to represent scalar resource values, such as the number of CPUs in a resource offer or dynamic reservation.

Re: mesos 0.23, long term quering state.json data.

2016-02-01 Thread Neil Conway
There are some known performance problems with the implementation of the /state endpoint in prior versions of Mesos (see MESOS-2353 for details). In Mesos 0.27, the performance of /state should be much, much faster. Neil On Mon, Feb 1, 2016 at 8:02 AM, tommy xiao wrote: >

Re: Get Task's labels on reconciliation

2016-01-22 Thread Neil Conway
Hi Andrii, TaskStatus includes the task's labels [1], so what you're trying to do should work. BTW, we recently wrote up some suggestions on how to write highly available frameworks. The docs will be on the website the next time it is refreshed; in the mean time, you can find them here:

Re: Dynamic Reservations and Roles

2016-01-21 Thread Neil Conway
Hi John, I believe what you're attempting to do should be supported. Try reserving the resources with "principal = prod" and "role = dev". That will mean that the dev role will be allowed to use the resources, but only principals that are allowed to unreserve prod's resources (as configured by

Allocator API changes

2015-12-10 Thread Neil Conway
Hi everyone, The allocator API [1] is going to change in the forthcoming 0.26 release [2]. Custom allocators will need to implement several new API methods. Further changes to the allocator API are being contemplated for the 0.27 release [3]. If you have built a custom allocator, please speak

Re: Is it safe to replace mesos-master in fly

2015-11-25 Thread Neil Conway
On Tue, Nov 24, 2015 at 3:38 PM, Marco Massenzio wrote: > The closest I could find is [0], but granted, much more detail could be > desirable :) Agreed! See also https://issues.apache.org/jira/browse/MESOS-3995 Neil