Design Doc: Hierarchical Quota Guarantees and Limits

2017-10-11 Thread Benjamin Mahler
Hi folks, As part of the ongoing work for hierarchical role support, Michael Park and I have been working on a design doc that describes how the allocation algorithm needs to be updated to handle hierarchical quota guarantees. Also, as part of this work, we realized it makes sense to also make

Re: Are there any supported systems without O_CLOEXEC?

2017-09-29 Thread Benjamin Mahler
Is this altering the minimum Linux or OS X version we support? On Fri, Sep 29, 2017 at 9:15 AM, James Peach wrote: > > > On Sep 27, 2017, at 5:03 PM, James Peach wrote: > > > > Hi all, > > > > In MESOS-8027 and https://reviews.apache.org/r/62638/, I'm

Re: When is support for the AMD GPU driver on Mesos?

2017-09-06 Thread Benjamin Mahler
AMD support is not planned, no users have asked for it as far as I know. Nvidia support in mesos means: (1) Automatic detection of the GPUs via the NVML libraries. (2) Enforced isolation via device access. (3) Automatically making the nvidia driver libraries available within the container. We

Re: Welcome James Peach as a new committer and PMC memeber!

2017-09-06 Thread Benjamin Mahler
Thanks for all that you've done so far for the project James! On Wed, Sep 6, 2017 at 2:08 PM, Yan Xu wrote: > Hi Mesos devs and users, > > Please welcome James Peach as a new Apache Mesos committer and PMC member. > > James has been an active contributor to Mesos for over two

Re: [VOTE] Release Apache Mesos 1.4.0 (rc3)

2017-08-28 Thread Benjamin Mahler
-1 due to https://issues.apache.org/jira/browse/MESOS-7921 Thanks for reporting this Yan, it unfortunately went unnoticed despite CI failures since Aug 3rd. On Mon, Aug 28, 2017 at 12:29 PM, Yan Xu wrote: > Also the libprocess refactor seems to have stability issues: >

Re: TaskStatus.uuid for idempotent status handling

2017-08-28 Thread Benjamin Mahler
Yes, the UUID is how you would check for a duplicate due to re-transmission. These duplicates still need to be acknowledged. Ben On Mon, Aug 28, 2017 at 9:59 AM, Christoph Heer wrote: > Hi, > > as described in Mesos' documentation [1], a framework scheduler should >

Re: Beta 1.10

2017-08-15 Thread Benjamin Mahler
Looks like you're asking about DC/OS? Their user list is: us...@dcos.io On Fri, Aug 11, 2017 at 7:04 AM, Mclain, Warren wrote: > We (Optum) are interested in the Beta1.10 dcos. The one item that we are > looking at is whether the OpenId support is in the beta. > > > >

Re: Command Executor

2017-08-08 Thread Benjamin Mahler
You're free to write your own long lived executor that can process multiple tasks. The built in executors self-terminate after running the tasks they are launched with. On Tue, Aug 8, 2017 at 2:36 AM, Oeg Bizz wrote: > It is used to notify some services that the agents are

Re: Deprecating `--disable-zlib` in libprocess

2017-08-08 Thread Benjamin Mahler
Sorry, I think this was me, feel free to remove it from libprocess now that it's required. On Tue, Aug 8, 2017 at 10:57 AM, Chun-Hung Hsiao wrote: > Hi all, > > In libprocess, we have an optional `--disable-zlib` flag, but it's > currently not used > for conditional

Re: [VOTE] Release Apache Mesos 1.3.1 (rc1)

2017-08-02 Thread Benjamin Mahler
+1 (binding) ./configure CC=clang CXX=clang++ CXXFLAGS=-Wno-deprecated-declarations --disable-python --disable-java --with-apr=/usr/local/opt/apr/libexec --with-svn=/usr/local/opt/subversion && make check -j8 Ran into a known flaky test: https://issues.apache.org/jira/browse/MESOS-7739 On Tue,

Re: Latest Mesos (1.4.0) Fails to Build on Ubuntu 14.04.5

2017-08-01 Thread Benjamin Mahler
That file path looks valid from what I can tell: /opt/mesos/build/src/../../src/python/cli/src/mesos/__init__.py Is the file not there? Is the directory not there? On Fri, Jul 28, 2017 at 9:45 AM, Traiano Welcome wrote: > Hi All > > The latest version of mesos fails to

Re: Mesos Python Daemon Launch

2017-07-28 Thread Benjamin Mahler
This is generally not something we want users to do (i.e. leak something outside of their container). Mesos will kill all tasks in the cgroup if you're using cgroup isolation, so you would have to ensure the daemon escapes the cgroup. If you're using the posix isolation, you also need to be sure

Re: Agent Working Directory Best Practices

2017-06-26 Thread Benjamin Mahler
As a data point, as far as I'm aware, most users are using a local work directory, not an NFS mounted one. Would love to hear from anyone on the list if they are doing this, and if there are any subtleties that should be documented. On Thu, Jun 22, 2017 at 11:13 PM,

Re: Work group on Community

2017-06-15 Thread Benjamin Mahler
Thanks for kicking this off Vinod! (lists to bcc) I'm happy to join, I would add the following under this umbrella for now: --> Project PR (e.g. blog posts, twitter, etc) --> Events --> Website / documentation --> New contributor UX On Thu, Jun 15, 2017 at 10:57 AM, Vinod Kone

Re: [VOTE] Release Apache Mesos 1.2.1 (rc1)

2017-06-08 Thread Benjamin Mahler
a flaky test or a bug? On Thu, Jun 8, 2017 at 4:07 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Vinod I think that's the getenv issue from: https://issues.apache.or > g/jira/browse/MESOS-6985 > > On Wed, May 17, 2017 at 5:57 PM, Till Toenshoff <toensh...@me.com>

Re: [VOTE] Release Apache Mesos 1.2.1 (rc1)

2017-06-08 Thread Benjamin Mahler
Vinod I think that's the getenv issue from: https://issues.apache. org/jira/browse/MESOS-6985 On Wed, May 17, 2017 at 5:57 PM, Till Toenshoff wrote: > +1 > > Ran it through DC/OS builds and integration tests; > https://github.com/dcos/dcos/pull/1530 => all green > > On May 17,

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-06-02 Thread Benjamin Mahler
Thanks Yan! On Fri, Jun 2, 2017 at 10:45 AM, Yan Xu <y...@jxu.me> wrote: > +1 (binding) > > Ran it in a test cluster. > > --- > Jiang Yan Xu <y...@jxu.me> | @xujyan <https://twitter.com/xujyan> > > On Thu, Jun 1, 2017 at 2:34 PM, Benjamin Mahler <

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-06-01 Thread Benjamin Mahler
+1 (binding) Looks like ExamplesTest.DynamicReservationFramework is flaky, unfortunately wasn't able to get the logs for a failed run. On Thu, Jun 1, 2017 at 2:03 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Not a blocker, but noticed the parallel test runner isn't bundled in the

Re: RFC: Partition Awareness

2017-06-01 Thread Benjamin Mahler
If I understood correctly, the proposal is to not kill the tasks for non-partition aware frameworks? That seems like a pretty big change for frameworks that are not partition aware and expect the old killing semantics. It seems like we should just directly fix the issue, do you have a sense of

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-06-01 Thread Benjamin Mahler
Not a blocker, but noticed the parallel test runner isn't bundled in the release, if you configure with '--enable-parallel-test-execution': /Users/bmahler/Downloads/mesos-1.3.0/support/mesos-gtest-runner.py --sequential=*ROOT_* ./stout-tests /bin/sh:

Re: Nested container sessions

2017-06-01 Thread Benjamin Mahler
+Kevin On Wed, May 31, 2017 at 3:31 PM, Brad wrote: > Hi all, > > I'm interested in the container attach and exec feature added in version > 1.2.0. > > I'm using the LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_INPUT > calls on the operator API to launch an

Re: Plan for upgrading protobuf==3.2.0 in Mesos

2017-05-26 Thread Benjamin Mahler
Thanks Zhitao and Anand! I've been looking forward to using arena allocation to improve performance. On Fri, May 26, 2017 at 6:01 PM, Qian Zhang wrote: > Thanks Anand and Zhitao! > > So I think we can remove the code like below, and switch to use the native > maps supported

Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-26 Thread Benjamin Mahler
Thanks for all of your contributions to the project so far! It's been great having you in the community On Wed, May 24, 2017 at 10:32 AM, Jie Yu wrote: > Hi folks, > > I' happy to announce that the PMC has voted Gilbert Song as a new > committer and member of PMC for the

Re: [RESULT][VOTE] Release Apache Mesos 1.0.4 (rc2)

2017-05-25 Thread Benjamin Mahler
s] >> > <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel >> ease/32/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=-- >> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu% >> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ub

Re: [RESULT][VOTE] Release Apache Mesos 1.1.2 (rc2)

2017-05-25 Thread Benjamin Mahler
I just was targeting a cherry pick and noticed the release isn't closed on JIRA (see 'Releasing the Version on JIRA' section of the release guide). I closed it and added a 1.1.3 version for folks to target for bug fixes. On Fri, May 19, 2017 at 5:36 AM, Alex Rukletsov wrote:

Re: High performance, low latency framework over mesos

2017-05-19 Thread Benjamin Mahler
OWLEDGE call with the uuid got in status > update) also seen in master only ~35ms (lines 18-19 below) after the call. > I’m starting to conclude the each call using the scheduler library (which > actually involves HTTP POST) takes ~40ms. > > > > To sum it up, it seems that the m

Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-09 Thread Benjamin Mahler
-1 (binding) Two upgrade blockers, these need to be backported to 1.2.x as well: https://issues.apache.org/jira/browse/MESOS-7478 (I have a patch already) https://issues.apache.org/jira/browse/MESOS-7460 (mpark is working on this) Re: MESOS-7378 Any updates on whether this will be fixed? On

Re: [VOTE] Release Apache Mesos 1.0.4 (rc2)

2017-05-02 Thread Benjamin Mahler
+1 make check passes on macOS 10.12.4 with clang On Tue, May 2, 2017 at 12:04 PM, Vinod Kone wrote: > Hi all, > > > Please vote on releasing the following candidate as Apache Mesos 1.0.4. > > > 1.0.4 includes the following: > >

Re: How to filter GET_TASKS api result

2017-04-19 Thread Benjamin Mahler
We can add a Call.GetTasks message to allow you to specify which task ids you would like to retrieve. But this isn't supported yet, the code needs to be written. E.g. message Call { enum Type { GET_TASKS = 13; // Retrieves the information about tasks, see `GetTasks` below. }

Re: Detect when mesos agent needs work directory cleanup

2017-03-23 Thread Benjamin Mahler
I would recommend avoiding a manual clean up of the work directory, since it's not guaranteed that this approach will remain correct as things evolve. To have the agent perform the cleanup using its own logic, you can run: mesos-agent --recover=cleanup --work_dir= --master= Also, there is

Re: Suspend a running task?

2017-03-16 Thread Benjamin Mahler
Hi Mark, No, there is no support for this currently. On Thu, Mar 16, 2017 at 2:11 PM, Mark Hammons wrote: > Can you suspend a running task with mesos? I see that it can be killed, > but it would be nice to have the ability to suspend tasks for a preemptive >

Re: High performance, low latency framework over mesos

2017-03-15 Thread Benjamin Mahler
lder() > > .setRefuseSeconds(0) > > .build())) > > .build()); > > } > > > > LOGGER.info("Completed handling offers"); > > } > > >

Re: High performance, low latency framework over mesos

2017-03-09 Thread Benjamin Mahler
Have you taken a look at the logs across your scheduler / master / agent to see where the latency is introduced? We can then discuss options to reduce the latency. Ben On Tue, Mar 7, 2017 at 5:03 AM, wrote: > Hi, > > > > I’m implementing my own framework (scheduler +

Re: Messos do not assign all available resources

2017-03-02 Thread Benjamin Mahler
Hartmann <gabr...@mesosphere.io> wrote: > Possibly the suppress/revive problem. > > On Thu, Mar 2, 2017 at 4:30 PM Benjamin Mahler <bmah...@apache.org> wrote: > >> Can you upload the full logs somewhere and link to them here? >> >> How many frameworks are yo

Re: Messos do not assign all available resources

2017-03-02 Thread Benjamin Mahler
Also, what is the allocation that each framework has when you reach your steady state? Are there frameworks that don't have any more work to do but have a really low share of the cluster? On Thu, Mar 2, 2017 at 4:29 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Can you upload the

Re: Messos do not assign all available resources

2017-03-02 Thread Benjamin Mahler
cpu per task). > > The problem is (we think): the mesos-master does not offers resources to > all the tasks all the time and the declined resources are not re-offered to > other tasks. Any idea to how to change the behavior or the rate to offer > resources to the tasks? > > FY

Re: Understanding Mesos Maintenance

2017-03-02 Thread Benjamin Mahler
Hey Zameer, great questions. Let us know if there's anything you think could be improved or documented better. Re 1: The 'Viewing maintenance status' section of the documentation should clarify this: http://mesos.apache.org/documentation/latest/maintenance/ Re 2: Both of these sound reasonable

Re: Messos do not assign all available resources

2017-03-02 Thread Benjamin Mahler
Hi there, more clarification is needed: > I have close to 800 CPUs, but the system does not assign all the available > resources to all our tasks. > What do you mean precisely here? Can you describe what you're seeing? Also, you have more than 800GB or RAM right? Ben On Thu, Mar 2, 2017 at 9:00

Re: resource offers after task failure

2017-02-26 Thread Benjamin Mahler
Hi Hendrik, > Is it normal that the reserved resources are only available a bit after > the task ended? Yes, that's normal since we don't block the forwarding of the terminal status update behind the allocation of the freed resources. Since the latter can take some time, we opt to forward the

Re: tag disks resources

2017-02-07 Thread Benjamin Mahler
For GPUs there have been requests to expose the hardware and topology information in a first class way, so that schedulers can consume it consistently. Uses cases have been: handling heterogenous gpu hardware, topology aware scheduling (critical for GPUs given NVLink vs PCI vs QPI communication

Re: Welcome Neil Conway as Mesos Committer and PMC member!

2017-01-22 Thread Benjamin Mahler
Congrats and welcome! On Fri, Jan 20, 2017 at 11:03 PM, Vinod Kone wrote: > Hi folks, > > Please welcome Neil Conway as the newest committer and PMC member of the > Apache Mesos project. > > Neil has been an active contributor to Mesos for more than a year now. As > part

[Design Doc] [RFC] Hierarchical Roles

2017-01-19 Thread Benjamin Mahler
As promised when publishing the multi-role framework design doc, here is the design doc for hierarchical roles. Design Doc: https://docs.google.com/document/d/1Ie2-6O400ayNXtRqipHq6_ CCQ4wOoLWzoqql3b0Y6HU/edit?usp=sharing JIRA Epic: https://issues.apache.org/jira/browse/MESOS-6375 Take a look

Re: 答复: Optimize libprocess performance

2017-01-05 Thread Benjamin Mahler
the network is not the > bottleneck, so the RPC layer is too heavy. > > -邮件原件- > 发件人: Benjamin Mahler [mailto:bmah...@apache.org] > 发送时间: 2017年1月5日 9:26 > 收件人: dev > 抄送: user@mesos.apache.org > 主题: Re: Optimize libprocess performance > > Which area

Re: Optimize libprocess performance

2017-01-04 Thread Benjamin Mahler
Which areas does the performance not meet your needs? There are a lot of aspects to libprocess that can be optimized, so it would be good to focus on each of your particular use cases via benchmarks, this allows us to have a shared way to profile and measure improvements. Copy elimination is one

Re: Multi-agent machine

2016-12-09 Thread Benjamin Mahler
Maintenance should work in this case, it will just be applied to all agents on the machine. On Fri, Dec 9, 2016 at 1:20 PM, Charles Allen wrote: > Thanks for the insight. > > I take that to mean the maintenance primitives might not work right for > multi-agent

Re: [VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-12-01 Thread Benjamin Mahler
+1 (binding) On Wed, Nov 30, 2016 at 2:53 PM, Greg Mann wrote: > +1 (non-binding) > > Did `sudo make check` on CentOS 7. Aside from several > LinuxFilesystemIsolatorTests and two other flaky > tests, CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_ > DestroyTracedProcess >

Re: outstanding offers

2016-11-03 Thread Benjamin Mahler
Yes, if you re-register with the master, this will invalidate all outstanding offers. On Mon, Oct 31, 2016 at 2:28 PM, Hendrik Haddorp wrote: > Right, I have written my own scheduler and sometimes end up in a state > that Mesos believes that there are outstanding offers

Re: Features missing in the webui

2016-10-21 Thread Benjamin Mahler
to the UI. > Maintenance of nodes will be presented in a table. Code is on github but > need some tweaks and tests with large maintenance json. I'll prepare patch > shortly (when github DDoS will be over). > https://issues.apache.org/jira/browse/MESOS-6443 > > pt., 21.10.2016 o 20:30 użytk

Features missing in the webui

2016-10-21 Thread Benjamin Mahler
When adding features we try to ensure the webui is updated accordingly. However, there have been a few gaps where the webui has not been updated to reflect the addition of functionality. I filed the following epic to collect gaps in functionality: https://issues.apache.org/jira/browse/MESOS-6440

Re: Web UI no longer shows Tasks information

2016-09-28 Thread Benjamin Mahler
Thanks for reporting this Rodrick, do you see any errors in your browser's console? On Tue, Sep 27, 2016 at 4:29 AM, Rodrick Brown wrote: > > On Sep 27, 2016, at 3:43 AM, haosdent wrote: > > Hi, @Rodrick > > >"master/frameworks_connected": 0, > > Is

Re: Non-existing jobs being stuck in over capacity

2016-09-20 Thread Benjamin Mahler
You may get better help from the Marathon team: https://github.com/mesosphere/marathon#help On Mon, Sep 19, 2016 at 11:49 PM, Cecile, Adam wrote: > Hello Guys, > > We are sometime experiencing weird behavior between Mesos and Marathon. > Some jobs that does not seem to

Re: Master pailer failure 0.28.2

2016-08-16 Thread Benjamin Mahler
Hard to interpret the error message, it looks like it's pointing to our $scope variables 'offered_cpus' and 'idle_cpus'. Is the error consistent? When you say you get this error with the pailer, what does that mean? You see this in the pailer window? In your browser console after you click on the

Re: Programmatically retrieve stdout/stderr from a node

2016-08-12 Thread Benjamin Mahler
Also I believe the CLI work that Haris / Kevin have been doing would make this easy to do via the Mesos CLI (it's not integrated into the project yet). On Wed, Aug 10, 2016 at 9:57 AM, Erik Weathers wrote: > Just for completeness and to provide an alternative, you can

Re: 1.0.1 release

2016-08-09 Thread Benjamin Mahler
All of the issues I've been shepherding have been fixed. The only one I see remaining is this one, but doesn't look like a blocking issue: https://issues.apache.org/jira/browse/MESOS-5985 Anything else that needs to go in? On Mon, Aug 1, 2016 at 4:19 PM, Vinod Kone wrote:

Re: Attributes cause agent to fail

2016-07-29 Thread Benjamin Mahler
Unfortunately we log termination messages to stderr rather than the logging files. Can you show stderr? I suspect we're printing the exit message there. See: https://issues.apache.org/jira/browse/MESOS-5854 On Fri, Jul 29, 2016 at 5:57 PM, Douglas Nelson wrote: > It might

Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Benjamin Mahler
+1 (binding) OS X 10.11.6 ./configure --disable-python --disable-java make check On Tue, Jul 26, 2016 at 10:24 AM, Greg Mann wrote: > +1 (non-binding) > > * Ran `sudo make distcheck` successfully on CentOS 7.1 with only one test > failure: ExamplesTest.PythonFramework fails

Using Mesos?

2016-06-30 Thread Benjamin Mahler
Just a reminder. If you're using Mesos and want to be featured in our list of users, send a PR to get your organization added: https://github.com/apache/mesos/blob/master/docs/powered-by-mesos.md If you've built a framework, and would like it featured in our list of frameworks, send a PR to get

Re: [VOTE] Release Apache Mesos 0.26.2 (rc1)

2016-06-22 Thread Benjamin Mahler
+1 (binding) Make check on OS X 10.11.5. On Mon, Jun 20, 2016 at 5:10 PM, Kapil Arya wrote: > +1 (binding) Internal CI build. > > Here is a link to the deb/rpm packages: > > http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-0.26.2-rc1 > > > > On Mon, Jun 20, 2016

GPU Support: Library Injection

2016-06-21 Thread Benjamin Mahler
Moving this to a new thread (see some context below). It may be worth exploring adding a generic mechanism for doing label-based injection of volumes: if a container is tagged with a particular label, we will inject a particular volume into the container. For Nvidia GPU containers, the operator

Re: [Compatibility] More strict parsing of ranges, e.g. port of resources

2016-06-15 Thread Benjamin Mahler
Sounds OK to me if there are no objections, since it should not be a difficult adjustment for users to make and users can use the more expressive JSON format for resources already. (e.g. https://github.com/dcos/dcos/blob/1.7-open/gen/dcos-config.yaml#L95) Also, please document this in the

Re: Flat Protobuf hierarchy

2016-06-15 Thread Benjamin Mahler
Is this the right project? https://github.com/tomas-abrahamsson/gpb If so, it seems to support package namespacing, just needs to be enabled: "Gpb can optionally make use of the package attribute by prepending the name of the package to every contained message type (if defined), which is useful

Re: Master configuration in the registry

2016-06-13 Thread Benjamin Mahler
> is largely immutable. > > > > Another distinction is that some configuration flags control behavior > > that doesn't need to be consistent between master replicas (e.g., > > "--ip", "--port", "--advertise-ip", "--advertise-port&q

Re: Welcome Anand and Joseph as new committers!

2016-06-10 Thread Benjamin Mahler
Welcome Anand and Joseph, thanks for all of your contributions! Looking forward to seeing your ongoing positive influences on the community and the project, let's build great software! On Thu, Jun 9, 2016 at 2:00 PM, Vinod Kone wrote: > Hi folks, > > I'm happy to announce

Re: Consequences of health-check timeouts?

2016-06-06 Thread Benjamin Mahler
I'll make sure this gets fixed for 1.0. Apologies for the pain, it looks like there is a significant amount of debt in the docker containerizer / executor. On Wed, May 18, 2016 at 10:51 AM, Steven Schlansker < sschlans...@opentable.com> wrote: > > > On May 18, 2016, at 10:44 AM, haosdent

Re: Mesos 0.24.1 on Raspberry Pi 3

2016-06-06 Thread Benjamin Mahler
Cool stuff Andrew, thanks for sharing! On Thu, Jun 2, 2016 at 11:50 AM, Andrew Spyker wrote: > FYI, based on the work others have done in the past, Netflix was able to > get Mesos agent building and running on Raspberry Pi natively and under > Docker containers.

Re: Status of MESOS-2533?

2016-05-04 Thread Benjamin Mahler
+AlexR On Mon, May 2, 2016 at 2:31 PM, Jeff Schroeder wrote: > Some frameworks like Aurora use custom executors to distribute the > healthchecks with the tasks. This allows the task to survive a network > partition without the scheduler setting it to TASK_LOST. > >

Re: stable remote branches

2016-04-25 Thread Benjamin Mahler
+user as an FYI Going forward we'll push directly to these branches as backport decisions are made. Since 0.28.x, 0.27.x, and 0.26.x have just been created, here is what was already marked for these versions, that we'll have to cherry-pick: The following need to be cherry-picked for 0.28.2:

Re: Question on slave recovery

2016-04-06 Thread Benjamin Mahler
guration options, I can see that there are two > options --strict and --recover but their defaults looks good. > > On Fri, Apr 1, 2016 at 2:40 AM, Benjamin Mahler <bmah...@apache.org> > wrote: > >> I'd recommend not using /tmp to store the meta-information because if

Re: [VOTE] Release Apache Mesos 0.25.1 (rc4)

2016-04-06 Thread Benjamin Mahler
+1 (binding) The following passes on OS X: $ ./configure CC=clang CXX=clang++ --disable-python --disable-java $ make check On Tue, Apr 5, 2016 at 11:41 PM, Michael Park wrote: > s/No changes from rc4/No changes from rc3/ > s/New fixes in rc5/New fixes in rc4/ > > On 5 April

Re: [VOTE] Release Apache Mesos 0.24.2 (rc5)

2016-04-06 Thread Benjamin Mahler
+1 (binding) The following passes on OS X: $ ./configure CC=clang CXX=clang++ --disable-python --disable-java $ make check On Tue, Apr 5, 2016 at 10:51 PM, Michael Park wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 0.24.2. > > > 0.24.2

Re: Question on slave recovery

2016-03-31 Thread Benjamin Mahler
I'd recommend not using /tmp to store the meta-information because if there is a tmpwatch it will remove things that we need for agent recovery. We probably should change the default --work_dir, or require that the user specify one. It's expected that wiping the work directory will cause the

Re: [VOTE] Release Apache Mesos 0.26.1 (rc3)

2016-03-31 Thread Benjamin Mahler
make check fails on OS X. Looks like we're missing the following: commit 363b0b059bdc7742b2258a33ebfe430fd03f4311 Author: Kapil Arya Date: Mon Jan 25 00:41:17 2016 -0500 Fixed non-linux build involving glog drop_log_meory flag. The variable

Re: [VOTE] Release Apache Mesos 0.25.1 (rc3)

2016-03-31 Thread Benjamin Mahler
make check fails on OS X. Looks like we're missing the following: commit 363b0b059bdc7742b2258a33ebfe430fd03f4311 Author: Kapil Arya Date: Mon Jan 25 00:41:17 2016 -0500 Fixed non-linux build involving glog drop_log_meory flag. The variable

Re: [VOTE] Release Apache Mesos 0.24.2 (rc4)

2016-03-31 Thread Benjamin Mahler
I'm seeing the following on OS X for the three RCs that were sent out: $ ./configure CC=clang CXX=clang++ --disable-python --disable-java ... $ make check -j7 ... ./mesos-tests dyld: Symbol not found: __ZN3fLB21FLAGS_drop_log_memoryE Referenced from:

Re: [VOTE] Release Apache Mesos 0.26.1 (rc2)

2016-03-23 Thread Benjamin Mahler
Also, I tagged https://issues.apache.org/jira/browse/MESOS-5021 with a fix version of 0.26.1. Can you include it? On Mon, Mar 21, 2016 at 1:59 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Yes it has existed for a long time but has only been discovered recently. &g

Re: [VOTE] Release Apache Mesos 0.25.1 (rc2)

2016-03-23 Thread Benjamin Mahler
Also, I tagged https://issues.apache.org/jira/browse/MESOS-5021 with a fix version of 0.25.1. Can you include it? On Sat, Mar 19, 2016 at 6:33 AM, Michael Park wrote: > As there are insufficient votes on this rc along with a request > from Evan Krall to include additional

Re: [VOTE] Release Apache Mesos 0.24.2 (rc2)

2016-03-23 Thread Benjamin Mahler
Also, I tagged https://issues.apache.org/jira/browse/MESOS-5021 with a fix version of 0.24.2. Can you include it? On Sat, Mar 19, 2016 at 6:30 AM, Michael Park wrote: > As there are insufficient votes on this rc along with a request > from Evan Krall to include additional

Re: 0.28.1

2016-03-23 Thread Benjamin Mahler
Thanks Jie, I've added a fix version of 0.28.1 to: https://issues.apache.org/jira/browse/MESOS-5021 On Fri, Mar 18, 2016 at 5:52 PM, Jie Yu wrote: > Hi, > > We recently noticed two bugs >

Re: [VOTE] Release Apache Mesos 0.26.1 (rc2)

2016-03-21 Thread Benjamin Mahler
a little > as to what the consequences are? > > Thanks! > > MPark > > On 18 March 2016 at 16:20, Benjamin Mahler <bmah...@apache.org> wrote: > >> These are be captured under: >> https://issues.apache.org/jira/browse/MESOS-4979 >> >> On

Re: [VOTE] Release Apache Mesos 0.26.1 (rc2)

2016-03-19 Thread Benjamin Mahler
These are be captured under: https://issues.apache.org/jira/browse/MESOS-4979 On Thu, Mar 17, 2016 at 5:04 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Thanks for the hard work! Do we need to backport the rmdir fixes on the > outstanding release candidates

Re: [VOTE] Release Apache Mesos 0.24.2 (rc2)

2016-03-18 Thread Benjamin Mahler
+michael who is managing the release, he'll get back to you shortly, apologies for the delay! On Fri, Mar 11, 2016 at 11:35 AM, Evan Krall wrote: > I humbly request that the fixes for these issues are also included in > 0.24.2: > >

Re: How to kill tasks when memory exceeds the cgroup limit?

2016-03-18 Thread Benjamin Mahler
Interesting, why does it take down the slaves? Because a lot of organizations run with swap disabled (e.g. for more deterministic performance), we originally did not set the swap limit at all. When we introduced the '--cgroups_limit_swap' flag we had to make it default to false initially in case

Re: Re: Mesos 0.25 not incresing Staged/Started counters in the UI

2016-03-07 Thread Benjamin Mahler
Non-terminal states are gauges (instantaneous measurements) whereas the terminal states are counters (always increasing, at least for the lifetime of a master process). Hopefully this image doesn't get stripped, but we improved the wording here to clarify which are gauges and which are counters:

Re: mesos agent not recovering after ZK init failure

2016-03-07 Thread Benjamin Mahler
t; I don't have the exit status. We haven't seen a repeat yet, will catch the > exit status next time it happens. > > Yes, removing the metadata directory was the only way it was resolved. > This happened on multiple hosts requiring the same resolution. > > > On Thu, Feb 25, 20

Re: mesos agent not recovering after ZK init failure

2016-02-25 Thread Benjamin Mahler
the >> detector.cpp:481 log line. >> -The agents that continue to flap repaired with manual removal of >> contents in mesos-slave's working dir >> >> >> >> On Wed, Feb 10, 2016 at 9:43 AM, Benjamin Mahler <bmah...@apache.org> >> wrote: >&g

Re: mesos agent not recovering after ZK init failure

2016-02-10 Thread Benjamin Mahler
Hey Sharma, I didn't quite follow the timeline of events here or how the agent logs you posted fit into the timeline of events. Here's how I interpreted: -Agent running fine with 0.24.1 -Transient ZK issues, slave flapping with zookeeper_init failure -ZK issue resolved -Most agents stop flapping

Re: Endpoint documentation is now published!

2016-02-08 Thread Benjamin Mahler
could try some api generators like http://swagger.io/ or > https://github.com/apidoc/apidoc > > On Tue, Feb 9, 2016 at 1:10 AM, Benjamin Mahler <bmah...@apache.org> > wrote: > >> We now have endpoint documentation published on the website: >> >> http://mesos.apa

Endpoint documentation is now published!

2016-02-08 Thread Benjamin Mahler
We now have endpoint documentation published on the website: http://mesos.apache.org/documentation/latest/endpoints/ https://issues.apache.org/jira/browse/MESOS-3831 A big thank you goes out to Kevin Klues who made this happen, thanks also goes out to Neil Conway for making the suggestion! Our

Re: [RESULT][VOTE] Release Apache Mesos 0.27.0 (rc2)

2016-02-03 Thread Benjamin Mahler
Great! Is a blog post on the way? On Sun, Jan 31, 2016 at 5:39 PM, Michael Park wrote: > Hi all, > > The vote for Mesos 0.27.0 (rc2) has passed with the > following votes. > > +1 (Binding) > -- > Vinod Kone > Joris Van Remoortere > Till Toenshoff >

Design Doc: Initial Support for GPGPU Resources

2016-02-02 Thread Benjamin Mahler
Hi folks, On behalf of the GPU working group [1] I'd like to share a design doc for adding some initial support for GPU resources in Mesos: JIRA Epic: https://issues.apache.org/jira/browse/MESOS-4424 Design Doc:

Re: mesos 0.23, long term quering state.json data.

2016-02-01 Thread Benjamin Mahler
It's unlikely that a single response took 5 minutes for the master to generate. It's more likely that the master was backlogged and it took the majority of the 5 minutes for the backlog to be processed. For example, if you have a number of webui instances open, they will each be polling

Re: Mesos sometimes not allocating the entire cluster

2016-01-29 Thread Benjamin Mahler
Hi Tom, I suspect you may be tripping the following issue: https://issues.apache.org/jira/browse/MESOS-4302 Please have a read through this and see if it applies here. You may also be able to apply the fix to your cluster to see if that helps things. Ben On Wed, Jan 20, 2016 at 10:19 AM, Tom

Re: Get Task's labels on reconciliation

2016-01-29 Thread Benjamin Mahler
I see that the following was filed: https://issues.apache.org/jira/browse/MESOS-4477 But this sounds like a bug: if the master knows about the task during reconciliation, labels should be sent. For example we had this same bug for health information:

Re: Tasks failing when restarting slave on Mesos 0.23.1

2016-01-19 Thread Benjamin Mahler
>From the slave (now known as agent) logs: I0114 14:09:51.297840 23049 slave.cpp:3967] Sending reconnect request to executor thermos-1452181970177-USER-prod-JOB_NAME-0-99a16851-42d6-4a52-b768-359b4f499ff3 of framework 20150930-134812-84017418-5050-29407-0001 at executor(1)@NET.10:57730 I0114

Re: Share GPU resources via attributes or as custom resources (INTERNAL)

2016-01-19 Thread Benjamin Mahler
;> isolation). Our initial proposal is not exposing details of GPU but >> subsequently more detail of GPU resources like (topology, memory, core, >> bandwidth etc.) will be exposed to do better job scheduling. >> >> >> >> As Ben indicated very soon we will se

Re: Share GPU resources via attributes or as custom resources (INTERNAL)

2016-01-16 Thread Benjamin Mahler
There is a design proposal coming that will include guidance around using GPUs and better GPU support in mesos, so stay tuned. Mesos supports adding arbitrary resources, e.g. --resources=cpus(*):4;gpus(*):4 Mesos will then manage a scalar "gpu" resource with a value of 4. This means "gpu"

Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-10 Thread Benjamin Mahler
ce > 0.20 AFAIK. > - There is a simple workaround. > > Bernd > > On Dec 10, 2015, at 3:05 AM, Benjamin Mahler <benjamin.mah...@gmail.com> > wrote: > > I'd really like to pull in the fix for: > https://issues.apache.org/jira/browse/MESOS-4106 > > This has been a long s

Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-10 Thread Benjamin Mahler
, Dec 10, 2015 at 11:22 AM, Benjamin Mahler <benjamin.mah...@gmail.com > wrote: > What is the workaround? > > On Thu, Dec 10, 2015 at 4:37 AM, Bernd Mathiske <be...@mesosphere.io> > wrote: > >> I think that whereas this would clearly be a desirable bug fi

Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-09 Thread Benjamin Mahler
I'd really like to pull in the fix for: https://issues.apache.org/jira/browse/MESOS-4106 This has been a long standing bug that makes the health checking not function correctly some of the time. While it is rare in CI, it appeared in a colleague's cluster for about a third of the tasks he was

Re: GenOuest makes use of Mesos

2015-12-08 Thread Benjamin Mahler
Great to hear Olivier, would you like to be added to the powered by mesos list? https://github.com/apache/mesos/blob/master/docs/powered-by-mesos.md On Tue, Dec 8, 2015 at 1:04 PM, Arunabha Ghosh wrote: > Welcome to the community, Oliver. > > On Tue, Dec 8, 2015 at 5:49

<    1   2   3   4   >