Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-04-03 Thread Benjamin Mahler
Just an update here, in late January we finished upstreaming our internal patches that were upstreamable. This amounted to 35 patches. The cgroups v2 work is ongoing, we're hoping to be mostly code complete by the end of this month. On Fri, Jan 12, 2024 at 6:01 PM Benjamin Mahler

Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-01-16 Thread Benjamin Mahler
aQ!uCG12YNoPBcCb2z5v__gdjEEdxzUQsTntmDTSOVDMeG_sbq2zT58dS9IisTDUqIVlh18jPbSXruBk3U$ > > > > > >> | consultancy > > >> < > > > https://urldefense.com/v3/__https://offscale.io__;!!PWjfaQ!uCG12YNoPBcCb2z5v__gdjEEdxzUQsTntmDTSOVDMeG_sbq2zT58dS9IisTDUqIVlh18jPbSQ

Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-01-12 Thread Benjamin Mahler
+user@ On Fri, Jan 12, 2024 at 5:55 PM Benjamin Mahler wrote: > As part of upgrading to CentOS 9 at X/Twitter, Shatil / Devin (cc'ed) will > be working on: > > * Upgrading to Python 3 > * Cgroups v2 support > > We will attempt to upstream this work for the bene

Re: Next steps for Mesos

2023-03-20 Thread Benjamin Mahler
Also if you are still a user of mesos, please chime in. Qian, it might be worth having a more explicit email asking users to chime in as this email was tailored more for contributors. Twitter is still using mesos heavily, we upgraded from a branch based off of 1.2.x to 1.9.x in 2021, but haven't u

Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Benjamin Mahler
+1 (binding) Thanks to all who contributed to the project. On Mon, Apr 5, 2021 at 1:58 PM Vinod Kone wrote: > Hi folks, > > Based on the recent conversations > < > https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cuser.mesos.apache.org%3E > > >

Re: Feature requests for Mesos

2021-03-08 Thread Benjamin Mahler
I think the key issues have been brought up by Benjamin and Renan. Just to add to Benjamin's comments above, achieving those key markers of a healthy project requires serious corporate backing such that people are being employed to primarily work on Mesos. It takes a lot of work to keep the projec

Re: Slow communications between components

2020-11-08 Thread Benjamin Mahler
Which version? I'm not sure what you're observing but slower responses is usually due to backlogging from expensive requests (like /state), however we made several changes that have made it much less of a potential problem (see the blog posts). How much CPU is the master consuming? What kind of l

Re: Changing logging timestamp

2020-09-21 Thread Benjamin Mahler
Mesos uses Google's glog library for logging: https://github.com/google/glog I believe it just prints the local time. You can see that it's produced by a call to gettimeofday(&tv, NULL) and localtime_r (not gmtime_r): https://github.com/google/glog/blob/v0.4.0/src/logging.cc#L1267 https://github.

Re: No offers are being made -- how to debug Mesos?

2020-06-06 Thread Benjamin Mahler
Don't worry about that "Ignoring" message on the agent. When the framework information is updated, the master broadcasts it to the agents, and in this case the agent doesn't know about the framework since it has no tasks for it, and so it ignores the updated information. I can't quite tell from th

Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-27 Thread Benjamin Mahler
+1 (binding) On Mon, May 18, 2020 at 4:36 PM Andrei Sekretenko wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.10.0. > > 1.10.0 includes the following major improvements: > >

Re: [VOTE] Release Apache Mesos 1.7.3 (rc1)

2020-05-07 Thread Benjamin Mahler
+1 (binding) On Mon, May 4, 2020 at 1:48 PM Greg Mann wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.7.3. > > The CHANGELOG for the release is available at: > > https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.3-rc1 > >

Re: Found no roles suitable for revive repetition.

2020-03-19 Thread Benjamin Mahler
on? > > Thanks, > Marc > > > > -Original Message- > From: Benjamin Mahler [mailto:bmah...@apache.org] > Sent: 18 March 2020 18:32 > To: user > Subject: Re: Found no roles suitable for revive repetition. > > Hi Marc, can you contact the marathon mailing

Re: registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime.

2020-03-18 Thread Benjamin Mahler
Same here, please reach out to marathon support channels and include additional context. On Wed, Mar 18, 2020 at 12:27 PM Marc Roos wrote: > > I am having these, has been reported already on Jira long time ago. How > to fix these? > > > > der mesosphere.marathon.api.v2.PodsResource will be ignor

Re: Found no roles suitable for revive repetition.

2020-03-18 Thread Benjamin Mahler
Hi Marc, can you contact the marathon mailing list or slack channel. Also, if there is a question here or some more context, please include that so they know what you need help with. On Wed, Mar 18, 2020 at 9:46 AM Marc Roos wrote: > > > Marathon is stuck on 'loading applications' > > > Mar 18

Re: Kill task, but not restarted

2020-02-03 Thread Benjamin Mahler
There's not enough information to understand the situation. How did you kill the task? Did the task get correctly marked as killed? Did the killed notification get correctly acknowledged? On Sun, Feb 2, 2020 at 9:04 AM Marc Roos wrote: > > > > Because the instance was not showing in the maratho

Welcome Andrei Sekretenko as a new committer and PMC member!

2020-01-21 Thread Benjamin Mahler
Please join me in welcoming Andrei Sekretenko as the newest committer and PMC member! Andrei has been active in the project for almost a year at this point and has been a productive and collaborative member of the community. He has helped out a lot with allocator work, both with code and investig

Re: Task Pinning

2019-10-22 Thread Benjamin Mahler
It's easier to do something custom for your own needs than to bring generic support into the project. For example, in kubernetes, as far as I can tell they offer two modes for the agent: "static" (i.e. pinning for integer requests) and "none" (regular shares / limit model). https://kubernetes.io/d

Re: large task scheduling on multi-framework cluster

2019-10-01 Thread Benjamin Mahler
Note that with the newest marathon that is capable of handling multiple roles, you would not need to run a dedicated marathon instance. On Tue, Oct 1, 2019 at 8:17 AM Grégoire Seux wrote: > Hello, > > I'm wondering how other mesos users deal with scheduling of large tasks > (using all resources

Re: reservations from terminated frameworks

2019-09-30 Thread Benjamin Mahler
Hi Hendrik, currently reservations are tied to a role, not framework. In this case, it's a static reservation which means you need to update the agent configuration and restart it destructively (we don't currently support a non-destructive non-additive agent resources change). If it was a dynamic r

Re: Attach Shared Volume to new tasks

2019-09-26 Thread Benjamin Mahler
Can you show the full resource information from the offer? On Tue, Sep 10, 2019 at 6:50 AM Harold Molina-Bulla wrote: > Hi everybody, > > We are implementing a Scheduler for Mesos in python, and we need to attach > a preconfigured shared volume to a new Task. The Shared volume is now > offered b

Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Benjamin Mahler
> We upgraded the version of the bundled boost very late in the release cycle Did we? We still bundle boost 1.65.0, just like we did during 1.8.x. We just adjusted our special stripped bundle to include additional headers. On Tue, Aug 27, 2019 at 1:39 PM Vinod Kone wrote: > -1 > > We upgraded t

[Performance / Resource Management WG] August Update

2019-08-21 Thread Benjamin Mahler
Can't make today's meeting, so sending out some notes: On the performance front: - Long Fei reported a slow master, and perf data indicates a lot of time is spent handling executor churn, this can be easily improved: https://issues.apache.org/jira/browse/MESOS-9948 On the resource management fro

Re: Mesos 1.9.0 release

2019-08-13 Thread Benjamin Mahler
Thanks for taking this on Qian! I seem to be unable to view the dashboard. Also, when are we aiming to make the cut? On Tue, Aug 13, 2019 at 10:58 PM Qian Zhang wrote: > Folks, > > It is time for Mesos 1.9.0 release and I am the release manager. Here is > the dashboard: > https://issues.apache.

Re: Should mesos 1.8 (and marathon 1.8) drain/migrate tasks or not?

2019-08-13 Thread Benjamin Mahler
(had to join the marathon-framework group to post to it, re-sending) On Tue, Aug 13, 2019 at 1:26 PM Benjamin Mahler wrote: > > I know DRAIN_AGENT is only for mesos 1.9. But what use it to post a > > maintenance schedule, see the node being marked as draining, and nothing > >

Re: Should mesos 1.8 (and marathon 1.8) drain/migrate tasks or not?

2019-08-13 Thread Benjamin Mahler
> I know DRAIN_AGENT is only for mesos 1.9. But what use it to post a > maintenance schedule, see the node being marked as draining, and nothing > happens with the tasks? The maintenance schedules require that schedulers implement support for them. Nothing happens if the scheduler does not have su

Re: Mesos-dns srv weight

2019-08-01 Thread Benjamin Mahler
Please seek support through the mesos DNS channels: https://github.com/mesosphere/mesos-dns#contact On Fri, Jul 26, 2019 at 9:50 AM Marc Roos wrote: > > Is it possible to configure a task with srv record weight? > > > [@ mesos-cni]# dig +short @192.168.10.151 _webchat._tcp.marathon.mesos > SRV >

[Performance / Resource Management WG] July Update

2019-07-17 Thread Benjamin Mahler
On the resource management front, Meng Zhu, Andrei Sekretenko, and myself have been working on quota limits and enhancing multi-role framework support: - A memory leak in the allocator was fixed: MESOS-9852 - Support for quota limits work is well underway, and at this point the major pieces are t

Re: Design doc: Agent draining and deprecation of maintenance primitives

2019-06-06 Thread Benjamin Mahler
> With the new proposal, it's going to be as difficult as before to have SLA-aware maintenances because it will need cooperation from the frameworks anyway and we know this is rarely a priority for them. We will also lose the ability to signal future maintenance in order to optimize allocations. P

[Performance / Resource management WG] Notes in lieu of tomorrow's meeting

2019-05-13 Thread Benjamin Mahler
I'm out of the country and so I'm sending out notes in lieu of tomorrow's performance / resource management meeting. Resource Management: - Work is underway for adding the UPDATE_FRAMEWORK scheduler::Call. - Some fixes and small performance improvements landed for the random sorter. - Perf data f

Re: Slack upgrade to Standard plan. Thanks Criteo

2019-04-25 Thread Benjamin Mahler
Thank you Criteo! On Tue, Apr 23, 2019 at 1:12 PM Vinod Kone wrote: > Hi folks, > > As you probably realized today, we got our Slack upgraded from "free" plan > to "standard" plan, which allows us to have unlimited message history and > better analytics among other things! This would be great fo

Performance / Resource Management Update

2019-04-17 Thread Benjamin Mahler
In lieu of today's meeting, this is an email update: The 1.8 release process is underway, and it includes a few performance related changes: - Parallel reads for the v0 API have been extended to all other v0 read only endpoints (e.g. /state-summary, /roles, etc). Whereas in 1.7.0, only /state had

Re: Subject: [VOTE] Release Apache Mesos 1.8.0 (rc1)

2019-04-15 Thread Benjamin Mahler
The CHANGELOG highlights seem a bit lacking? - For some reason, the task CLI command is listed in a performance section? - The parallel endpoint serving changes are in the longer list of items, seems like we highlight them in the performance section? Maybe we could be specific too about what we di

Re: Mesos Master Crashes when Task launched with LAUNCH_GROUP fails

2019-03-01 Thread Benjamin Mahler
For posterity: https://issues.apache.org/jira/browse/MESOS-9619 On Thu, Feb 28, 2019 at 6:02 PM Meng Zhu wrote: > Hi Nimi: > > Thanks for reporting this. > > From the log snippet, looks like, when de-allocating resources, the agent > does not have the port resources that is supposed to have been

Re: Enabling framework authentication Loaded deprecated flag 'authenticate'

2019-02-15 Thread Benjamin Mahler
The --authenticate master flag has been renamed: https://github.com/apache/mesos/blob/1.7.1/src/master/flags.cpp#L221 So yes, the documentation you linked to needs an update. On Fri, Feb 15, 2019 at 12:26 PM Marc Roos wrote: > > [@]# cat /etc/mesos-master/authenticate > true > > Is this page o

Re: centos7/el7 newer marathon rpms

2019-02-12 Thread Benjamin Mahler
Hi Marc, You can reach out to the marathon community to get this question answered: https://mesosphere.github.io/marathon/support.html Ben On Fri, Feb 8, 2019 at 6:33 PM Marc Roos wrote: > > Where can I get newer marathon rpms, currently I am getting them from > here > > > > http://repos.mesos

Re: Check failed: reservationScalarQuantities.contains(role)

2019-02-06 Thread Benjamin Mahler
Thanks for reporting this, we can help investigate this with you in JIRA. On Tue, Feb 5, 2019 at 5:40 PM Jeff Pollard wrote: > Thanks for the info. I did find the "Removed agent" line as you suspected, > but not much else in logging looked promising. I opened a JIRA to track > from here on out h

Re: Welcome Benno Evers as committer and PMC member!

2019-01-30 Thread Benjamin Mahler
Welcome Benno! Thanks for all the great contributions On Wed, Jan 30, 2019 at 6:21 PM Alex R wrote: > Folks, > > Please welcome Benno Evers as an Apache committer and PMC member of the > Apache Mesos! > > Benno has been active in the project for more than a year now and has made > significant co

Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.14.2 $ clang++ --version Apple LLVM version 10.0.0 (clang-1000.10.44.4) Target: x86_64-apple-darwin18.2.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin $ ./configure CC=clang CXX=clang++ CXXFLAGS="-Wno-deprecated-declarat

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-10 Thread Benjamin Mahler
I think we're agreed: -There are no schedulers modeling the existing per-agent time-based filters that mesos is tracking, and we shouldn't go in a direction that encourages frameworks to try to model and manage these. So, we should be very careful in considering something like CLEAR_FILTERS. W

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-05 Thread Benjamin Mahler
Thanks for bringing REQUEST_RESOURCES up for discussion, it's one of the mechanisms that we've been considering for further scaling pessimistic offers before we make the migration to optimistic offers. It's also been referred to as "demand" rather than "request", but for the sake of this discussion

Re: [API WG] Proposals for dealing with master subscriber leaks.

2018-11-11 Thread Benjamin Mahler
>- We can add heartbeats to the SUBSCRIBE call. > This would need to be > part of a separate operator Call, because one platform (browsers) that > might subscribe to the master does not support two-way streaming. This doesn't make sense to me, the heartbeats should still be part of the same c

Re: Rhythm - time-based job scheduler

2018-11-02 Thread Benjamin Mahler
Thanks for sharing Michał! could you tell us how you (or your employer) are using it? On Tue, Oct 30, 2018 at 10:34 AM Michał Łowicki wrote: > Hey! > > I would like to announce project I've been working on recently - > https://github.com/mlowicki/rhythm. It's a Cron-like scheduler with > couple

Welcome Meng Zhu as PMC member and committer!

2018-10-31 Thread Benjamin Mahler
Please join me in welcoming Meng Zhu as a PMC member and committer! Meng has been active in the project for almost a year and has been very productive and collaborative. He is now one of the few people of understands the allocator code well, as well as the roadmap for this area of the project. He

Re: Dedup mesos agent status updates at framework

2018-10-28 Thread Benjamin Mahler
scheduler will remove the status > update from the queue, and in case of failure, Mesos Master will send > status update again. > > > > On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler > wrote: > > > Which version of mesos are you running? > > > > >

Re: Dedup mesos agent status updates at framework

2018-10-28 Thread Benjamin Mahler
h default backoff period from 10s -> 30s or > 60s, and simultaneously explore if dedup is an option. > > Thanks, > Varun > > On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler > wrote: > > > Hi Varun, > > > > What problem are you trying to solve precisely? The

Re: Dedup mesos agent status updates at framework

2018-10-28 Thread Benjamin Mahler
Hi Varun, What problem are you trying to solve precisely? There seems to be an implication that the duplicate acknowledgements are expensive. They should be low cost, so that's rather surprising. Do you have any data related to this? You can also tune the backoff rate on the agents, if the defaul

Re: Proposal: Adding health check definitions to master state output

2018-10-18 Thread Benjamin Mahler
> It's worth mentioning that I believe the original intention of the 'Task' > message was to contain most information contained in 'TaskInfo', except for > those fields which could grow very large, like the 'data' field. +1 all task / executor metadata should be exposed IMO. I look at the 'data' f

1.7.x Performance Improvements Blog Post

2018-10-09 Thread Benjamin Mahler
We published a blog post highlighting the performance improvements in Mesos 1.7.x, take a look! https://twitter.com/ApacheMesos/status/1049740950359044096 Ben

Re: Vote now for MesosCon 2018 proposals!

2018-09-25 Thread Benjamin Mahler
Voted! Thanks Jörg and the PC! On Thu, Sep 20, 2018 at 9:51 AM Jörg Schad wrote: > Dear Mesos Community, > > Please take a few minutes over the next few days and review what members > of the community have submitted for MesosCon 2018 > (which will be held in San Francis

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.13.6 with Apple LLVM version 9.1.0 (clang-902.0.39.2). Thanks Kapil! On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.4.2. > > 1.4.2 is a bug fix release. The CHANGELOG

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
This was fixed in https://github.com/apache/mesos/commit/02ad5c8cdd644ee8eec83bf887daa98bb163637d, I don't recall there being any issues due to it. On Mon, Aug 13, 2018 at 4:50 PM, Benjamin Mahler wrote: > Hm.. I ran make check on macOS saw the following: &

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
Hm.. I ran make check on macOS saw the following: [ RUN ] AwaitTest.AwaitSingleDiscard src/tests/collect_tests.cpp:275: Failure Value of: promise.future().hasDiscard() Actual: false Expected: true [ FAILED ] AwaitTest.AwaitSingleDiscard (0 ms) On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya

Re: Understand fixed resource estimator to get oversubscribe resources

2018-08-10 Thread Benjamin Mahler
The fixed resource estimator provides a fixed size revocable pool: if you tell it to create a 24 cpu revocable pool, there will be a 24 cpu revocable pool. It is not looking at utilization slack. On Mon, Aug 6, 2018 at 2:28 PM, Varun Gupta wrote: > Hi, > > I was reading the code >

Re: Backport Policy

2018-07-26 Thread Benjamin Mahler
ossible: keep behavior >>> > consistent >>> > >>> (and safe) within a release. With that as the goal of a branch in >>> > >>> maintenance mode, it makes sense to fix regressions, and make >>> > exceptions to >>> > >>

Mesos 1.7.x and JSON clients

2018-07-25 Thread Benjamin Mahler
TLDR: If you use a spec-compliant JSON parser, you will observe no change in Mesos 1.7.x and everything will continue to work as before. Longer version: JSON allows strings to be encoded in several different ways. For example, "/" can be encoded directly as "/", or "\/", "\u002F", or "\u002f". A

Re: Backport Policy

2018-07-12 Thread Benjamin Mahler
anks for the clarification. I'm in agreement with the points you > > made. > > > > Once we have consensus, would you mind updating the doc? > > > > On Wed, Jul 11, 2018 at 5:15 PM Benjamin Mahler > > wrote: > > > > > I realized recently that w

Re: Normalization of metric keys

2018-07-06 Thread Benjamin Mahler
pace character, which could very well appear in framework names. I > don't think we have a good reason on our side to substitute whitespace, but > perhaps its presence in the metric keys would cause issues with external > tooling? > > Greg > > [1] https://github.com/m

Re: Normalization of metric keys

2018-07-03 Thread Benjamin Mahler
I don't think the lack of principal normalization was intentional. Why spread that further? Don't we also have some normalization today? Having slashes show up in components complicates parsing (can no longer split on '/'), no? For example, if we were to introduce the ability to query a subset of

Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-29 Thread Benjamin Mahler
On Wed, May 23, 2018 at 11:39 AM, Michael Park wrote: > >> Huh... 🤔 Super weird. I'll look into it. >> >> Thanks for checking! >> >> MPark >> >> On Wed, May 23, 2018 at 11:34 AM Vinod Kone wrote: >> >>> It's empty for

Re: [VOTE] Release Apache Mesos 1.5.1 (rc1)

2018-05-23 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.13.4 with Apple LLVM version 9.1.0 (clang-902.0.39.1) On Fri, May 11, 2018 at 12:35 PM, Gilbert Song wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.5.1. > > 1.5.1 includes the following: > --

Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-23 Thread Benjamin Mahler
Thanks Michael! Looks like the tar.gz is empty, is it just me? On Tue, May 22, 2018 at 10:09 PM, Michael Park wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.3.3. > > The CHANGELOG for the release is available at: > https://git-wip-us.apache.org/repos/asf

Re: Mesos Roles | Min or Max ?

2018-05-21 Thread Benjamin Mahler
Currently a role either has no guarantee and no limit, or a guarantee and limit set to the same amount of resources. The work is underway to allow setting limit distinct from guarantee: https://issues.apache.org/jira/browse/MESOS-8068 On Mon, May 21, 2018 at 4:17 PM Ken Sipe wrote: > Hey Trevor

Re: Operator ReadFile API

2018-05-05 Thread Benjamin Mahler
Yes, it's base64 encoded. The protobuf schema defines this field of type "bytes": https://github.com/apache/mesos/blob/1.5.0/include/mesos/v1/agent/agent.proto#L460 When converted to JSON, this follows the standard protobuf -> JSON conversion by converting "bytes" fields into base64 encoded strin

Re: mesos-slave Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98]

2018-05-03 Thread Benjamin Mahler
>From the man page for bind: *EADDRINUSE* (Internet domain sockets) The port number was specified as zero in the socket address structure, but, upon attempting to bind to an ephemeral port, it was determined that all port numbers in the ephem

Re: Reason of cascaded kill in a group

2018-04-10 Thread Benjamin Mahler
Are you saying that there was no reason previously, and there would be a reason after the change? If so, adding a reason where one did not exist is safe from a backwards compatibility perspective. On Mon, Apr 9, 2018 at 10:32 AM, Zhitao Li wrote: > Hi, > > We are considering adding a new reason

Re: Troubleshooting Mesos SSL setup

2018-04-10 Thread Benjamin Mahler
Are there bugs here? Is there anything that mesos could have logged / handled better? On Fri, Mar 16, 2018 at 11:46 AM, Renan DelValle wrote: > Follow up, we weren't able to get our wildcard certificate working but we > did get it to work when we used a certificate for a single hostname. > > Al

Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
illed after > some timeout. We currently have some logic in our scheduler to kill these > tasks. Would be nice to delegate this to the executor. > > - Sagar > > On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler > wrote: > > > Sagar, could you share your use case?

Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Sagar, could you share your use case? Or is it exactly the same as Zhitao's? On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan wrote: > +1 > > This will be useful for us(Yelp) as well. > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler > wrote: > >

Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Also, it's advantageous for mesos to be aware of a hard deadline when it comes to resource allocation. We know that some resources will free up and can make better decisions when it comes to pre-emption, for example. Currently, mesos doesn't know if a task will run forever or will run to completion

Re: Mesos scalability

2018-03-23 Thread Benjamin Mahler
Hi Karan, Only one master can be elected leader in the current architecture. It's unlikely we're at a point where we need to balance work across masters to push scalability further. That comes with a lot of complexity, and we still have a lot of room for performance improvements on a single leader

Re: Mesos on OS X

2018-03-21 Thread Benjamin Mahler
MacOS is a supported platform, you can see the supported versions here: http://mesos.apache.org/documentation/latest/building/ The containerization maintainers could probably chime in to elaborate on the isolation caveats. For example, you won't have many of the resource isolators available and th

Re: 答复: 答复: Status update: task 1 is in state TASK_ERROR

2018-03-16 Thread Benjamin Mahler
gt; > > And after that I met another problem: my task is always in staging, and > terminates after 1min due to timeout. I think there are many mini process > in a scheduler app including callbacks, such as connect, register, get > offers list,accpet offer and etc. Is there a detail pr

Re: Welcome Zhitao Li as Mesos Committer and PMC Member

2018-03-12 Thread Benjamin Mahler
Welcome Zhitao! Thanks for your contributions so far On Mon, Mar 12, 2018 at 2:02 PM, Gilbert Song wrote: > Hi, > > I am excited to announce that the PMC has voted Zhitao Li as a new > committer and member of PMC for the Apache Mesos project. Please join me to > congratulate Zhitao! > > Zhitao h

Re: Welcome Chun-Hung Hsiao as Mesos Committer and PMC Member

2018-03-12 Thread Benjamin Mahler
Welcome Chun! It's been great discussing things with you so far and thanks for the all the hard work! On Sat, Mar 10, 2018 at 9:14 PM, Jie Yu wrote: > Hi, > > I am happy to announce that the PMC has voted Chun-Hung Hsiao as a new > committer and member of PMC for the Apache Mesos project. Please

Re: 答复: Status update: task 1 is in state TASK_ERROR

2018-03-09 Thread Benjamin Mahler
ntroller)(reservations: > [(STATIC,controller)]):550264; ports(allocated: > controller):[31000-32000] > 233 Status update: task 1 is in state TASK_ERROR > > > > 罗辉 > > 基础架构 > -- > *发件人:* Benjamin Mahler > *发送时间:* 2018年3月9日 9:24:37

Re: Status update: task 1 is in state TASK_ERROR

2018-03-08 Thread Benjamin Mahler
Can you log the message provided in the TaskStatus? https://github.com/apache/mesos/blob/1.5.0/include/ mesos/v1/mesos.proto#L2424 On Wed, Mar 7, 2018 at 11:23 PM, 罗 辉 wrote: > Hi guys: > > I got a mesos test app, mostly likely > > https://github.com/apache/mesos/blob/master/src/java/src/ >

Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Benjamin Mahler
Put another way, we currently don't guarantee in-order task delivery to the executor. Due to the changes for MESOS-1720, one special case of task re-ordering now leads to the re-ordered task being dropped (rather than delivered out-of-order as before). Technically, this is strictly better. However

Re: is there any docs to show how to secure http(s) for masters

2018-02-23 Thread Benjamin Mahler
+Alexander On Mon, Feb 19, 2018 at 11:00 AM Mclain, Warren wrote: > I am not finding any documentation that tells you how to actually > implement the following on the mesos masters and agents. > > > > authenticate=true > > authenticate_http_readonly=true > > authenticate_http_readwrite=true > >

Re: http://mesos.apache.org/downloads/ is not up to date

2018-02-12 Thread Benjamin Mahler
Thanks for pointing this out Adam, I've added mpark who is the release manager for 1.3.2. On Tue, Feb 6, 2018 at 6:12 AM, Adam Cecile wrote: > Hi guys, > > > Did you notice Mesos 1.3.2 is missing from the official download page ? > > http://mesos.apache.org/downloads/ > > > Regards, Adam. > >

Reminder: Design Doc for Mesos CLI Re-design

2018-02-12 Thread Benjamin Mahler
I've heard a lot of interest in there being investment in the mesos CLI. For those that are interested, please take a look at the re-design doc and share your feedback: https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVsc gYqrD07OyIglsA/edit Feel free to make comments in the doc, sug

Re: Questions about Pods and the Mesos Containerizer

2018-01-29 Thread Benjamin Mahler
If moving the conversation to slack, it would be great to post back to the list with a summary! On Mon, Jan 29, 2018 at 1:38 PM, Vinod Kone wrote: > Hi David, > > It's probably worth having a synchronous discussion around your proposed > approach in our slack. I would like to understand if TASK_

Re: java driver/shutdown call

2018-01-17 Thread Benjamin Mahler
t; > On Tue, Jan 16, 2018 at 6:40 PM, Benjamin Mahler > wrote: > >> Mohit, what are you trying to accomplish by going from KILL to SHUTDOWN? >> >> On Tue, Jan 16, 2018 at 5:15 PM, Joseph Wu wrote: >> >>> If a framework launches tasks, then it will use

Re: Mesos slave ID change after reboot

2018-01-16 Thread Benjamin Mahler
Yes, the agent used to check for the boot id having changed in order to decide whether to try to recover. On Wed, Jan 10, 2018 at 5:53 PM, Srikanth Viswanathan wrote: > I am trying to understand under what cases the mesos slave ID changes in > response to reboot. I noticed this note at http://m

Re: java driver/shutdown call

2018-01-16 Thread Benjamin Mahler
Mohit, what are you trying to accomplish by going from KILL to SHUTDOWN? On Tue, Jan 16, 2018 at 5:15 PM, Joseph Wu wrote: > If a framework launches tasks, then it will use an executor. Mesos > provides a "default" executor if the framework doesn't explicitly specify > an executor. (And the Sh

Re: Duplicate task ID for same framework on different agents

2017-12-21 Thread Benjamin Mahler
It's a known issue: https://issues.apache.org/jira/browse/MESOS-3070 Putting in place a protection mechanism sounds good, but is rather complicated. See the comment in this ticket: https://issues.apache.org/jira/browse/MESOS-6785 On Wed, Dec 20, 2017 at 8:26 PM, Zhitao Li wrote: > Hi all, > > W

Re: Mesos 1.5.0 Release

2017-12-21 Thread Benjamin Mahler
Meng is working on https://issues.apache.org/jira/browse/MESOS-8352 and we should land it tonight if not tomorrow. I can cherry pick if it's after your cut, and worst case it can go in 1.5.1. Have you guys gone over the unresolved items targeted for 1.5.0? I see a lot of stuff, might be good to st

Re: [VOTE] Release Apache Mesos 1.3.2 (rc1)

2017-12-14 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.13.2 with Apple LLVM version 9.0.0 (clang-900.0.39.2) On Thu, Dec 7, 2017 at 2:44 PM, Michael Park wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.3.2. > > The CHANGELOG for the release is available at: > https:/

December Performance Working Group Report

2017-12-11 Thread Benjamin Mahler
The December performance working group report is published on the website here: http://mesos.apache.org/blog/performance-working-group-progress-report/ This report highlights the progress we've made recently in the performance of master failover. Special thanks to Dmitry Zhuk, Michael Park and Yan

Re: Resource allocation cycle in DRF for multiple frameworks

2017-12-05 Thread Benjamin Mahler
framework, it immediately comes to to next available framework even though > next frameworks share is higher than the previous one. Is that by > implementation or I am getting something wrong here? > > Thanks > > > On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler > wrote:

Re: Resource allocation cycle in DRF for multiple frameworks

2017-12-04 Thread Benjamin Mahler
I don't think I understood the questions here, but let me add some explanation and we can go from there. Mesos will use DRF to choose an ordering amongst the roles that are actively interested in obtaining resources. Within a role, we currently use DRF again to choose an ordering amongst the frame

Re: Documentation for Mesos On windows

2017-11-29 Thread Benjamin Mahler
+Andrew On Tue, Nov 28, 2017 at 5:41 PM, sweta Das wrote: > Hi > > Is there any other documentation than the one on mesos site > http://mesos.apache.org/documentation/latest/windows/ > > I was able to build mesos on AWS on an windows 2016 server. But I am not > able to find any docs for starting

Re: [VOTE] Release Apache Mesos 1.2.3 (rc1)

2017-11-29 Thread Benjamin Mahler
+1 (binding) make check on macOS 10.13.1 On Wed, Nov 29, 2017 at 9:17 PM, Adam Bordelon wrote: > +1 (binding) > > Passed all tests in DC/OS integration CI, with a bump to 1.2.x at f8706e5, > just one changelog update before 1.2.3-rc1. > https://github.com/dcos/dcos/pull/2104#pullrequestreview-7

Re: Persistent volumes

2017-11-29 Thread Benjamin Mahler
+jpeach The polling mechanism is used by the "disk/du" isolator to handle the case where we don't have filesystem support for enforcing a quota on a per-directory basis. I believe the "disk/xfs" isolator will stop writes with EDQUOT without killing the task: http://mesos.apache.org/documentation/

Re: Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Benjamin Mahler
Welcome and thanks for your contributions so far! On Mon, Nov 27, 2017 at 11:00 PM, Joseph Wu wrote: > Hi devs & users, > > I'm happy to announce that Andrew Schwartzmeyer has become a new committer > and member of the PMC for the Apache Mesos project. Please join me in > congratulating him! >

Stripping Offer.AllocationInfo and Resource.AllocationInfo for non-MULTI_ROLE schedulers.

2017-11-15 Thread Benjamin Mahler
Hi folks, When we released MULTI_ROLE support, Offers and Resources within them included additional information, specifically the AllocationInfo which indicated which role was being allocated to: https://github.com/apache/mesos/blob/1.3.0/include/ mesos/v1/mesos.proto#L907-L923 https://github.com

Re: 1.3.2 Release

2017-11-02 Thread Benjamin Mahler
Great! I cherry picked Gaston's fix for https://issues.apache.org/ jira/browse/MESOS-8135. On Wed, Nov 1, 2017 at 6:57 PM, Michael Park wrote: > Please reply to this email if you have pending patches to be backported to > 1.3.x, I'm aiming to cut a 1.3.2 on Friday. > > Thanks, > > MPark >

Re: orphan executor

2017-11-02 Thread Benjamin Mahler
s? Any idea when this will be worked on? > > On Tue, Oct 31, 2017 at 5:22 PM, Benjamin Mahler > wrote: > >> The question was posed merely to point out that there is no notion of the >> executor "running away" currently, due to the answer I provided: there &g

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
t;orphan" executor in the list there, so framework can find > runaways and kill them(using Mesos provided API)? > > On Tue, Oct 31, 2017 at 3:49 PM, Benjamin Mahler > wrote: > >> What defines a runaway executor? >> >> Mesos does not know that this particula

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
terminates. However, we currently don't provide a great executor lifecycle API to enable schedulers to do this (it's long overdue). On Tue, Oct 31, 2017 at 2:47 PM, Mohit Jaggi wrote: > I was asking if this can happen automatically. > > On Tue, Oct 31, 2017 at 2:41 PM, Benjamin Mahler

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
here is a fix available now in Aurora/Thermos to try and exit in > such scenarios. But I am curious to know if Mesos agent has the > functionality to reap runaway executors. > > On Tue, Oct 31, 2017 at 12:08 PM, Benjamin Mahler > wrote: > >> Is my understanding correct tha

  1   2   3   4   5   >