Re: [VOTE] Release Apache Aurora 0.21.0 RC1

2018-09-10 Thread David McLaughlin
+1(ish) I have not been able to get the release verification script to work on my laptop for a while. Something with my local environment. But I ran the testing steps manually and they pass. On Mon, Sep 10, 2018 at 5:14 AM, Stephan Erb wrote: > +1 (binding). Release verification has passed for

Re: Proposal to add SOCKS functionality to Aurora CLI tools

2018-06-25 Thread David McLaughlin
Thanks for taking the time to write up your proposal! On Mon, Jun 25, 2018 at 3:25 AM, Mathias Sulser wrote: > Hi, > > I have mentioned this briefly a bit more than a week ago on the slack > channel: adding SOCKS functionality to the CLI tools would help me running > aurora and aurora-admin in

Re: [VOTE] Move project to Apache GitBox service

2018-06-12 Thread David McLaughlin
+1 On Tue, Jun 12, 2018 at 11:19 AM, Jordan Ly wrote: > +1 > > On Tue, Jun 12, 2018 at 11:16 AM, Renan DelValle wrote: > > Kicking the vote off with a +1 from me since I feel it will simplify our > > patch submission process and lower the difficulty bar for new > contributors. > > > > On Tue,

Re: [DISCUSS] Move project to GitBox service

2018-06-11 Thread David McLaughlin
+1 Thanks for kicking off the vote! On Mon, Jun 11, 2018 at 5:27 PM, Renan DelValle wrote: > All, > > I wanted to bring up for discussion moving the project from our current > ReviewBoard based workflow to a GitHub pull request based workflow through > the use of the ASF's GitBox service[1]. >

Re: Recovery instructions updates

2018-06-04 Thread David McLaughlin
We should definitely update that doc, Bill's patch makes this much easier (as can be seen by the e2e test) and we've been using it in our scale test environment. How does the site get updated? Is it auto-generated when we build releases? Having corrupted logs that frequently is concerning too, we

Re: [VOTE] Discontinue Official Binary Package releases

2018-05-22 Thread David McLaughlin
+1 On Mon, May 21, 2018 at 11:17 PM, Stephan Erb wrote: > +1 > > On 21.05.18, 17:00, "Santhosh Kumar Shanmugham" > > wrote: > > +1 > > On Fri, May 18, 2018, 7:03 PM Nicolas Donatucci < > ndonatu...@medallia.com> >

Re: [DISCUSS] State of the Community

2018-05-22 Thread David McLaughlin
I feel like not getting code reviews is often a symptom of some other fundamental issue with how change is introduced to a community. When I joined the Aurora team at Twitter there were some principals in place for getting your changes accepted to the community and I still feel like when you

Re: Slack IRC Gateway support ending

2018-03-14 Thread David McLaughlin
I don't have a strong opinion here, the whole chat space is very flavor of the month. Does Apache have a policy? On Tue, Mar 13, 2018 at 3:04 PM, Renan DelValle wrote: > Hi all, > > Slack has announced that their gateway for IRC will no longer be available > after May 15th,

Re: [VOTE] Release Apache Aurora 0.19.x packages

2018-02-21 Thread David McLaughlin
+1 from me. On Wed, Feb 21, 2018 at 9:57 AM, Renan DelValle wrote: > Another friendly reminder that we can't release the binary packages for > 0.19.x without at least three +1 binding votes. > > Not releasing a package for 0.19.x will create a problem for anyone trying > to

Re: StartJobUpdate vs JobCreate Thrift API Performance

2017-11-22 Thread David McLaughlin
The performance difference used to be extremely significant due to the overhead of the MyBatis stores. But with the recent changes to drop MyBatis the difference should be pretty negligible - just some added overhead for writing extra transactions to storage. The corner cases are just that

Re: [VOTE] Release Apache Aurora 0.19.0 RC0

2017-11-10 Thread David McLaughlin
k on the macOS build, a workaround is to > verify from vagrant: > > $ vagrant up > $ vagrant ssh > $ cd /vagrant > $ ./build-support/release/verify-release-candidate 0.19.0-rc0 > > > > On Wed, Nov 8, 2017 at 10:48 AM, David McLaughlin < > d

Re: [VOTE] Release Apache Aurora 0.19.0 RC0

2017-11-08 Thread David McLaughlin
+1 from me. The Mac OS breakage is disappointing, but I'm fine with it not being a blocker. On Tue, Nov 7, 2017 at 11:04 PM, Mohit Jaggi wrote: > +1 > > On Tue, Nov 7, 2017 at 10:51 PM, Bill Farner wrote: > > > +1 > > > > Successfully validated with

Re: 0.19.0 release preparation

2017-11-07 Thread David McLaughlin
Both fixes have now been committed. Looks good to move forward with the release. On Tue, Nov 7, 2017 at 11:31 AM, David McLaughlin <dmclaugh...@apache.org> wrote: > The bugs were reported last night. I'll have patches for both out by EOD. > > On Tue, Nov 7, 2017 at 10:22 AM, Bi

Re: 0.19.0 release preparation

2017-11-07 Thread David McLaughlin
like to make sure > we maintain momentum towards a release. > > On Tue, Nov 7, 2017 at 10:11 AM, David McLaughlin <dmclaugh...@apache.org> > wrote: > > > We have two outstanding regressions I'll get to this week, one is minor > and > > one is relatively seriou

Proposal: Integrate with Partition-Aware APIs in Aurora

2017-11-07 Thread David McLaughlin
Hey, I've built a prototype that uses the new Partition Aware APIs in Mesos to provide user-configurable partitioning policy in Aurora. I have written a proposal document here: https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit?usp=sharing And shared my

Re: 0.19.0 release preparation

2017-11-07 Thread David McLaughlin
us - we can cut point releases when necessary > to address this type of issue. > > On Mon, Oct 30, 2017 at 8:41 AM, David McLaughlin <dmclaugh...@apache.org> > wrote: > > > I'd like another week of feedback to incorporate changes of the new UI. > We > > are

Re: Redesign of the Aurora UI

2017-09-22 Thread David McLaughlin
probably hold off on further patches until the relicense is complete. On Fri, Aug 18, 2017 at 8:13 PM, David McLaughlin <da...@dmclaughlin.com> wrote: > Good to know they made the decision. My plan is to move forward with > Preact, with the major question how to do unit testing since

Re: Build failed in Jenkins: Aurora #1802

2017-09-13 Thread David McLaughlin
This was an executor flakey test, and unrelated to the change. On Wed, Sep 13, 2017 at 11:47 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See redirect?page=changes> > > Changes: > > [david] HomePage implemented in Preact >

Re: [Design Doc] Hot Standby in Replicas to Reduce Failover Time

2017-09-05 Thread David McLaughlin
+1, this proposal looks sound to me. I'll leave any minor feedback on the doc but none of it will be blocking. On Mon, Sep 4, 2017 at 10:31 AM, Erb, Stephan wrote: > Thanks for the detailed design document and the in-depth walkthrough [1]! > Your proposal seems to

Re: Make drain MAX_STATUS_WAIT configurable

2017-09-05 Thread David McLaughlin
+1 On Tue, Sep 5, 2017 at 10:09 AM, Mauricio Garavaglia < mauriciogaravag...@gmail.com> wrote: > Hi folks, > > The aurora-admin drain command currently has a hardcoded limit of 5 minutes > waiting for a node to be drained, after that timeout it fails. > > This doesn't work very well when tasks

Re: Redesign of the Aurora UI

2017-08-18 Thread David McLaughlin
speed with the tech and > its > > usage in Aurora. > > > > Thanks a lot for driving this, David! > > > > On 21.07.17, 07:00, "Kai Huang" <texasred2...@hotmail.com> wrote: > > > > David - Sure, let's sync on the work when you are ready

Re: [VOTE] Release Apache Aurora 0.18.x packages

2017-08-02 Thread David McLaughlin
+1 On Wed, Aug 2, 2017 at 1:15 PM, Jordan Ly wrote: > +1 verification passed for me as well. > > On Wed, Aug 2, 2017 at 12:50 PM, Santhosh Kumar Shanmugham > wrote: > > All, > > > > We seem to have got only 2 vote (binding). > > > > +1

Re: Reducing Failover Time by Eagerly Reading/Replaying Log in Followers

2017-07-26 Thread David McLaughlin
e main > Storage. > > > On Wed, Jul 26, 2017 at 1:56 PM, Santhosh Kumar Shanmugham < > sshanmug...@twitter.com.invalid> wrote: > > > +1 > > > > This sets up the stage for more potential benefits by offloading work > from > > the leading scheduler that

Re: Reducing Failover Time by Eagerly Reading/Replaying Log in Followers

2017-07-26 Thread David McLaughlin
I'm +1 to this approach over my proposal. With the enforced daily failover, it's a much bigger win to make failovers "cheap" than making snapshots cheap, and this is going to be backwards compatible too. On Wed, Jul 26, 2017 at 9:51 AM, Jordan Ly wrote: > Hello everyone! >

Re: Redesign of the Aurora UI

2017-07-19 Thread David McLaughlin
ally for those unfamiliar with React. That said, perhaps we > could go with an alternate method for reviewing here, where people review > against your fork directly and only when they're comfortable do you post > the whole patch to reviewboard for what should, by that point, be a rubber > s

Redesign of the Aurora UI

2017-07-18 Thread David McLaughlin
Hey all, At Twitter we have had a long-standing desire to be able to put custom widgets and other UX enhancements into the Aurora UI. Recent prototype work to do this in a clean way has proved fruitful and I'd like to present this approach to the community and get feedback on the overall

Re: Aurora reconciliation and Master fail over

2017-07-17 Thread David McLaughlin
> >> Thx David for the follow up and confirmation. > >> We have started the thread on the mesos dev DL. > >> > >> So to get clarification on the recon, what is in general effect > >> during the recon. Does scheduling and activities like snapshot is &

Re: Aurora reconciliation and Master fail over

2017-07-15 Thread David McLaughlin
. On Sat, Jul 15, 2017 at 9:21 AM, David McLaughlin <dmclaugh...@apache.org> wrote: > Yes, we've confirmed this internally too (Santhosh did the work here): > > When an agent becomes unreachable while the master is running, it sends >> TASK_LOST events for each task o

Re: Aurora reconciliation and Master fail over

2017-07-15 Thread David McLaughlin
down agent at same time > 3. Wait for 10 mins > > What Renan and I saw in the logs were only agent lost and not task lost > sent. While in regular health check expire scenario both task lost and > agent lost were sent. > > So yes this is very concerning. > > Thx >

Re: Aurora reconciliation and Master fail over

2017-07-14 Thread David McLaughlin
en sent. > > Because either mesos is not sending the right status or aurora is not > handling it. > > Thx > > > On Jul 14, 2017, at 8:21 AM, David McLaughlin <dmclaugh...@apache.org> > wrote: > > > > "1. When mesos sends slave lost after 10 mins in this situation

Re: Aurora reconciliation and Master fail over

2017-07-14 Thread David McLaughlin
"1. When mesos sends slave lost after 10 mins in this situation , why does aurora not act on it?" Because Mesos also sends TASK_LOST for every task running on the agent whenever it calls slaveLost: When it is time to remove an agent, the master removes the agent from the list of registered

Re: Proposal for Pluggable Scheduling in Aurora

2017-07-05 Thread David McLaughlin
/e76862a39622ba5c236f0c9e8ba94c341c5c4da8 Thanks, David On Wed, Jul 5, 2017 at 11:39 AM, Renan DelValle <rdelv...@binghamton.edu> wrote: > Hi David, > > Any updates on the progress of this proposal/feature? > > -Renan > > > > On Mon, May 8, 2017 at 5:59 PM, David McLaughlin <dmclaugh.

Re: [VOTE] Release Apache Aurora 0.18.0 RC0

2017-06-14 Thread David McLaughlin
+1 On Wed, Jun 14, 2017 at 3:50 AM, Erb, Stephan wrote: > +1 > > Verification script passed & successfully deployed to production. > > In this release, the Aurora client will need to be updated before the > scheduler can be deployed. This is unfortunate, but I

Proposal for Pluggable Scheduling in Aurora

2017-05-08 Thread David McLaughlin
Hi all, I've posted a patch to enable replacing the scheduling algorithms in Aurora. The patch is relatively trivial but has some big implications. There is a document that outlines the motivation of this patch and some future work to make everything more user-friendly:

Re: Future of storage in Aurora

2017-03-30 Thread David McLaughlin
stores themselves are problematic (to say > the > > least); do we have evidence that returning to memory based stores will be > > an improvement on that? > > > > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin < > dmclaugh...@apache.org > > wrote: > > > >

Re: schedule task instances spreading them based on a host attribute.

2017-03-30 Thread David McLaughlin
; > > > > > On Thu, Mar 30, 2017 at 11:31 AM, Rick Mangi <r...@chartbeat.com> > wrote: > > > > > >> Yeah, we have a dozen or so kafka consumer jobs running in our > cluster, > > >> each having about 40 or so instances. > > >>

Re: schedule task instances spreading them based on a host attribute.

2017-03-30 Thread David McLaughlin
nale for having a pluggable scheduling > layer. Aurora is very flexible and people use it in many different ways. > Giving users more flexibility in how jobs are scheduled seems like it would > be a good direction for the project. > > >> On Mar 30, 2017, at 12:16 PM, Davi

Future of storage in Aurora

2017-03-30 Thread David McLaughlin
Hi all, I'd like to start a discussion around storage in Aurora. I think one of the biggest mistakes we made in migrating our storage to H2 was deleting the memory stores as we moved. We made a pretty big bet that we could eventually make H2/relational databases work. I don't think that bet has

Re: schedule task instances spreading them based on a host attribute.

2017-03-30 Thread David McLaughlin
I think this is more complicated than multiple scheduling algorithms. The problem you'll end up having if you try to solve this in the Scheduling loop is when resources are unavailable because there are preemptible tasks running in them, rather than hosts being down. Right now the fact that the

Re: Design Doc for Mesos Maintenance in Aurora

2017-03-13 Thread David McLaughlin
Design looks good to me also. On Mon, Mar 13, 2017 at 3:08 PM, Zameer Manji wrote: > Thanks for the feedback Stephan. > > I am going to cautiously assume that future feedback here will be along the > same lines. Therefore I have created a ticket [1] for the work proposed in >

Re: Dynamic Reservations

2017-03-08 Thread David McLaughlin
Ticket for replace task primitive already exists: https://issues.apache.org/jira/browse/MESOS-1280 On Wed, Mar 8, 2017 at 6:34 PM, David McLaughlin <dmclaugh...@apache.org> wrote: > Spoke with Zameer offline and he asked me to post additional thoughts > here. > > My moti

Re: Dynamic Reservations

2017-03-08 Thread David McLaughlin
) type API from Mesos. I'll bring this up within our team and see if we can put resources on adding such an API. Any feedback on this approach in the meantime is welcome. On Wed, Mar 8, 2017 at 5:30 PM, David McLaughlin <dmclaugh...@apache.org> wrote: > You don't have to store anythin

Re: Dynamic Reservations

2017-03-08 Thread David McLaughlin
ng more storage and storage operations is the ideal way of > solving this problem. Second, in a multi framework environment, a framework > needs to use dynamic reservations otherwise the resources might be taken by > another framework. > > On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin &l

Re: Dynamic Reservations

2017-03-08 Thread David McLaughlin
So I read the docs again and I have one major question - do we even need dynamic reservations for the current proposal? The current goal of the proposed work is to keep an offer on a host and prevent some other pending task from taking it before the next scheduling round. This exact problem is

Re: Idea: rolling restarts in Aurora

2017-03-03 Thread David McLaughlin
+1 for thinner client. Another reason rolling update was moved to the Scheduler was to have an audit trail of changes to the job. If we could also get these restarts appearing on the job page, it would be great. On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji wrote: > +1 > >

Re: [VOTE] Release Apache Aurora 0.17.0 RC0

2017-02-01 Thread David McLaughlin
Is anyone running this in production yet? For me there is no value in a release if it hasn't been vetted in production. On Wed, Feb 1, 2017 at 2:22 AM, Stephan Erb wrote: > All, > > I propose that we accept the following release candidate as the > official > Apache Aurora

Re: Support instance-specific TaskConfig in CreateJob API

2016-08-12 Thread David McLaughlin
Hi Min, I'd prefer to add support for ad-hoc jobs to startJobUpdate and completely remove the notion of job create. " Also, even the > StartJobUpdate API is not scalable to a job with 10K ~ 100K task instances > and each instance has different task config since we will have to invoke >

Re: Aurora now supports multiple executors

2016-08-05 Thread David McLaughlin
test different executors (including thermos) with Aurora with > real > >> examples. During the same time we will release production grade > >> docker-compose executor simulating running container pods with Aurora. > >> Expect a detailed blog in a month. > >> T

Re: Golang Aurora lib, multiple executor support, integrate mesos task related fields

2016-06-13 Thread David McLaughlin
ails on the mechanics here but > >>> generally positive towards supporting more TaskInfo features in > >>> Aurora. > >>> > >>> > >>>> On Sun, Jun 12, 2016 at 11:46 AM, <r...@chartbeat.com> wrote: > >>>> I genera

Re: Golang Aurora lib, multiple executor support, integrate mesos task related fields

2016-06-09 Thread David McLaughlin
On Thu, Jun 9, 2016 at 2:21 PM, Renan DelValle wrote: > Hello all, > > I'd like to (re-)introduce myself. My name's Renan DelValle and I've had > the pleasure of being part of the Aurora community for the last year or so. > > Last year I worked to allow Aurora to utilize

Re: NEWS Layout

2016-02-02 Thread David McLaughlin
+1 On Tue, Feb 2, 2016 at 11:25 AM, Jake Farrell wrote: > sounds good. thanks Stephan > > -Jake > > On Tue, Feb 2, 2016 at 2:05 PM, Erb, Stephan > wrote: > > > Hi everyone, > > > > I'd like to propose that we give our NEWS file a little bit

Re: JobConfig diff API

2015-10-02 Thread David McLaughlin
I'd like to propose an alternative - that we start off by having an API endpoint which simply returns the JobUpdateInstructions that describes the changes that would happen if a given JobUpdateRequest was applied. There is a lot of value in having clients ask the scheduler to tell them what is

Re: Aurora React UI Demo/Prototype

2015-07-16 Thread David McLaughlin
Thanks for doing this and sharing code! I'd be -1 to any proposal to switch our FE technology stack for several reasons. I find this stack way too advanced for what the scheduler UI currently does. This is also a problem with Angular, but we're already stuck with that now. I also don't think