Re: Review Request 64288: Add a SQL persistence implementation

2017-12-04 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64288/#review192722
---



Just took a quick peek at this, a couple of drive by questions...


src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistence.java
Lines 73 (patched)
<https://reviews.apache.org/r/64288/#comment270975>

Can this just be `DataSource dataSource` since we're not using it for 
anything other than `getConnection` and `close`?



src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistence.java
Lines 161 (patched)
<https://reviews.apache.org/r/64288/#comment270979>

Does MySQL support `NULLS FIRST`? I don't see any mention of it in their 
docs, and trying on a local MySQL instance (5.7.x) it doesn't seem to work?


- Joshua Cohen


On Dec. 4, 2017, 1:09 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64288/
> ---
> 
> (Updated Dec. 4, 2017, 1:09 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Introduces a `Persistence` implementation that uses a SQL database via JDBC.  
> I've opted to lean towards MySQL SQL dialect, as unfortunately there are 
> vendor differences for even the very simple SQL used here.
> 
> I chose [HikariCP](https://github.com/brettwooldridge/HikariCP) to serve as a 
> `DataSource` (connection pool) implementation.  We don't really need much out 
> of a connection pool aside from general connection lifecycle management (i.e. 
> not for concurrency).  I chose this library based on recent development 
> activity and several positive comparisons to other pools.
> 
> Note that the implementation is not yet wired into the scheduler application. 
>  That will come in a follow-up.
> 
> 
> Diffs
> -
> 
>   build.gradle af119910e84c48f75f2573ababcfa287c3b986fc 
>   config/spotbugs/excludeFilter.xml 51790cce8d9047e40741f05ee55af15dbdc3065e 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  85b2113631586f43d854c4d2812f43b7b864d452 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/Persistence.java 
> 9eb862c01bf451252bfbcc7a2eac60d2c965c9f0 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogPersistence.java 
> e70e6051582ca90ae72014626b983bbf4b8d5b48 
>   src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistence.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistenceModule.java
>  PRE-CREATION 
>   src/main/resources/org/apache/aurora/scheduler/storage/sql/schema.sql 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/sql/SqlPersistenceTest.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64288/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 63685: RFC: Use new scheduler UI as landing page

2017-11-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63685/#review190533
---



In general I think the spirit of this change makes sense, but as you mention, 
the HTTP redirect is definitely not ideal. It should certainly be possible to 
serve up the actual scheduler UI by default without the need to redirect. Off 
the top of my head I'm not sure exactly what would need to be done though.

- Joshua Cohen


On Nov. 8, 2017, 10:32 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63685/
> ---
> 
> (Updated Nov. 8, 2017, 10:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Is this something the community would be interested in?
> 
> Known issues in this minimal viable prototype:
> 
> * I did not manage to convince Jetty to perform the redirecting. Using the 
> HTML feels hacky but works.
> * An operator will have difficulties to navigate to /admin of a non-leading 
> instance. In particular if there is no leader at all.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java 
> 0f8528c3403b5f51f082aa54c16358f7568f439a 
>   src/main/resources/scheduler/assets/index.html 
> 533091395547a6b067dbf5f53e42a13a560971da 
>   ui/src/main/js/components/Navigation.js 
> 50881a9914cee2812807624cd28f24fa64207296 
>   ui/src/main/js/index.js 9f94d4bd6f649d74bdd9c3aa99783e01cae25d93 
>   ui/src/main/js/pages/Admin.js PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63685/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> Screen Shot 2017-11-08 at 23.18.06.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/08/80ae2c5f-e431-4f86-8811-13bfe6a8627b__Screen_Shot_2017-11-08_at_23.18.06.png
> Screen Shot 2017-11-08 at 23.18.20.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/08/9fb0f2e0-49c9-427f-8b84-d3d1fc057a59__Screen_Shot_2017-11-08_at_23.18.20.png
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 63436: Enabling ErrorBoundary in Scheduler UI

2017-10-31 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63436/#review189756
---


Ship it!




Ship It!

- Joshua Cohen


On Oct. 31, 2017, 8:15 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63436/
> ---
> 
> (Updated Oct. 31, 2017, 8:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Kai Huang.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> # Enabling ErrorBoundary in Scheduler UI
> React 16 introduces a new concept of an “error boundary” that allows us to 
> limit the impact of an error and not unmount the whole component tree. I am 
> open to keeping or removing the stack trace.
> 
> from React docs:
> > As of React 16, errors that were not caught by any error boundary will 
> > result in unmounting of the whole React component tree.
> 
> 
> Diffs
> -
> 
>   ui/.eslintrc 5cdc4e67030a79c3f81c06f585cc9ff5ce959e52 
>   ui/src/main/js/components/ErrorBoundary.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/ErrorBoundary-test.js PRE-CREATION 
>   ui/src/main/js/index.js 9f94d4bd6f649d74bdd9c3aa99783e01cae25d93 
> 
> 
> Diff: https://reviews.apache.org/r/63436/diff/2/
> 
> 
> Testing
> ---
> 
> - Unit tests added.
> 
> ```
> # rmotamedi@tw-mbp-rmotamedi:~/oss/aurora on git:error-boundry-test ? 
> [13:06:52]
> ? ./gradlew ui:test
> 
> BUILD SUCCESSFUL in 0s
> 2 actionable tasks: 2 up-to-date
> 
> # rmotamedi@tw-mbp-rmotamedi:~/oss/aurora on git:error-boundry-test ? 
> [13:07:46]
> ? ./gradlew ui:lint
> 
> BUILD SUCCESSFUL in 5s
> 4 actionable tasks: 4 up-to-date
> ```
> 
> 
> - For more testing, I added the following component (that raises an error) 
> and installed it under different components. I attached the screen-shots of 
> how this will render. Clicking the link initializes the process of Jira 
> ticket creation.
> 
> ## ComponentWithError
> 
> ```
> import React from 'react'
> 
> export default class ComponentWithError extends React.Component {
>   render() {
> throw new Error('Crashed!');
> return ;
>   }
> }
> ```
> 
> 
> File Attachments
> 
> 
> Apache Jira ticket template filled with information on the bug
>   
> https://reviews.apache.org/media/uploaded/files/2017/10/31/0b50175f-0dd5-44fe-8bc4-c69ca1723729__Screen_Shot_2017-10-31_at_11.55.39_AM.png
> Catching exceptions on the router
>   
> https://reviews.apache.org/media/uploaded/files/2017/10/31/915182ef-4229-4b22-b7c6-20a5f24f8e1a__Screen_Shot_2017-10-31_at_11.44.56_AM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>



Re: Review Request 63436: Enabling ErrorBoundary in Scheduler UI

2017-10-30 Thread Joshua Cohen


> On Oct. 31, 2017, 2:08 a.m., David McLaughlin wrote:
> > ui/src/main/js/components/ErrorBoundary.js
> > Lines 28 (patched)
> > <https://reviews.apache.org/r/63436/diff/1/?file=1873286#file1873286line28>
> >
> > Link to the Apache JIRA? And ask them to post the stacktrace below?

Could even create a link that will pre-populate the ticket: 
https://confluence.atlassian.com/jirakb/creating-issues-via-direct-html-links-159474.html.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63436/#review189682
---


On Oct. 31, 2017, 1:58 a.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63436/
> ---
> 
> (Updated Oct. 31, 2017, 1:58 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Kai Huang.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> # Enabling ErrorBoundary in Scheduler UI
> React 16 introduces a new concept of an “error boundary” that allows us to 
> limit the impact of an error and not unmount the whole component tree. I am 
> open to keeping or removing the stack trace.
> 
> from React docs:
> > As of React 16, errors that were not caught by any error boundary will 
> > result in unmounting of the whole React component tree.
> 
> 
> Diffs
> -
> 
>   ui/src/main/js/components/ErrorBoundary.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/ErrorBoundary-test.js PRE-CREATION 
>   ui/src/main/js/index.js 9f94d4bd6f649d74bdd9c3aa99783e01cae25d93 
>   ui/src/main/sass/app.scss 8a10e1c5a501828c39ca7b9cf6b518a2d7a99d28 
>   ui/src/main/sass/components/_error-boundary.scss PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63436/diff/1/
> 
> 
> Testing
> ---
> 
> - Unit tests added. This however causes an error message to be logged on 
> stdout.
> 
> ```
> ? ./gradlew ui:test
> ...
> ASS src/main/js/pages/__tests__/Job-test.js
> PASS src/main/js/components/__tests__/InstanceViz-test.js
> PASS src/main/js/pages/__tests__/Update-test.js
> PASS src/main/js/components/__tests__/InstanceHistoryItem-test.js
> PASS src/main/js/components/__tests__/Pagination-test.js
> PASS src/main/js/components/__tests__/ErrorBoundary-test.js
>   ? Console
> 
> console.error node_modules/react-dom/cjs/react-dom.development.js:8305
>   The above error occurred in the  component:
>   in ComponentWithError
>   in ErrorBoundary (created by WrapperComponent)
>   in WrapperComponent
> 
>   React will try to recreate this component tree from scratch using the 
> error boundary you provided, ErrorBoundary.
> ...
> 
> BUILD SUCCESSFUL in 0s
> 2 actionable tasks: 2 up-to-date
> 
> # rmotamedi@tw-mbp-rmotamedi:~/oss/aurora on git:error-boundry ? [18:52:22]
> ? ./gradlew ui:lint
> 
> BUILD SUCCESSFUL in 6s
> 4 actionable tasks: 4 up-to-date
> ```
> 
> 
> - For more testing, I added the following component (that raises an error) 
> and installed it under `UpdateInstanceEvents`. I then caught the error at 
> various depths in the doom tree. I attached the screen-shots of how this will 
> render.
> 
> ## ComponentWithError
> 
> ```
> import React from 'react'
> 
> export default class ComponentWithError extends React.Component {
>   render() {
> throw new Error('Crashed!');
> return ;
>   }
> }
> ```
> 
> 
> File Attachments
> 
> 
> handling boundary on the main router
>   
> https://reviews.apache.org/media/uploaded/files/2017/10/31/d95e9f85-3dad-4332-ac33-3cb9f8a4455c__Screen_Shot_2017-10-30_at_6.30.22_PM.png
> handling boundary on a component deep in the doom tree.
>   
> https://reviews.apache.org/media/uploaded/files/2017/10/31/64d47905-b058-4ef5-9d19-fa3311a47f9d__Screen_Shot_2017-10-30_at_6.36.18_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>



Re: Review Request 63383: Suppress multiline logging from mesos callbacks

2017-10-27 Thread Joshua Cohen


> On Oct. 28, 2017, 12:08 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java
> > Lines 303-305 (patched)
> > 
> >
> > Does Mesos send any other important information on new lines? I am 
> > mostly worried about losing potentially valuable debugging info.
> 
> Bill Farner wrote:
> Not to my knowledge; historically the message (multi-line or not) has 
> been of little value even for debugging.

Is the info that we're truncating here also available in the Mesos logs 
somewhere?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63383/#review189497
---


On Oct. 27, 2017, 11:46 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63383/
> ---
> 
> (Updated Oct. 27, 2017, 11:46 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Log messages like this:
> ```console
> W1026 02:55:19.450 [Thread-4822, 
> MesosCallbackHandler$MesosCallbackHandlerImpl] Lost executor value: 
> "thermos-foo-prod-bar-2-40501be8-11fd-417a-8ad3-326c014e8466"
>  on slave value: "efe603c2-8b52-4cc1-b9f4-43a4eb06d29b-S1521"
>  with status 9
> ```
> become:
> ```console
> W1026 02:55:19.450 [Thread-4822, 
> MesosCallbackHandler$MesosCallbackHandlerImpl] Lost executor: 
> "thermos-foo-prod-bar-2-40501be8-11fd-417a-8ad3-326c014e8466" on slave: 
> "efe603c2-8b52-4cc1-b9f4-43a4eb06d29b-S1521" with status 9
> ```
> 
> And messages like this:
> ```console
> I1026 00:00:23.271 [Thread-1152062, 
> MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for 
> task foo-prod-bar-25-6ac5d5e9-2563-4cd0-a4a8-eb19a990bad4 in state 
> TASK_FAILED from SOURCE_AGENT with REASON_CONTAINER_LIMITATION_MEMORY: Memory 
> limit exceeded: Requested: 14464MB Maximum Used: 14464MB
> 
> MEMORY STATISTICS:
> cache 0
> rss 15166603264
> rss_huge 0
> mapped_file 0
> writeback 0
> pgpgin 6430645
> pgpgout 2727861
> pgfault 6441533
> pgmajfault 4
> inactive_anon 0
> active_anon 15166365696
> inactive_file 0
> active_file 0
> unevictable 0
> hierarchical_memory_limit 15166603264
> total_cache 0
> total_rss 15166603264
> total_rss_huge 0
> total_mapped_file 0
> total_writeback 0
> total_pgpgin 6430645
> total_pgpgout 2727861
> total_pgfault 6441533
> total_pgmajfault 4
> total_inactive_anon 0
> total_active_anon 15166365696
> total_inactive_file 0
> total_active_file 0
> total_unevictable 0
> 
> 
> ```
> 
> become:
> ```console
> I1027 18:48:40.793 [Thread-754, 
> MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for 
> task foo-prod-bar-11-c39221c6-fd87-40cf-a5a8-555f7bea33f8 in state 
> TASK_FAILED from SOURCE_AGENT with REASON_CONTAINER_LIMITATION_MEMORY: Memory 
> limit exceeded: Requested: 14464MB Maximum Used: 14464MB (truncated)
> ```
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> e93c4fa15a7cc1b025dcb0f29319bc774c62e2c9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> 51d0371007564bc0af32e256dbae8b57113536c2 
> 
> 
> Diff: https://reviews.apache.org/r/63383/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 60942: Remove task level resource fields from thrift interface and db

2017-10-19 Thread Joshua Cohen


> On Oct. 18, 2017, 7:38 p.m., Nicolás Donatucci wrote:
> > I found an issue a while back after talking to Joshua. 
> > It is regarding the downgrade script for my db migration.
> > It works just fine assuming there is only one entry of ram, cpu and disk 
> > per taskId on the task_resource table.
> > The thing is, during one of the tests, after upgrading from 18 to patched 
> > version and then downgrading to 18, the migration failed because that table 
> > had two entries of resources for some of the tasks, all of them with value 
> > of 0.
> >  
> > I was not able to tell how those entries got into the task_resource table 
> > but, for the patch's sake, values of ramMb, diskMb and numCpus in the 
> > task_configs table are no longer used in 0.18.0. In 0.17.0, those fields 
> > are backfilled if the new Resources are set.
> > 
> > One possible approach is to have the downgrade script only create the old 
> > fields and leave them in 0.
> > Another approach is ignoring entries with value 0 when backfilling in the 
> > downgrade script.
> > The best thing would be to know why there are nultiple entries in the 
> > task_resource table. Any ideas?
> 
> Bill Farner wrote:
> I suggest you wait for https://reviews.apache.org/r/62869/, which should 
> land soon.  This might make it easier to think about the upgrade path (i.e. 
> no need to think about SQL).

+1, once the above diff lands, the entirety of the DB migration problem in this 
diff goes away.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60942/#review188555
---


On Aug. 24, 2017, 7:05 p.m., Nicolás Donatucci wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60942/
> ---
> 
> (Updated Aug. 24, 2017, 7:05 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1707
> https://issues.apache.org/jira/browse/AURORA-1707
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Removed task level resource fields from the DB and the thrift-API.
> To do this, a new DB migration was added. When upgrading, it just drops the 
> task level resource fields. When downgrading, it creates the fields again and 
> populates them with information from the task_resource table.
> 
> IMPORTANT: One of the python client tests is failing (test_cron_diff). This 
> is not serious, I think it is a problem with the ordering of the elements of 
> Resources (had similar problems with other python client tests that instead 
> of comparing the Resources as a set, compared them as lists and thus order 
> mattered). Nevertheless, I could not fully understand the code of that test. 
> I was hoping someone could give me a hand with that. 
> But then again, it is a smaller issue and so the patch can start being 
> reviewed.
> 
> Issue Related: AURORA-1707
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md fd2618fee8ef05091849e177bd99fc321548be90 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 3749531b5412d7ca217736aa85eed8e6606225ad 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 186fa1b3a4780c0536fb486d50a33133258110cd 
>   src/main/java/org/apache/aurora/scheduler/storage/db/TaskConfigManager.java 
> d2eb6aa6e4a155b2d28debab2ca10dfc76d57213 
>   src/main/java/org/apache/aurora/scheduler/storage/db/TaskConfigMapper.java 
> cda55c55680a19ed421299a8949299b21949787b 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/migration/V004_CreateTaskResourceTable.java
>  af106a8a9ee8c14122e98bcc0ec44b616f21d61f 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/migration/V011_DropResourceFields.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/views/DbTaskConfig.java 
> 138cd5316adc73eed24fc7accc53885dd5d5bee5 
>   src/main/python/apache/aurora/client/cli/diff_formatter.py 
> 78717774aa3fbaf83a5fb850bc9f9f4a4038d70f 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/TaskConfigMapper.xml
>  5422183e4a1fe122fc0e1aa871aa75ae102e322d 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
> 7a86f47af67adb3e488381d30ddf424549deefbc 
>   src/test/java/org/apache/aurora/scheduler/http/TestUtils.java 
> 689482c9f6c49bcca7818

Re: Review Request 63052: Update list of Companies using Aurora.

2017-10-16 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63052/#review188252
---


Ship it!




Ship It!

- Joshua Cohen


On Oct. 16, 2017, 11:18 p.m., Derek Slager wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63052/
> ---
> 
> (Updated Oct. 16, 2017, 11:18 p.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update list of Companies using Aurora.
> 
> 
> Diffs
> -
> 
>   README.md 5bf48e8fe1c8fffa04ecc448cd941065af1ee54a 
> 
> 
> Diff: https://reviews.apache.org/r/63052/diff/1/
> 
> 
> Testing
> ---
> 
> Previewed markdown locally, tested link.
> 
> 
> Thanks,
> 
> Derek Slager
> 
>



Re: Review Request 62958: Add URL handling for tab switching on Job page

2017-10-13 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62958/#review187962
---



When I implemented this originally I wanted it to be a separate URL 
(http://aurora/scheduler/role/env/job/completed) rather than a query parameter 
(?tab=completed), but due to something in Angular's router it wasn't really 
feasible at the time. Presumably this is easier to accomplish w/ React?

I know [cool uris don't change](https://www.w3.org/Provider/Style/URI), but I 
think it'd be ok to drop the query support in favor of a full url (or, if we're 
really concerned about bookmarks have a handler that redirects the query param 
form to the url form)?

What do you think?

- Joshua Cohen


On Oct. 13, 2017, 12:10 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62958/
> ---
> 
> (Updated Oct. 13, 2017, 12:10 a.m.)
> 
> 
> Review request for Aurora, Kai Huang, Reza Motamedi, and Santhosh Kumar 
> Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This maintains the existing UI functionality where clicking on tabs updates 
> the browser location, to create shareable URLs to tab views.
> 
> 
> Diffs
> -
> 
>   ui/package.json cde8d106346d9ac498cfee9b5291ebc637fc6a2a 
>   ui/src/main/js/components/Tabs.js 43b1950b11b80dc3017730801992341bc527c39c 
>   ui/src/main/js/components/__tests__/Tabs-test.js 
> e028c2dd86739ed9762aa1d0be5c609d5487e06e 
>   ui/src/main/js/pages/Job.js fc400f7442a1f8a5f0ebfe366dfe40ef40e7108e 
>   ui/src/main/js/pages/__tests__/Job-test.js 
> 4cc76b8731d71ceca87a5d1e259360b2af8feba0 
> 
> 
> Diff: https://reviews.apache.org/r/62958/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew ui:lint
> ./gradlew ui:test
> 
> Unfortunately the testing here is only on one side (restoring the URL state). 
> I cannot simulate the change events from the unit tests because of 
> limitations with shallow rendering. When I figure out how to add integration 
> tests that work with React Router, I'll add coverage.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 62857: Fix broken end-to-end tests

2017-10-10 Thread Joshua Cohen


> On Oct. 10, 2017, 2:29 p.m., Joshua Cohen wrote:
> > I'd be shocked if they've been failing for over a year. We've had three 
> > releases since then and the release candidate verification script runs the 
> > end to end tests.
> > 
> > If I had to hazard a guess, I'd point the finger at something changing in 
> > curl w/ the High Sierra update? I'm running the e2e tests locally right now 
> > (I'm still on Sierra) to confirm they're working there.
> 
> Joshua Cohen wrote:
> Oh wait, the curl version being used is from within the Vagrant image, 
> not the host system. So... perhaps something changed in the curl version in 
> vagrant?

I was just able to verify the failure. So it's not specific to your system or 
host OS. It'd be interesting to bisect to see what actually caused the failure, 
but either way your fix lgtm.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62857/#review187535
---


On Oct. 10, 2017, 6:21 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62857/
> ---
> 
> (Updated Oct. 10, 2017, 6:21 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> TContentAwareServlet constrains the supported Content-Type headers,
> resulting in test_kerberos_end_to_end.sh failing with the error
> `Unsupported Content-Type: application/x-www-form-urlencoded`, which
> is the Content-Type header curl chooses when the --data-binary
> argument is passed
> 
> 
> Diffs
> -
> 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> ecb3d3f50e77b800eb0d05618c7740c42fca28c2 
> 
> 
> Diff: https://reviews.apache.org/r/62857/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Based on what i'm seeing, it appears that these tests should have been 
> failing since the introduction of `TContentAwareServlet` 
> [0105a15](https://github.com/apache/aurora/commit/0105a15) in Aug 2016.  Is 
> that true, or have others actually seen them passing somehow?
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 62857: Fix broken end-to-end tests

2017-10-10 Thread Joshua Cohen


> On Oct. 10, 2017, 2:29 p.m., Joshua Cohen wrote:
> > I'd be shocked if they've been failing for over a year. We've had three 
> > releases since then and the release candidate verification script runs the 
> > end to end tests.
> > 
> > If I had to hazard a guess, I'd point the finger at something changing in 
> > curl w/ the High Sierra update? I'm running the e2e tests locally right now 
> > (I'm still on Sierra) to confirm they're working there.

Oh wait, the curl version being used is from within the Vagrant image, not the 
host system. So... perhaps something changed in the curl version in vagrant?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62857/#review187535
---


On Oct. 10, 2017, 6:21 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62857/
> ---
> 
> (Updated Oct. 10, 2017, 6:21 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> TContentAwareServlet constrains the supported Content-Type headers,
> resulting in test_kerberos_end_to_end.sh failing with the error
> "Unsupported Content-Type: application/x-www-form-urlencoded", which
> is the Content-Type header curl chooses when the --data-binary
> argument is passed
> 
> 
> Diffs
> -
> 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> ecb3d3f50e77b800eb0d05618c7740c42fca28c2 
> 
> 
> Diff: https://reviews.apache.org/r/62857/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Based on what i'm seeing, it appears that these tests should have been 
> failing since the introduction of `TContentAwareServlet` 
> [0105a15](https://github.com/apache/aurora/commit/0105a15) in Aug 2016.  Is 
> that true, or have others actually seen them passing somehow?
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 62857: Fix broken end-to-end tests

2017-10-10 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62857/#review187536
---


Ship it!




Regardless... ship away!

- Joshua Cohen


On Oct. 10, 2017, 6:21 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62857/
> ---
> 
> (Updated Oct. 10, 2017, 6:21 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> TContentAwareServlet constrains the supported Content-Type headers,
> resulting in test_kerberos_end_to_end.sh failing with the error
> "Unsupported Content-Type: application/x-www-form-urlencoded", which
> is the Content-Type header curl chooses when the --data-binary
> argument is passed
> 
> 
> Diffs
> -
> 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> ecb3d3f50e77b800eb0d05618c7740c42fca28c2 
> 
> 
> Diff: https://reviews.apache.org/r/62857/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Based on what i'm seeing, it appears that these tests should have been 
> failing since the introduction of `TContentAwareServlet` 
> [0105a15](https://github.com/apache/aurora/commit/0105a15) in Aug 2016.  Is 
> that true, or have others actually seen them passing somehow?
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 62857: Fix broken end-to-end tests

2017-10-10 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62857/#review187535
---



I'd be shocked if they've been failing for over a year. We've had three 
releases since then and the release candidate verification script runs the end 
to end tests.

If I had to hazard a guess, I'd point the finger at something changing in curl 
w/ the High Sierra update? I'm running the e2e tests locally right now (I'm 
still on Sierra) to confirm they're working there.

- Joshua Cohen


On Oct. 10, 2017, 6:21 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62857/
> ---
> 
> (Updated Oct. 10, 2017, 6:21 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> TContentAwareServlet constrains the supported Content-Type headers,
> resulting in test_kerberos_end_to_end.sh failing with the error
> "Unsupported Content-Type: application/x-www-form-urlencoded", which
> is the Content-Type header curl chooses when the --data-binary
> argument is passed
> 
> 
> Diffs
> -
> 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> ecb3d3f50e77b800eb0d05618c7740c42fca28c2 
> 
> 
> Diff: https://reviews.apache.org/r/62857/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Based on what i'm seeing, it appears that these tests should have been 
> failing since the introduction of `TContentAwareServlet` 
> [0105a15](https://github.com/apache/aurora/commit/0105a15) in Aug 2016.  Is 
> that true, or have others actually seen them passing somehow?
> 
> 
> Thanks,
> 
> Bill Farner
> 
>



Re: Review Request 62720: Implement Instance pages in React

2017-10-05 Thread Joshua Cohen


> On Oct. 4, 2017, 9:42 p.m., Stephan Erb wrote:
> > Regarding the question nodejs vs js thrift: nodejs supports both binary 
> > (good) and compact (better) thrift protocols. This should reduce scheduler 
> > and client side serialization overhead and lead to a slightly more snappy 
> > UI.
> 
> David McLaughlin wrote:
> But it doesn't support XMLHttpRequest, so it can't be used in a UI bundle.
> 
> David McLaughlin wrote:
> Or maybe I misunderstood the comparison (between nodejs and js thrift)? 
> 
> To try and give a clearer picture of the problems:
> 
> ThriftJS library: absolutely required to deserialize the Thrift over HTTP 
> API the Scheduler exposes (unless we pin the UI to /apibeta). 
> 
> The files generated by the Thrift compiler (e.g. api_types.js, 
> ReadOnlyScheduler.js) are also required in order to build query objects (like 
> TaskQuery, etc.).
> 
> So when you generate the files, you have two options: node, js or jquery. 
> 
> Node:
> * Supports CommonJS modules, meaning you don't have to rely on global 
> scope and can instead import all Thrift types into your ES6 code (allowing 
> you to reuse Thrift types in the app and your test code). 
> * Assumes server-side environment, so uses the nodejs socket library to 
> perform client requests, rather than XMLHttpRequest.
> 
> jQuery:
> * Not only does it not support CommonJS, but the generated output doesn't 
> even use var declarations for the variables. This caues it to break strict 
> mode and means you can't even use node.vm to slurp it into the global context 
> in test-setup.js. 
> * Uses jQuery to perform all network requests, which uses the browser 
> environment. 
> 
> JS:
> * As jQuery, but all web requests are synchronous. 
> 
> 
> I tried an approach where I generated both jquery and node thrift files 
> and then imported them only for tests. But the compiled code also assumes the 
> ThriftJS library is also in the same format.. so it broke the tests and/or UI 
> bundle. At which point I just attached the copy and pasted structs and enums 
> to global in my test setup :)
> 
> Stephan Erb wrote:
> My comment above was originating based on some foggy memory of this 
> github comment on Thrift binary support for javascript 
> https://github.com/apache/thrift/pull/345#issuecomment-244588418 :)
> 
> Thanks for the thorough explanation.

Could we use the node.js code with a browser shim for the socket stuff? 
Something like: https://github.com/substack/http-browserify (not sure if 
that'll work w/ webpack, but in general it should be doable, and maybe 
preferable?)


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62720/#review187132
---


On Oct. 4, 2017, 5:43 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62720/
> ---
> 
> (Updated Oct. 4, 2017, 5:43 a.m.)
> 
> 
> Review request for Aurora, Kai Huang and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Instance page moved over to the new UI. 
> 
> One of the big issues I ran into here was dealing with Thrift types. 
> Currently the Thrift compiler generates a file for use in JS. The compiler 
> gives you two options for JavaScript - one is 'node' which plays well with 
> our module system but assumes you'll be using the built-in socket 
> implementation in node, and then one for 'jquery' which uses the global 
> namespace but doesn't use vars (so we can't eval as it breaks strict mode). 
> So the TL;DR is - to write tests against scheduler states, I've had to copy 
> and paste the Thrift enums into a test setup file. 
> 
> The alternative is to simply ignore Thrift types and duplicate it all without 
> relying on globals. Thus making sure the tests and application code are using 
> the same enums. However, this approach won't help you find any UI regressions 
> when you update Thrift - and even worse, you won't spot any API regressions 
> when developing the UI. So I've just gone for this approach for now.
> 
> 
> Diffs
> -
> 
>   ui/.eslintrc 8d37c60e1edd4528ce4abdf6dda18ec2dfc078f4 
>   ui/package.json fe0397a888b337137c1d13b6f9559dae13698b0a 
>   ui/src/main/js/components/InstanceHistory.js PRE-CREATION 
>   ui/src/main/js/components/InstanceHistoryItem.js PRE-CREATION 
>   ui/src/main/js/components/Layout.js 
> 4ca54e318786859e23dc0444d7d2750817428e81 
>   ui/src/main/js/components/StateMachine.js PRE-CREATION 
>   ui/src/main/js/components/TaskDetails.js PRE-CREATION 
>   ui/src/main/js/components/TaskStatus.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/Insta

Re: Review Request 62607: Replace Preact and custom testing with React + Enzyme

2017-09-27 Thread Joshua Cohen


> On Sept. 27, 2017, 8:35 p.m., Kai Huang wrote:
> > ui/src/main/js/components/Pagination.js
> > Lines 93 (patched)
> > 
> >
> > Should this be extracted into renderer? So that we don't need to pass 
> > isTable flags?
> 
> David McLaughlin wrote:
> I added a comment that describes why the user must provide this. You 
> cannot return a list of elements - you have to return a single container. So 
> Pagination needs to know what to wrap your items with.

Another "fwiw", I think in React 16 you *can* return a list of elements: 
https://facebook.github.io/react/blog/2017/09/26/react-v16.0.html#new-render-return-types-fragments-and-strings
 (note I haven't actually reviewed the code in question to see whether or not 
it makes sense in this situation).


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62607/#review186469
---


On Sept. 27, 2017, 9:31 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62607/
> ---
> 
> (Updated Sept. 27, 2017, 9:31 p.m.)
> 
> 
> Review request for Aurora, Kai Huang and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Replace Preact and custom testing with React + jest + Enzyme now that they 
> are MIT licensed.
> 
> Side-effects:
> 
> * The Reactable library is not compatible with React 16.0 (the MIT licensed 
> version we are forced to use) and attempts to flag this started as early as 
> June this year. But the author seems to have abandoned the library. So I had 
> to create my own Pagination class. I found myself repeatedly using Reactable 
> in places where I just wanted pagination (for lists of divs) anyway, so maybe 
> this is a good thing.
> * Deleted custom (and questionable) shallow renderer.
> * The download time will be reduced due to not having to download PhantomJS.
> 
> 
> Diffs
> -
> 
>   build.gradle 460500aaeab28dcfeb29ec602d057ea4bca12378 
>   ui/karma.conf.js d42cda7d025f8c5618039888f4fb49ff388c3af1 
>   ui/package.json d6802029e0aeb9f6692ec8cc065949ce17b2ca07 
>   ui/src/__mocks__/react.js PRE-CREATION 
>   ui/src/main/js/components/Pagination.js PRE-CREATION 
>   ui/src/main/js/components/RoleList.js 
> 32595601aae06730f537d37d95be2e746f779103 
>   ui/src/main/js/components/__tests__/Breadcrumb-test.js 
> 18af0fe6c360f375fb0fa1694304d821041d9e16 
>   ui/src/main/js/components/__tests__/Home-test.js 
> 8e6bc09050148ea9711dbfe6f52b83c8210262be 
>   ui/src/main/js/components/__tests__/Pagination-test.js PRE-CREATION 
>   ui/src/main/js/pages/__tests__/Home-test.js 
> 4f13f99ea24c0252b5f6cbb6980e0c96832d5120 
>   ui/src/main/js/utils/ShallowRender.js 
> 52e8bb259264e7dd47ee97cec35553acc147e818 
>   ui/src/main/js/utils/__tests__/ShallowRender-test.js 
> d5663a7c44de795a5aa0b28d4f373871ffcc3fd2 
>   ui/src/main/sass/components/_tables.scss 
> 58f176cffcdae34de7c776e31b3f0342efa3024f 
>   ui/test-setup.js PRE-CREATION 
>   ui/webpack.config.js e7cd672df62006bd5c5b60a6ba0903e16a767d13 
> 
> 
> Diff: https://reviews.apache.org/r/62607/diff/4/
> 
> 
> Testing
> ---
> 
> $ ./gradlew ui:lint
> $ ./gradlew ui:test
> 
> I also tested the UI in my Vagrant image.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 62607: Replace Preact and custom testing with React + Enzyme

2017-09-27 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62607/#review186425
---



FWIW there's also a version of React 15 that's been licensed under MIT: 
https://facebook.github.io/react/blog/2017/09/25/react-v15.6.2.html

- Joshua Cohen


On Sept. 27, 2017, 3:04 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62607/
> ---
> 
> (Updated Sept. 27, 2017, 3:04 p.m.)
> 
> 
> Review request for Aurora, Kai Huang and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Replace Preact and custom testing with React + jest + Enzyme now that they 
> are MIT licensed.
> 
> Side-effects:
> 
> * The Reactable library is not compatible with React 16.0 (the MIT licensed 
> version we are forced to use) and attempts to flag this started as early as 
> June this year. But the author seems to have abandoned the library. So I had 
> to create my own Pagination class. I found myself repeatedly using Reactable 
> in places where I just wanted pagination (for lists of divs) anyway, so maybe 
> this is a good thing.
> * Deleted custom (and questionable) shallow renderer.
> * The download time will be reduced due to not having to download PhantomJS.
> 
> 
> Diffs
> -
> 
>   ui/karma.conf.js d42cda7d025f8c5618039888f4fb49ff388c3af1 
>   ui/package.json d6802029e0aeb9f6692ec8cc065949ce17b2ca07 
>   ui/src/__mocks__/react.js PRE-CREATION 
>   ui/src/main/js/components/Pagination.js PRE-CREATION 
>   ui/src/main/js/components/RoleList.js 
> 32595601aae06730f537d37d95be2e746f779103 
>   ui/src/main/js/components/__tests__/Breadcrumb-test.js 
> 18af0fe6c360f375fb0fa1694304d821041d9e16 
>   ui/src/main/js/components/__tests__/Home-test.js 
> 8e6bc09050148ea9711dbfe6f52b83c8210262be 
>   ui/src/main/js/components/__tests__/Pagination-test.js PRE-CREATION 
>   ui/src/main/js/pages/__tests__/Home-test.js 
> 4f13f99ea24c0252b5f6cbb6980e0c96832d5120 
>   ui/src/main/js/utils/ShallowRender.js 
> 52e8bb259264e7dd47ee97cec35553acc147e818 
>   ui/src/main/js/utils/__tests__/ShallowRender-test.js 
> d5663a7c44de795a5aa0b28d4f373871ffcc3fd2 
>   ui/src/main/sass/components/_tables.scss 
> 58f176cffcdae34de7c776e31b3f0342efa3024f 
>   ui/test-setup.js PRE-CREATION 
>   ui/webpack.config.js e7cd672df62006bd5c5b60a6ba0903e16a767d13 
> 
> 
> Diff: https://reviews.apache.org/r/62607/diff/1/
> 
> 
> Testing
> ---
> 
> Unfortunately the Gradle 4.2 upgrade seems to have broken running builds from 
> gradle (will attempt to fix in another ticket)... So I had to manually verify 
> with:
> 
> $ npm run lint
> $ npm test
> 
> I also tested the UI in my Vagrant image.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 62135: HomePage implemented in Preact

2017-09-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62135/#review185082
---


Ship it!




lgtm overall. One question: do you think it's worthwhile to extract the shallow 
render to its own npm module that we depend on? Maybe that would make it 
possible/easier to have it upstreamed to preact? No idea if there are any 
challenges to that from an OSS perspective (i.e. we're ok to contribute to 
Aurora, but creating a new OSS project/publishing to npm/etc. may need an 
internal OSS review at Twitter?).


ui/src/main/js/components/Breadcrumb.js
Lines 12 (patched)
<https://reviews.apache.org/r/62135/#comment261299>

This is a change for the current behavior of the breadcrumbs in that 
currently the last crumb is not clickable. Is this the intent? I think the 
current behavior is preferable, but don't feel super strongly about it.

Here's how I handled it in the original react prototype (which I remember 
feeling was kind of hacky truth be told): 
https://github.com/jcohen/aurora-react/blob/master/src/components/Breadcrumbs.js



ui/src/main/js/components/Home.js
Line 4 (original), 3-5 (patched)
<https://reviews.apache.org/r/62135/#comment261300>

Just curious, why the move away from the ES6 style?


- Joshua Cohen


On Sept. 8, 2017, 11:40 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62135/
> ---
> 
> (Updated Sept. 8, 2017, 11:40 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Kai Huang, and Santhosh Kumar 
> Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Implementation of the home page in PreactJS. This was delayed heavily by the 
> lack of a shallow renderer 
> (https://facebook.github.io/react/docs/shallow-renderer.html) for testing. It 
> was too painful to test without it (e.g. Links when fully rendered must be 
> rendered within a Router context), so I hand-rolled one based on some 
> community attempts. The key motivation is just to allow us to decouple markup 
> from component logic, and also to avoid having to test child components 
> within component heirarchies. IMO the end result is very clean and easy to 
> read (and write!). 
> 
> If Preact gets a shallow renderer in the future, we can swap the one included 
> here out.
> 
> 
> Diffs
> -
> 
>   src/main/resources/scheduler/assets/images/aurora_logo_white.png 
> PRE-CREATION 
>   ui/.eslintrc 355b6a8a5f5d25dd4c71dc546dc7d4acfffdd506 
>   ui/package.json f712518b27477bccb03f49c86eac3ee5f769fc88 
>   ui/src/main/js/client/scheduler-client.js PRE-CREATION 
>   ui/src/main/js/components/Breadcrumb.js PRE-CREATION 
>   ui/src/main/js/components/Home.js 91d60b387cf3b1fb268e73b7b50922a83898c31f 
>   ui/src/main/js/components/Icon.js PRE-CREATION 
>   ui/src/main/js/components/Loading.js PRE-CREATION 
>   ui/src/main/js/components/Navigation.js PRE-CREATION 
>   ui/src/main/js/components/RoleList.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/Breadcrumb-test.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/Home-test.js 
> 2a80958d9303d1d3b9ae8f95013c66cb39f6bac3 
>   ui/src/main/js/index.js 2f7467b41ac3373a90c5453a7534d384a585b464 
>   ui/src/main/js/pages/Home.js PRE-CREATION 
>   ui/src/main/js/pages/__tests__/Home-test.js PRE-CREATION 
>   ui/src/main/js/utils/ShallowRender.js PRE-CREATION 
>   ui/src/main/js/utils/__tests__/ShallowRender-test.js PRE-CREATION 
>   ui/src/main/sass/app.scss PRE-CREATION 
>   ui/src/main/sass/components/_base.scss PRE-CREATION 
>   ui/src/main/sass/components/_breadcrumb.scss PRE-CREATION 
>   ui/src/main/sass/components/_home-page.scss PRE-CREATION 
>   ui/src/main/sass/components/_layout.scss PRE-CREATION 
>   ui/src/main/sass/components/_navigation.scss PRE-CREATION 
>   ui/src/main/sass/components/_tables.scss PRE-CREATION 
>   ui/src/main/sass/modules/_all.scss PRE-CREATION 
>   ui/src/main/sass/modules/_colors.scss PRE-CREATION 
>   ui/src/main/sass/modules/_typography.scss PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62135/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew ui:test
> 
> Running in vagrant (screenshots attached).
> 
> 
> File Attachments
> 
> 
> preview
>   
> https://reviews.apache.org/media/uploaded/files/2017/09/08/011f99fb-a14e-4547-b90d-a5a1909c737c__Screen_Shot_2017-09-08_at_3.58.48_PM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 60942: Remove task level resource fields from thrift interface and db

2017-08-30 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60942/#review184226
---



Overall looks good to me. Just a couple of questions about the migration 
changes.

I also think it'd be good for someone at Twitter who's actively working on 
Aurora to take a look at this to be sure I didn't miss anything.


src/main/java/org/apache/aurora/scheduler/storage/db/migration/V004_CreateTaskResourceTable.java
Lines 34-41 (original), 33 (patched)
<https://reviews.apache.org/r/60942/#comment260316>

I wonder if a better option, than having this weird, empty migration, would 
be to leave the create/drop table in place, but simply remove the migrate 
script.

The issue here was because the migrate script depended on fields that are 
being removed, it wasn't feasible to leave this migration as-is, right?

At the very least we should add comments here explaining what's going on.



src/main/java/org/apache/aurora/scheduler/storage/db/migration/V011_DropResourceFields.java
Lines 59 (patched)
<https://reviews.apache.org/r/60942/#comment260317>

It seems strange to me that this migration depends on the task_resource 
table existing, but we're updating the above migration to never create it.

How does this work for someone who might be updating from an early version 
of Aurora where V0004 was never applied? I know generally we only support +/- 1 
version, so we can get away with this if necessary, but I think it's more 
argument for leaving the add table in the above migration, and just dropping 
the migrate script?


- Joshua Cohen


On Aug. 24, 2017, 7:05 p.m., Nicolás Donatucci wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60942/
> ---
> 
> (Updated Aug. 24, 2017, 7:05 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1707
> https://issues.apache.org/jira/browse/AURORA-1707
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Removed task level resource fields from the DB and the thrift-API.
> To do this, a new DB migration was added. When upgrading, it just drops the 
> task level resource fields. When downgrading, it creates the fields again and 
> populates them with information from the task_resource table.
> 
> IMPORTANT: One of the python client tests is failing (test_cron_diff). This 
> is not serious, I think it is a problem with the ordering of the elements of 
> Resources (had similar problems with other python client tests that instead 
> of comparing the Resources as a set, compared them as lists and thus order 
> mattered). Nevertheless, I could not fully understand the code of that test. 
> I was hoping someone could give me a hand with that. 
> But then again, it is a smaller issue and so the patch can start being 
> reviewed.
> 
> Issue Related: AURORA-1707
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md fd2618fee8ef05091849e177bd99fc321548be90 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 3749531b5412d7ca217736aa85eed8e6606225ad 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 186fa1b3a4780c0536fb486d50a33133258110cd 
>   src/main/java/org/apache/aurora/scheduler/storage/db/TaskConfigManager.java 
> d2eb6aa6e4a155b2d28debab2ca10dfc76d57213 
>   src/main/java/org/apache/aurora/scheduler/storage/db/TaskConfigMapper.java 
> cda55c55680a19ed421299a8949299b21949787b 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/migration/V004_CreateTaskResourceTable.java
>  af106a8a9ee8c14122e98bcc0ec44b616f21d61f 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/migration/V011_DropResourceFields.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/views/DbTaskConfig.java 
> 138cd5316adc73eed24fc7accc53885dd5d5bee5 
>   src/main/python/apache/aurora/client/cli/diff_formatter.py 
> 78717774aa3fbaf83a5fb850bc9f9f4a4038d70f 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/TaskConfigMapper.xml
>  5422183e4a1fe122fc0e1aa871aa75ae102e322d 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
> 7a86f47af67adb3e488381d30ddf424549deefbc 
>   src/test/java/org/apache/aurora/scheduler/http/TestUtils.java 
> 689482c9f6c49bcca781834566edeb975d2f3af2 
>   
> src/test/java/org/apache/aurora/scheduler/storage/db/RowGarbag

Re: Review Request 61864: Bootstrap the build pipeline for new Preact UI.

2017-08-24 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61864/#review183788
---


Ship it!




lgtm modulo questions below.


src/main/resources/scheduler/assets/scheduler/new-index.html
Lines 21 (patched)
<https://reviews.apache.org/r/61864/#comment259837>

Should we include this locally, rather than pulling it down from Google?



ui/.eslintrc
Lines 9 (patched)
<https://reviews.apache.org/r/61864/#comment259842>

Are we using jest?


- Joshua Cohen


On Aug. 24, 2017, 7:15 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61864/
> ---
> 
> (Updated Aug. 24, 2017, 7:15 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Kai Huang, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Bootstrap the build pipeline for new Preact UI. Obviously due to the 
> disappointing decision by Facebook to keep their PATENTS clause, we can no 
> longer use React or jest or any of the other Facebook OSS projects. 
> Thankfully Preact has a compatibility layer that mostly hides this (see 
> code), so it's really only jest (which is so much simpler than Karma) that is 
> missed here. 
> 
> Right now we only insert a "Hello, World" message under a new path in the 
> Scheduler. The goal here is to verify that this works in the Apache 
> infrastructure, and that people can download the patch and confirm their 
> local development environment is built correctly by gradle. Will be 
> particularly useful if you don't have any of the modern Node/UI stack 
> installed, because the idea is that Gradle will do it all for you.
> 
> Note: this will add time, perhaps significant time, to the Scheduler build 
> process the first time it sets up the environment. I've made sure to set up 
> input/output directories in Gradle to make sure the UP-TO-DATE mechanism is 
> respected and the work isn't repeated. 
> 
> The following tasks become available:
> 
> ./gradlew ui:install (runs npm install and fetches 3rd party JS dependencies)
> ./gradlew ui:webpack (the main build step - performed by webpack)
> ./gradlew ui:test (runs Karma-Jasmine-Webpack-Chai powered test suite) 
> ./gradlew ui:lint (runs eslint on the source code)
> 
> And all of these are now performed as part of the root ./gradlew build step. 
> This also maintains feature parity with ./gradlew processResources 
> --continuous. 
> 
> This is the first time I've ever really had to do significant changes to our 
> Gradle build, so any feedback on my changes there are particularly 
> appreciated.
> 
> 
> Diffs
> -
> 
>   .gitignore b4e2bcbef6fdb1c049acda7c4fda4bd47988bea2 
>   build.gradle c2c402f3ed0043b1e9befb6f9c4423649ee5c105 
>   settings.gradle b097e2fd958fa0ce6076fc104eb3890c4029295d 
>   src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java 
> f7e72e77e50680d937d727cb4c0eb8940aabf03b 
>   src/main/resources/scheduler/assets/scheduler/new-index.html PRE-CREATION 
>   ui/.babelrc PRE-CREATION 
>   ui/.eslintrc PRE-CREATION 
>   ui/karma.conf.js PRE-CREATION 
>   ui/package.json PRE-CREATION 
>   ui/src/main/js/components/Home.js PRE-CREATION 
>   ui/src/main/js/components/__tests__/Home-test.js PRE-CREATION 
>   ui/src/main/js/index.js PRE-CREATION 
>   ui/webpack.config.js PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61864/diff/3/
> 
> 
> Testing
> ---
> 
> The Scheduler is running in my Vagrant image with the Hello World message 
> served under /beta.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 60306: Ensure Thermos is not overloaded by an unlimited number of lost processes

2017-06-21 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60306/#review178556
---


Ship it!




Ship It!

- Joshua Cohen


On June 21, 2017, 9:37 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60306/
> ---
> 
> (Updated June 21, 2017, 9:37 p.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Included changes:
> 
> * Thermos may consider launched processes to be LOST. Instead of
>   restarting those immediately, the restarts will now be at least
>   `min_duration` seconds apart. Restarts will also be capped at the
>   TOTAL_RUN_LIMIT of 100 restarts. This ensures neither Thermos nor the
>   observer will be overloaded by checkpoints. The handling of the LOST
>   state is now consistent with the handling of both FAILED and FINISHED.
> * Mark the success_transition and failure_transition as private. They
>   are only used within `TaskPlanner` itself.
> * Fix documented default of `min_duration` (i.e 5s rather than 15s).
> 
> 
> Diffs
> -
> 
>   docs/reference/configuration.md 6a9a3ff988dd2102aa9d22e27f22487f18423894 
>   src/main/python/apache/thermos/common/planner.py 
> da5120f8f0c2489927a03e9d78ccb4f9b6eb275f 
>   src/test/python/apache/thermos/common/test_task_planner.py 
> 132c1ec4977143b79df8d13804370e76a553c3b9 
> 
> 
> Diff: https://reviews.apache.org/r/60306/diff/1/
> 
> 
> Testing
> ---
> 
> ./pants test.pytest src/test/python::
> 
> 
> File Attachments
> 
> 
> massive_cpu_spike.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/21/57cbc6e6-2cd5-4e92-995a-e0e05a57c359__massive_cpu_spike.png
> massive_restart_count.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/21/c4cbab7c-1a48-4cf0-92ab-5fa9444813c7__massive_restart_count.png
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 60264: Add instructions to generate website for a release.

2017-06-21 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60264/#review178501
---




docs/development/committers-guide.md
Lines 103-104 (patched)
<https://reviews.apache.org/r/60264/#comment252521>

maybe just make `instructions` the link, rather than `instructions README`, 
which reads somewhat awkwardly.


- Joshua Cohen


On June 21, 2017, 6:46 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60264/
> ---
> 
> (Updated June 21, 2017, 6:46 a.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add instructions to generate website for a release.
> 
> 
> Diffs
> -
> 
>   docs/development/committers-guide.md 
> eb17efe8e53b7ea40d712d297765082cc29f31ab 
> 
> 
> Diff: https://reviews.apache.org/r/60264/diff/1/
> 
> 
> Testing
> ---
> 
> NA.
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 59640: Prioritize adding instances over updating instances during an update

2017-05-31 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59640/#review176457
---


Ship it!




lgtm overall. Might want to add one more test case on the ordering for the edge 
case where we're adding, updating *and* killing instances. E.g. imagine the 
scenario where a service has 5 instances, but out of band, someone has manually 
killed instances 0 and 1. Then we come along and we want to update to a new 
task config, but also reduce the instance count to 4. So, current instances: 2, 
3, 4, desired instances: 0, 1, 2, 3. In that scenario I believe we'd add 
instance 0 and 1, update instance 2 and 3, and kill instance 4.

- Joshua Cohen


On May 30, 2017, 9:21 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59640/
> ---
> 
> (Updated May 30, 2017, 9:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1928
> https://issues.apache.org/jira/browse/AURORA-1928
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Currently, when updating a job we choose to update instances naively by 
> ascending instance ID number.
> However, it would be better to add new instances before killing and updating 
> older instances.
> 
> This patch makes it so the job updater prefers to create new instances, then
> update instances, and finally kill instances.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 14c2d2de3186271819a5f4e527d3b30fd34d2b21 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 290385d737294e23e9dd50f2631303124aa7af7c 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateFactoryImplTest.java 
> 611f6b8681c8e0b286cd361bdb5ace1fea39d9a5 
> 
> 
> Diff: https://reviews.apache.org/r/59640/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit tests and integration tests.
> 
> Had to modify some integration tests since we now prefer to create over 
> update -- needed to change
> the ordering of actions. Additionally, some unit tests only specified configs 
> for one instance even
> though desiredInstances is always 2 -- had to make it so the range of 
> configurations is always 0-2
> when creating. Otherwise, it would try to create instances first even though 
> the test didn't really
> care.
> 
> Tested different update configurations on the Vagrant cluster: only adding 
> instances, only updating
> instances, only killing instances, create & update, update & kill.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 59495: Adding gpg key for santhk

2017-05-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59495/#review175854
---


Ship it!




Ship It!

- Joshua Cohen


On May 23, 2017, 9:09 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59495/
> ---
> 
> (Updated May 23, 2017, 9:09 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding gpg key for santhk
> 
> 
> Diffs
> -
> 
>   KEYS 396130ff8eaaeedc332eb49e940f6af3d98dbff6 
> 
> 
> Diff: https://reviews.apache.org/r/59495/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 59495: Adding gpg key for santhk

2017-05-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59495/#review175827
---



Please use a 4096 bit key to stay in line w/ the others.

- Joshua Cohen


On May 23, 2017, 6:23 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59495/
> ---
> 
> (Updated May 23, 2017, 6:23 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding gpg key for santhk
> 
> 
> Diffs
> -
> 
>   KEYS 396130ff8eaaeedc332eb49e940f6af3d98dbff6 
> 
> 
> Diff: https://reviews.apache.org/r/59495/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 59039: Add the ability to customize scheduling logic.

2017-05-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59039/#review174807
---


Ship it!




My main concern is the potential for breaking changes, but since it's clearly 
labeled as alpha, caveat emptor~.

- Joshua Cohen


On May 11, 2017, 9:46 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59039/
> ---
> 
> (Updated May 11, 2017, 9:46 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1920
> https://issues.apache.org/jira/browse/AURORA-1920
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add the ability to replace the first-fit scheduling algorithm and associated 
> first-fit preemption logic.
> 
> See design/proposal document here: 
> https://docs.google.com/document/d/1fVHLt9AF-YbOCVCDMQmi5DATVusn-tqY8DldKbjVEm0/edit?usp=sharing
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 4e930fb1c5b87bca3cf4d8de804d69301f013f07 
>   docs/development/design-documents.md 
> c942643d6220cb61eb97289b35d92facae06a682 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorModule.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterModule.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 66d20290d381c041ac2f28903d6ba1468b352172 
>   
> src/main/java/org/apache/aurora/scheduler/state/FirstFitTaskAssignerModule.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0186484208965c5b8942ec76eb078b30190c7d57 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> da064e3589951d4f4cbff96d1ed89d9d7e4d3882 
> 
> 
> Diff: https://reviews.apache.org/r/59039/diff/2/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> I've used this in one of our large-scale test clusters to plug in a custom 
> task assigner. We were able to remove our modifications to the OSS code-base 
> and use this mechanism to do what we need. Here is a rough gist of what it 
> looks like:
> https://gist.github.com/DavidMcLaughlin/5d5472cde15343aef8705478f644ddb9
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59039: Add the ability to customize scheduling logic.

2017-05-12 Thread Joshua Cohen


> On May 12, 2017, 2:20 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java
> > Line 130 (original), 174 (patched)
> > 
> >
> > Changes like these and others in this patch could potentially break our 
> > benchmarking scripts.
> > 
> > Could you please make sure that at least the `SchedulingBenchmarks` 
> > continue to run?
> > 
> > We already have several broken the benchmarks suites 
> > `ThriftApiBenchmarks`, `SnapshotBenchmarks` `StatusUpdateBenchmark` and 
> > `ThriftApiBenchmarks`. Would be a pitty if we break even more.

Not directly related here, but should we add something that regularly runs the 
benchmarks and alerts if they stop working (or if there's a major regression in 
terms of perf)?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59039/#review174790
---


On May 11, 2017, 9:46 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59039/
> ---
> 
> (Updated May 11, 2017, 9:46 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1920
> https://issues.apache.org/jira/browse/AURORA-1920
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add the ability to replace the first-fit scheduling algorithm and associated 
> first-fit preemption logic.
> 
> See design/proposal document here: 
> https://docs.google.com/document/d/1fVHLt9AF-YbOCVCDMQmi5DATVusn-tqY8DldKbjVEm0/edit?usp=sharing
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 4e930fb1c5b87bca3cf4d8de804d69301f013f07 
>   docs/development/design-documents.md 
> c942643d6220cb61eb97289b35d92facae06a682 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorModule.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterModule.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 66d20290d381c041ac2f28903d6ba1468b352172 
>   
> src/main/java/org/apache/aurora/scheduler/state/FirstFitTaskAssignerModule.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0186484208965c5b8942ec76eb078b30190c7d57 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> da064e3589951d4f4cbff96d1ed89d9d7e4d3882 
> 
> 
> Diff: https://reviews.apache.org/r/59039/diff/2/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> I've used this in one of our large-scale test clusters to plug in a custom 
> task assigner. We were able to remove our modifications to the OSS code-base 
> and use this mechanism to do what we need. Here is a rough gist of what it 
> looks like:
> https://gist.github.com/DavidMcLaughlin/5d5472cde15343aef8705478f644ddb9
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 58651: Extend operator documentation

2017-04-24 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58651/#review172831
---


Ship it!





docs/operations/installation.md
Lines 29 (patched)
<https://reviews.apache.org/r/58651/#comment245863>

Maybe s/dedicated/separate?

The way it's worded now makes it seem like you want to dedicate a single 
ensemble to both election and service discovery. This is obviously not the case 
based on the following sentence, but it's not entirely clear.


- Joshua Cohen


On April 23, 2017, 11:29 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58651/
> ---
> 
> (Updated April 23, 2017, 11:29 a.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Included changes:
> 
> * new cluster upgrade instructions
> * docs for several best practices collected on the mailinglist
> * extracted and extended troubleshooting guide for new cluster operators
> * several minor formatting fixes
> 
> 
> Diffs
> -
> 
>   docs/README.md dfd3a23c3341922a8d13303453086cf8f1774609 
>   docs/features/custom-executors.md 40fc118a864b4731998bafa4777399252695f4e2 
>   docs/features/webhooks.md 075aeec4159d486e4ea78ffdda96b3b12ece9212 
>   docs/operations/backup-restore.md da467c3c2bccacd5f3923d190b4786e9e397b876 
>   docs/operations/configuration.md 203f3be767347e7759f33e3e15bca50c9a37a9c5 
>   docs/operations/installation.md f9b04d4ebe920e846c6b71c223c9f5d77dc7a3cf 
>   docs/operations/storage.md c30922fef7075308802dabdd1d84a983ca508428 
>   docs/operations/troubleshooting.md PRE-CREATION 
>   docs/operations/upgrades.md PRE-CREATION 
>   docs/reference/scheduler-endpoints.md 
> d302e9001bbd441a3c303a9e445bee57dc969ac7 
> 
> 
> Diff: https://reviews.apache.org/r/58651/diff/1/
> 
> 
> Testing
> ---
> 
> Rendered version is available at 
> https://github.com/StephanErb/aurora/tree/operator-docs/docs
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58609: Switch Thermos runner to simple disk log layout

2017-04-21 Thread Joshua Cohen


> On April 21, 2017, 2:37 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/thermos_task_runner.py
> > Line 257 (original), 257 (patched)
> > <https://reviews.apache.org/r/58609/diff/1/?file=1697172#file1697172line257>
> >
> > I think a better place to do this might be in 
> > [thermos_runner.py](https://github.com/apache/aurora/blob/master/src/main/python/apache/thermos/runner/thermos_runner.py).
> > 
> > Instead of passing this as a command line arg, we could just:
> > 
> > from twitter.common.log.options import LogOptions
> > 
> > LogOptions.set_simple(True)
> > 
> > (We already do this in our internal fork, no idea why it's done there 
> > instead of the main project)
> 
> Stephan Erb wrote:
> There are some further options a few lines above:
> 
> ```
> params = dict(log_dir=LogOptions.log_dir(),
>   log_to_disk='DEBUG',
> ```
> 
> 
> Should I move those as well?
> 
> Stephan Erb wrote:
> Ah thinking about it, we probably still need the log_dir, to transfer the 
> log location from the executor to the runner.

Yeah, I'm fine with moving `log_to_disk` if you want though.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58609/#review172657
---


On April 21, 2017, 9:35 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58609/
> ---
> 
> (Updated April 21, 2017, 9:35 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Switch Thermos runner to simple disk log layout.
> 
> This collapses the individual files per loglevel. We will therefore have just 
> one log file for executor and one for the runner.
> 
> -- 
> 
> Not to reviwers: I am unsure why the executor log file defaults to __main__ 
> rather than the name of the binary. I decided not to dig deeper here, as it 
> is not my main problem during operations.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/thermos_task_runner.py 
> 8f88af4c24ddc603fa12587741af56a6c711e420 
> 
> 
> Diff: https://reviews.apache.org/r/58609/diff/1/
> 
> 
> Testing
> ---
> 
> ./pants test.pytest src/test/python::
> 
> 
> File Attachments
> 
> 
> after.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/04/21/8816f094-b473-45bd-8d9a-fd2f3fad7dfe__after.png
> before.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/04/21/f10b3500-a66b-4850-a571-f964b97b7b40__before.png
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58609: Switch Thermos runner to simple disk log layout

2017-04-21 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58609/#review172657
---




src/main/python/apache/aurora/executor/thermos_task_runner.py
Line 257 (original), 257 (patched)
<https://reviews.apache.org/r/58609/#comment245722>

I think a better place to do this might be in 
[thermos_runner.py](https://github.com/apache/aurora/blob/master/src/main/python/apache/thermos/runner/thermos_runner.py).

Instead of passing this as a command line arg, we could just:

from twitter.common.log.options import LogOptions

LogOptions.set_simple(True)

(We already do this in our internal fork, no idea why it's done there 
instead of the main project)


- Joshua Cohen


On April 21, 2017, 9:35 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58609/
> ---
> 
> (Updated April 21, 2017, 9:35 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Switch Thermos runner to simple disk log layout.
> 
> This collapses the individual files per loglevel. We will therefore have just 
> one log file for executor and one for the runner.
> 
> -- 
> 
> Not to reviwers: I am unsure why the executor log file defaults to __main__ 
> rather than the name of the binary. I decided not to dig deeper here, as it 
> is not my main problem during operations.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/thermos_task_runner.py 
> 8f88af4c24ddc603fa12587741af56a6c711e420 
> 
> 
> Diff: https://reviews.apache.org/r/58609/diff/1/
> 
> 
> Testing
> ---
> 
> ./pants test.pytest src/test/python::
> 
> 
> File Attachments
> 
> 
> after.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/04/21/8816f094-b473-45bd-8d9a-fd2f3fad7dfe__after.png
> before.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/04/21/f10b3500-a66b-4850-a571-f964b97b7b40__before.png
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58612: Improve cleanup hints in release and release-candidate scripts

2017-04-21 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58612/#review172656
---


Ship it!




Ship It!

- Joshua Cohen


On April 21, 2017, 11:24 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58612/
> ---
> 
> (Updated April 21, 2017, 11:24 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and John Sirois.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> I was running into some of those issues in during the last release,
> but only managed to file them now.
> 
> 
> Diffs
> -
> 
>   build-support/release/release 05a2e023ae1b32779248fe759518925684381c36 
>   build-support/release/release-candidate 
> 80b14eac8223a0d52d086eb1c80a8f83ad94e152 
> 
> 
> Diff: https://reviews.apache.org/r/58612/diff/1/
> 
> 
> Testing
> ---
> 
> I did *not* run the release script using the proposed changes
> as it requires an un-released release candidate.
> 
> There could therefor be a small chance that I broke something.
> I beleive it is good enough to correct those during the next
> release.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58524: Allow task to be in STARTING state during watch_secs.

2017-04-19 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58524/#review172340
---


Ship it!




lgtm


src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java
Lines 51 (patched)
<https://reviews.apache.org/r/58524/#comment245450>

0L


- Joshua Cohen


On April 19, 2017, 9:07 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58524/
> ---
> 
> (Updated April 19, 2017, 9:07 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Tasks are gated on health-checks before they transistion into `RUNNING`
> and hence it is possible for a task to stay in `STARTING` during the
> watch duration of an instance's update. So include `STARTING` into the
> possible states for a task when watching it after an update. Without
> this tasks will wait for watch_secs after the task moves to RUNNING
> extending the update time.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> df1f8394b824dbb7b2745fcccdab5adaafdf6e6c 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
> 32a8c89d3e71395696fc613da96b871330891c42 
> 
> 
> Diff: https://reviews.apache.org/r/58524/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 58467: Update to Mesos 1.2.0

2017-04-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58467/#review172085
---


Ship it!




Ship It!

- Joshua Cohen


On April 15, 2017, 10:44 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58467/
> ---
> 
> (Updated April 15, 2017, 10:44 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update to Mesos 1.2.0
> 
> Changelog: https://github.com/apache/mesos/blob/1.2.0/CHANGELOG
> 
> The items that stand to me out the most:
> 
> * Unified containerizer support for specifying Docker images by Image ID. 
> (MESOS-3505)
> * new posix/rlimits isolator (MESOS-6402)
> * new IPC namespace isolator (MESOS-6557)
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD 7648ac8ca81ef1bfef13d840334a03f4bb7b8198 
>   RELEASE-NOTES.md 5babea532760e908d80235e1d6b8a1548c57cce3 
>   Vagrantfile d1c536bc3868409e184c9c97845d65c5a3e1722c 
>   build-support/packer/build.sh 548cf37e097c6ed56fc6cc718a642b105afb9331 
>   build.gradle bca669881e95e1415f5848f298dc4bab4fb65ba0 
> 
> 
> Diff: https://reviews.apache.org/r/58467/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> ./gradlew -Pq build
> ./pants test.pytest src/{main,test}/python:: -- -v
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58265: Add automatic browser tab open feature for aurora update start

2017-04-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58265/#review171356
---


Ship it!




Ship It!

- Joshua Cohen


On April 7, 2017, 3:44 p.m., Takuya Kuwahara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58265/
> ---
> 
> (Updated April 7, 2017, 3:44 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Mehrdad Nurolahzade, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add automatic browser tab open feature for aurora update start
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/cli/update.py 
> c70119680ecc72beb8f1af2d86079dc25bfce650 
>   src/test/python/apache/aurora/client/cli/test_supdate.py 
> 92ab37595950cea7c1b84c20d630d81624323791 
> 
> 
> Diff: https://reviews.apache.org/r/58265/diff/1/
> 
> 
> Testing
> ---
> 
> I add test for this feature and pass it.
> 
> 
> Thanks,
> 
> Takuya Kuwahara
> 
>



Re: Review Request 58036: Ensure enum tables are complete afer a snapshot restore.

2017-03-29 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58036/#review170462
---


Ship it!




Ship It!

- Joshua Cohen


On March 29, 2017, 7:51 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58036/
> ---
> 
> (Updated March 29, 2017, 7:51 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Bugs: AURORA-1912
> https://issues.apache.org/jira/browse/AURORA-1912
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In our in memory database, we model enums as two column tables. The two 
> columns
> would be `id` which corresponds to the integer value in the thrift enum and
> `name` which is the all caps string name of the enum. For example to model the
> `JobUpdateStatus` enum we have a table called `job_update_statuses`. In there
> the `ROLLING_FORWARD` enum is modeled as a row `(0, "ROLLING_FORWARD")`. Other
> tables reference the enum table via the id.
> 
> When we prepare storage on startup the `DbStorage` starts up. It does two
> things:
> 1. Load in the schema.
> 2. Populate the enum tables.
> 
> This ensures that when we insert values into the database, the enum refernces
> will be valid.
> 
> However, after we restore from a Snapshot with the `dbScript` field, we blow 
> all
> of that data away and restore what was in the snapshot:
> 
> try (Connection c = ((DataSource) 
> store.getUnsafeStoreAccess()).getConnection()) {
>   LOG.info("Dropping all tables");
>   try (PreparedStatement drop = c.prepareStatement("DROP ALL OBJECTS")) \
> drop.executeUpdate();
>   }
> 
> 
> This means that if we add a new enum value, and then restore from a snapshot,
> that enum value will not exist in the table any more. We could address this by
> saying that every enum value addition requires a migration. However instead I
> propose not blowing away the work done by `DbStorage` instead and re-hydrating
> the enum tables.
> 
> To do this I extracted the logic into a new class `EnumBackfill`. Restoring 
> from
> a snapshot calls this after the migrations are done. The underlying SQL was
> changed from `INSERT` to `MERGE` to make this work.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  36a1bd5c784ed0febebccfd22e5064f0b2e3106f 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbModule.java 
> d1a196419b67108ee2bb778f83a2993e2e5ee83b 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java 
> 923e904f396724b9dde4a330ef312a6aae2c02a6 
>   src/main/java/org/apache/aurora/scheduler/storage/db/EnumBackfill.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 81a8cca6974e33c774473a4990e0e981cf6ddee6 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/EnumValueMapper.xml 
> 153fd26c27275c46b190e71d8a5736153f2c2d18 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 42615da54096d5b06c7989cb30fc3cfbe59bc1b9 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DbStorageTest.java 
> f26529c76214c8f22563f04a197798c82d341b49 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  ca9525665805a33b4a322a72022ff037f0dd2a94 
> 
> 
> Diff: https://reviews.apache.org/r/58036/diff/2/
> 
> 
> Testing
> ---
> 
> existing tests and e2e tests
> 
> I also added a new enum value to `JobUpdateStatus` and observed it was 
> correctly loaded in.
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 58036: Ensure enum tables are complete afer a snapshot restore.

2017-03-29 Thread Joshua Cohen


> On March 29, 2017, 7:34 p.m., Joshua Cohen wrote:
> > This may be an existing problem with the current impl as well, but what 
> > happens if we drop an enum value? Are we just assuming some other migration 
> > will have removed it (since presumably other tables will need to be updated 
> > to have their FK relations fixed)?
> 
> Zameer Manji wrote:
> Yes, exactly. Dropping an enum value requires removing other tables. I 
> presume we will just never use enums instead of actually deleting them.

Might want to document that somewhere (even if just a comment on the backfill 
that it's only for additive changes). Otherwise the general approach looks good 
to me.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58036/#review170456
---


On March 29, 2017, 7:21 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58036/
> ---
> 
> (Updated March 29, 2017, 7:21 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Bugs: AURORA-1912
> https://issues.apache.org/jira/browse/AURORA-1912
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In our in memory database, we model enums as two column tables. The two 
> columns
> would be `id` which corresponds to the integer value in the thrift enum and
> `name` which is the all caps string name of the enum. For example to model the
> `JobUpdateStatus` enum we have a table called `job_update_statuses`. In there
> the `ROLLING_FORWARD` enum is modeled as a row `(0, "ROLLING_FORWARD")`. Other
> tables reference the enum table via the id.
> 
> When we prepare storage on startup the `DbStorage` starts up. It does two
> things:
> 1. Load in the schema.
> 2. Populate the enum tables.
> 
> This ensures that when we insert values into the database, the enum refernces
> will be valid.
> 
> However, after we restore from a Snapshot with the `dbScript` field, we blow 
> all
> of that data away and restore what was in the snapshot:
> 
> try (Connection c = ((DataSource) 
> store.getUnsafeStoreAccess()).getConnection()) {
>   LOG.info("Dropping all tables");
>   try (PreparedStatement drop = c.prepareStatement("DROP ALL OBJECTS")) \
> drop.executeUpdate();
>   }
> 
> 
> This means that if we add a new enum value, and then restore from a snapshot,
> that enum value will not exist in the table any more. We could address this by
> saying that every enum value addition requires a migration. However instead I
> propose not blowing away the work done by `DbStorage` instead and re-hydrating
> the enum tables.
> 
> To do this I extracted the logic into a new class `EnumBackfill`. Restoring 
> from
> a snapshot calls this after the migrations are done. The underlying SQL was
> changed from `INSERT` to `MERGE` to make this work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  36a1bd5c784ed0febebccfd22e5064f0b2e3106f 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbModule.java 
> d1a196419b67108ee2bb778f83a2993e2e5ee83b 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java 
> 923e904f396724b9dde4a330ef312a6aae2c02a6 
>   src/main/java/org/apache/aurora/scheduler/storage/db/EnumBackfill.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 81a8cca6974e33c774473a4990e0e981cf6ddee6 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/EnumValueMapper.xml 
> 153fd26c27275c46b190e71d8a5736153f2c2d18 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 42615da54096d5b06c7989cb30fc3cfbe59bc1b9 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DbStorageTest.java 
> f26529c76214c8f22563f04a197798c82d341b49 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  ca9525665805a33b4a322a72022ff037f0dd2a94 
> 
> 
> Diff: https://reviews.apache.org/r/58036/diff/1/
> 
> 
> Testing
> ---
> 
> existing tests and e2e tests
> 
> I also added a new enum value to `JobUpdateStatus` and observed it was 
> correctly loaded in.
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 58036: Ensure enum tables are complete afer a snapshot restore.

2017-03-29 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58036/#review170456
---



This may be an existing problem with the current impl as well, but what happens 
if we drop an enum value? Are we just assuming some other migration will have 
removed it (since presumably other tables will need to be updated to have their 
FK relations fixed)?

- Joshua Cohen


On March 29, 2017, 7:21 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58036/
> ---
> 
> (Updated March 29, 2017, 7:21 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Bugs: AURORA-1912
> https://issues.apache.org/jira/browse/AURORA-1912
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In our in memory database, we model enums as two column tables. The two 
> columns
> would be `id` which corresponds to the integer value in the thrift enum and
> `name` which is the all caps string name of the enum. For example to model the
> `JobUpdateStatus` enum we have a table called `job_update_statuses`. In there
> the `ROLLING_FORWARD` enum is modeled as a row `(0, "ROLLING_FORWARD")`. Other
> tables reference the enum table via the id.
> 
> When we prepare storage on startup the `DbStorage` starts up. It does two
> things:
> 1. Load in the schema.
> 2. Populate the enum tables.
> 
> This ensures that when we insert values into the database, the enum refernces
> will be valid.
> 
> However, after we restore from a Snapshot with the `dbScript` field, we blow 
> all
> of that data away and restore what was in the snapshot:
> 
> try (Connection c = ((DataSource) 
> store.getUnsafeStoreAccess()).getConnection()) {
>   LOG.info("Dropping all tables");
>   try (PreparedStatement drop = c.prepareStatement("DROP ALL OBJECTS")) \
> drop.executeUpdate();
>   }
> 
> 
> This means that if we add a new enum value, and then restore from a snapshot,
> that enum value will not exist in the table any more. We could address this by
> saying that every enum value addition requires a migration. However instead I
> propose not blowing away the work done by `DbStorage` instead and re-hydrating
> the enum tables.
> 
> To do this I extracted the logic into a new class `EnumBackfill`. Restoring 
> from
> a snapshot calls this after the migrations are done. The underlying SQL was
> changed from `INSERT` to `MERGE` to make this work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  36a1bd5c784ed0febebccfd22e5064f0b2e3106f 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbModule.java 
> d1a196419b67108ee2bb778f83a2993e2e5ee83b 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java 
> 923e904f396724b9dde4a330ef312a6aae2c02a6 
>   src/main/java/org/apache/aurora/scheduler/storage/db/EnumBackfill.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 81a8cca6974e33c774473a4990e0e981cf6ddee6 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/EnumValueMapper.xml 
> 153fd26c27275c46b190e71d8a5736153f2c2d18 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 42615da54096d5b06c7989cb30fc3cfbe59bc1b9 
>   src/test/java/org/apache/aurora/scheduler/storage/db/DbStorageTest.java 
> f26529c76214c8f22563f04a197798c82d341b49 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  ca9525665805a33b4a322a72022ff037f0dd2a94 
> 
> 
> Diff: https://reviews.apache.org/r/58036/diff/1/
> 
> 
> Testing
> ---
> 
> existing tests and e2e tests
> 
> I also added a new enum value to `JobUpdateStatus` and observed it was 
> correctly loaded in.
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 57524: Support setting the rootfs on Mesos Containers.

2017-03-13 Thread Joshua Cohen


> On March 11, 2017, 12:28 p.m., Stephan Erb wrote:
> > I need a little bit more context to understand what is going on here:
> > 
> > * Do you plan to use this with Thermos or an alternative executor? Or both?
> > * It seems like we don't need this for Thermos as we already create 
> > mountpoints when needed (see do_mount() in 
> > aurora/executor/common/sandbox.py)
> > * Switching the flag will run the executor within the filesystem image and 
> > thus require Python and all libmesos dependencies within the image. This 
> > sounds like a big downfall just for gaining the mkdir.

+1, it's unclear to me why this is necessary. If it is, I'd consider it a bug 
in the sandbox code that prepares mounts. It should be making any directories 
that don't already exist under taskfs:


# If we're mounting a file into the task filesystem, the mount call will 
fail if the mount
# point doesn't exist. In that case we'll create an empty file to mount 
over.
if os.path.isfile(source) and not os.path.exists(destination):
  safe_mkdir(os.path.dirname(destination))
  touch(destination)
else:
  safe_mkdir(destination)

https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/sandbox.py#L284-L290


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57524/#review168709
---


On March 11, 2017, 12:27 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57524/
> ---
> 
> (Updated March 11, 2017, 12:27 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar and Stephan Erb.
> 
> 
> Bugs: AURORA-1903
> https://issues.apache.org/jira/browse/AURORA-1903
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The mesos unified containerizer does not support absolute container path 
> mounts if no rootfs is set. This allows operators to switch between our 
> current behaviour (mounting images as a volume) and setting the rootfs. See 
> AURORA-1903 for more detailed analysis.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  4dac9757a65e144142d36ee921b85a02a5311fe5 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
>  5c987fd051728486172c8afd34219e86d56f00d5 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 0d639f66db456858278b0485c91c40975c3b45ac 
>   src/main/java/org/apache/aurora/scheduler/mesos/TestExecutorSettings.java 
> e1cd81e6fbd98f23046e6e775be268be4310c62a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 93cc34cf8393f969087cd0fd6f577228c00170e9 
> 
> 
> Diff: https://reviews.apache.org/r/57524/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 53333: Add DSL and E2E changes for per task volume mounts.

2017-02-14 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5/#review165512
---


Ship it!




lgtm modulo some truly nitpicky nits below ;).


RELEASE-NOTES.md (line 7)
<https://reviews.apache.org/r/5/#comment237354>

s/*/-

Also I think, "Add support for per-task volume mounts for Mesos containers 
to the Aurora config DSL." reads a little bit better?



docs/reference/configuration.md (lines 475 - 476)
<https://reviews.apache.org/r/5/#comment237355>

Can you fix spacing so the columns line up? ;)


- Joshua Cohen


On Feb. 14, 2017, 2:26 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5/
> ---
> 
> (Updated Feb. 14, 2017, 2:26 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1107
> https://issues.apache.org/jira/browse/AURORA-1107
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Enables the client DSL to set per task volume mounts. This also adds a E2E 
> test that tests per task volume mounting.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 3e9880249aa40215a08424b58f85a634337cc67e 
>   docs/reference/configuration.md 6c7114256ea6f57d6e09e07896e9f4fde19139ca 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 91f0f0c18062cb8ca36c104a270b1d16ff13cd52 
>   src/main/python/apache/aurora/config/schema/base.py 
> b15b93942f6d92d8f4c84c82d61950414f4f3d34 
>   src/main/python/apache/aurora/config/thrift.py 
> 3539469d243638c0acd08bf0859d0ce858d8977c 
>   src/test/python/apache/aurora/config/test_thrift.py 
> e213184739167e01f3614c20a809af39b3a6b3d6 
>   src/test/sh/org/apache/aurora/e2e/Dockerfile.netcat 
> c8b2f46a36d2ddecf030bc7f005afc1f7fe5e99d 
>   src/test/sh/org/apache/aurora/e2e/check-fs.sh PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
> b2b977b0c248e7156061653aacec43947cc49cd5 
>   src/test/sh/org/apache/aurora/e2e/run-server.sh 
> 1fe09090bfa3eeb0f6897e8c895782fb1ff949a1 
> 
> Diff: https://reviews.apache.org/r/5/diff/
> 
> 
> Testing
> ---
> 
> sh ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 54459: Add message parameter to killTasks

2017-02-02 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54459/#review163995
---


Ship it!




Ship It!

- Joshua Cohen


On Feb. 2, 2017, 12:57 a.m., Cody Gibb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54459/
> ---
> 
> (Updated Feb. 2, 2017, 12:57 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1846
> https://issues.apache.org/jira/browse/AURORA-1846
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> RPC's such as pauseJobUpdate include a parameter for "a user-specified 
> message to include with the induced job update state change." This diff 
> provides a similar optional parameter for the killTasks RPC, which allows 
> users to indicate the reason why a task was killed, and later inspect that 
> reason when consuming task events.
> 
> Example usage from Aurora CLI: `$ aurora job killall 
> devcluster/www-data/prod/hello --message "Some message"`
> 
> In the task event, the supplied message (if provided) is appended to the 
> existing template "Killed by ", separated by a newline. For the above 
> example, this looks like: "Killed by aurora\nSome message".
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c3ec96584d7acdc50c80fc629f8cff61f2d206d7 
>   src/main/java/org/apache/aurora/scheduler/thrift/AuditMessages.java 
> a7aecedcab7a5ae116b0536fe7831689e065ea80 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  a40114d6c49b3892f21aedddcdd3a414675e39f3 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/aop/AnnotatedAuroraAdmin.java
>  bfc3dc868b2b52d3ccf6a7c54039ed75fe0501a1 
>   src/main/python/apache/aurora/client/api/__init__.py 
> e1dde638bd1d686269fbcd88cb083a52e7f5dbfc 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 8ba41aa914f7ddd301891f67dd763ba50a977a9d 
>   src/main/python/apache/aurora/client/hooks/hooked_api.py 
> 86c3a09481d0504690439b608c269285b9cc2a38 
>   
> src/test/java/org/apache/aurora/scheduler/http/api/security/HttpSecurityIT.java
>  b20900da82915f429c35a68775b2106fb15394ff 
>   
> src/test/java/org/apache/aurora/scheduler/http/api/security/ShiroAuthorizingParamInterceptorTest.java
>  05f4a18938be8075e487478cab06ff51874e08a7 
>   src/test/java/org/apache/aurora/scheduler/thrift/AuditMessagesTest.java 
> 9c8460ce206eb13b4b9910d76d6fcade439c9c61 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  b7574e45368b877807b6b0b391e0ae4a1610b780 
>   src/test/python/apache/aurora/api_util.py 
> 983cde48a48bc2388e8535c3d17b867373c8dcaa 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> f6018caa4f431e85a9e9ff203ac3d4b6c33f40ef 
>   src/test/python/apache/aurora/client/cli/test_kill.py 
> 564ea440983359ae3c382c58aa249965752fedbc 
>   src/test/python/apache/aurora/client/hooks/test_hooked_api.py 
> eb97c611d7e17668a3e646722684a847fc491d1a 
>   src/test/python/apache/aurora/client/hooks/test_non_hooked_api.py 
> 04b2257cb7e4e78ab90564ab27b160de2b1c8912 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> 736d1fcf1d99226efd252336f1630e1f033073ca 
> 
> Diff: https://reviews.apache.org/r/54459/diff/
> 
> 
> Testing
> ---
> 
> Added a unit test in the scheduler, and a test in the client.
> 
> Also manually tested using the Vagrant environment.
> 
> 
> File Attachments
> 
> 
> "Some message" shown in UI
>   
> https://reviews.apache.org/media/uploaded/files/2016/12/07/9ab25047-4418-4121-906c-380ded9e1962__Screen_Shot_2016-12-06_at_4.56.43_PM.png
> 
> 
> Thanks,
> 
> Cody Gibb
> 
>



Re: Review Request 56131: Suppress role deprecation warning as replacement is not yet ready.

2017-01-31 Thread Joshua Cohen


> On Jan. 31, 2017, 7:09 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java,
> >  line 155
> > 
> >
> > Can you link to the commit that did this or something? It would make it 
> > much easier to determine later if we can remove it.

+1.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56131/#review163697
---


On Jan. 31, 2017, 6:41 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56131/
> ---
> 
> (Updated Jan. 31, 2017, 6:41 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The role field was prematurely deprecated in the Mesos project. 
> https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L257
> 
> Suppress deprecation warnings.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  49877682e501d0af76f6e2583b59e93b1bd90137 
> 
> Diff: https://reviews.apache.org/r/56131/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 54754: Fixed starting cron jobs when using default_docker_parameters

2017-01-27 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54754/#review163334
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 15, 2016, 5:48 p.m., Steve Niemitz wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54754/
> ---
> 
> (Updated Dec. 15, 2016, 5:48 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1684
> https://issues.apache.org/jira/browse/AURORA-1684
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Fixed starting cron jobs when using default_docker_parameters
> 
> The code was previously attempting to re-sanitize the configuration read from 
> storage rather than just using it as is.  This causes issues if after 
> sanitization the job no longer passes sanitization (which is the case here w/ 
> default_docker_parameters).
> 
> We've been running this in our branch forever.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> 5873983479c39a011eb363f7fd442867f0794b17 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronLifecycle.java 
> 650facecc2e02be7bb3cd5ea9ff0f094e006bcb3 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> fb06c282c4e94639c521658682c5da3e4dd4baf7 
> 
> Diff: https://reviews.apache.org/r/54754/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Niemitz
> 
>



Re: Review Request 55020: AURORA-1835 Expose finer grained offer veto stats

2017-01-25 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55020/#review163005
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 23, 2016, 10:46 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55020/
> ---
> 
> (Updated Dec. 23, 2016, 10:46 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Bugs: AURORA-1835
> https://issues.apache.org/jira/browse/AURORA-1835
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1835 Expose finer grained offer veto stats
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/TaskVars.java 
> 6351cc74c152d1f902078154ad14376c19c6ef1a 
>   src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 
> 05cd78f4c7c7d8dd6eeb6f2f9a3e8f7a167f274d 
> 
> Diff: https://reviews.apache.org/r/55020/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep scheduling_veto
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 502370 502370 0  8672k  0 --:--:-- --:--:-- --:--:-- 9811k
> scheduling_veto_insufficient_resources 5
> scheduling_veto_static 5
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55951: Use --launch_info when invoking MesosContainerizer.

2017-01-25 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55951/#review163003
---




build-support/packer/build.sh (line 20)
<https://reviews.apache.org/r/55951/#comment234383>

This should probably be reverted?



src/main/python/apache/thermos/common/process_util.py (lines 25 - 37)
<https://reviews.apache.org/r/55951/#comment234384>

Rather than continuing to manually construct a json representation of a 
protobuf, we might want to look at just creating the actual `LaunchInfo` 
protobuf and converting it directly to json?

I looked into this at one point, and it was problematic due to the version 
of the protobuf library we use (tied to the version in use by Mesos). I 
remember thinking the problem wasn't intractable though. Would you mind taking 
a swing at that now that the json object is more complex?


- Joshua Cohen


On Jan. 25, 2017, 7:02 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55951/
> ---
> 
> (Updated Jan. 25, 2017, 7:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Karthik Anantha 
> Padmanabhan.
> 
> 
> Bugs: AURORA-1882
> https://issues.apache.org/jira/browse/AURORA-1882
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> MesosContainerizer has updated the command line parameters and
> consolidated the individual arguments into a single ContainerLaunchInfo
> proto buf message. Update ThermosExecutor to use the new `--launch_info`
> parameter to be compatible with MesosContainerizer.
> 
> 
> Diffs
> -
> 
>   build-support/packer/build.sh 548cf37e097c6ed56fc6cc718a642b105afb9331 
>   src/main/python/apache/thermos/common/process_util.py 
> 54e716b726fc02f3901f1b9143d3fa253511e29b 
>   src/test/python/apache/thermos/core/test_process.py 
> 520390217f691b9136cb4d36262be3d372a16509 
> 
> Diff: https://reviews.apache.org/r/55951/diff/
> 
> 
> Testing
> ---
> 
> build-support/jenkins/build.sh
> 
> TBD: End-to-end test needs Mesos 1.2.0 which has not been released?
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 55902: Capture health check output

2017-01-25 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/#review162993
---


Ship it!




lgtm pending a clean reviewbot run.

- Joshua Cohen


On Jan. 25, 2017, 6:37 p.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55902/
> ---
> 
> (Updated Jan. 25, 2017, 6:37 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1881
> https://issues.apache.org/jira/browse/AURORA-1881
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Users really could really benefit from seeing the output of the shell health 
> check failure, so plumbing through the output.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/common/health_check/shell.py 
> 58470a48a7a14092eeb8837aada9e358c6922c93 
>   src/test/python/apache/aurora/common/health_check/test_shell.py 
> 792ef40028cda112db5b93c4cc37e305937bc351 
> 
> Diff: https://reviews.apache.org/r/55902/diff/
> 
> 
> Testing
> ---
> 
> added unit tests
> e2e tests
> screenshot attached.
> 
> 
> File Attachments
> 
> 
> Updated screenshot
>   
> https://reviews.apache.org/media/uploaded/files/2017/01/25/90d6ff4f-84f9-4b4d-9b3d-56dadf7027ae__Screen_Shot_2017-01-24_at_5.32.08_PM.png
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>



Re: Review Request 55471: AURORA-1876 Expose stats on scheduler rate limiter

2017-01-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55471/#review162695
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 23, 2017, 5 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55471/
> ---
> 
> (Updated Jan. 23, 2017, 5 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1876
> https://issues.apache.org/jira/browse/AURORA-1876
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch exposes stats on `rateLimiter.acquire()` blocking events in 
> `TaskGroups`. Hence, providing visibility into whether scheduling rate is 
> above/below `MAX_SCHEDULE_ATTEMPTS_PER_SEC`.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> 2d548b0adc9219080aa8e0e8b329f57edda01c13 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 566e0d9ee8976a5eadd3088d34280fcfb0de3df0 
> 
> Diff: https://reviews.apache.org/r/55471/diff/
> 
> 
> Testing
> ---
> 
> ```
> $ curl localhost:8081/vars | grep schedule_attempts_blocks
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 279760 279760 0  2836k  0 --:--:-- --:--:-- --:--:-- 3415k
> schedule_attempts_blocks 0
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 54995: AURORA-1828 Expose stats on the number of offers evaluated before a task is assigned

2017-01-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54995/#review162694
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 22, 2016, 8:56 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54995/
> ---
> 
> (Updated Dec. 22, 2016, 8:56 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1828
> https://issues.apache.org/jira/browse/AURORA-1828
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1828 Expose stats on the number of offers evaluated before a task is 
> assigned
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> c1e7b0377b6bc0084b3810351e81916b7a7e4374 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> b482be52518c40a0e9d2206bf52bcbb4ad94f46d 
> 
> Diff: https://reviews.apache.org/r/54995/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 54288: Make leader elections resilient to ZK disconnections.

2017-01-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54288/#review162693
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 23, 2017, 8:21 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54288/
> ---
> 
> (Updated Jan. 23, 2017, 8:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1669
> https://issues.apache.org/jira/browse/AURORA-1669
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates
> leadership if the ZK connection is lost or if there is a timeout. This is not
> compatible with the commons based implementation which would only abdicate
> leadership if the ZK session timeout occurred.
> 
> This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a
> custom listener that only loses leadership if a connection loss occurs.
> 
> 
> Diffs
> -
> 
>   
> commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java
>  50acaeba82e163f8f2970a264cbd889c9eb3b5ed 
>   
> src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java
>  c378172c850aafe0a9381552b5067277b40dbfab 
>   
> src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java
>  a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 
>   
> src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java
>  6ea49b0c690d288ff59d1d4798144bfa2d153d3a 
> 
> Diff: https://reviews.apache.org/r/54288/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 55684: Fix command escaping when using the Mesos containerizer

2017-01-19 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55684/#review162338
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 18, 2017, 8:54 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55684/
> ---
> 
> (Updated Jan. 18, 2017, 8:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1782
> https://issues.apache.org/jira/browse/AURORA-1782
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The important bit is the change to call the Mesos containerizer with 
> `shell=False`. Getting rid of manual json encoding and eliminating shlex 
> might have helped as well, but was more motivated by clarity rather than 
> correctness.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/common/health_check/shell.py 
> 6ac9021c3ef3324c64f23f068b31b0c239a0349e 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> a5fb18f367c8ad1321267a8f75e3b9aca0ad1ff5 
>   src/main/python/apache/thermos/common/process_util.py 
> 637b025dd98348b2741aa1c03117e1bc899022d5 
>   src/main/python/apache/thermos/core/process.py 
> 496b5401b0656115750513cf038104fbc1b3c654 
>   src/test/python/apache/aurora/common/health_check/test_shell.py 
> 4f028787d7ec5fa73263718d8bddff4c8ccdd477 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> 86a71c7be44ef442fa839a9677300155386550b9 
>   src/test/python/apache/thermos/core/test_process.py 
> 2fd3d9618cddefdd7863a3a88e294d5e79562b2e 
>   src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
> 1985f89ffc9142877bc168c93a7efd19b6bcd4aa 
> 
> Diff: https://reviews.apache.org/r/55684/diff/
> 
> 
> Testing
> ---
> 
> I have verifed that the new e2e fail with the code on master but succeed with 
> this patch.
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 55646: Ensure Aurora thrift support js and html.

2017-01-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55646/#review161968
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 17, 2017, 9:30 p.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55646/
> ---
> 
> (Updated Jan. 17, 2017, 9:30 p.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Bugs: AURORA-1875
> https://issues.apache.org/jira/browse/AURORA-1875
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We use these for the Aurora UI and the API docs.
> 
>  build-support/thrift/thriftw | 2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> Diffs
> -
> 
>   build-support/thrift/thriftw 278576f22b6127da306e2b12ece3686981a317d3 
> 
> Diff: https://reviews.apache.org/r/55646/diff/
> 
> 
> Testing
> ---
> 
> Missed these 2 in https://reviews.apache.org/r/55536/
> 
> Our custom thrift build does not rebuild since it passes the new checks:
> ```
> $ ./build-support/thrift/thriftw 0.9.1 --which
> /home/jsirois/dev/apache/jsirois-aurora/build-support/thrift/thrift-0.9.1/compiler/cpp/thrift
> $ ./build-support/thrift/thriftw 0.9.1 -help 2>&1 | grep -iE "javascript|html"
>   html (HTML):
> standalone:  Self-contained mode, includes all CSS in the HTML files.
>  Generates no style.css file, but HTML files will be 
> larger.
>   js (Javascript):
> ```
> 
> 
> Thanks,
> 
> John Sirois
> 
>



Re: Review Request 55071: Disable H2 logging via SLF4J.

2017-01-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55071/#review161954
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 28, 2016, 11:28 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55071/
> ---
> 
> (Updated Dec. 28, 2016, 11:28 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Sending H2 log messages to SLF4J is expensive as the messages will be 
> assembled in memory, only to be discarded by logback afterwards. With this 
> change H2 errors will be printed to stdout instead. In this case H2 is clever 
> enough to only assemble debug messages when necessary. 
> 
> I have never seen H2 error messages in the wild, so I believe the change is 
> safe. If one of you is using H2 logging regularly, we could consider exposing 
> the trace level as a commandline flag. 
> 
> On the performance side, we gain about a factor 2-3 in our micro benchmarks 
> (excluding the TaskStoreBenchmarks which are skewed by caching).
> 
> Before the change:
> ```
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  61431.706 ± 13049.185  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  30442.256 ± 19143.475  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  0.087 ± 0.022  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 1000 N/A N/A  thrpt5151.684 ±77.845  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 5000 N/A N/A  thrpt5 39.098 ± 3.030  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
>1 N/A N/A  thrpt5 20.233 ± 2.385  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run  1 
>  N/A N/A N/A  thrpt5 39.556 ±13.071  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 
>  N/A N/A N/A  thrpt5 39.307 ± 9.991  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run100 
>  N/A N/A N/A  thrpt5 24.029 ± 5.149  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run   1000 
>  N/A N/A N/A  thrpt5 17.671 ± 6.065  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A  10 N/A  thrpt5 37.791 ±13.941  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A 100 N/A  thrpt5 37.189 ±26.962  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A1000 N/A  thrpt5 26.881 ±15.143  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A   1 N/A  thrpt5 13.874 ± 4.928  ops/s
> ```
> 
> With this change:
> ```
> Benchmark (instanceOverrides)  
> (instances)  (metadata)  (numTasks)   Mode  Cnt  Score   Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  64510.645 ± 11207.229  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  53270.516 ± 21560.208  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  40471.488 ± 36196.208  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 1000 N/A N/A  thrpt5471.175 ±   436.428  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 5000 N/A N/A  thrpt5120.241 ±   113.142  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
>1 N/A N/A  thrpt5 67.512 ±28.800  ops/s
> UpdateStor

Re: Review Request 55536: Improve `thriftw` robustness.

2017-01-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55536/#review161945
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 14, 2017, 6:19 p.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55536/
> ---
> 
> (Updated Jan. 14, 2017, 6:19 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1875
> https://issues.apache.org/jira/browse/AURORA-1875
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Now the selected thrift is checked both for the proper version and
> support of the gen langs Aurora requires. In addition, all thrifts on
> the `PATH` are and an existing locally built thrift is always verified
> to protect Aurora thrift requirement changes (if we ever add a gen lang
> again).
> 
>  build-support/thrift/thriftw | 54 
> +++---
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> 
> Diffs
> -
> 
>   build-support/thrift/thriftw 1185298b902b5f1075816d0113ddc34dd4c26a89 
> 
> Diff: https://reviews.apache.org/r/55536/diff/
> 
> 
> Testing
> ---
> 
> I have thrift latest (0.10.0) installed as a system package:
> ```
> $ ./build-support/thrift/thriftw 0.10.0 --which
> /usr/bin/thrift
> $ ./build-support/thrift/thriftw 0.10.0 -help 2>&1 | grep Swift
>   swift (Swift):
> $ ./build-support/thrift/thriftw 0.9.1 --which
> /home/jsirois/dev/apache/jsirois-aurora/build-support/thrift/thrift-0.9.1/compiler/cpp/thrift
> $ ./build-support/thrift/thriftw 0.9.1 -help 2>&1 | grep Swift
> $
> ```
> 
> I also edited the "java (Java)" gen identifier to "java (Java2)" and
> observed the script forcing a clean recompile of 0.9.1 with a subsequent
> failure to find a compatible thrift.
> 
> 
> Thanks,
> 
> John Sirois
> 
>



Re: Review Request 55578: Log process sampling failures with debug severity

2017-01-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55578/#review161934
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 16, 2017, 2:44 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55578/
> ---
> 
> (Updated Jan. 16, 2017, 2:44 p.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Bugs: AURORA-1541
> https://issues.apache.org/jira/browse/AURORA-1541
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The observer's logs consist of lots of warnings about being unable to find 
> PIDs. This is expected when running with the PID isolator, or when tasks when 
> the checkpoint is out of date (e.g. if processes were killed by the OOM). 
> 
> W0116 14:42:54.694221 3253 process_collector_psutil.py:75] Error during 
> process sampling: psutil.NoSuchProcess process no longer exists (pid=27727)
> W0116 14:42:54.717905 3253 process_collector_psutil.py:42] Error during 
> process sampling [pid=10960]: psutil.NoSuchProcess process no longer exists 
> (pid=10960)
> W0116 14:42:54.718089 3253 process_collector_psutil.py:75] Error during 
> process sampling: psutil.NoSuchProcess process no longer exists (pid=10960)
> W0116 14:42:54.718245 3253 process_collector_psutil.py:42] Error during 
> process sampling [pid=10026]: psutil.NoSuchProcess process no longer exists 
> (pid=10026)
> W0116 14:42:54.718334 3253 process_collector_psutil.py:75] Error during 
> process sampling: psutil.NoSuchProcess process no longer exists (pid=10026)
> 
> This change adopts the proposal of David Robinson to decrease the severity 
> level to debug.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/thermos/monitoring/process_collector_psutil.py 
> 4a1b159637ae18113aa7f4d10a0864be0fd1e920 
> 
> Diff: https://reviews.apache.org/r/55578/diff/
> 
> 
> Testing
> ---
> 
> ./pants test.pytest src/{main,test}/python:: -- -v
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 55434: Reduce logging by ChainedStatusChecker and StatusManager when they're on the happy path.

2017-01-11 Thread Joshua Cohen


> On Jan. 11, 2017, 10:05 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/common/status_checker.py, line 134
> > <https://reviews.apache.org/r/55434/diff/1/?file=1603106#file1603106line134>
> >
> > Just like Java, Python can do the formatting for you when needed. Just 
> > pass your arguments directly to the info function and the logger will do 
> > the rest if needed.

done, thanks for the nudge.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55434/#review161310
-------


On Jan. 11, 2017, 10:18 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55434/
> ---
> 
> (Updated Jan. 11, 2017, 10:18 p.m.)
> 
> 
> Review request for Aurora and Santhosh Kumar Shanmugham.
> 
> 
> Bugs: AURORA-1878
> https://issues.apache.org/jira/browse/AURORA-1878
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Reduce logging by ChainedStatusChecker and StatusManager when they're  on the 
> happy path.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/status_checker.py 
> f278825e58bba40c3b3ec735173705feb42bf165 
>   src/main/python/apache/aurora/executor/status_manager.py 
> 8b536a925e3f209c03b3eb44257096a0c0e497e0 
> 
> Diff: https://reviews.apache.org/r/55434/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 55434: Reduce logging by ChainedStatusChecker and StatusManager when they're on the happy path.

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55434/
---

(Updated Jan. 11, 2017, 10:18 p.m.)


Review request for Aurora and Santhosh Kumar Shanmugham.


Changes
---

Clean up logging.


Bugs: AURORA-1878
https://issues.apache.org/jira/browse/AURORA-1878


Repository: aurora


Description
---

Reduce logging by ChainedStatusChecker and StatusManager when they're  on the 
happy path.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/common/status_checker.py 
f278825e58bba40c3b3ec735173705feb42bf165 
  src/main/python/apache/aurora/executor/status_manager.py 
8b536a925e3f209c03b3eb44257096a0c0e497e0 

Diff: https://reviews.apache.org/r/55434/diff/


Testing
---


Thanks,

Joshua Cohen



Re: Review Request 55058: AURORA-1859 Expose stats on statically banned offers

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55058/#review161319
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 28, 2016, 6:25 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55058/
> ---
> 
> (Updated Dec. 28, 2016, 6:25 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Zameer Manji.
> 
> 
> Bugs: AURORA-1859
> https://issues.apache.org/jira/browse/AURORA-1859
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1859 Expose stats on statically banned offers
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 6c2b6d20658a7fe75725487c9a983e884d9ddfe5 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 5e570b6341c55be8ef27469077932d1ea8378b55 
> 
> Diff: https://reviews.apache.org/r/55058/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep statically_banned_offers
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 343790 343790 0  6747k  0 --:--:-- --:--:-- --:--:-- 8393k
> statically_banned_offers 1
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review161318
---


Ship it!




Ship It!

- Joshua Cohen


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> ---
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on 
> scheduler landing page (`/scheduler`) causes a call to 
> `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very 
> inefficient as it requires a sequential scan of the task store and mapping to 
> their respective job keys. In Twitter clusters this method is currently 
> taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply 
> returning an immutable copy of the key set of the existing secondary index 
> `job`.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> ---
> 
> Using the following modified version of existing 
> `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks 
> `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends 
> AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
>   storage = Guice.createInjector(
>   Modules.combine(
>   DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new 
> InMemStoresModule(PLAIN))),
>   new AbstractModule() {
> @Override
> protected void configure() {
>   bind(StatsProvider.class).toInstance(new 
> FakeStatsProvider());
>   bind(Clock.class).toInstance(new FakeClock());
> }
>   }))
>   .getInstance(Storage.class);
> 
> }
> 
> @Setup(Level.Iteration)
> public void setUpIteration() {
>   createTasks(numTasks);
> }
> 
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
>   deleteTasks();
> }
> 
> @Benchmark
> public Iterable run() {
>   return storage.read(store -> store.getTaskStore().getJobKeys());
> }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark   (numTasks)   Mode  Cnt
> Score Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run   1  thrpt5  
> 572.761 ± 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run   5  thrpt5   
> 80.697 ±   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run  10  thrpt5   
> 25.102 ±   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark   (numTasks)   Mode  Cnt   
> Score   Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run   1  thrpt5  
> 327336.998 ± 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run   5  thrpt5  
> 320506.958 ± 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run  10  thrpt5  
> 287962.695 ± 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 54967: AURORA-1856 Expose stats on deleted job updates in JobUpdateHistoryPruner

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54967/#review161315
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 22, 2016, 7:37 a.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54967/
> ---
> 
> (Updated Dec. 22, 2016, 7:37 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Bugs: AURORA-1856
> https://issues.apache.org/jira/browse/AURORA-1856
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1856 Expose stats on deleted job updates in JobUpdateHistoryPruner
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/pruning/JobUpdateHistoryPruner.java 
> 6ab39ca0a64dfab9fe2fdec79fef1b4d320b6dc6 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/JobUpdateHistoryPrunerTest.java
>  20f790ef4d04cd8aaa7cdab4442040a31fa72838 
> 
> Diff: https://reviews.apache.org/r/54967/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55357: AURORA-1867 Consider reserving for multiple tasks per preemption round

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55357/#review161307
---


Ship it!




Since the default behavior is changing (formerly defaulted to batch size of 1, 
now defaulting to batch size of 5), I believe this should be mentioned in 
RELEASE-NOTES.md.

Other than that and the typo below, lgtm.


src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java (line 
41)
<https://reviews.apache.org/r/55357/#comment232539>

s/UNMATECHED/UNMATCHED


- Joshua Cohen


On Jan. 11, 2017, 4:48 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55357/
> ---
> 
> (Updated Jan. 11, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1867
> https://issues.apache.org/jira/browse/AURORA-1867
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> To be fair, PendingTaskProcessor interleaves tasks from different groups. 
> However, this fairness comes at the price of increasing reservation time. 
> Even if reservations are being made for the same task group, the processor 
> would still restart iterating through slaves for each task instance. This 
> results in reevaluating all slaves already rejected in a previous search 
> before it finds a new viable candidate.
> 
> This patch improves `PendingTaskProcessor` performance by reducing slave 
> search/evaluation time, at the cost of reduced fairness. 
> `PendingTaskProcessor` now does reservation for a configurable maximum of _N_ 
> candidates per task group in each iteration over the list of slaves.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> fa37236e68657b539b182519b9d46d96d5b0953a 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
> f59f3fd8959b1ba3726b55a2943fb9228a049ac5 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java 
> 67822cafbe89f4798b4ea6da3856663cc4872798 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 23d1c120657d5cb9d294a80c63e8a04512d361ca 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
>  d11ae5883f2a00dca4c4b36f0ab58ea95c7ecb2e 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 67b6d69e3ddd1028dfe9ff451b171cd888674920 
> 
> Diff: https://reviews.apache.org/r/55357/diff/
> 
> 
> Testing
> ---
> 
> As is, the cluster setup in our existing preemption benchmark does not 
> reflect the improvements resulting from this patch. Currently, all existing 
> victims can be preempted, therefore all `PendingTaskProcessor` has to is look 
> at the next slave.
> 
> ```
> BEFORE
> Benchmark   
> (numPendingTasks)   Mode  Cnt   Score   Error  Units
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>   1  thrpt   10  75.386 ± 2.984  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>  10  thrpt   10  74.584 ± 2.598  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
> 100  thrpt   10  79.731 ± 2.182  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
> 1000  thrpt   10  66.386 ± 1.833  ops/s
> 
> AFTER
> Benchmark   
> (numPendingTasks)   Mode  Cnt   Score   Error  Units
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>   1  thrpt   10  78.266 ± 3.290  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>  10  thrpt   10  76.743 ± 2.073  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
> 100  thrpt   10  75.343 ± 1.943  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
> 1000  thrpt   10  68.284 ± 2.413  ops/s
> ```
> 
> I need to further imprpve the cluster setup for this benchmark to reflect the 
> improvements in the patch. A more representative cluster setup would be one 
> in which only a subset of potential victims pass 
> `PreemptionVictimFilter.filterPreemptionVictims()` test.
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55434: Reduce logging by ChainedStatusChecker and StatusManager when they're on the happy path.

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55434/#review161305
---



@ReviewBot retry

- Joshua Cohen


On Jan. 11, 2017, 9:41 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55434/
> ---
> 
> (Updated Jan. 11, 2017, 9:41 p.m.)
> 
> 
> Review request for Aurora and Santhosh Kumar Shanmugham.
> 
> 
> Bugs: AURORA-1878
> https://issues.apache.org/jira/browse/AURORA-1878
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Reduce logging by ChainedStatusChecker and StatusManager when they're  on the 
> happy path.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/status_checker.py 
> f278825e58bba40c3b3ec735173705feb42bf165 
>   src/main/python/apache/aurora/executor/status_manager.py 
> 8b536a925e3f209c03b3eb44257096a0c0e497e0 
> 
> Diff: https://reviews.apache.org/r/55434/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Review Request 55434: Reduce logging by ChainedStatusChecker and StatusManager when they're on the happy path.

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55434/
---

Review request for Aurora and Santhosh Kumar Shanmugham.


Bugs: AURORA-1878
https://issues.apache.org/jira/browse/AURORA-1878


Repository: aurora


Description
---

Reduce logging by ChainedStatusChecker and StatusManager when they're  on the 
happy path.


Diffs
-

  src/main/python/apache/aurora/executor/common/status_checker.py 
f278825e58bba40c3b3ec735173705feb42bf165 
  src/main/python/apache/aurora/executor/status_manager.py 
8b536a925e3f209c03b3eb44257096a0c0e497e0 

Diff: https://reviews.apache.org/r/55434/diff/


Testing
---


Thanks,

Joshua Cohen



Re: Review Request 55409: AURORA-1873 CuratorServiceGroupMonitor.LOG should use its own logger name

2017-01-11 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55409/#review161250
---


Ship it!




Thanks for cleaning this up!

- Joshua Cohen


On Jan. 11, 2017, 9:27 a.m., Bing-Qian Luan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55409/
> ---
> 
> (Updated Jan. 11, 2017, 9:27 a.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1873 CuratorServiceGroupMonitor.LOG should use its own logger name
> 
> 
> Diffs
> -
> 
>   
> commons/src/main/java/org/apache/aurora/common/application/ShutdownRegistry.java
>  5ee740fed58175f0b42c45320a4d76c2d5260a63 
>   
> src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceGroupMonitor.java
>  0b86fb6eb95d74a464358e8c1fcd473b52a9afbc 
>   
> src/main/java/org/apache/aurora/scheduler/discovery/ServiceDiscoveryModule.java
>  07a63023511c5bd05a5d9c8d7a421a1091854d4a 
>   
> src/main/java/org/apache/aurora/scheduler/scheduling/RescheduleCalculator.java
>  18edb01113f605398837ac3957a5662abc3bb755 
>   src/main/java/org/apache/aurora/scheduler/storage/backup/StorageBackup.java 
> 92b7b82a2eeaf7688606645cedc837d4467ee1bc 
> 
> Diff: https://reviews.apache.org/r/55409/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> Please NOTE: I didn't run any integration tests.
> 
> 
> Thanks,
> 
> Bing-Qian Luan
> 
>



Re: Review Request 55347: Ensure destination exists when mounting files into a filesystem image.

2017-01-10 Thread Joshua Cohen


> On Jan. 10, 2017, 6:15 p.m., Aurora ReviewBot wrote:
> > Master (d4ebb56) is red with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >  I0110 18:14:35.522526 17409 executor_base.py:45] 
> > Executor [None]:Reason: test_message
> >   INFO] Executor []:Reason: Task initialization 
> > failed: Task failed: Thermos exited for unknown reason (exit status: 1)
> >   generated xml file: 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/aaf4d108c31293299a0839bdc404a91802f80937.xml
> >  
> >   1 failed, 775 passed, 7 skipped, 1 warnings 
> > in 298.66 seconds 
> >  
> > FAILURE
> > 
> > 
> > 18:15:01 05:42   [complete]
> >FAILURE
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"

@ReviewBot retry


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55347/#review161091
---


On Jan. 10, 2017, 6 p.m., Joshua Cohen wrote:
> 
> 

Re: Review Request 55347: Ensure destination exists when mounting files into a filesystem image.

2017-01-10 Thread Joshua Cohen


> On Jan. 9, 2017, 7:28 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/common/sandbox.py, lines 285-290
> > <https://reviews.apache.org/r/55347/diff/1/?file=1600157#file1600157line285>
> >
> > This looks suspicious. Creating a directory and then checking it does 
> > not exist.

Fixed.


> On Jan. 9, 2017, 7:28 p.m., Stephan Erb wrote:
> > src/test/python/apache/aurora/executor/common/test_sandbox.py, lines 404-410
> > <https://reviews.apache.org/r/55347/diff/1/?file=1600158#file1600158line404>
> >
> > With that amount of mocking the test is both hard to understand and 
> > brittle to changes of the production code.
> > 
> > Have you considered extending this 
> > https://github.com/apache/aurora/blob/master/examples/vagrant/upstart/aurora-scheduler.conf#L43
> >  list instead?

Added to startup config and added e2e test as well.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55347/#review160933
---


On Jan. 10, 2017, 6 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55347/
> ---
> 
> (Updated Jan. 10, 2017, 6 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When testing filesystem isolation internally, we ran into an issue where 
> mounting a regular file into the task filesystem failed with exit code 32 
> since the mount destination did not exist. To account for this, we'll touch 
> an empty file in the taskfs.
> 
> 
> Diffs
> -
> 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> e68a801017ae02e0ed581129e12a645ccc1e35d4 
>   src/main/python/apache/aurora/executor/common/sandbox.py 
> 9d6e73c4221302c56596ef3591bdeab41c81535e 
>   src/test/python/apache/aurora/executor/common/test_sandbox.py 
> a441d2a60e413395b74b62445846e3871784e729 
>   src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
> 438eb0f03128d21b3201d0a843adebe09422c75e 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> 46309371a33d26bdd467b5fa131634354be2a978 
> 
> Diff: https://reviews.apache.org/r/55347/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 55347: Ensure destination exists when mounting files into a filesystem image.

2017-01-10 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55347/
---

(Updated Jan. 10, 2017, 6 p.m.)


Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.


Changes
---

- Fix bug when creating directories for mount points.
- Add e2e test.
- Minor cleanup on e2e tests to fix i/o redirection on tear down.


Repository: aurora


Description
---

When testing filesystem isolation internally, we ran into an issue where 
mounting a regular file into the task filesystem failed with exit code 32 since 
the mount destination did not exist. To account for this, we'll touch an empty 
file in the taskfs.


Diffs (updated)
-

  examples/vagrant/upstart/aurora-scheduler.conf 
e68a801017ae02e0ed581129e12a645ccc1e35d4 
  src/main/python/apache/aurora/executor/common/sandbox.py 
9d6e73c4221302c56596ef3591bdeab41c81535e 
  src/test/python/apache/aurora/executor/common/test_sandbox.py 
a441d2a60e413395b74b62445846e3871784e729 
  src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
438eb0f03128d21b3201d0a843adebe09422c75e 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
46309371a33d26bdd467b5fa131634354be2a978 

Diff: https://reviews.apache.org/r/55347/diff/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Joshua Cohen



Re: Review Request 55347: Ensure destination exists when mounting files into a filesystem image.

2017-01-09 Thread Joshua Cohen


> On Jan. 9, 2017, 10:57 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/aurora/executor/common/sandbox.py, lines 285-291
> > <https://reviews.apache.org/r/55347/diff/1/?file=1600157#file1600157line285>
> >
> > I tried a `mkdir` followed by `touch` and the path still continues to 
> > be a directory. Following this the mount actually fails.
> > 
> > 
> > $ touch source
> > $ mkdir dest
> > $ touch dest/
> > 
> > $ ls -lrth
> > total 4.0K
> > -rw-r--r-- 1 sshanmugham employee0 Jan  9 22:52 source
> > drwxr-xr-x 2 sshanmugham employee 4.0K Jan  9 22:52 dest
> > 
> > $ sudo mount -n --rbind source dest
> > mount: Not a directory
> > 
> > $ echo $?
> > 32

Yeah, this is a bug Stephan caught above, introduced when I cleaned up the 
internal impl for this patch without paying close attention. Have it fixed 
locally, just working on an e2e test.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55347/#review160995
---


On Jan. 9, 2017, 5:38 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55347/
> ---
> 
> (Updated Jan. 9, 2017, 5:38 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When testing filesystem isolation internally, we ran into an issue where 
> mounting a regular file into the task filesystem failed with exit code 32 
> since the mount destination did not exist. To account for this, we'll touch 
> an empty file in the taskfs.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/sandbox.py 
> 9d6e73c4221302c56596ef3591bdeab41c81535e 
>   src/test/python/apache/aurora/executor/common/test_sandbox.py 
> a441d2a60e413395b74b62445846e3871784e729 
> 
> Diff: https://reviews.apache.org/r/55347/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Review Request 55347: Ensure destination exists when mounting files into a filesystem image.

2017-01-09 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55347/
---

Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.


Repository: aurora


Description
---

When testing filesystem isolation internally, we ran into an issue where 
mounting a regular file into the task filesystem failed with exit code 32 since 
the mount destination did not exist. To account for this, we'll touch an empty 
file in the taskfs.


Diffs
-

  src/main/python/apache/aurora/executor/common/sandbox.py 
9d6e73c4221302c56596ef3591bdeab41c81535e 
  src/test/python/apache/aurora/executor/common/test_sandbox.py 
a441d2a60e413395b74b62445846e3871784e729 

Diff: https://reviews.apache.org/r/55347/diff/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Joshua Cohen



Re: Review Request 55105: AURORA-1870 Add finer grained timings to the Snapshot process

2017-01-04 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55105/#review160521
---




src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
(lines 520 - 532)
<https://reviews.apache.org/r/55105/#comment231669>

Can these be default methods on the interface rather than converting to an 
abstract class?


- Joshua Cohen


On Dec. 30, 2016, 9:18 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55105/
> ---
> 
> (Updated Dec. 30, 2016, 9:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Bugs: AURORA-1870
> https://issues.apache.org/jira/browse/AURORA-1870
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1870 Add finer grained timings to the Snapshot process
> 
> I gave up on `@Timed` interceptor approach because major refactoring is 
> required in order to have snapshot fields instantiated by Guice through 
> `Provider` or `@Provides` interfaces. The abstract class approach is much 
> easier/cleaner.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 7aa111ec14696ae40f518c42f3c7f45d8ab0e94c 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  f56a1624c7188da175ad3e6de323c3442802f2ea 
> 
> Diff: https://reviews.apache.org/r/55105/diff/
> 
> 
> Testing
> ---
> 
> ```
> $ curl localhost:8081/vars | grep snapshot_restore
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
> snapshot_restore_crons_events 1
> snapshot_restore_crons_events_per_sec 0.0
> snapshot_restore_crons_nanos_per_event 0.0
> snapshot_restore_crons_nanos_total 73648
> snapshot_restore_crons_nanos_total_per_sec 0.0
> snapshot_restore_dbscript_events 1
> snapshot_restore_dbscript_events_per_sec 0.0
> snapshot_restore_dbscript_nanos_per_event 0.0
> snapshot_restore_dbscript_nanos_total 1148842021
> snapshot_restore_dbscript_nanos_total_per_sec 0.0
> snapshot_restore_hosts_events 1
> snapshot_restore_hosts_events_per_sec 0.0
> snapshot_restore_hosts_nanos_per_event 0.0
> snapshot_restore_hosts_nanos_total 76166
> snapshot_restore_hosts_nanos_total_per_sec 0.0
> snapshot_restore_job_updates_events 1
> snapshot_restore_job_updates_events_per_sec 0.0
> snapshot_restore_job_updates_nanos_per_event 0.0
> snapshot_restore_job_updates_nanos_total 49482
> snapshot_restore_job_updates_nanos_total_per_sec 0.0
> snapshot_restore_locks_events 1
> snapshot_restore_locks_events_per_sec 0.0
> snapshot_restore_locks_nanos_per_event 0.0
> snapshot_restore_locks_nanos_total 125084
> snapshot_restore_locks_nanos_total_per_sec 0.0
> snapshot_restore_quota_events 1
> snapshot_restore_quota_events_per_sec 0.0
> snapshot_restore_quota_nanos_per_event 0.0
> snapshot_restore_quota_nanos_total 52305
> snapshot_restore_quota_nanos_total_per_sec 0.0
> snapshot_restore_scheduler_metadata_events 1
> snapshot_restore_scheduler_metadata_events_per_sec 0.0
> snapshot_restore_scheduler_metadata_nanos_per_event 0.0
> snapshot_restore_scheduler_metadata_nanos_total 70816
> snapshot_restore_scheduler_metadata_nanos_total_per_sec 0.0
> snapshot_restore_tasks_events 1
> snapshot_restore_tasks_events_per_sec 0.0
> snapshot_restore_tasks_nanos_per_event 0.0
> snapshot_restore_tasks_nanos_total 91377
> snapshot_restore_tasks_nanos_total_per_sec 0.0
> ```
> 
> ```
> $ aurora_admin scheduler_snapshot devcluster
>  INFO] Response from scheduler: OK (message: )
>  
> $ curl localhost:8081/vars | grep snapshot_save
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 482260 482260 0  3266k  0 --:--:-- --:--:-- --:--:-- 3363k
> snapshot_save_crons_events 1
> snapshot_save_crons_events_per_sec 0.0
> snapshot_save_crons_nanos_per_event 0.0
> snapshot_save_crons_nanos_total 466181
> snapshot_save_crons_nanos_total_per_sec 0.0
> snapshot_save_dbscript_events 1
> snapshot_save_dbscript_events_per_sec 0.0
> snapshot_save_dbscript_nanos_per_event 0.0
> snapshot_save_dbscript_nanos_total 1

Re: Review Request 55179: AURORA-1820 Reduce storage write lock contention by adopting Double-Checked Locking pattern in TimedOutTaskHandler

2017-01-04 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160515
---


Ship it!




LGTM pending additional testing per Stephan's comments.

- Joshua Cohen


On Jan. 4, 2017, 5:16 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> ---
> 
> (Updated Jan. 4, 2017, 5:16 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `TimedOutTaskHandler` acquires storage write lock for every task every time 
> they transition to a transient state. It then verifies after a default 
> time-out period of 5 minutes if the task has transitioned out of the 
> transient state.
> 
> The verification step takes place while holding the storage write lock. In 
> over 99% of cases the logic short-circuits and returns from 
> `StateManagerImpl.updateTaskAndExternalState()` once it learns task has 
> transitioned out of the transient state.
> 
> This patch reduces storage write lock contention by adopting Double-Checked 
> Locking pattern in `TimedOutTaskHandler.run()`.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 
> 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 
> 1006ddb6caea015c2d4e014bd044f2933541c84f 
> 
> Diff: https://reviews.apache.org/r/55179/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  25m36.144s
> user  0m1.358s
> sys   0m0.595s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55179: AURORA-1820 Reduce storage write lock contention by adopting Double-Checked Locking pattern in TimedOutTaskHandler

2017-01-04 Thread Joshua Cohen


> On Jan. 4, 2017, 5:39 p.m., Stephan Erb wrote:
> > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java,
> >  line 130
> > <https://reviews.apache.org/r/55179/diff/1/?file=1596711#file1596711line130>
> >
> > Given the motivation of the patch, we should probably check that we did 
> > not try to acquire the storage lock.

+1


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160510
---


On Jan. 4, 2017, 5:16 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> ---
> 
> (Updated Jan. 4, 2017, 5:16 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `TimedOutTaskHandler` acquires storage write lock for every task every time 
> they transition to a transient state. It then verifies after a default 
> time-out period of 5 minutes if the task has transitioned out of the 
> transient state.
> 
> The verification step takes place while holding the storage write lock. In 
> over 99% of cases the logic short-circuits and returns from 
> `StateManagerImpl.updateTaskAndExternalState()` once it learns task has 
> transitioned out of the transient state.
> 
> This patch reduces storage write lock contention by adopting Double-Checked 
> Locking pattern in `TimedOutTaskHandler.run()`.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 
> 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 
> 1006ddb6caea015c2d4e014bd044f2933541c84f 
> 
> Diff: https://reviews.apache.org/r/55179/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  25m36.144s
> user  0m1.358s
> sys   0m0.595s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 54883: Move snapshots into a separate log

2016-12-27 Thread Joshua Cohen


> On Dec. 27, 2016, 10:50 p.m., Joshua Cohen wrote:
> > Overall this looks good to me. I think some work would be required to 
> > productionize it (e.g. make it optional to start). Obviously want to vet 
> > this in a test cluster and we would need some doc changes to go with this 
> > and RELEASE-NOTES.md, etc. but that's easy enough to take care of.
> > 
> > I also want to be sure I understand the behavior in the event of a problem 
> > with one, or both replicated logs. So... scenarios:
> > 
> > 1) One log or the other is missing/corrupt: our state becomes the sum of 
> > whatever's found in the log that's present/valid.
> > 2) There's no chance of a snapshot being persisted that references a log 
> > position that hasn't been persisted to the operation log, right? E.g. 
> > there's no realistic way that appending the noop transaction could take 
> > longer than creating/persisting the snapshot (in which case the snapshot 
> > could be successfully persisted referencing that position, yet that 
> > position would not exist in the operation log).
> 
> David McLaughlin wrote:
> 1) It's exactly the same as today if the log is corrupt - you need to 
> restore from a backup. 
> 2) No, that is not possible. Log appends are synchronous. 
> 
> I'm -1 to making this optional, purely because that doesn't reduce the 
> risk in any way - you'd just be putting off adding flags to your start-up 
> script. But yeah, it means I'll have to vet this change in a production-like 
> environment before this gets committed.

I'd like to hear what others running Aurora in production think about putting 
this behind a flag. This feels like a big change to force on people and I think 
if this came from someone outside of our org we'd like the flexibility to 
enable this on our own timeframe rather than being blocked from picking up all 
updates until we were comfortable/ready.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54883/#review160188
---


On Dec. 22, 2016, 9:26 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54883/
> ---
> 
> (Updated Dec. 22, 2016, 9:26 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, John Sirois, Santhosh Kumar 
> Shanmugham, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> See here for more details: 
> https://docs.google.com/document/d/1QVSEfeoCyt2D6cCmTCxy8-epufcuzIfnqRUkyT1betY/edit?usp=sharing
> 
> The motivation for this patch is the realisation that if we store Snapshots 
> outside of the main transaction log, we only ever to need to hold the storage 
> lock during snapshot creation. This would reduce the time writes are blocked 
> during snapshots by 80% for a large production cluster. I'm also doing this 
> with a view to future work, which is described in the document above. 
> 
> As long as everything works fine, this change is as simple as adding two new 
> args to the scheduler JVM. But it's worth noting that once this change is 
> deployed and the first snapshot is written, you will no longer be able to 
> rollback without first restoring from a backup.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 9e4213f13255a182df938bea44ca87fa03a25318 
>   examples/vagrant/aurorabuild.sh dbec54d3e677db8cb0f02849e5373e619629fc7c 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> e68a801017ae02e0ed581129e12a645ccc1e35d4 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 43cc5b4645bc26b0fc6b23726ad3292699048ded 
>   src/main/java/org/apache/aurora/scheduler/log/Log.java 
> dc77eb435e5f8fdce56727a2f679e8e1907e977c 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/DualLogModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java 
> 21855e184fe20dc339713978b32344b6950701ec 
>   
> src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
> 6704a328a4023a178ed8f86ae4772cb04eb2fa8e 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
> cfa9c56c909bac2e885e33a9fe772cb41cbd9fa9 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
> 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/Snapsho

Re: Review Request 54883: Move snapshots into a separate log

2016-12-27 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54883/#review160188
---



Overall this looks good to me. I think some work would be required to 
productionize it (e.g. make it optional to start). Obviously want to vet this 
in a test cluster and we would need some doc changes to go with this and 
RELEASE-NOTES.md, etc. but that's easy enough to take care of.

I also want to be sure I understand the behavior in the event of a problem with 
one, or both replicated logs. So... scenarios:

1) One log or the other is missing/corrupt: our state becomes the sum of 
whatever's found in the log that's present/valid.
2) There's no chance of a snapshot being persisted that references a log 
position that hasn't been persisted to the operation log, right? E.g. there's 
no realistic way that appending the noop transaction could take longer than 
creating/persisting the snapshot (in which case the snapshot could be 
successfully persisted referencing that position, yet that position would not 
exist in the operation log).


src/main/java/org/apache/aurora/scheduler/log/mesos/DualLogModule.java (lines 
33 - 34)
<https://reviews.apache.org/r/54883/#comment231274>

Is there any benefit in having `MesosLogStreamModule` create the directory 
if an operator has to run `mesos-log --initialize` out of band? Should we just 
add an `@Exists` annotation to the args and drop the `mkdirs` call in 
`MesosLogStreamModule`?



src/main/java/org/apache/aurora/scheduler/log/mesos/DualLogModule.java (line 38)
<https://reviews.apache.org/r/54883/#comment231273>

fix this to reference snapshot, rather than native log data.



src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
(line 107)
<https://reviews.apache.org/r/54883/#comment231263>

nit, `Class` isn't reserved as part of a longer variable name, so just 
`logTypeClass`? Or better yet, `Class` is implied by the type, so just 
`logType`?



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java (line 533)
<https://reviews.apache.org/r/54883/#comment231268>

Use `{}` for log replacements instead of string concat (I thought we had 
checkstyle set up to alert on these now?).



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java (line 542)
<https://reviews.apache.org/r/54883/#comment231269>

This is probably a naive question, but why do we need to noop transaction? 
Can we not just `truncateBefore(snapshotPosition - 1)`? Are the log positions 
just not incremented in this fashion?



src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java (line 
87)
<https://reviews.apache.org/r/54883/#comment231270>

nit: blank line between description and `@param`/`@return`



src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java (lines 
92 - 94)
<https://reviews.apache.org/r/54883/#comment231271>

same here.



src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
(line 106)
<https://reviews.apache.org/r/54883/#comment231272>

+blank line before


- Joshua Cohen


On Dec. 22, 2016, 9:26 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54883/
> -------
> 
> (Updated Dec. 22, 2016, 9:26 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, John Sirois, Santhosh Kumar 
> Shanmugham, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> See here for more details: 
> https://docs.google.com/document/d/1QVSEfeoCyt2D6cCmTCxy8-epufcuzIfnqRUkyT1betY/edit?usp=sharing
> 
> The motivation for this patch is the realisation that if we store Snapshots 
> outside of the main transaction log, we only ever to need to hold the storage 
> lock during snapshot creation. This would reduce the time writes are blocked 
> during snapshots by 80% for a large production cluster. I'm also doing this 
> with a view to future work, which is described in the document above. 
> 
> As long as everything works fine, this change is as simple as adding two new 
> args to the scheduler JVM. But it's worth noting that once this change is 
> deployed and the first snapshot is written, you will no longer be able to 
> rollback without first restoring from a backup.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 9e4213f13255a182df938bea44ca87fa03a25318 
>   examples/vagrant/aurorabuild.sh dbec54d3e677db8cb0f02849e5373e619629fc7c 
>   examples/vagrant/upstart/aurora-sch

Re: Review Request 54624: Expose stats on ZooKeeper connection state

2016-12-27 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54624/#review160175
---


Fix it, then Ship it!




lgtm modulo the below.


src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java
 (lines 68 - 77)
<https://reviews.apache.org/r/54624/#comment231247>

It's prefereble to inject the `StatsProvider` instance and use that to 
create the counters in the constructor rather than to reference 
`Stats.STATS_PROVIDER` directly (if we ever bound another instance of 
`StatsProvider` in `AppModule` this class would not be aware of that, nor would 
we be able to use a mock stats provider for tests if we so desired.


- Joshua Cohen


On Dec. 20, 2016, 9:33 a.m., Jing Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54624/
> ---
> 
> (Updated Dec. 20, 2016, 9:33 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Mehrdad Nurolahzade, Stephan Erb, 
> and Zameer Manji.
> 
> 
> Bugs: AURORA-1838
> https://issues.apache.org/jira/browse/AURORA-1838
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Expose stats on ZooKeeper connection state
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java
>  999a542796858dcfe9e31601c47239189043fd52 
> 
> Diff: https://reviews.apache.org/r/54624/diff/
> 
> 
> Testing
> ---
> 
> http://192.168.33.7:8081/vars
> ```
> zk_connection_state_CONNECTED 1
> zk_connection_state_CONNECTED_counter 1
> zk_connection_state_LOST 0
> zk_connection_state_LOST_counter 0
> zk_connection_state_READ_ONLY 0
> zk_connection_state_READ_ONLY_counter 0
> zk_connection_state_RECONNECTED 0
> zk_connection_state_RECONNECTED_counter 0
> zk_connection_state_SUSPENDED 0
> zk_connection_state_SUSPENDED_counter 0
> ```
> 
> * zk_connection_state_STATE shows 1 if STATE is current connection state, 
> otherwise 0.
> * zk_connection_state_STATE_counter represents occurence times of the STATE 
> since scheduler state
> 
> 
> Thanks,
> 
> Jing Chen
> 
>



Re: Review Request 54847: Remove ignored snapshot stats. Add high-level timings on storage start-up lifecycle.

2016-12-19 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54847/#review159616
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 18, 2016, 11:51 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54847/
> ---
> 
> (Updated Dec. 18, 2016, 11:51 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The stats I added around SnapshotFields are not applied because they are not 
> Guice-created instances. I've just removed them for now (adding timings 
> manually would be very repetitive here so there's probably a better 
> abstraction we can use).
> 
> I added metrics to cover two pretty large parts of the Scheduler start-up 
> time: storage.prepare and storage.start.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
> 9c9218ca6294eaddeb47b274393603bf47af61a8 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 239f2eb475bec20acfcc2990d5d933f5bec83ed4 
> 
> Diff: https://reviews.apache.org/r/54847/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 54774: Avoid double writing job updates to the Scheduler Snapshot

2016-12-16 Thread Joshua Cohen


> On Dec. 16, 2016, 1:13 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java,
> >  lines 69-72
> > <https://reviews.apache.org/r/54774/diff/6/?file=1586548#file1586548line69>
> >
> > I feel like we should change the default to an empty set. The largest 
> > number of Aurora users will not need those fields and there is therefore no 
> > need to populate those by default.

I think at least for one release it should match the current behavior. After 
0.18.0 we can default it to empty?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54774/#review159433
---


On Dec. 15, 2016, 8:04 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54774/
> ---
> 
> (Updated Dec. 15, 2016, 8:04 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1861
> https://issues.apache.org/jira/browse/AURORA-1861
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Motivation: Thanks to the mybatis query metrics we added, we found that 
> double writing Snapshot fields for H2 stores adds considerable overhead to 
> our snapshot creation time. 
> 
> Snapshots are also written as backups, and many operators choose to process 
> backups offline for analytics, rather than query the live scheduler (due to 
> not being able to scale reads horizontally). So this allows operators to 
> enable/disable the hydrated fields as needed.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 90c4793b218f54ad92ab4da5049d5e92c3a104b3 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  3fa408e283f91b313633959ea6d2e730d4dc0771 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
> 7dcd1bfcf303cf374e9a6627cb6c632ccea098f2 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 853780bc68400489578ed3692920aafcec42c999 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 7a11850e217dcb0148e4a4d33542c95b2e53a726 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  cf0a8f3ea11e9c48d1f16441af54dc781b33bdfc 
> 
> Diff: https://reviews.apache.org/r/54774/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> Applied this to an internal test cluster with the update hydration disabled 
> and verified snapshots and backups and failover.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 54774: Avoid double writing job updates to the Scheduler Snapshot

2016-12-15 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54774/#review159347
---


Fix it, then Ship it!




lgtm modulo the below.


src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
(line 128)
<https://reviews.apache.org/r/54774/#comment230310>

This `@Timed` annotation is on the wrong method now.


- Joshua Cohen


On Dec. 15, 2016, 6:16 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54774/
> ---
> 
> (Updated Dec. 15, 2016, 6:16 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1861
> https://issues.apache.org/jira/browse/AURORA-1861
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Motivation: Thanks to the mybatis query metrics we added, we found that 
> double writing Snapshot fields for H2 stores adds considerable overhead to 
> our snapshot creation time. 
> 
> Snapshots are also written as backups, and many operators choose to process 
> backups offline for analytics, rather than query the live scheduler (due to 
> not being able to scale reads horizontally). So this allows operators to 
> enable/disable the hydrated fields as needed.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  3fa408e283f91b313633959ea6d2e730d4dc0771 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
> 7dcd1bfcf303cf374e9a6627cb6c632ccea098f2 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 853780bc68400489578ed3692920aafcec42c999 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 7a11850e217dcb0148e4a4d33542c95b2e53a726 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  cf0a8f3ea11e9c48d1f16441af54dc781b33bdfc 
> 
> Diff: https://reviews.apache.org/r/54774/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> I'll apply this to one of our test clusters before merging to master too.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 54774: Avoid double writing job updates to the Scheduler Snapshot

2016-12-15 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54774/#review159305
---



I think there's some relevant context in the original review for this feature: 
https://reviews.apache.org/r/44471/. There was concern in that review with the 
removal of the native fields from snapshots for the reasons you mentioned in 
the associated ticket. I think that's still a concern, and there's the 
possibility that tooling exists outside of Aurora that depends on these fields 
existing in the snapshot.

I'm generally in favor of the changes you've outlined, as they should 
dramatically improve snapshot performance, however perhaps it makes sense to 
put this behind a command line flag? Maybe a single flag that allows for 
control over which snapshot fields are emitted in addition to the dbScript 
(this would require adding something like `getName` to `SnapshotField`). I'd 
also be fine with providing some tooling that converts the dbScript in a 
snapshot to a fully hydrated thrift snapshot. Either way, I don't think we 
should ship this without some accommodation for that use case. Maybe start with 
the flag (as it's much easier to add), and follow up with the additional 
tooling at a later date (at which point we could kill the flag and all of the 
`SnapshotField`s that are already encompassed by the `dbScript`.)


src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
(lines 315 - 317)
<https://reviews.apache.org/r/54774/#comment230279>

Calling `hasDbSnapshot` has nothing to do with whether or not there's a 
memory store. It has to do with whether or not there's a `dbScript` in the 
snapshot. That's a relatively recent addition to snapshots. Prior to that we 
fetched everything from H2 and dumped it to the snapshot as individual thrift 
entries.


- Joshua Cohen


On Dec. 15, 2016, 7:57 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54774/
> ---
> 
> (Updated Dec. 15, 2016, 7:57 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1861
> https://issues.apache.org/jira/browse/AURORA-1861
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Motivation: Thanks to the mybatis query metrics we added, we found that the 
> redundant query I'm removing here currently adds 20s to the snapshot creation 
> time. This is an extra 20s unavailability every time we snapshot on our prod 
> clusters. 
> 
> ===
> 
> Avoid double writing job updates to the Scheduler Snapshot. This should be 
> low-risk because:
> 
> 1) Job Updates have *never* had an in-memory store implementation. 
> 2) They are also relatively new, so the chances of some old code processing 
> job updates from backups is fairly low?
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> d2c859055f905ac76ee8eb387dca103b9857ddbe 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 7a11850e217dcb0148e4a4d33542c95b2e53a726 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  cf0a8f3ea11e9c48d1f16441af54dc781b33bdfc 
> 
> Diff: https://reviews.apache.org/r/54774/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> I'll apply this to one of our test clusters before merging to master too.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 54773: Add finer grained timings to the Snapshot process.

2016-12-15 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54773/#review159302
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 15, 2016, 7:14 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54773/
> ---
> 
> (Updated Dec. 15, 2016, 7:14 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add finer grained timings to the Snapshot process. I also added some log 
> output, as I found those existing numbers handy when investigating our long 
> snapshot times. 
> 
> Related ticket: https://issues.apache.org/jira/browse/AURORA-1861
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
> 39924c62108f6a68f545f90d77ceab4265d41fad 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> d2c859055f905ac76ee8eb387dca103b9857ddbe 
> 
> Diff: https://reviews.apache.org/r/54773/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 52669: Move the H2 database off heap.

2016-12-13 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52669/#review159028
---



Just a quick note: I enabled this to faciliate something else I'm working on, 
and I discovered that the `/h2console` endpoint doesn't seem to work with these 
changes applied. That endpoint is not well documented, but it's still important 
to have available. We probably need to do some work to get it working reliably 
in vagrant (i.e. no hacks to disable security) and then add some e2e tests 
around it.

- Joshua Cohen


On Oct. 11, 2016, 6:17 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52669/
> ---
> 
> (Updated Oct. 11, 2016, 6:17 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, John Sirois, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This experiment is inspired by David's comment: "I don’t think the
> storage engine matters. We just need to be able to offload it from
> the Scheduler JVM. The problem with H2 isn’t SQL or anything else,
> it’s the GC pressure."
> 
> Basic idea is to switch to another storage backend: "nioMemFS stores
> data outside of the VM's heap - useful for large memory DBs without
> incurring GC costs" (http://www.h2database.com/html/advanced.html)
> 
> Our micro-benchmarks look promising
> 
> Current Master (on-heap db with latest versions):
> 
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  72851.249 ± 15794.210  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  31626.929 ± 17326.988  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  0.078 ± 0.013  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   1  thrpt5414.135 ±   315.838  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   5  thrpt5 68.643 ±24.303  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A  10  thrpt5 32.032 ±13.870  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 1000 N/A N/A  thrpt5143.981 ±78.985  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 5000 N/A N/A  thrpt5 35.224 ±25.593  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
>1 N/A N/A  thrpt5 18.869 ± 3.318  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run  1 
>  N/A N/A N/A  thrpt5 36.013 ±19.743  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 
>  N/A N/A N/A  thrpt5 33.813 ±11.216  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run100 
>  N/A N/A N/A  thrpt5 20.516 ±10.526  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run   1000 
>  N/A N/A N/A  thrpt5 16.564 ± 2.993  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A  10 N/A  thrpt5 32.399 ±21.310  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A 100 N/A  thrpt5 35.518 ± 7.468  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A1000 N/A  thrpt5 19.757 ±10.035  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A   1 N/A  thrpt5 10.849 ±10.660  ops/s
> 
> This patch (off-heap db):
> 
> Benchmark (instanceOverrides)  
> (instances)  (metadata)  (numTasks)   Mode  Cnt  Score   Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  77746.436 ± 47191.240  ops/s
> TaskStoreBenchmarks.DBFetchTas

Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-12 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.
> 
> Joshua Cohen wrote:
> I think that would filter out updates that are currently active but have 
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial 
> example would be an update that processes batches of one instance where each 
> instance takes more than a minute to update.
> 
> Santhosh Kumar Shanmugham wrote:
> I am talking about this part of the code.
> 
>  .filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
>       && timeFrame.contains(taskEvent.getTimestamp()))
>   
> I think I misspoke about the event type, it is a `TaskEvent`.
> 
> Joshua Cohen wrote:
> I'm not sure I follow? The discussion above was related to whether we 
> should be filtering completed updates, or whether we should iterate all 
> updates in the store. Querying for only task events active in the last 
> `SLA_REFRESH_INTERVAL` wouldn't be helpful for a few reasons:
> 
> 1) We already have the full list of tasks in `MetricCalculator` for use 
> in other SLA calculations.
> 2) By the time we iterate task events to find the `ASSIGNED` event, we've 
> already iterated the update details.
> 2) The number of task events at this point will be small... maybe 5 or 6 
> tops.
> 
> Is there something I'm missing?
> 
> Santhosh Kumar Shanmugham wrote:
> I feel that the current logic has correctness issues, since it can miss 
> arbitrary number of data points and makes the metric unreliable. My 
> suggestion is to lead to the creation a metric that is reliable at the 
> expense of maybe more work.
> 
> David McLaughlin wrote:
> Oh so I was not proposing querying all updates rather than just active 
> ones. Santhosh's suggestion was more in line with what I was thinking as an 
> alternative to avoid the data loss. We might not have storage support for 
> such a query right now (or maybe we do?), but could be worth adding to 
> support this?

I'm still not clear what you guys are asking for. Are you just saying rather 
than filtering out all non-active updates, we should instead filter out all 
updates that have been not been active with some timeframe?

If so, using just `SLA_REFRESH_INTERVAL` would not work, since that raises the 
issues in my original response to Santhosh (i.e. an update that is active, but 
where instances take a long time to transition to running would be missed). We 
could pick just pick some abritrary value to try and minimize the chances of 
that occurring, e.g. any update that has instance update events within the past 
10 minutes is included... This still has the chance of missing some data, 
though unlikely unless under a very specific set of circumstances.

I think the current impl is also not likely to miss much data though. Since 
we're timing from `UPDATING` to `ASSIGNED`,  that means the task would have to 
be assigned and the entire update would need to complete within the same 
`SLA_REFRESH_INTERVAL` for the da

Review Request 54669: Fix thrift bootstrap to use python2.7.

2016-12-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54669/
---

Review request for Aurora and John Sirois.


Repository: aurora


Description
---

Fix thrift bootstrap to use python2.7.


Diffs
-

  build-support/thrift/Makefile 02e983582fb1a89efeee60a2de08f9050f8d17c3 

Diff: https://reviews.apache.org/r/54669/diff/


Testing
---

Verified this patch on a machine where python2.7 is not the default system 
python.


Thanks,

Joshua Cohen



Re: Review Request 54624: Expose stats on ZooKeeper connection state

2016-12-10 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54624/#review158773
---



I think the idea behind the ticket was to track these as stats, not just into 
the log.

- Joshua Cohen


On Dec. 10, 2016, 11:04 a.m., Jing Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54624/
> ---
> 
> (Updated Dec. 10, 2016, 11:04 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1838
> https://issues.apache.org/jira/browse/AURORA-1838
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Expose stats on ZooKeeper connection state
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java
>  999a542796858dcfe9e31601c47239189043fd52 
> 
> Diff: https://reviews.apache.org/r/54624/diff/
> 
> 
> Testing
> ---
> 
> State Change is logged into aurora-scheduler.log:
> ```
> I1210 10:44:37.973 [main-EventThread, ConnectionStateManager] State change: 
> CONNECTED
> ```
> 
> 
> Thanks,
> 
> Jing Chen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.
> 
> Joshua Cohen wrote:
> I think that would filter out updates that are currently active but have 
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial 
> example would be an update that processes batches of one instance where each 
> instance takes more than a minute to update.
> 
> Santhosh Kumar Shanmugham wrote:
> I am talking about this part of the code.
> 
>  .filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
>   && timeFrame.contains(taskEvent.getTimestamp()))
>   
> I think I misspoke about the event type, it is a `TaskEvent`.

I'm not sure I follow? The discussion above was related to whether we should be 
filtering completed updates, or whether we should iterate all updates in the 
store. Querying for only task events active in the last `SLA_REFRESH_INTERVAL` 
wouldn't be helpful for a few reasons:

1) We already have the full list of tasks in `MetricCalculator` for use in 
other SLA calculations.
2) By the time we iterate task events to find the `ASSIGNED` event, we've 
already iterated the update details.
2) The number of task events at this point will be small... maybe 5 or 6 tops.

Is there something I'm missing?


- Joshua


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 9:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 9:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.

Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.

I think that would filter out updates that are currently active but have not 
have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial example 
would be an update that processes batches of one instance where each instance 
takes more than a minute to update.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 9:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 9:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54567: Fixup prepare_binary.sh to work under modern bash.

2016-12-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54567/#review158638
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 9, 2016, 3:58 a.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54567/
> ---
> 
> (Updated Dec. 9, 2016, 3:58 a.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Previously pushd/popd were used and these emit data to stdout muddying
> pants output and breaking setup of the thrift serve dir structure.
> 
>  build-support/thrift/prepare_binary.sh | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> 
> Diffs
> -
> 
>   build-support/thrift/prepare_binary.sh 
> e3b46aa1f36ed3eecc9bb7794073d18dd950eb1c 
> 
> Diff: https://reviews.apache.org/r/54567/diff/
> 
> 
> Testing
> ---
> 
> Reproduced the current master CI failure before this change on my linux 
> machine.
> Fixed with this change.
> 
> 
> Thanks,
> 
> John Sirois
> 
>



Review Request 54569: Change back to thrift directory silently after executing pants command.

2016-12-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54569/
---

Review request for Aurora and John Sirois.


Repository: aurora


Description
---

Change back to thrift directory silently after executing pants command.


Diffs
-

  build-support/thrift/prepare_binary.sh 
e3b46aa1f36ed3eecc9bb7794073d18dd950eb1c 

Diff: https://reviews.apache.org/r/54569/diff/


Testing
---

Ran this in vagrant.


Thanks,

Joshua Cohen



Re: Review Request 54550: Specify python2.7 when extracting options from pants.ini during thrift bootstrap.

2016-12-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54550/
---

(Updated Dec. 9, 2016, 2:30 a.m.)


Review request for Aurora and Santhosh Kumar Shanmugham.


Changes
---

set -o xtrace


Repository: aurora


Description
---

We have some CI machines internally where the default python doesn't have the 
`json` module, so this script fails.


Diffs (updated)
-

  build-support/thrift/prepare_binary.sh 
a96b33e4564604f72d8fcf284f0cf1adea1866a9 

Diff: https://reviews.apache.org/r/54550/diff/


Testing
---


Thanks,

Joshua Cohen



Re: Review Request 54550: Specify python2.7 when extracting options from pants.ini during thrift bootstrap.

2016-12-08 Thread Joshua Cohen


> On Dec. 9, 2016, 2:22 a.m., Zameer Manji wrote:
> > build-support/thrift/prepare_binary.sh, line 3
> > <https://reviews.apache.org/r/54550/diff/2/?file=1580208#file1580208line3>
> >
> > To be consistent can you do `set -o xtrace`?

TIL


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54550/#review158625
---


On Dec. 9, 2016, 2:15 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54550/
> ---
> 
> (Updated Dec. 9, 2016, 2:15 a.m.)
> 
> 
> Review request for Aurora and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We have some CI machines internally where the default python doesn't have the 
> `json` module, so this script fails.
> 
> 
> Diffs
> -
> 
>   build-support/thrift/prepare_binary.sh 
> a96b33e4564604f72d8fcf284f0cf1adea1866a9 
> 
> Diff: https://reviews.apache.org/r/54550/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54550: Specify python2.7 when extracting options from pants.ini during thrift bootstrap.

2016-12-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54550/
---

(Updated Dec. 9, 2016, 2:15 a.m.)


Review request for Aurora and Santhosh Kumar Shanmugham.


Changes
---

I also ran into an issue where running pants from outside of the root directory 
fails:

$ pwd
/home/jcohen/aurora/build-support/thrift
$ /home/jcohen/aurora/pants
Exception caught: ()
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/bin/pants", 
line 11, in 
sys.exit(main())
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/pants_exe.py",
 line 44, in main
PantsRunner(exiter).run()
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/pants_runner.py",
 line 57, in run
options_bootstrapper=options_bootstrapper)
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/pants_runner.py",
 line 46, in _run
return LocalPantsRunner(exiter, args, env, 
options_bootstrapper=options_bootstrapper).run()
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/local_pants_runner.py",
 line 53, in run
self._maybe_profiled(self._run)
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/local_pants_runner.py",
 line 50, in _maybe_profiled
runner()
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/local_pants_runner.py",
 line 59, in _run
options, build_config = OptionsInitializer(options_bootstrapper, 
exiter=self._exiter).setup()
  File 
"/home/jcohen/.cache/pants/setup/bootstrap-Linux-x86_64/unspecified/lib/python2.7/site-packages/pants/bin/options_initializer.py",
 line 142, in setup
.format(global_bootstrap_options.pants_version, pants_version())
Exception message: Version mismatch: Requested version was 1.3.0.dev3, our 
version is 1.1.0.
$ cd ../..
$ pwd
/home/jcohen/aurora
$ /home/jcohen/aurora/pants
No goals specified.
Use `pants goals` to list goals.

Updated the script to cd to the root before getting the thrift options.


Repository: aurora


Description
---

We have some CI machines internally where the default python doesn't have the 
`json` module, so this script fails.


Diffs (updated)
-

  build-support/thrift/prepare_binary.sh 
a96b33e4564604f72d8fcf284f0cf1adea1866a9 

Diff: https://reviews.apache.org/r/54550/diff/


Testing
---


Thanks,

Joshua Cohen



Review Request 54550: Specify python2.7 when extracting options from pants.ini during thrift bootstrap.

2016-12-08 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54550/
---

Review request for Aurora and Santhosh Kumar Shanmugham.


Repository: aurora


Description
---

We have some CI machines internally where the default python doesn't have the 
`json` module, so this script fails.


Diffs
-

  build-support/thrift/prepare_binary.sh 
a96b33e4564604f72d8fcf284f0cf1adea1866a9 

Diff: https://reviews.apache.org/r/54550/diff/


Testing
---


Thanks,

Joshua Cohen



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Joshua Cohen


> On Dec. 7, 2016, 6:21 p.m., Mehrdad Nurolahzade wrote:
> > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java, line 
> > 261
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578838#file1578838line261>
> >
> > I find the calculation logic in some of these test cases difficult to 
> > follow. 
> > 
> > Could you maybe add comments explaining in plain math what is the 
> > justification behind the hard-coded expected values?

Agree that it's hard to follow the calculations. I'll add some comments 
(reluctantly, since these are the types of comments that quickly get out of 
sync with reality if the underlying numbers change).


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158360
-------


On Dec. 7, 2016, 5:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 5:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.

My thinking was that the vast majority of updates in the store will be 
completed hours or days ago, so there's no need to consider them when 
calculating the mttu. You're right, this does mean that we might lose some data 
points for tasks that moved to `ASSIGNED` in the same `SLA_REFRESH_INTERVAL` 
(defaults to one minute) in which the entire update completed.

For reference, some general stats from one of our clusters: currently at 
off-peak hours, .02% of all updates in the update store are active. It's hard 
to say with certainty, historically how many updates were active at any given 
time. But anecdotatlly it's a small fraction of the total number of updates in 
the store, generously speaking I'd say 1-2%. That being the case, by including 
only active updates in the calculation, we reduce the work to be done by 
anywhere from 98 to 99.98 percent.

I feel like this is a fair trade off to make, but I'm not steadfast in that 
opinion.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
-------


On Dec. 7, 2016, 5:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 5:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54520: Revert BUILD changes in 0c177058.

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54520/#review158497
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 8, 2016, 4:39 a.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54520/
> ---
> 
> (Updated Dec. 8, 2016, 4:39 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The changes caused `python_tests` target to lose their sources which in
> turn caused tests not to run.
> 
>  src/main/python/apache/aurora/admin/BUILD   | 1 +
>  src/main/python/apache/aurora/client/BUILD  | 1 +
>  src/main/python/apache/aurora/common/BUILD  | 1 +
>  src/main/python/apache/aurora/config/BUILD  | 1 +
>  src/main/python/apache/aurora/executor/BUILD| 1 +
>  src/main/python/apache/aurora/kerberos/BUILD| 1 +
>  src/main/python/apache/aurora/tools/BUILD   | 1 +
>  src/main/python/apache/thermos/cli/BUILD| 1 +
>  src/main/python/apache/thermos/common/BUILD | 1 +
>  src/main/python/apache/thermos/config/BUILD | 1 +
>  src/main/python/apache/thermos/core/BUILD   | 1 +
>  src/main/python/apache/thermos/monitoring/BUILD | 1 +
>  src/main/python/apache/thermos/observer/BUILD   | 1 +
>  src/main/python/apache/thermos/runner/BUILD | 1 +
>  src/main/python/apache/thermos/testing/BUILD| 1 +
>  src/test/python/apache/aurora/BUILD | 1 +
>  src/test/python/apache/aurora/admin/BUILD   | 2 ++
>  src/test/python/apache/aurora/client/BUILD  | 2 ++
>  src/test/python/apache/aurora/client/api/BUILD  | 2 ++
>  src/test/python/apache/aurora/client/cli/BUILD  | 2 ++
>  src/test/python/apache/aurora/client/docker/BUILD   | 2 ++
>  src/test/python/apache/aurora/client/hooks/BUILD| 3 ++-
>  src/test/python/apache/aurora/common/BUILD  | 2 ++
>  src/test/python/apache/aurora/common/health_check/BUILD | 2 ++
>  src/test/python/apache/aurora/config/BUILD  | 2 ++
>  src/test/python/apache/aurora/executor/BUILD| 2 ++
>  src/test/python/apache/aurora/executor/bin/BUILD| 2 ++
>  src/test/python/apache/aurora/executor/common/BUILD | 2 ++
>  src/test/python/apache/aurora/tools/BUILD   | 2 ++
>  src/test/python/apache/thermos/cli/BUILD| 2 ++
>  src/test/python/apache/thermos/cli/commands/BUILD   | 2 ++
>  src/test/python/apache/thermos/common/BUILD | 2 ++
>  src/test/python/apache/thermos/config/BUILD | 2 ++
>  src/test/python/apache/thermos/core/BUILD   | 2 ++
>  src/test/python/apache/thermos/monitoring/BUILD | 2 ++
>  src/test/python/apache/thermos/observer/BUILD   | 2 ++
>  src/test/python/apache/thermos/observer/http/BUILD  | 2 ++
>  37 files changed, 58 insertions(+), 1 deletion(-)
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/admin/BUILD 
> 60bfbeda8369db71099a2056c509ea1f34eadaa6 
>   src/main/python/apache/aurora/client/BUILD 
> 452efd45beeb61042b24cf116690221eb95d02ca 
>   src/main/python/apache/aurora/common/BUILD 
> 85b5b45a099556a21d090e180645f0c680d62db3 
>   src/main/python/apache/aurora/config/BUILD 
> 14364ef77eeaa9fe5cd8df731c7047ba18beb361 
>   src/main/python/apache/aurora/executor/BUILD 
> bb6d4a3e21075ee4d56615e083059d7cae2e3e99 
>   src/main/python/apache/aurora/kerberos/BUILD 
> d1e6ded02ad8f71e56659fa4455dbbe14b965249 
>   src/main/python/apache/aurora/tools/BUILD 
> c970bec2c64226b22b624f39d21b2cedcaf4d60d 
>   src/main/python/apache/thermos/cli/BUILD 
> 784b05c6811a86589a5a1a64bc3d15302bb22479 
>   src/main/python/apache/thermos/common/BUILD 
> 023b87ff4ac28c58914a1a8d3943fb0554b1a259 
>   src/main/python/apache/thermos/config/BUILD 
> 729c6533d3e059c0e65bb69da3af464f1dd9d49f 
>   src/main/python/apache/thermos/core/BUILD 
> e800ccabfb780e8962686434f18caba522f6516b 
>   src/main/python/apache/thermos/monitoring/BUILD 
> a1c16ac782da3df78b325f5273f76c47a6a1febf 
>   src/main/python/apache/thermos/observer/BUILD 
> 714955d91e97fefc30841a2dd975f63962a444df 
>   src/main/python/apache/thermos/runner/BUILD 
> d51587842deeadfc316072b6c7fe4a584412db0a 
>   src/main/python/apache/thermos/testing/BUILD 
> 0964a7ddcb6558b6e685b3fa71

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review158490
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 8, 2016, 12:15 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> ---
> 
> (Updated Dec. 8, 2016, 12:15 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
> https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> d01fcb9594552eb6cdfbdbab2d03707738df3443 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> ---
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 54255: Update to Mesos 1.1.0.

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54255/#review158355
---


Ship it!




Can you add links to the Mesos release notes to the review?


RELEASE-NOTES.md (line 5)
<https://reviews.apache.org/r/54255/#comment229120>

1.1.0 ;)


- Joshua Cohen


On Dec. 7, 2016, 11:04 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54255/
> ---
> 
> (Updated Dec. 7, 2016, 11:04 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1813
> https://issues.apache.org/jira/browse/AURORA-1813
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Included changes:
> 
> * Handle new task states introduced in the latest Mesos release.
> * Prevent NullPointer exception when inspecting an empty/invalid executor 
> config in a test.
>   Probably this is due to a change in the Mesos protobufs.
> * Fix bug preventing the teardown of Vagrant boxes started by the egg build.
> * Increase resources for the Mesos egg builds. The build for all distribution 
> now takes 2h in total.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD c572da02e68957567eca0fc3cde9c518aa4fe126 
>   RELEASE-NOTES.md 90c4793b218f54ad92ab4da5049d5e92c3a104b3 
>   Vagrantfile c20c7861095de0f81dbf57c617640e24257b52b4 
>   build-support/packer/build.sh 2fb4723ed4d57be25371c6b60375044c857afabd 
>   build-support/python/make-mesos-native-egg 
> bcc6e85abb7536002afb3071c08e2008bcc1ecca 
>   build.gradle 7064bcace19a183c41e04c4ac7a6944df6063d2a 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 4a1e0876609ce8eda457c0174e3ea3fbb8609376 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
>  7dc9eff2c7058e90eb93f7d7f4f378e8883989e5 
> 
> Diff: https://reviews.apache.org/r/54255/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> ./pants test.pytest src/{main,test}/python:: -- -v
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 53836: Get pants using the same thrift binary as gradle.

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53836/#review158399
---


Ship it!




Ship It!

- Joshua Cohen


On Dec. 7, 2016, 12:30 a.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53836/
> ---
> 
> (Updated Dec. 7, 2016, 12:30 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This required an upgrade to the latest pants dev release to correct
> an issue with the setup.py binary packager we use to generate sdists.
> 
> This is for sanity sake, and, once the TODO in
> `build-support/thrift/prepare_binary.sh` is addressed, it will also
> allow pants-patch-free addition of new platforms (thinking ARM).
> 
>  api/src/main/thrift/org/apache/aurora/gen/BUILD|  3 
> +++
>  api/src/main/thrift/org/apache/thermos/BUILD   |  1 +
>  build-support/thrift/.gitignore|  1 +
>  build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch| 42 
> +--
>  build-support/thrift/AURORA-1727.lib.py.setup.py.patch | 45 
> +
>  {api/src/main/thrift/org/apache/thermos => build-support/thrift}/BUILD | 18 
> -
>  build-support/thrift/Makefile  |  2 
> +-
>  build-support/thrift/prepare_binary.sh | 68 
> ++
>  build-support/thrift/thriftw   | 12 
> +++--
>  pants.ini  |  7 
> +-
>  10 files changed, 158 insertions(+), 41 deletions(-)
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/BUILD 
> f4004a3ca36078c6d24991db4beb68903a05652c 
>   api/src/main/thrift/org/apache/thermos/BUILD 
> cb655a2fd35d22f7d7e80c4311742bad763d8614 
>   build-support/thrift/.gitignore 9a3adb69210ba3dbf3c1f408895561da37e8f4c3 
>   build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
> b69e3fef137cd73c6f2b73201463a0705ef8082a 
>   build-support/thrift/AURORA-1727.lib.py.setup.py.patch PRE-CREATION 
>   build-support/thrift/BUILD PRE-CREATION 
>   build-support/thrift/Makefile 48b174ad622288d2738a5fa37bbb72385fcc3a27 
>   build-support/thrift/prepare_binary.sh PRE-CREATION 
>   build-support/thrift/thriftw 50d6dfdeb16ca8bf14aaff7aa826e3d69c5e13f0 
>   pants.ini cecdb277f327f77b2652f76a30fc8d4ffd9ff1db 
>   src/main/python/apache/aurora/admin/BUILD 
> b5d37f718a25d43d7ac07e30f789cf526bdfdcc2 
>   src/main/python/apache/aurora/client/BUILD 
> 1411765ed47d4286e3ba0636bc3f3c1b29afac02 
>   src/main/python/apache/aurora/common/BUILD 
> 0e4c51020946dc21953493bd43b944177c444c28 
>   src/main/python/apache/aurora/config/BUILD 
> 12e7fe973f456d0847ce63d3b293131a7f4c3bdd 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
>   src/main/python/apache/aurora/kerberos/BUILD 
> 847852fdf7762789579f8818c677dc1f79a76c73 
>   src/main/python/apache/aurora/tools/BUILD 
> 6717f92c65d7bc1d71c67e37f57750c28f10fda3 
>   src/main/python/apache/thermos/cli/BUILD 
> a4932b2757a53ab8654c5d9c0fbae4446c9c5383 
>   src/main/python/apache/thermos/common/BUILD 
> 0adabbfa3f0230dae95aca5002c128832916fdd6 
>   src/main/python/apache/thermos/config/BUILD 
> 6e52c4bbe111529e14327de699218d9340e66fa9 
>   src/main/python/apache/thermos/core/BUILD 
> 82448cec3380e237001f2fb02a28df4c5e5adb30 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/observer/BUILD 
> 95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
>   src/main/python/apache/thermos/runner/BUILD 
> 085b6ce35e8f67d9db0f6838bafeeb525191e6e9 
>   src/main/python/apache/thermos/testing/BUILD 
> 24261701c5334999431a6cd41a6a711753180a89 
>   src/test/python/apache/aurora/BUILD 
> 626f1108a24e25747f2c57b4a9f21acbc32fd2d5 
>   src/test/python/apache/aurora/admin/BUILD 
> 093537ed13406be3e8dff3245283b4d8d53481a3 
>   src/test/python/apache/aurora/client/BUILD 
> 0dc7b8ed19e1216d027095afc43e063d17bedad1 
>   src/test/python/apache/aurora/client/api/BUILD 
> e9742240ec4847b4256b554a01dd2369f6fb62f5 
>   src/test/python/apache/aurora

Re: Review Request 54459: Add message parameter to killTasks

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54459/#review158290
---



Thanks for the patch! Looks good to me overall, a few comments below, almost 
entirely style-based. +1 to David's suggestion to add to the end to end tests, 
and if you could also add a note to RELEASE-NOTES.md, that'd be great.


src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
(lines 429 - 430)
<https://reviews.apache.org/r/54459/#comment229059>

Formatting for continuation indents should be:

public Response killTasks(
@Nullable JobKey mutableJob,
...
...) {

  // code continues here, after blank line



src/main/python/apache/aurora/client/hooks/hooked_api.py (line 55)
<https://reviews.apache.org/r/54459/#comment229061>

Let's not change the argument order on these methods? It's possible someone 
has a custom hook defined somewhere that isn't passing these as named args.



src/test/java/org/apache/aurora/scheduler/thrift/AuditMessagesTest.java (lines 
35 - 37)
<https://reviews.apache.org/r/54459/#comment229062>

Fix indentation (should be 4 spaces for continuation).



src/test/java/org/apache/aurora/scheduler/thrift/AuditMessagesTest.java (lines 
52 - 53)
<https://reviews.apache.org/r/54459/#comment229063>

Same here.



src/test/java/org/apache/aurora/scheduler/thrift/AuditMessagesTest.java (lines 
67 - 72)
<https://reviews.apache.org/r/54459/#comment229064>

And here.



src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 (lines 608 - 612)
<https://reviews.apache.org/r/54459/#comment229065>

Fix indent.


- Joshua Cohen


On Dec. 7, 2016, 2:20 a.m., Cody Gibb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54459/
> ---
> 
> (Updated Dec. 7, 2016, 2:20 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1846
> https://issues.apache.org/jira/browse/AURORA-1846
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> RPC's such as pauseJobUpdate include a parameter for "a user-specified 
> message to include with the induced job update state change." This diff 
> provides a similar optional parameter for the killTasks RPC, which allows 
> users to indicate the reason why a task was killed, and later inspect that 
> reason when consuming task events.
> 
> Example usage from Aurora CLI: `$ aurora job killall 
> devcluster/www-data/prod/hello --message "Some message"`
> 
> In the task event, the supplied message (if provided) is appended to the 
> existing template "Killed by ", separated by a newline. For the above 
> example, this looks like: "Killed by aurora\nSome message".
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c3ec96584d7acdc50c80fc629f8cff61f2d206d7 
>   src/main/java/org/apache/aurora/scheduler/thrift/AuditMessages.java 
> a7aecedcab7a5ae116b0536fe7831689e065ea80 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  16b1b52f8691d978a9ec1bf7aa0c9716b3484cf0 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/aop/AnnotatedAuroraAdmin.java
>  bfc3dc868b2b52d3ccf6a7c54039ed75fe0501a1 
>   src/main/python/apache/aurora/client/api/__init__.py 
> e1dde638bd1d686269fbcd88cb083a52e7f5dbfc 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 8ba41aa914f7ddd301891f67dd763ba50a977a9d 
>   src/main/python/apache/aurora/client/hooks/hooked_api.py 
> 86c3a09481d0504690439b608c269285b9cc2a38 
>   
> src/test/java/org/apache/aurora/scheduler/http/api/security/HttpSecurityIT.java
>  b20900da82915f429c35a68775b2106fb15394ff 
>   
> src/test/java/org/apache/aurora/scheduler/http/api/security/ShiroAuthorizingParamInterceptorTest.java
>  05f4a18938be8075e487478cab06ff51874e08a7 
>   src/test/java/org/apache/aurora/scheduler/thrift/AuditMessagesTest.java 
> 9c8460ce206eb13b4b9910d76d6fcade439c9c61 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  b28cd2489a52041a8e7e53f298fad8d8cd29406f 
>   src/test/python/apache/aurora/api_util.py 
> 983cde48a48bc2388e8535c3d17b867373c8dcaa 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> f6018caa4f431e85a9e9ff203ac3d4b6c33f40ef 
>   src/test/python/apache/aurora/client/cli/test_kill.py 
> 564ea440983359ae3c382c5

Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/
---

(Updated Dec. 7, 2016, 5:50 p.m.)


Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, and 
Zameer Manji.


Changes
---

Add task index to the mttu calculation to avoid iterating tasks for every job 
update.


Repository: aurora


Description
---

The metric is calculated from the time of the `INSTANCE_UPDATING` event to the 
subsequent `ASSIGNED` event for the task with the same instance id that matches 
the desired task config from the update details.

My original approach to this involved converting `GroupType` and 
`AlgorithmType` from enums (which cannot be generic) to static classes (which, 
of course, can). This allowed me to avoid unnecessarily passing update details 
to the `calculate` method of `SlaAlgorithm` since it's ignored in all but the 
one, new case. However, that ended up being a lot of churn, and since it turns 
out we need both the task details and the update details to calculate this 
metric, I went with the below approach. If anyone feels strongly, I could go 
back to generics and create an container class that's gives access to both the 
tasks and update details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
9a56cda809fbbcb07e6dd12c7a0feb272542491d 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
5d8d5bd8f705770979f284d26d2e932aabe707e5 
  src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
6fbd4e962b3bb6eeb0831c810a321478fd52172c 
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
953b65f28a585375e36e305dea6f9f94f99abc93 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
2e719ac6b7aea86faa22deff2cc6b5f73135761c 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
341e346e794c9cf9a2789b8799f38fff900ec9b3 
  src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
78f440f7546de9ed6842cb51db02b3bddc9a74ff 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
21d26b3930ea965487b2dec48a48a98677ba022b 
  src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 

Diff: https://reviews.apache.org/r/54439/diff/


Testing
---

./gradlew build -Pq
e2e tests.


Thanks,

Joshua Cohen



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Joshua Cohen


> On Dec. 7, 2016, 4:42 p.m., Mehrdad Nurolahzade wrote:
> > A general side note: SLA metrics calculation is currently the most 
> > expensive cpu-bound operation handled by the scheduler (it can take as much 
> > as 50% master cpu cycles). The calculators seem like a good fit for 
> > micro-benchmarking with JMH.

Do you think that's a blocker to shipping this metric, or just a general 
nice-to-have?


> On Dec. 7, 2016, 4:42 p.m., Mehrdad Nurolahzade wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, lines 
> > 189-193
> > <https://reviews.apache.org/r/54439/diff/2/?file=1578124#file1578124line189>
> >
> > This lookup seems like something that can be improved by an index? 
> > I.e., a mapping from `instanceId -> IScheduledTask`.

Yeah, I agree. I debated adding that when I was writing it, thanks for the 
nudge.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158343
-------


On Dec. 7, 2016, 2:36 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 2:36 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-06 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/
---

(Updated Dec. 6, 2016, 8:58 p.m.)


Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.


Changes
---

Remove unused imports.


Repository: aurora


Description
---

The metric is calculated from the time of the `INSTANCE_UPDATING` event to the 
subsequent `ASSIGNED` event for the task with the same instance id that matches 
the desired task config from the update details.

My original approach to this involved converting `GroupType` and 
`AlgorithmType` from enums (which cannot be generic) to static classes (which, 
of course, can). This allowed me to avoid unnecessarily passing update details 
to the `calculate` method of `SlaAlgorithm` since it's ignored in all but the 
one, new case. However, that ended up being a lot of churn, and since it turns 
out we need both the task details and the update details to calculate this 
metric, I went with the below approach. If anyone feels strongly, I could go 
back to generics and create an container class that's gives access to both the 
tasks and update details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
9a56cda809fbbcb07e6dd12c7a0feb272542491d 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
5d8d5bd8f705770979f284d26d2e932aabe707e5 
  src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
6fbd4e962b3bb6eeb0831c810a321478fd52172c 
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
953b65f28a585375e36e305dea6f9f94f99abc93 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
2e719ac6b7aea86faa22deff2cc6b5f73135761c 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
341e346e794c9cf9a2789b8799f38fff900ec9b3 
  src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
78f440f7546de9ed6842cb51db02b3bddc9a74ff 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
21d26b3930ea965487b2dec48a48a98677ba022b 
  src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 

Diff: https://reviews.apache.org/r/54439/diff/


Testing
---

./gradlew build -Pq
e2e tests.


Thanks,

Joshua Cohen



Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-06 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/
---

Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.


Repository: aurora


Description
---

The metric is calculated from the time of the `INSTANCE_UPDATING` event to the 
subsequent `ASSIGNED` event for the task with the same instance id that matches 
the desired task config from the update details.

My original approach to this involved converting `GroupType` and 
`AlgorithmType` from enums (which cannot be generic) to static classes (which, 
of course, can). This allowed me to avoid unnecessarily passing update details 
to the `calculate` method of `SlaAlgorithm` since it's ignored in all but the 
one, new case. However, that ended up being a lot of churn, and since it turns 
out we need both the task details and the update details to calculate this 
metric, I went with the below approach. If anyone feels strongly, I could go 
back to generics and create an container class that's gives access to both the 
tasks and update details.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
9a56cda809fbbcb07e6dd12c7a0feb272542491d 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
5d8d5bd8f705770979f284d26d2e932aabe707e5 
  src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
6fbd4e962b3bb6eeb0831c810a321478fd52172c 
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
953b65f28a585375e36e305dea6f9f94f99abc93 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
2e719ac6b7aea86faa22deff2cc6b5f73135761c 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
341e346e794c9cf9a2789b8799f38fff900ec9b3 
  src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
78f440f7546de9ed6842cb51db02b3bddc9a74ff 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
21d26b3930ea965487b2dec48a48a98677ba022b 
  src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 

Diff: https://reviews.apache.org/r/54439/diff/


Testing
---

./gradlew build -Pq
e2e tests.


Thanks,

Joshua Cohen



Review Request 54428: Fix invalid logging that was causing pmd to NPE.

2016-12-06 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54428/
---

Review request for Aurora and Stephan Erb.


Repository: aurora


Description
---

Fix invalid logging that was causing pmd to NPE.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
ae7d019097f3e7049622623b48f3731198a08fb0 

Diff: https://reviews.apache.org/r/54428/diff/


Testing
---


Thanks,

Joshua Cohen



  1   2   3   4   5   6   7   8   >