Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/#review144223
---


Ship it!




Ship It!

- Joshua Cohen


On July 29, 2016, 10:46 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50617/
> ---
> 
> (Updated July 29, 2016, 10:46 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Added missing test cases
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50617/diff/
> 
> 
> Testing
> ---
> 
> `./pants test src/test/python/apache/aurora/client/cli:cli`
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/#review144215
---



Master (e14d0a3) is red with this patch.
  ./build-support/jenkins/build.sh

  Test coverage missing for 
org/apache/aurora/scheduler/storage/log/StreamManagerImpl
  Test coverage missing for 
org/apache/aurora/scheduler/storage/log/EntrySerializer$EntrySerializerImpl
  Test coverage missing for 
org/apache/aurora/scheduler/storage/log/LogStorage$ScheduledExecutorSchedulingService
  Test coverage missing for 
org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator$SnapshotDeduplicatorImpl
  Test coverage missing for org/apache/aurora/scheduler/storage/log/Entries
  Test coverage missing for 
org/apache/aurora/scheduler/storage/log/WriteAheadStorage
  Test coverage missing for org/apache/aurora/scheduler/storage/log/LogManager
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory$1
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/StorageBackup$StorageBackupImpl$BackupConfig
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/StorageBackup$StorageBackupImpl
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory
  Test coverage missing for 
org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl$PendingRecovery
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$7
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$6
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$5
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$4
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$3
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$2
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$1
  Test coverage missing for org/apache/aurora/scheduler/HostOffer
  Test coverage missing for 
org/apache/aurora/scheduler/SchedulerLifecycle$SchedulerCandidateImpl
  Test coverage missing for 
org/apache/aurora/scheduler/SchedulerLifecycle$DefaultDelayedActions
  Test coverage missing for org/apache/aurora/scheduler/TaskVars$Counter
  Test coverage missing for org/apache/aurora/scheduler/TaskVars$1
  Test coverage missing for org/apache/aurora/scheduler/HostOffer$1
  Test coverage missing for 
org/apache/aurora/scheduler/TaskIdGenerator$TaskIdGeneratorImpl
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle
  Test coverage missing for 
org/apache/aurora/scheduler/storage/db/typehandlers/JobUpdateStatusTypeHandler
  Test coverage missing for 
org/apache/aurora/scheduler/storage/db/typehandlers/MaintenanceModeTypeHandler
  Test coverage missing for 
org/apache/aurora/scheduler/storage/db/typehandlers/CronCollisionPolicyTypeHandler
  Test coverage missing for 
org/apache/aurora/scheduler/storage/db/typehandlers/JobUpdateActionTypeHandler

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
==

BUILD FAILED

Total time: 3 mins 39.138 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 29, 2016, 10:46 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50617/
> ---
> 
> (Updated July 29, 2016, 10:46 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Added missing test cases
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50617/diff/
> 
> 
> Testing
> ---
> 
> `./pants test src/test/python/apache/aurora/client/cli:cli`
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/
---

(Updated July 29, 2016, 3:46 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Sorry, forgot to commit :)


Bugs: AURORA-1741
https://issues.apache.org/jira/browse/AURORA-1741


Repository: aurora


Description
---

AURORA-1741 Added missing test cases


Diffs (updated)
-

  src/test/python/apache/aurora/client/test_config.py 
4742fa28e3156e5b20791b80f2db8392f7f2f4bf 

Diff: https://reviews.apache.org/r/50617/diff/


Testing
---

`./pants test src/test/python/apache/aurora/client/cli:cli`


Thanks,

Mehrdad Nurolahzade



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/
---

(Updated July 29, 2016, 3:44 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Now using a truely unbound mustache like `{{_unbound_}}` instead of 
`{{thermos.ports[http]}}` that would not throw 
`ThermosTaskValidator.assert_all_refs_bound()` off on a `config.job()` call.


Bugs: AURORA-1741
https://issues.apache.org/jira/browse/AURORA-1741


Repository: aurora


Description
---

AURORA-1741 Added missing test cases


Diffs (updated)
-

  src/test/python/apache/aurora/client/test_config.py 
4742fa28e3156e5b20791b80f2db8392f7f2f4bf 

Diff: https://reviews.apache.org/r/50617/diff/


Testing
---

`./pants test src/test/python/apache/aurora/client/cli:cli`


Thanks,

Mehrdad Nurolahzade



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/#review144212
---



Props to Joshua for finding a problem with these test scenarios. 
I'll sibmit a new version shortly.

- Mehrdad Nurolahzade


On July 29, 2016, 2:21 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50617/
> ---
> 
> (Updated July 29, 2016, 2:21 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Added missing test cases
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50617/diff/
> 
> 
> Testing
> ---
> 
> `./pants test src/test/python/apache/aurora/client/cli:cli`
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/#review144206
---


Ship it!




Master (e14d0a3) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 29, 2016, 9:21 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50617/
> ---
> 
> (Updated July 29, 2016, 9:21 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Added missing test cases
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50617/diff/
> 
> 
> Testing
> ---
> 
> `./pants test src/test/python/apache/aurora/client/cli:cli`
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/#review144202
---


Ship it!




Ship It!

- David McLaughlin


On July 29, 2016, 9:21 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50617/
> ---
> 
> (Updated July 29, 2016, 9:21 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Added missing test cases
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50617/diff/
> 
> 
> Testing
> ---
> 
> `./pants test src/test/python/apache/aurora/client/cli:cli`
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Review Request 50617: AURORA-1741 Added missing test cases

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50617/
---

Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Bugs: AURORA-1741
https://issues.apache.org/jira/browse/AURORA-1741


Repository: aurora


Description
---

AURORA-1741 Added missing test cases


Diffs
-

  src/test/python/apache/aurora/client/test_config.py 
4742fa28e3156e5b20791b80f2db8392f7f2f4bf 

Diff: https://reviews.apache.org/r/50617/diff/


Testing
---

`./pants test src/test/python/apache/aurora/client/cli:cli`


Thanks,

Mehrdad Nurolahzade



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-29 Thread Mehrdad Nurolahzade


> On July 29, 2016, 2:03 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/client/config.py, line 138
> > 
> >
> > Isn't this undoing the bug fix to compare w/ `Empty` instead of `None`?
> 
> Mehrdad Nurolahzade wrote:
> You are right, not sure why this is happening.
> 
> But that's not part of my new change-set, should be review board mix up!

I'll create a new RB.


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review144191
---


On July 29, 2016, 2 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 29, 2016, 2 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-29 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review144195
---


Ship it!




Master (e14d0a3) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 29, 2016, 9 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 29, 2016, 9 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-29 Thread Mehrdad Nurolahzade


> On July 29, 2016, 2:03 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/client/config.py, line 138
> > 
> >
> > Isn't this undoing the bug fix to compare w/ `Empty` instead of `None`?

You are right, not sure why this is happening.

But that's not part of my new change-set, should be review board mix up!


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review144191
---


On July 29, 2016, 2 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 29, 2016, 2 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-29 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review144191
---




src/main/python/apache/aurora/client/config.py 


Isn't this undoing the bug fix to compare w/ `Empty` instead of `None`?


- Joshua Cohen


On July 29, 2016, 9 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 29, 2016, 9 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/test/python/apache/aurora/client/test_config.py 
> 4742fa28e3156e5b20791b80f2db8392f7f2f4bf 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/
---

(Updated July 29, 2016, 2 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Added unbound moustache bindings to test cases


Bugs: AURORA-1741
https://issues.apache.org/jira/browse/AURORA-1741


Repository: aurora


Description
---

AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710


Diffs (updated)
-

  src/test/python/apache/aurora/client/test_config.py 
4742fa28e3156e5b20791b80f2db8392f7f2f4bf 

Diff: https://reviews.apache.org/r/50432/diff/


Testing
---

```
./build-support/jenkins/build.sh

```


Thanks,

Mehrdad Nurolahzade



Re: Review Request 50168: Add rollback functionality to the scheduler

2016-07-29 Thread Igor Morozov


> On July 29, 2016, 7:25 p.m., Maxim Khutornenko wrote:
> > Finally had time to go over this review in detail. I am afraid the proposed 
> > approach of rolling back the arbitrary update may not work in general case 
> > and result in unexpected outcome, inconsistent job state and hard to 
> > troubleshoot update failures. Once the global update lock is lifted (e.g. 
> > the update finishes), subsequent updates and/or kill/add instance RPC calls 
> > will drift the job state making the history obsolete. 
> > 
> > For example, consider updates changing instance counts:
> > U1: 5 -> 10
> > U2: 10 -> 20 (or addInstances(0, 10))
> > 
> > If we now attempt to rollback U1 we will end up with a hole [5-9] in the 
> > instance count: 0-4, 10-19 AND likely unexpected extra 10 instances left 
> > (10-19). 
> > 
> > Following the above example we can come up with plenty of other intricate 
> > and hard to reason scenarios. By allowing randomly old update rollback we 
> > are giving our users a powerful tool that is very hard to understand or 
> > ensure correctness. 
> > 
> > The only way this _may_ be improved if we approach rollbackJobUpdate RPC as 
> > the one creating a completely new update request with the old initial state 
> > used as the new desired. This way _all_ current job instances are being 
> > considered, job diff gives correct output and quota checks are properly 
> > done. The latter btw is another problem as rollbackJobUpdate does not call 
> > `validateTaskLimits` to ensure there is enough quota to rollback something 
> > like 10 -> 5. The open problem: in case of multiple initial states to 
> > rollback to, which one to choose as the new desired one? We can only choose 
> > one according to our thrift schema (for a good reason). 
> > 
> > We have thought long and hard about the rollback cases in our internal 
> > deployment orchestration tool and ended up with the approach that operates 
> > at the .aurora config (JobUpdateRequest) level by creating a _new_ update 
> > with the target job config (JobUpdateRequest) of the _old_ one. This way we 
> > can safely rollback to the previously deployed version without violating 
> > job/cluster invariants. 
> > 
> > Given the concerns in this CR, I suggest we move this discussion to dev 
> > list or a design doc.
> > 
> > To seed our further discussion: we can consider an approach where instead 
> > of rolling back a job update itself we rollback _to_ that update's target 
> > state. For example (desired state === desiredState + settings + 
> > instanceCount):
> > U1: desired state A
> > U2: desired state B
> > U3: desired state C
> > 
> > To rollback to state B: U4: use desired state of U2
> > To rollback to state A: U4: use desired state of U1
> > 
> > While not ideal (e.g. there is no way to rollback to state A-1), this would 
> > be safer and easier to reason about. It'd also require a smaller diff as 
> > most of the required code paths already exist.
> 
> Igor Morozov wrote:
> I think the semantics of rollback is pretty well defined: it is to 
> rollback a job(in whatever state it is right now) to the state defined as 
> initial in the update job. Should not scheduler try to kill extra instances 
> in your example as they violate the job configuration it is rolling back to? 
> In this example that simply means having just 5 instances running after 
> rollback.
> 
> Maxim Khutornenko wrote:
> > Should not scheduler try to kill extra instances in your example as 
> they violate the job configuration it is rolling back to?
> 
> Unfortunately, that's not possible in the current implementation. Once an 
> update object is created, the initial and desired states contain only the 
> working instance set calculated by the `JobDiff`. Any instances 
> created/killed after the update finishes are no longer considered. The 
> working set can be further reduced within a given update if 
> `updateOnlyTheseInstances` value is specified. 
> 
> In other words, the updater goes over a narrowly defined set of instances 
> completely ignoring the state of the job after the update has started.

gotcha. Yes, then it is going to be a problem. Let's discuss it in a ticket. I 
have another suggestion.


- Igor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50168/#review144137
---


On July 28, 2016, 10:54 p.m., Igor Morozov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50168/
> ---
> 
> (Updated July 28, 2016, 10:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, Bill Farner, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1721
> https://issues.apache.org/jira/browse/AURORA-1721
> 
> 
> 

Re: Review Request 50584: Upgrade to Mesos 1.0.0

2016-07-29 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50584/#review144185
---


Ship it!




Ship It!

- Maxim Khutornenko


On July 28, 2016, 9:18 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50584/
> ---
> 
> (Updated July 28, 2016, 9:18 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Steve Niemitz.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> I also took this as an opportunity to switch us from mesos.native to 
> mesos.executor on the Python side, meaning Docker containers will no longer 
> require all mesos deps.
> 
> Release notes here: http://mesos.apache.org/blog/mesos-1-0-0-released/
> Upgrade notes here: http://mesos.apache.org/documentation/latest/upgrades/
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD a44624c6cf3dcbdbfcc0504dd5e3dbcbfe6812de 
>   RELEASE-NOTES.md d46420ef876b88d9ab68c4816f1c2d6780aba702 
>   Vagrantfile 5aa2128b8645402b9892ee8dc17439bf58ea19ab 
>   build-support/packer/build.sh 08938ec13f3054c0c5ec6a75c1515f195533bc6a 
>   build-support/python/make-mesos-native-egg 
> 736685f9fc57b2d758edc752dc15bffb426a1879 
>   build.gradle 95401953c51dfabb9ce388d66b5aac109507f74c 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> e5bafa441cc848eb7eabdfe8c9e22dea3fac2ab7 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 8fe2b7997c0dfcbf9abdb467eea0fc768be6745c 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  e137daa2221c49e6fcd2b62b8ad0ccff1b8cb759 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> cbbd6be94aa857b02cd7b45bfb2f0216d9a1cec3 
>   src/main/python/apache/aurora/executor/BUILD 
> 0be65fc7c180d15f5a94bef268652ef7b868dcf7 
>   src/main/python/apache/aurora/executor/bin/thermos_executor_main.py 
> 0ef3856abc0df5403e3443ac35ba8d6940de8938 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  448d57a06b80fe1222abbc5149acf4e752d839a5 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 500fd435b4c72b25abd8df7eea6b3850edc96e99 
>   src/test/sh/org/apache/aurora/e2e/Dockerfile 
> 6fdea3d28760f59235c51c5b6913d2ee0172ef1a 
> 
> Diff: https://reviews.apache.org/r/50584/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ran e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 50584: Upgrade to Mesos 1.0.0

2016-07-29 Thread Maxim Khutornenko


> On July 29, 2016, 7:37 p.m., Maxim Khutornenko wrote:
> > build-support/packer/build.sh, line 141
> > 
> >
> > Is there a release note or a jira link that would support mesos.native 
> > -> mesos.executor transition is safe?
> 
> Joshua Cohen wrote:
> I saw no mention of it in any release notes anywhere, but when building 
> the mesos.native egg I saw it had a different name than it previously did 
> (platform was dropped) which led me to dig into it a bit and discuss w/ 
> Steve. He pointed out that mesos.native is now just a thin wrapper on top of 
> mesos.scheduler and mesos.executor, so there's no longer any reason for the 
> executor to pull in mesos.native which drags along all of the unnecessary 
> scheduler deps.

Thanks for explaining.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50584/#review144164
---


On July 28, 2016, 9:18 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50584/
> ---
> 
> (Updated July 28, 2016, 9:18 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Steve Niemitz.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> I also took this as an opportunity to switch us from mesos.native to 
> mesos.executor on the Python side, meaning Docker containers will no longer 
> require all mesos deps.
> 
> Release notes here: http://mesos.apache.org/blog/mesos-1-0-0-released/
> Upgrade notes here: http://mesos.apache.org/documentation/latest/upgrades/
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD a44624c6cf3dcbdbfcc0504dd5e3dbcbfe6812de 
>   RELEASE-NOTES.md d46420ef876b88d9ab68c4816f1c2d6780aba702 
>   Vagrantfile 5aa2128b8645402b9892ee8dc17439bf58ea19ab 
>   build-support/packer/build.sh 08938ec13f3054c0c5ec6a75c1515f195533bc6a 
>   build-support/python/make-mesos-native-egg 
> 736685f9fc57b2d758edc752dc15bffb426a1879 
>   build.gradle 95401953c51dfabb9ce388d66b5aac109507f74c 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> e5bafa441cc848eb7eabdfe8c9e22dea3fac2ab7 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 8fe2b7997c0dfcbf9abdb467eea0fc768be6745c 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  e137daa2221c49e6fcd2b62b8ad0ccff1b8cb759 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> cbbd6be94aa857b02cd7b45bfb2f0216d9a1cec3 
>   src/main/python/apache/aurora/executor/BUILD 
> 0be65fc7c180d15f5a94bef268652ef7b868dcf7 
>   src/main/python/apache/aurora/executor/bin/thermos_executor_main.py 
> 0ef3856abc0df5403e3443ac35ba8d6940de8938 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  448d57a06b80fe1222abbc5149acf4e752d839a5 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 500fd435b4c72b25abd8df7eea6b3850edc96e99 
>   src/test/sh/org/apache/aurora/e2e/Dockerfile 
> 6fdea3d28760f59235c51c5b6913d2ee0172ef1a 
> 
> Diff: https://reviews.apache.org/r/50584/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ran e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 50168: Add rollback functionality to the scheduler

2016-07-29 Thread Maxim Khutornenko


> On July 29, 2016, 7:25 p.m., Maxim Khutornenko wrote:
> > Finally had time to go over this review in detail. I am afraid the proposed 
> > approach of rolling back the arbitrary update may not work in general case 
> > and result in unexpected outcome, inconsistent job state and hard to 
> > troubleshoot update failures. Once the global update lock is lifted (e.g. 
> > the update finishes), subsequent updates and/or kill/add instance RPC calls 
> > will drift the job state making the history obsolete. 
> > 
> > For example, consider updates changing instance counts:
> > U1: 5 -> 10
> > U2: 10 -> 20 (or addInstances(0, 10))
> > 
> > If we now attempt to rollback U1 we will end up with a hole [5-9] in the 
> > instance count: 0-4, 10-19 AND likely unexpected extra 10 instances left 
> > (10-19). 
> > 
> > Following the above example we can come up with plenty of other intricate 
> > and hard to reason scenarios. By allowing randomly old update rollback we 
> > are giving our users a powerful tool that is very hard to understand or 
> > ensure correctness. 
> > 
> > The only way this _may_ be improved if we approach rollbackJobUpdate RPC as 
> > the one creating a completely new update request with the old initial state 
> > used as the new desired. This way _all_ current job instances are being 
> > considered, job diff gives correct output and quota checks are properly 
> > done. The latter btw is another problem as rollbackJobUpdate does not call 
> > `validateTaskLimits` to ensure there is enough quota to rollback something 
> > like 10 -> 5. The open problem: in case of multiple initial states to 
> > rollback to, which one to choose as the new desired one? We can only choose 
> > one according to our thrift schema (for a good reason). 
> > 
> > We have thought long and hard about the rollback cases in our internal 
> > deployment orchestration tool and ended up with the approach that operates 
> > at the .aurora config (JobUpdateRequest) level by creating a _new_ update 
> > with the target job config (JobUpdateRequest) of the _old_ one. This way we 
> > can safely rollback to the previously deployed version without violating 
> > job/cluster invariants. 
> > 
> > Given the concerns in this CR, I suggest we move this discussion to dev 
> > list or a design doc.
> > 
> > To seed our further discussion: we can consider an approach where instead 
> > of rolling back a job update itself we rollback _to_ that update's target 
> > state. For example (desired state === desiredState + settings + 
> > instanceCount):
> > U1: desired state A
> > U2: desired state B
> > U3: desired state C
> > 
> > To rollback to state B: U4: use desired state of U2
> > To rollback to state A: U4: use desired state of U1
> > 
> > While not ideal (e.g. there is no way to rollback to state A-1), this would 
> > be safer and easier to reason about. It'd also require a smaller diff as 
> > most of the required code paths already exist.
> 
> Igor Morozov wrote:
> I think the semantics of rollback is pretty well defined: it is to 
> rollback a job(in whatever state it is right now) to the state defined as 
> initial in the update job. Should not scheduler try to kill extra instances 
> in your example as they violate the job configuration it is rolling back to? 
> In this example that simply means having just 5 instances running after 
> rollback.

> Should not scheduler try to kill extra instances in your example as they 
> violate the job configuration it is rolling back to?

Unfortunately, that's not possible in the current implementation. Once an 
update object is created, the initial and desired states contain only the 
working instance set calculated by the `JobDiff`. Any instances created/killed 
after the update finishes are no longer considered. The working set can be 
further reduced within a given update if `updateOnlyTheseInstances` value is 
specified. 

In other words, the updater goes over a narrowly defined set of instances 
completely ignoring the state of the job after the update has started.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50168/#review144137
---


On July 28, 2016, 10:54 p.m., Igor Morozov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50168/
> ---
> 
> (Updated July 28, 2016, 10:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, Bill Farner, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1721
> https://issues.apache.org/jira/browse/AURORA-1721
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add rollback functionality to the scheduler
> 
> Thic change also includes an addition to storage replicated log. I needed a 
> 

Re: Review Request 50168: Add rollback functionality to the scheduler

2016-07-29 Thread Igor Morozov


> On July 29, 2016, 7:25 p.m., Maxim Khutornenko wrote:
> > Finally had time to go over this review in detail. I am afraid the proposed 
> > approach of rolling back the arbitrary update may not work in general case 
> > and result in unexpected outcome, inconsistent job state and hard to 
> > troubleshoot update failures. Once the global update lock is lifted (e.g. 
> > the update finishes), subsequent updates and/or kill/add instance RPC calls 
> > will drift the job state making the history obsolete. 
> > 
> > For example, consider updates changing instance counts:
> > U1: 5 -> 10
> > U2: 10 -> 20 (or addInstances(0, 10))
> > 
> > If we now attempt to rollback U1 we will end up with a hole [5-9] in the 
> > instance count: 0-4, 10-19 AND likely unexpected extra 10 instances left 
> > (10-19). 
> > 
> > Following the above example we can come up with plenty of other intricate 
> > and hard to reason scenarios. By allowing randomly old update rollback we 
> > are giving our users a powerful tool that is very hard to understand or 
> > ensure correctness. 
> > 
> > The only way this _may_ be improved if we approach rollbackJobUpdate RPC as 
> > the one creating a completely new update request with the old initial state 
> > used as the new desired. This way _all_ current job instances are being 
> > considered, job diff gives correct output and quota checks are properly 
> > done. The latter btw is another problem as rollbackJobUpdate does not call 
> > `validateTaskLimits` to ensure there is enough quota to rollback something 
> > like 10 -> 5. The open problem: in case of multiple initial states to 
> > rollback to, which one to choose as the new desired one? We can only choose 
> > one according to our thrift schema (for a good reason). 
> > 
> > We have thought long and hard about the rollback cases in our internal 
> > deployment orchestration tool and ended up with the approach that operates 
> > at the .aurora config (JobUpdateRequest) level by creating a _new_ update 
> > with the target job config (JobUpdateRequest) of the _old_ one. This way we 
> > can safely rollback to the previously deployed version without violating 
> > job/cluster invariants. 
> > 
> > Given the concerns in this CR, I suggest we move this discussion to dev 
> > list or a design doc.
> > 
> > To seed our further discussion: we can consider an approach where instead 
> > of rolling back a job update itself we rollback _to_ that update's target 
> > state. For example (desired state === desiredState + settings + 
> > instanceCount):
> > U1: desired state A
> > U2: desired state B
> > U3: desired state C
> > 
> > To rollback to state B: U4: use desired state of U2
> > To rollback to state A: U4: use desired state of U1
> > 
> > While not ideal (e.g. there is no way to rollback to state A-1), this would 
> > be safer and easier to reason about. It'd also require a smaller diff as 
> > most of the required code paths already exist.

I think the semantics of rollback is pretty well defined: it is to rollback a 
job(in whatever state it is right now) to the state defined as initial in the 
update job. Should not scheduler try to kill extra instances in your example as 
they violate the job configuration it is rolling back to? In this example that 
simply means having just 5 instances running after rollback.


- Igor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50168/#review144137
---


On July 28, 2016, 10:54 p.m., Igor Morozov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50168/
> ---
> 
> (Updated July 28, 2016, 10:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, Bill Farner, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1721
> https://issues.apache.org/jira/browse/AURORA-1721
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add rollback functionality to the scheduler
> 
> Thic change also includes an addition to storage replicated log. I needed a 
> way to re-insert lock token for an existing job update.
> 
> 
> Diffs
> -
> 
>   CHANGELOG fc6a46d77ebf889c8f60402c95e8a7472980ba1c 
>   RELEASE-NOTES.md 29d224d5596eb060356f1343cf347db79b1ad687 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d66208490aff6ea8af4c737845fa2cf13617529 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 9e4213f13255a182df938bea44ca87fa03a25318 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
> 52c3c6618a3cf1009435ca8a9cece36365913e55 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java 
> d2673e6b328cb1e249fbe91d18e0d9e935636eaa 
>   

Re: Review Request 50584: Upgrade to Mesos 1.0.0

2016-07-29 Thread Joshua Cohen


> On July 29, 2016, 7:37 p.m., Maxim Khutornenko wrote:
> > build-support/packer/build.sh, line 141
> > 
> >
> > Is there a release note or a jira link that would support mesos.native 
> > -> mesos.executor transition is safe?

I saw no mention of it in any release notes anywhere, but when building the 
mesos.native egg I saw it had a different name than it previously did (platform 
was dropped) which led me to dig into it a bit and discuss w/ Steve. He pointed 
out that mesos.native is now just a thin wrapper on top of mesos.scheduler and 
mesos.executor, so there's no longer any reason for the executor to pull in 
mesos.native which drags along all of the unnecessary scheduler deps.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50584/#review144164
---


On July 28, 2016, 9:18 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50584/
> ---
> 
> (Updated July 28, 2016, 9:18 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Steve Niemitz.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> I also took this as an opportunity to switch us from mesos.native to 
> mesos.executor on the Python side, meaning Docker containers will no longer 
> require all mesos deps.
> 
> Release notes here: http://mesos.apache.org/blog/mesos-1-0-0-released/
> Upgrade notes here: http://mesos.apache.org/documentation/latest/upgrades/
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD a44624c6cf3dcbdbfcc0504dd5e3dbcbfe6812de 
>   RELEASE-NOTES.md d46420ef876b88d9ab68c4816f1c2d6780aba702 
>   Vagrantfile 5aa2128b8645402b9892ee8dc17439bf58ea19ab 
>   build-support/packer/build.sh 08938ec13f3054c0c5ec6a75c1515f195533bc6a 
>   build-support/python/make-mesos-native-egg 
> 736685f9fc57b2d758edc752dc15bffb426a1879 
>   build.gradle 95401953c51dfabb9ce388d66b5aac109507f74c 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> e5bafa441cc848eb7eabdfe8c9e22dea3fac2ab7 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 8fe2b7997c0dfcbf9abdb467eea0fc768be6745c 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  e137daa2221c49e6fcd2b62b8ad0ccff1b8cb759 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> cbbd6be94aa857b02cd7b45bfb2f0216d9a1cec3 
>   src/main/python/apache/aurora/executor/BUILD 
> 0be65fc7c180d15f5a94bef268652ef7b868dcf7 
>   src/main/python/apache/aurora/executor/bin/thermos_executor_main.py 
> 0ef3856abc0df5403e3443ac35ba8d6940de8938 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  448d57a06b80fe1222abbc5149acf4e752d839a5 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 500fd435b4c72b25abd8df7eea6b3850edc96e99 
>   src/test/sh/org/apache/aurora/e2e/Dockerfile 
> 6fdea3d28760f59235c51c5b6913d2ee0172ef1a 
> 
> Diff: https://reviews.apache.org/r/50584/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ran e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 50530: AURORA-1656 Document tier concept

2016-07-29 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50530/#review144167
---



Can we git this merged?

- Mehrdad Nurolahzade


On July 27, 2016, 3:41 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50530/
> ---
> 
> (Updated July 27, 2016, 3:41 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Stephan Erb.
> 
> 
> Bugs: AURORA-1656
> https://issues.apache.org/jira/browse/AURORA-1656
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1656 Document tier concept
> 
> 
> Diffs
> -
> 
>   docs/features/multitenancy.md 62bcd535d3bf39c9ea7e6d5958f6ae8ba0867c0d 
>   docs/reference/configuration.md 64c076d862453545652fbaa2d3e29f284ddd164d 
> 
> Diff: https://reviews.apache.org/r/50530/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>