Re: Review Request 57524: Support setting the rootfs on Mesos Containers.

2017-03-13 Thread Zameer Manji


> On March 11, 2017, 4:28 a.m., Stephan Erb wrote:
> > I need a little bit more context to understand what is going on here:
> > 
> > * Do you plan to use this with Thermos or an alternative executor? Or both?
> > * It seems like we don't need this for Thermos as we already create 
> > mountpoints when needed (see do_mount() in 
> > aurora/executor/common/sandbox.py)
> > * Switching the flag will run the executor within the filesystem image and 
> > thus require Python and all libmesos dependencies within the image. This 
> > sounds like a big downfall just for gaining the mkdir.
> 
> Joshua Cohen wrote:
> +1, it's unclear to me why this is necessary. If it is, I'd consider it a 
> bug in the sandbox code that prepares mounts. It should be making any 
> directories that don't already exist under taskfs:
> 
> 
> # If we're mounting a file into the task filesystem, the mount call 
> will fail if the mount
> # point doesn't exist. In that case we'll create an empty file to 
> mount over.
> if os.path.isfile(source) and not os.path.exists(destination):
>   safe_mkdir(os.path.dirname(destination))
>   touch(destination)
> else:
>   safe_mkdir(destination)
> 
> 
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/sandbox.py#L284-L290

I see that I have been unclear here. I will take some time to better document 
the problem to explain why I think this is necesssary.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57524/#review168709
---


On March 13, 2017, 9:36 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57524/
> ---
> 
> (Updated March 13, 2017, 9:36 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Bugs: AURORA-1903
> https://issues.apache.org/jira/browse/AURORA-1903
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The mesos unified containerizer does not support absolute container path 
> mounts if no rootfs is set. This allows operators to switch between our 
> current behaviour (mounting images as a volume) and setting the rootfs. See 
> AURORA-1903 for more detailed analysis.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  4dac9757a65e144142d36ee921b85a02a5311fe5 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
>  5c987fd051728486172c8afd34219e86d56f00d5 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 0d639f66db456858278b0485c91c40975c3b45ac 
>   src/main/java/org/apache/aurora/scheduler/mesos/TestExecutorSettings.java 
> e1cd81e6fbd98f23046e6e775be268be4310c62a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 93cc34cf8393f969087cd0fd6f577228c00170e9 
> 
> 
> Diff: https://reviews.apache.org/r/57524/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 55951: Use --launch_info when invoking MesosContainerizer.

2017-03-13 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55951/#review168855
---


Ship it!




Master (2f08d91) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On March 14, 2017, 1:57 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55951/
> ---
> 
> (Updated March 14, 2017, 1:57 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Karthik Anantha 
> Padmanabhan.
> 
> 
> Bugs: AURORA-1882
> https://issues.apache.org/jira/browse/AURORA-1882
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> MesosContainerizer has updated the command line parameters in 1.2.0 and
> consolidated the individual arguments into a single ContainerLaunchInfo
> proto buf message. Update ThermosExecutor to use the new `--launch_info`
> parameter to be compatible with MesosContainerizer also check the
> containerizer binary interface to determine to be backward-compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 1142f75bbdd488a3e4966f187c0c55879241989b 
>   src/main/python/apache/thermos/common/process_util.py 
> 54e716b726fc02f3901f1b9143d3fa253511e29b 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> a3b5a22a805753d42e97f8f1e4e3dab7d3c9c507 
>   src/test/python/apache/thermos/core/test_process.py 
> 520390217f691b9136cb4d36262be3d372a16509 
> 
> 
> Diff: https://reviews.apache.org/r/55951/diff/3/
> 
> 
> Testing
> ---
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 55951: Use --launch_info when invoking MesosContainerizer.

2017-03-13 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55951/
---

(Updated March 13, 2017, 6:57 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Karthik Anantha 
Padmanabhan.


Changes
---

Use sh supported `if` in the shell script. Bash syntax is not working.


Bugs: AURORA-1882
https://issues.apache.org/jira/browse/AURORA-1882


Repository: aurora


Description
---

MesosContainerizer has updated the command line parameters in 1.2.0 and
consolidated the individual arguments into a single ContainerLaunchInfo
proto buf message. Update ThermosExecutor to use the new `--launch_info`
parameter to be compatible with MesosContainerizer also check the
containerizer binary interface to determine to be backward-compatible.


Diffs (updated)
-

  RELEASE-NOTES.md 1142f75bbdd488a3e4966f187c0c55879241989b 
  src/main/python/apache/thermos/common/process_util.py 
54e716b726fc02f3901f1b9143d3fa253511e29b 
  src/test/python/apache/aurora/executor/common/test_health_checker.py 
a3b5a22a805753d42e97f8f1e4e3dab7d3c9c507 
  src/test/python/apache/thermos/core/test_process.py 
520390217f691b9136cb4d36262be3d372a16509 


Diff: https://reviews.apache.org/r/55951/diff/3/

Changes: https://reviews.apache.org/r/55951/diff/2-3/


Testing
---

build-support/jenkins/build.sh
src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 57524: Support setting the rootfs on Mesos Containers.

2017-03-13 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57524/#review168835
---


Ship it!




As long as the change is behind the flag this change looks pretty innocuous to 
me.

Can you add some detail as to which use-case this change will enable that is 
not currently possible?

- Santhosh Kumar Shanmugham


On March 13, 2017, 9:36 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57524/
> ---
> 
> (Updated March 13, 2017, 9:36 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Bugs: AURORA-1903
> https://issues.apache.org/jira/browse/AURORA-1903
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The mesos unified containerizer does not support absolute container path 
> mounts if no rootfs is set. This allows operators to switch between our 
> current behaviour (mounting images as a volume) and setting the rootfs. See 
> AURORA-1903 for more detailed analysis.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  4dac9757a65e144142d36ee921b85a02a5311fe5 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
>  5c987fd051728486172c8afd34219e86d56f00d5 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 0d639f66db456858278b0485c91c40975c3b45ac 
>   src/main/java/org/apache/aurora/scheduler/mesos/TestExecutorSettings.java 
> e1cd81e6fbd98f23046e6e775be268be4310c62a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 93cc34cf8393f969087cd0fd6f577228c00170e9 
> 
> 
> Diff: https://reviews.apache.org/r/57524/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 56395: Change Resource Validation in ConfigurationManager so that it validates the Resource Set instead of deprecated fields

2017-03-13 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56395/#review168828
---


Ship it!




Ship It!

- Stephan Erb


On March 13, 2017, 5:46 p.m., Nicolás Donatucci wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56395/
> ---
> 
> (Updated March 13, 2017, 5:46 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The Resource validation in ConfigurationManager is now done against the 
> Resource set instead of the NumCpus, RamMb and DiskMb fields.
> 
> To do this I moved the GetResource method from the ThriftBackfill to the 
> ResourceManager so it can be used in the ConfigurationManager without 
> exposing it on the ThriftBackfill.
> 
> Related Issue: Aurora-1707
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  80f0aebd6de0c2d7c30a58ec09702d1c6d519787 
>   src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java 
> c8838438ef5aa94fbcc407101e7f04abc75cd96e 
>   src/main/python/apache/aurora/config/thrift.py 
> 3539469d243638c0acd08bf0859d0ce858d8977c 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  5419d7edd83c14f155b105b87c5521d7a938e248 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  0cdd9829417bb3ef0215278a9458a3dd78a49a20 
> 
> 
> Diff: https://reviews.apache.org/r/56395/diff/3/
> 
> 
> Testing
> ---
> 
> src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Nicolás Donatucci
> 
>



Re: Review Request 56395: Change Resource Validation in ConfigurationManager so that it validates the Resource Set instead of deprecated fields

2017-03-13 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56395/#review168784
---


Ship it!




Master (a07b9ed) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On March 13, 2017, 4:46 p.m., Nicolás Donatucci wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56395/
> ---
> 
> (Updated March 13, 2017, 4:46 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The Resource validation in ConfigurationManager is now done against the 
> Resource set instead of the NumCpus, RamMb and DiskMb fields.
> 
> To do this I moved the GetResource method from the ThriftBackfill to the 
> ResourceManager so it can be used in the ConfigurationManager without 
> exposing it on the ThriftBackfill.
> 
> Related Issue: Aurora-1707
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  80f0aebd6de0c2d7c30a58ec09702d1c6d519787 
>   src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java 
> c8838438ef5aa94fbcc407101e7f04abc75cd96e 
>   src/main/python/apache/aurora/config/thrift.py 
> 3539469d243638c0acd08bf0859d0ce858d8977c 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  5419d7edd83c14f155b105b87c5521d7a938e248 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  0cdd9829417bb3ef0215278a9458a3dd78a49a20 
> 
> 
> Diff: https://reviews.apache.org/r/56395/diff/3/
> 
> 
> Testing
> ---
> 
> src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Nicolás Donatucci
> 
>



Re: Review Request 56395: Change Resource Validation in ConfigurationManager so that it validates the Resource Set instead of deprecated fields

2017-03-13 Thread Nicolás Donatucci

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56395/
---

(Updated March 13, 2017, 4:46 p.m.)


Review request for Aurora, Stephan Erb and Zameer Manji.


Changes
---

Code style fix.


Repository: aurora


Description
---

The Resource validation in ConfigurationManager is now done against the 
Resource set instead of the NumCpus, RamMb and DiskMb fields.

To do this I moved the GetResource method from the ThriftBackfill to the 
ResourceManager so it can be used in the ConfigurationManager without exposing 
it on the ThriftBackfill.

Related Issue: Aurora-1707


Diffs (updated)
-

  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 80f0aebd6de0c2d7c30a58ec09702d1c6d519787 
  src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java 
c8838438ef5aa94fbcc407101e7f04abc75cd96e 
  src/main/python/apache/aurora/config/thrift.py 
3539469d243638c0acd08bf0859d0ce858d8977c 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 5419d7edd83c14f155b105b87c5521d7a938e248 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 0cdd9829417bb3ef0215278a9458a3dd78a49a20 


Diff: https://reviews.apache.org/r/56395/diff/3/

Changes: https://reviews.apache.org/r/56395/diff/2-3/


Testing
---

src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Nicolás Donatucci



Re: Review Request 56395: Change Resource Validation in ConfigurationManager so that it validates the Resource Set instead of deprecated fields

2017-03-13 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56395/#review168775
---



Master (a07b9ed) is red with this patch.
  ./build-support/jenkins/build.sh

:commons:processResources
:commons:classes
:commons:jar
:compileJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor

:generateBuildProperties
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java:33:8:
 Unused import - org.apache.aurora.scheduler.resources.ResourceManager. 
[UnusedImports]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java:67:
 Line is longer than 100 characters (found 104). [LineLength]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleMain'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 32.452 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On March 13, 2017, 3:53 p.m., Nicolás Donatucci wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56395/
> ---
> 
> (Updated March 13, 2017, 3:53 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The Resource validation in ConfigurationManager is now done against the 
> Resource set instead of the NumCpus, RamMb and DiskMb fields.
> 
> To do this I moved the GetResource method from the ThriftBackfill to the 
> ResourceManager so it can be used in the ConfigurationManager without 
> exposing it on the ThriftBackfill.
> 
> Related Issue: Aurora-1707
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  80f0aebd6de0c2d7c30a58ec09702d1c6d519787 
>   src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java 
> c8838438ef5aa94fbcc407101e7f04abc75cd96e 
>   src/main/python/apache/aurora/config/thrift.py 
> 3539469d243638c0acd08bf0859d0ce858d8977c 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  5419d7edd83c14f155b105b87c5521d7a938e248 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  0cdd9829417bb3ef0215278a9458a3dd78a49a20 
> 
> 
> Diff: https://reviews.apache.org/r/56395/diff/2/
> 
> 
> Testing
> ---
> 
> src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Nicolás Donatucci
> 
>



Re: Review Request 56395: Change Resource Validation in ConfigurationManager so that it validates the Resource Set instead of deprecated fields

2017-03-13 Thread Nicolás Donatucci

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56395/
---

(Updated March 13, 2017, 3:53 p.m.)


Review request for Aurora, Stephan Erb and Zameer Manji.


Changes
---

I moved getReource back to the ThriftBackfill and instead used the QuantityOf 
method to do the validations. For this, the validations had to be done against 
the "config" instead of the backfilled "builder". This is ok given that the 
resource part of the backfill will be removed soon. 
By doing this, I found a bug in the client. If "numGpus" was 0, then the whole 
task.resources would be the empty set. That was later corrected by the 
backfill. A small change to the client stops that from happening.


Repository: aurora


Description
---

The Resource validation in ConfigurationManager is now done against the 
Resource set instead of the NumCpus, RamMb and DiskMb fields.

To do this I moved the GetResource method from the ThriftBackfill to the 
ResourceManager so it can be used in the ConfigurationManager without exposing 
it on the ThriftBackfill.

Related Issue: Aurora-1707


Diffs (updated)
-

  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 80f0aebd6de0c2d7c30a58ec09702d1c6d519787 
  src/main/java/org/apache/aurora/scheduler/storage/log/ThriftBackfill.java 
c8838438ef5aa94fbcc407101e7f04abc75cd96e 
  src/main/python/apache/aurora/config/thrift.py 
3539469d243638c0acd08bf0859d0ce858d8977c 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 5419d7edd83c14f155b105b87c5521d7a938e248 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 0cdd9829417bb3ef0215278a9458a3dd78a49a20 


Diff: https://reviews.apache.org/r/56395/diff/2/

Changes: https://reviews.apache.org/r/56395/diff/1-2/


Testing
---

src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Nicolás Donatucci



Re: Review Request 57524: Support setting the rootfs on Mesos Containers.

2017-03-13 Thread Joshua Cohen


> On March 11, 2017, 12:28 p.m., Stephan Erb wrote:
> > I need a little bit more context to understand what is going on here:
> > 
> > * Do you plan to use this with Thermos or an alternative executor? Or both?
> > * It seems like we don't need this for Thermos as we already create 
> > mountpoints when needed (see do_mount() in 
> > aurora/executor/common/sandbox.py)
> > * Switching the flag will run the executor within the filesystem image and 
> > thus require Python and all libmesos dependencies within the image. This 
> > sounds like a big downfall just for gaining the mkdir.

+1, it's unclear to me why this is necessary. If it is, I'd consider it a bug 
in the sandbox code that prepares mounts. It should be making any directories 
that don't already exist under taskfs:


# If we're mounting a file into the task filesystem, the mount call will 
fail if the mount
# point doesn't exist. In that case we'll create an empty file to mount 
over.
if os.path.isfile(source) and not os.path.exists(destination):
  safe_mkdir(os.path.dirname(destination))
  touch(destination)
else:
  safe_mkdir(destination)

https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/sandbox.py#L284-L290


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57524/#review168709
---


On March 11, 2017, 12:27 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57524/
> ---
> 
> (Updated March 11, 2017, 12:27 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar and Stephan Erb.
> 
> 
> Bugs: AURORA-1903
> https://issues.apache.org/jira/browse/AURORA-1903
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The mesos unified containerizer does not support absolute container path 
> mounts if no rootfs is set. This allows operators to switch between our 
> current behaviour (mounting images as a volume) and setting the rootfs. See 
> AURORA-1903 for more detailed analysis.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  4dac9757a65e144142d36ee921b85a02a5311fe5 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
>  5c987fd051728486172c8afd34219e86d56f00d5 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 0d639f66db456858278b0485c91c40975c3b45ac 
>   src/main/java/org/apache/aurora/scheduler/mesos/TestExecutorSettings.java 
> e1cd81e6fbd98f23046e6e775be268be4310c62a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 93cc34cf8393f969087cd0fd6f577228c00170e9 
> 
> 
> Diff: https://reviews.apache.org/r/57524/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 56575: AURORA-1837 Improve task history pruning

2017-03-13 Thread Mehrdad Nurolahzade


> On Feb. 15, 2017, 9:40 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java
> > Lines 134-135 (original), 134-135 (patched)
> > 
> >
> > Can you explain the reasoning behind doing this in a loop per job 
> > rather than a single loop that queries all tasks?
> 
> Mehrdad Nurolahzade wrote:
> That's how I originally did it (see revision 1). The concern expressed by 
> reviewers (read above) was around the following areas:
> 1. We are trying to load/filter all terminal state tasks at once which 
> might cause heap pressure. Looking at one job at a time might be slow and 
> inefficient from a throughput standpoint (which is a secondary concern) but 
> can help with putting less pressure on the heap.
> 2. `MemStore` does not have a scheduling status index therefore it has to 
> sequentially scan all tasks to find matching tasks.
> 
> David McLaughlin wrote:
> Were those concerns backed by data? Do we have performance numbers from 
> before and after that change? It doesn't seem to me like a given that moving 
> from a single query to N+1 queries leads to less work and less objects being 
> created. Especially when the tasks are already on heap (DbTaskStore is still 
> considered experimental and not production ready. We should not optimize for 
> it). 
> 
> For point (2) - this does not prevent us scanning every single task in 
> the store. It just divides the full scan into multiple queries, and each of 
> those queries (and subsequent filtering) has the overhead of objects, 
> streams, sorted copies (yikes) and collections that are used for filtering 
> inside the task processing loop.
> 
> Mehrdad Nurolahzade wrote:
> No, I did not verify performance/heap profile of the original version. 
> I'm going to deploy it to a test cluster and see if there is a noticable 
> difference in run-time behavior.
> 
> I agree on both points regarding `MemTaskStore`. Now, I am seeing value 
> in separating prunable task selection implementations for `MemTaskStore` and 
> `DbTaskStore`. But, that can be addressed in a follow-up ticket.
> 
> Stephan Erb wrote:
> Were you able to run a version of this patch on your scale cluster? Any 
> new insights?

I did naive observations under a test cluster and both implementations (prune 
all vs prune by job) seem to work fine. However, for the final word on this, we 
are still waiting to have our production-like test cluster to be ready (soon) 
before we can verify which implementation produces the least unwanted 
side-effect.


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56575/#review165733
---


On Feb. 15, 2017, 9:07 a.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56575/
> ---
> 
> (Updated Feb. 15, 2017, 9:07 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1837
> https://issues.apache.org/jira/browse/AURORA-1837
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch addressed efficiency issues in the current implementation of 
> `TaskHistoryPruner`. The new design is similar to that of 
> `JobUpdateHistoryPruner`: (a) Instead of registering a `DelayExecutor` run 
> upon terminal task state transitions, it runs on preconfigured intervals, 
> finds all terminal state tasks that meet pruning criteria and deletes them. 
> (b) Makes the initial task history pruning delay configurable so that it does 
> not hamper scheduler upon start.
> 
> The new design addressed the following two efficiecy problems:
> 
> 1. Upon scheduler restart/failure, the in-memory state of task history 
> pruning scheduled with `DelayExecutor` is lost. `TaskHistoryPruner` learns 
> about these dead tasks upon restart when log is replayed. These expired tasks 
> are picked up by the second call to `executor.execute()` that performs job 
> level pruning immediately (i.e., without delay). Hence, most task history 
> pruning happens after scheduler restarts and can severely hamper scheduler 
> performance (or cause consecutive fail-overs on test clusters when we put 
> load test on scheduler).
> 
> 2. Expired tasks can be picked up for pruning multiple times. The 
> asynchronous nature of `BatchWorker` which used to process task deletions 
> introduces some delay between delete enqueue and delete execution. As a 
> result, tasks already queued for deletion in a previous evaluation round 
> might get picked up, evaluated and enqueued for deletion again. This is 
> evident in `task