[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2017-01-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: Implemented a mesos module and resource estimator - source has been 
posted to https://github.com/ct-clmsn/mesos-cpusets
Added performance counter enabled tools and mesos executor - source posted to 
https://github.com/ct-clmsn/mesos-papi)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, 
> mentor, perfomance
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2017-01-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815211#comment-15815211
 ] 

Chris commented on MESOS-5342:
--

Implemented a mesos module and resource estimator - source has been posted to 
https://github.com/ct-clmsn/mesos-cpusets
Added performance counter enabled tools and mesos executor - source posted to 
https://github.com/ct-clmsn/mesos-papi

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, 
> mentor, perfomance
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Labels: cgroups cpu cpu-usage gpu isolation isolator mentor perfomance  
(was: )

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, 
> mentor, perfomance
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5358:
-
Labels: cgroups cpu cpu-usage documentation gpu isolation isolator mentor 
newbie performance  (was: documentation mentor newbie performance)

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, documentation, gpu, isolation, 
> isolator, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278570#comment-15278570
 ] 

Chris commented on MESOS-5358:
--

Requesting a shepard!

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: Documentation for this ticket is MESOS-5358)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278416#comment-15278416
 ] 

Chris commented on MESOS-5358:
--

Implementation ticket is MESOS-5342 
(https://issues.apache.org/jira/browse/MESOS-5342)

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-

Documentation for this ticket is MESOS-5358

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278405#comment-15278405
 ] 

Chris commented on MESOS-5342:
--

Note, this is my first design document for Mesos, it's not perfect.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278400#comment-15278400
 ] 

Chris commented on MESOS-5358:
--

Design document posted here:

https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5358:
-
Comment: was deleted

(was: 
https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing)

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278399#comment-15278399
 ] 

Chris commented on MESOS-5358:
--

https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277463#comment-15277463
 ] 

Chris commented on MESOS-5342:
--

[~kaysoky] Sure thing - I've done some of this work prior to writing the code 
in a local README. Shouldn't be too much trouble transposing that information 
onto googledocs. Oh, should the source be posted on github under a separate 
branch for review?

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277207#comment-15277207
 ] 

Chris commented on MESOS-5342:
--

[~kaysoky] where are design documents supposed to be posted? I've gone through 
the patch submission documentation and will review the testing documentation 
and style guides.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276658#comment-15276658
 ] 

Chris commented on MESOS-5342:
--

Forgot to mention, a shepard is needed to support integration of this feature!

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276521#comment-15276521
 ] 

Chris commented on MESOS-5342:
--

For information about submodular functions (and why it was selected for this 
problem), strongly suggest reviewing at least this youtube lecture/video 
(ideally the entire series of videos) publicly available from MLSS Iceland 
2014: https://youtu.be/6ThMzlHdKsI


> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: For information about submodular functions (and why it was selected for 
this problem), strongly suggest reviewing at least this youtube lecture/video 
(ideally the entire series of videos) publicly available from MLSS Iceland 
2014: https://youtu.be/6ThMzlHdKsI)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276518#comment-15276518
 ] 

Chris commented on MESOS-5342:
--

For information about submodular functions (and why it was selected for this 
problem), strongly suggest reviewing at least this youtube lecture/video 
(ideally the entire series of videos) publicly available from MLSS Iceland 
2014: https://youtu.be/6ThMzlHdKsI

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: For information about submodular functions (and why it was selected for 
this problem), strongly suggest reviewing this youtube video: 
https://youtu.be/6ThMzlHdKsI)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276514#comment-15276514
 ] 

Chris commented on MESOS-5342:
--

For information about submodular functions (and why it was selected for this 
problem), strongly suggest reviewing this youtube video: 
https://youtu.be/6ThMzlHdKsI

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: Fixed a small bug in the greedy submodular subset selection algorithm. 
The "submodular cost" of selecting a core was being used in the knapsack budget 
test (cores currently have an at-most-budget-cost of "1.0"). The correct cost 
is now being used in the test.)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276500#comment-15276500
 ] 

Chris commented on MESOS-5342:
--

Fixed a small bug in the greedy submodular subset selection algorithm. The 
"submodular cost" of selecting a core was being used in the knapsack budget 
test (cores currently have an at-most-budget-cost of "1.0"). The correct cost 
is now being used in the test.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Description: 
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance, specifically in terms of cache locality. Applications requiring 
GPU resources can benefit from this feature by getting access to cores closest 
to the GPU hardware, which reduces cpu-gpu copy latency.

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology, and the advent of Intel's Knights-series processors, will require 
making choices about where container's are going to run on the mesos-agent's 
processor(s) cores - this feature is a step toward developing a robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.

  was:
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance,specifically in terms of cache locality. Applications requiring GPU 
resources can benefit from this feature by getting access to cores closest to 
the GPU hardware, which reduces cpu-gpu copy latency.

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.


> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Description: 
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance,specifically in terms of cache locality. Applications requiring GPU 
resources can benefit from this feature by getting access to cores closest to 
the GPU hardware, which reduces cpu-gpu copy latency.

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.

  was:
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance, particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.


> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance,specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276195#comment-15276195
 ] 

Chris commented on MESOS-5342:
--

implementation has been posted to review board. successfully passed 'make 
check'. requires use of the hwloc library to perform machine hardware topology 
discovery and cpu binding. updates were made to configure.ac and Makefile.am.

implementation is a new "device" under the cgroups isolator directory called 
"hwloc". Implementation detects topology, computes total number of cores 
required by the container (also checks if the container requires gpu). if the 
container requires gpu, the topology information is used to find the "closest" 
cores based on latency. if the container only requires cpu, a histogram of task 
assignment to cores is checked. 

if the histogram is "empty" (all cores have a value of 1.0) then a random core 
is selected and the latency matrix is used to find cores that are "closest" to 
the random core. the histogram is updated. If the histogram is "not empty" then 
a greedy submodular subset selection algorithm is used to select N cores using 
the latency matrix and a "per-core" cost value. the "per-core" cost value is a 
normalized version of the histogram divided by the number of processing units 
available on each core. greedy submodular subset selection algorithms use a 
"diminishing returns property" to find an optimal subset of items under a 
knapsack constraint.

when the list of cores is returned, a bit vector representing a cpuset is bound 
to the container's pid_t. when the container is cleaned up, the histogram is 
updated by reducing the current task counts on each core assigned to the pid_t 
by -1.0.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: implementation has been posted to review board. successfully passed 'make 
check'. requires use of the hwloc library to perform machine hardware topology 
discovery and cpu binding. updates were made to configure.ac and Makefile.am.

implementation is a new "device" under the cgroups isolator directory called 
"hwloc". Implementation detects topology, computes total number of cores 
required by the container (also checks if the container requires gpu). if the 
container requires gpu, the topology information is used to find the "closest" 
cores based on latency. If the container only requires cpu, a histogram of task 
assignment to cores is checked. If the histogram is "empty" (all cores have a 
value of 1.0) then a random core is selected and the latency matrix is used to 
find cores that are "closest" to the random core. The histogram is updated. If 
the histogram is "not empty" then a greedy submodular subset selection 
algorithm is used to select N cores using the latency matrix and a "per-core" 
cost value. The "per-core" cost value is a normalized version of the histogram 
divided by the number of processing units available on each core.  Greedy 
submodular subset selection algorithms use a "diminishing returns property" to 
find an optimal subset of items under a knapsack constraint.

When the list of cores is returned, a bit vector representing a cpuset is bound 
to the container's pid_t. When the container is cleaned up, the histogram is 
updated by reducing the current task counts on each core assigned to the pid_t 
by -1.0.)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276194#comment-15276194
 ] 

Chris commented on MESOS-5342:
--

implementation has been posted to review board. successfully passed 'make 
check'. requires use of the hwloc library to perform machine hardware topology 
discovery and cpu binding. updates were made to configure.ac and Makefile.am.

implementation is a new "device" under the cgroups isolator directory called 
"hwloc". Implementation detects topology, computes total number of cores 
required by the container (also checks if the container requires gpu). if the 
container requires gpu, the topology information is used to find the "closest" 
cores based on latency. If the container only requires cpu, a histogram of task 
assignment to cores is checked. If the histogram is "empty" (all cores have a 
value of 1.0) then a random core is selected and the latency matrix is used to 
find cores that are "closest" to the random core. The histogram is updated. If 
the histogram is "not empty" then a greedy submodular subset selection 
algorithm is used to select N cores using the latency matrix and a "per-core" 
cost value. The "per-core" cost value is a normalized version of the histogram 
divided by the number of processing units available on each core.  Greedy 
submodular subset selection algorithms use a "diminishing returns property" to 
find an optimal subset of items under a knapsack constraint.

When the list of cores is returned, a bit vector representing a cpuset is bound 
to the container's pid_t. When the container is cleaned up, the histogram is 
updated by reducing the current task counts on each core assigned to the pid_t 
by -1.0.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-09 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Description: 
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance, particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.

  was:
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance; particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.


> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-08 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275892#comment-15275892
 ] 

Chris commented on MESOS-5342:
--

I've implemented code to support this particular feature and need to submit it 
for review.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance; particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-08 Thread Chris (JIRA)
Chris created MESOS-5342:


 Summary: CPU pinning/binding support for 
CgroupsCpushareIsolatorProcess
 Key: MESOS-5342
 URL: https://issues.apache.org/jira/browse/MESOS-5342
 Project: Mesos
  Issue Type: Improvement
  Components: cgroups, containerization
Affects Versions: 0.28.1
Reporter: Chris


The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance; particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4027) Improve task-node affinity

2015-12-18 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064705#comment-15064705
 ] 

Chris commented on MESOS-4027:
--

Great considerations - really like the idea of public/private label fields - 
what would be a starting point for implementation? Adding security flags to 
labels?

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4027) Improve task-node affinity

2015-12-11 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052902#comment-15052902
 ] 

Chris commented on MESOS-4027:
--

[~qianzhang] just noticed that - really should have spent more time reviewing 
the documentation in the protobuf file. I'd like to see the Task's labels field 
get propogated into an Offer so customized schedulers can have some view into 
what is running on a Mesos Node (ie: scheduler logic may filter out Offers 
associated with Nodes involved in Spark processing OR maybe schedulers want to 
filter out Offers that are not associated with HDFS). Figured out a way to do 
this without added more fields into the protobuf IDL for TaskInfo. Thanks for 
pointing this out!!!

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4027) Improve task-node affinity

2015-11-30 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-4027:
-
Summary: Improve task-node affinity  (was: Improve task-node affinit)

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4027) Improve task-node affinit

2015-11-30 Thread Chris (JIRA)
Chris created MESOS-4027:


 Summary: Improve task-node affinit
 Key: MESOS-4027
 URL: https://issues.apache.org/jira/browse/MESOS-4027
 Project: Mesos
  Issue Type: Wish
  Components: allocation, general
Reporter: Chris
Priority: Trivial


Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
on a node currently running hdfs or to avoid running Ceph on HDFS nodes).

Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
Framework Scheduler) that can describe what a Task is running.

The Attribute would propagate to a Task at execution. The Attribute is  passed 
to Framework Schedulers as part of an Offer's Attributes list. 

A Framework Scheduler could then filter out or accept Offers from Nodes that 
are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4027) Improve task-node affinity

2015-11-30 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032577#comment-15032577
 ] 

Chris edited comment on MESOS-4027 at 11/30/15 10:04 PM:
-

I've developed a patch in support of this ticket (it's been pushed to my github 
mesos fork). The patch requires testing (among other things!) before submitting 
for review.


was (Author: ct.clmsn):
I've developed a patch in support of this ticket (it's been pushed to my github 
mesos fork). The patch requires testing (among other things - ie: a Shepard) 
before submitting for review.

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4027) Improve task-node affinity

2015-11-30 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032577#comment-15032577
 ] 

Chris commented on MESOS-4027:
--

I've developed a patch for in support of this ticket. The patch requires 
testing (among other things - ie: a Shepard) before submitting for review.

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4027) Improve task-node affinity

2015-11-30 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032577#comment-15032577
 ] 

Chris edited comment on MESOS-4027 at 11/30/15 10:04 PM:
-

I've developed a patch in support of this ticket (it's been pushed to my github 
mesos fork). The patch requires testing (among other things - ie: a Shepard) 
before submitting for review.


was (Author: ct.clmsn):
I've developed a patch for in support of this ticket. The patch requires 
testing (among other things - ie: a Shepard) before submitting for review.

> Improve task-node affinity
> --
>
> Key: MESOS-4027
> URL: https://issues.apache.org/jira/browse/MESOS-4027
> Project: Mesos
>  Issue Type: Wish
>  Components: allocation, general
>Reporter: Chris
>Priority: Trivial
>
> Improve task-to-node affinity and anti-affinity (running hadoop or spark jobs 
> on a node currently running hdfs or to avoid running Ceph on HDFS nodes).
> Provide a user-mutable Attribute in TaskInfo (the Attribute is modified by a 
> Framework Scheduler) that can describe what a Task is running.
> The Attribute would propagate to a Task at execution. The Attribute is  
> passed to Framework Schedulers as part of an Offer's Attributes list. 
> A Framework Scheduler could then filter out or accept Offers from Nodes that 
> are currently labeled with a desired set or individual Attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3059) Allow http endpoint to dynamically change the slave attributes

2015-11-23 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022795#comment-15022795
 ] 

Chris edited comment on MESOS-3059 at 11/23/15 7:19 PM:


Has there been any recent work on this capability?


was (Author: ct.clmsn):
Has there been any work on this capability?

> Allow http endpoint to dynamically change the slave attributes
> --
>
> Key: MESOS-3059
> URL: https://issues.apache.org/jira/browse/MESOS-3059
> Project: Mesos
>  Issue Type: Wish
>Reporter: Nitin
>
> This is well understood that - changing the attributes dynamically is not 
> safe without a restart because slave itself may not know which old framework 
> tasks are running on it that were dependent on previous attributes. 
> However, total restart makes lot of other history to delete. We need to 
> ensure a dynamic attribute changes with a soft restart. 
> It will be good to expose a rest endpoint either at slave or mesos-master 
> which directly changes the state in zookeeper.
> USE-CASE
> We use slave attributes/roles to direct the framework scheduling to use 
> specific slave as per it's requirements. Mesos scheduler only creates the 
> offer on the basis of some resources.
> In our use case, we have some categorization of our spark frameworks or jobs 
> with framework(like marathon) based on multiple factors. We want job or 
> frameworks belonging to one category be running into their specific cluster 
> of resources. We want to dynamically manage the slaves into these logical 
> sub-clusters.
> Since number of jobs that will be submitted or when it will be submitted is 
> very dynamic, it make sense to be able to dynamically assign roles or 
> attributes to slaves. It is not possible to gauge the requirements at time of 
> cluster provisioning. Static role or attribute assignment leads to 
> sub-optimal use of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3059) Allow http endpoint to dynamically change the slave attributes

2015-11-23 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022795#comment-15022795
 ] 

Chris commented on MESOS-3059:
--

Has there been any work on this capability?

> Allow http endpoint to dynamically change the slave attributes
> --
>
> Key: MESOS-3059
> URL: https://issues.apache.org/jira/browse/MESOS-3059
> Project: Mesos
>  Issue Type: Wish
>Reporter: Nitin
>
> This is well understood that - changing the attributes dynamically is not 
> safe without a restart because slave itself may not know which old framework 
> tasks are running on it that were dependent on previous attributes. 
> However, total restart makes lot of other history to delete. We need to 
> ensure a dynamic attribute changes with a soft restart. 
> It will be good to expose a rest endpoint either at slave or mesos-master 
> which directly changes the state in zookeeper.
> USE-CASE
> We use slave attributes/roles to direct the framework scheduling to use 
> specific slave as per it's requirements. Mesos scheduler only creates the 
> offer on the basis of some resources.
> In our use case, we have some categorization of our spark frameworks or jobs 
> with framework(like marathon) based on multiple factors. We want job or 
> frameworks belonging to one category be running into their specific cluster 
> of resources. We want to dynamically manage the slaves into these logical 
> sub-clusters.
> Since number of jobs that will be submitted or when it will be submitted is 
> very dynamic, it make sense to be able to dynamically assign roles or 
> attributes to slaves. It is not possible to gauge the requirements at time of 
> cluster provisioning. Static role or attribute assignment leads to 
> sub-optimal use of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)