[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401435#comment-15401435 ] Fan Du commented on MESOS-5545: --- It makes no sense by changing the label at the time being, implementation details will be discussed with shepherd eventually. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Epic > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du updated MESOS-5545: -- Issue Type: Story (was: Epic) > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401422#comment-15401422 ] Fan Du edited comment on MESOS-5545 at 8/1/16 1:20 AM: --- [~haosd...@gmail.com] What's the intention of changing label from *Story* to *Epic*? was (Author: fan.du): [~haosd...@gmail.com] What's the intention of changing label from *Story* to "Epic"? > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Epic > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401422#comment-15401422 ] Fan Du commented on MESOS-5545: --- [~haosd...@gmail.com] What's the intention of changing label from *Story* to "Epic"? > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Epic > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321408#comment-15321408 ] Fan Du commented on MESOS-5545: --- [~brugidou] Thanks for the sharing, I will look into it! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320245#comment-15320245 ] Fan Du commented on MESOS-5545: --- [~jvanremoortere] Thanks for your constructive advices/suggestions! Yes, this will be a long way, but it's fun to experiment the idea. :) How about we sync up together in the next community meeting 6/16? In my heart, it's not the attribute that I hate, but lack of doing this automatically with boring maintenance effort. I will update my design doc to enhance current attribute with the goals: a. Automatically probing rack topology, modular popular network plugins, e.g. Ethernet, Infiniband etc. b. Using rack topology information to re-arrange agents in per rack basis. c. Design a common/friendly attribute scheme for framework to interpret d. ACLs to enforce security btw, may I ask can you shepherd this ticket? we can work shoulder by shoulder then. Thanks! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320125#comment-15320125 ] Fan Du commented on MESOS-5545: --- [~adam-mesos] Thanks for sharing your thoughts here, profound and impressive! Mesos performs the lower level resource scheduling, exporting the network topology will fall into Mesos's role. It's up to the framework scheduler like [Firmament|https://github.com/camsas/firmament] to do more sophisticated scheduling decision based on a qualitative approach. I will think more about here, willing to discuss with you if anything shiny pops up in my mind. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320093#comment-15320093 ] Fan Du commented on MESOS-5545: --- [~avin...@mesosphere.io] Thanks for the comments, apparently you did LLDP homework :) The topology here only refer to the access layer, that is the switch the agent directly connected to. And lldptool will take care of parsing LLDP packet in various ways, so to my best knowledge, this will not relate to libprocess part. You are right about LLDP has boundary of next bridge, i.e. only hop one time, in the scenario when OpenvSwitch invovled, Mesos runs inside KVM guest, I can think of two ways here: 1. It's the LLDP packets set by ovs bridge that matters so far, because ovs bridge now is the access bridge, and lldpad daemon will broadcast LLDP packets. 2. After commit [784b58a3|https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=784b58a327ad16967ab64bbfa558df81980d31e9], sys knobs could be tweaked to forward LLDP packets. I don't have any comments about using the label/attributes at the time being, I will work out something more appealing based on it. Will let you my thoughts! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316113#comment-15316113 ] Fan Du edited comment on MESOS-5545 at 6/7/16 4:52 AM: --- The design doc to elaborate the story will be published soon for community to review. Please hold on. Design doc: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing was (Author: fan.du): The design doc to elaborate the story will be published soon for community to review. Please hold on. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du updated MESOS-5545: -- Attachment: RackAwarenessforMesos-Lite.pdf Rack Awareness Design doc(pdf) > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763 ] Fan Du edited comment on MESOS-5545 at 6/7/16 3:35 AM: --- [~vinodkone] Thanks for the comments. Rack topology information does not fall into scope of network isolator, because it's not the target which can/should be isolated. Here is the explanation to justify rack topology information can be updated: The state of rack information could only transit from no rack information to valid rack information, in other words, it's possible that tasks use resources without rack information, but later on agents report rack id to master, the logic could be one/all of design decisions: a) notify corresponding frameworks with updated rack id for previous resources, b) subsequent allocation will have rack id tagged with agents, c)Resource freed by framework will have rack id for the next round allocation. The scenario is simpler and cleaner compared with attributes updates. OR only activate the agents for resource allocation once got valid rack id. Using attributes is a way to export the rack information, but I don't think that's possible in production, scale of +1 servers, setting attributes with rack information from 3rd party logic and start agents?! Automatically exposing the rack information could save lots of deployment and maintenance effort. Apologize, seems I don't quite get the meaning of first class field, influencing allocation decision is not the intention of the ticket, I believe that part of work is out of scope the ticket, which I put them in the Future section of the design doc. The allocation strategy DOES honor DRF, current implementation is do the allocation in a per agent basis, and we could investigate different allocation modes. In addition, I'd prefer arranging agents in a per rack basis, because randomly shuffling agents scale to +1 nodes is no good for every allocation iteration. IIRC, this number is grown. All in all, IMHO, it's a good feature for Mesos, the question is how to do it elegantly. :) was (Author: fan.du): [~vinodkone] Thanks for the comments. Rack topology information does not fall into scope of network isolator, because it's not the target which can/should be isolated. Here is the explanation to justify rack topology information can be updated: The state of rack information could only transit from no rack information to valid rack information, in other words, it's possible that tasks use resources without rack information, but later on agents report rack id to master, the logic could be one/all of design decisions: a) notify corresponding frameworks with updated rack id for previous resources, b) subsequent allocation will have rack id tagged with agents, c)Resource freed by framework will have rack id for the next round allocation. The scenario is simpler and cleaner compared with attributes updates. OR only activate the agents for resource allocation once got valid rack id. Using attributes is a way to export the rack information, but I don't think that's possible in production, scale of +1 servers, setting attributes with rack information from 3rd party logic and start agents?! Automatically exposing the rack information could save lots of deployment and maintenance effort. Apologize, seems I don't quite get the meaning of first class field, influencing allocation decision is not the intention of the ticket, I believe that part of work is out of scope the ticket, which I put them in the Future section of the design doc. The allocation strategy DOES honor DRF, current implementation is do the allocation in a per agent basis, and we could investigate different allocation modes. In addition, I'd prefer arranging agents in a per rack basis, because randomly shuffling agents scale to +1 nodes is no good for every allocation iteration. IIRC, this number is grown. All in all, IMHO, it's a good feature for Mesos, the question is how to do it elegantly. :) > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763 ] Fan Du edited comment on MESOS-5545 at 6/7/16 3:35 AM: --- [~vinodkone] Thanks for the comments. Rack topology information does not fall into scope of network isolator, because it's not the target which can/should be isolated. Here is the explanation to justify rack topology information can be updated: The state of rack information could only transit from no rack information to valid rack information, in other words, it's possible that tasks use resources without rack information, but later on agents report rack id to master, the logic could be one/all of design decisions: a) notify corresponding frameworks with updated rack id for previous resources, b) subsequent allocation will have rack id tagged with agents, c)Resource freed by framework will have rack id for the next round allocation. The scenario is simpler and cleaner compared with attributes updates. OR only activate the agents for resource allocation once got valid rack id. Using attributes is a way to export the rack information, but I don't think that's possible in production, scale of +1 servers, setting attributes with rack information from 3rd party logic and start agents?! Automatically exposing the rack information could save lots of deployment and maintenance effort. Apologize, seems I don't quite get the meaning of first class field, influencing allocation decision is not the intention of the ticket, I believe that part of work is out of scope the ticket, which I put them in the Future section of the design doc. The allocation strategy DOES honor DRF, current implementation is do the allocation in a per agent basis, and we could investigate different allocation modes. In addition, I'd prefer arranging agents in a per rack basis, because randomly shuffling agents scale to +1 nodes is no good for every allocation iteration. IIRC, this number is grown. All in all, IMHO, it's a good feature for Mesos, the question is how to do it elegantly. :) was (Author: fan.du): [~vinodkone] Thanks for the comments. Rack topology information does not fall into scope of network isolator, because it's not the target which can/should be isolated. Here is the explanation to justify rack topology information can be updated: The state of rack information could only transit from no rack information to valid rack information, in other words, it's possible that tasks use resources without rack information, but later on agents report rack id to master, the logic could be one/all of design decisions: a) notify corresponding frameworks with updated rack id for previous resources, b) subsequent allocation will have rack id tagged with agents, c)Resource freed by framework will have rack id for the next round allocation. The scenario is simpler and cleaner compared with attributes updates. OR only activate the agents for resource allocation once got valid rack id. Using attributes is a way to export the rack information, but I don't think that's possible in production, scale of +1 servers, setting attributes with rack information from 3rd party logic and start agents?! Automatically exposing the rack information could save lots of deployment and maintenance effort. Apologize, seems I don't quite get the meaning of first class field, influencing allocation decision is not the intention of the ticket, I believe that part of work is out of scope the ticket, which I put them in the Future section of the design doc. The allocation strategy DOES honor DRF, current implementation is do the allocation in a per agent basis, and we could investigate different allocation modes. In addition, I'd prefer arranging agents in a per rack basis, because randomly shuffling agents scale to +1 nodes is no good for every allocation iteration. IIRC, this number is grown. All in all, IMHO, it's a good feature for Mesos, the question is how to do it elegantly. :) > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763 ] Fan Du commented on MESOS-5545: --- [~vinodkone] Thanks for the comments. Rack topology information does not fall into scope of network isolator, because it's not the target which can/should be isolated. Here is the explanation to justify rack topology information can be updated: The state of rack information could only transit from no rack information to valid rack information, in other words, it's possible that tasks use resources without rack information, but later on agents report rack id to master, the logic could be one/all of design decisions: a) notify corresponding frameworks with updated rack id for previous resources, b) subsequent allocation will have rack id tagged with agents, c)Resource freed by framework will have rack id for the next round allocation. The scenario is simpler and cleaner compared with attributes updates. OR only activate the agents for resource allocation once got valid rack id. Using attributes is a way to export the rack information, but I don't think that's possible in production, scale of +1 servers, setting attributes with rack information from 3rd party logic and start agents?! Automatically exposing the rack information could save lots of deployment and maintenance effort. Apologize, seems I don't quite get the meaning of first class field, influencing allocation decision is not the intention of the ticket, I believe that part of work is out of scope the ticket, which I put them in the Future section of the design doc. The allocation strategy DOES honor DRF, current implementation is do the allocation in a per agent basis, and we could investigate different allocation modes. In addition, I'd prefer arranging agents in a per rack basis, because randomly shuffling agents scale to +1 nodes is no good for every allocation iteration. IIRC, this number is grown. All in all, IMHO, it's a good feature for Mesos, the question is how to do it elegantly. :) > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316120#comment-15316120 ] Fan Du commented on MESOS-5545: --- labels require manual configuration, or involves tools like Ansible, Puppet stuff. This ticket will do it automatically to probe the cluster rack topology. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316113#comment-15316113 ] Fan Du commented on MESOS-5545: --- The design doc to elaborate the story will be published soon for community to review. Please hold on. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5545) Add rack awareness support for Mesos resources
Fan Du created MESOS-5545: - Summary: Add rack awareness support for Mesos resources Key: MESOS-5545 URL: https://issues.apache.org/jira/browse/MESOS-5545 Project: Mesos Issue Type: Story Components: hadoop, master Reporter: Fan Du Resources managed by Mesos master have no topology information of the cluster, for example, rack topology. While lots of data center applications have rack awareness feature to provide data locality, fault tolerance and intelligent task placement. This ticket tries to investigate how to add rack awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257405#comment-15257405 ] Fan Du commented on MESOS-4492: --- [~bmahler] Can you please help to view this ticket? RR: https://reviews.apache.org/r/44255/ Thanks a lot! > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255998#comment-15255998 ] Fan Du commented on MESOS-4705: --- [~bmahler] Ping ;) > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245167#comment-15245167 ] Fan Du commented on MESOS-4705: --- [~haosd...@gmail.com] [~bmahler] I have elaborated more about the comments, please review again: https://reviews.apache.org/r/44379/ Thanks a lot! > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243029#comment-15243029 ] Fan Du commented on MESOS-4705: --- {quote} Which patch? This one? https://reviews.apache.org/r/44379/ It still does not contain the information related to perf stat formats that haosdent provided earlier in this thread. Can you add that? {quote} [~haosd...@gmail.com] I think I have added the format you mention at the first reply of the comments {{value,unit,event,cgroup}}, and this format also matches what you describe in [MESOS-4655|https://issues.apache.org/jira/browse/MESOS-4655], right? > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5163) LKVM Containerization
[ https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234769#comment-15234769 ] Fan Du commented on MESOS-5163: --- AFAIK, Clear Container has additional features enhancement and bugfix for lkvm which maybe absent in upstream lkvm version. That's why I ask whether this ticket is intended for Clear Container. > LKVM Containerization > - > > Key: MESOS-5163 > URL: https://issues.apache.org/jira/browse/MESOS-5163 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Vaibhav Khanduja > Labels: container, containerizer > > LKVM is lightweight kernel based hypervisors. The hypervisor is eventually > designed to land inside kernel code, it may be good step to consider > supporting as one the container option. LKVM comes with the advantage of been > light weight container along with its own kernel footprint. Having a separate > kernel footprint goes way forward in solving issue of security with > containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234588#comment-15234588 ] Fan Du commented on MESOS-4705: --- [~bmahler] I have updated the RR using tokens size to parse perf stat output format, please review. btw, I'm wondering if you can help to review https://reviews.apache.org/r/44255/, I shoot [~jieyu] the email weeks before, maybe he is quite busy with something else. thanks a lot! [~haosd...@gmail.com] Add you as co-author. Thanks for the comments and challenges. > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5163) LKVM Containerization
[ https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234420#comment-15234420 ] Fan Du commented on MESOS-5163: --- [~vaibhav_khanduja] Does this ticket is intened for Intel Clear Container, which based on lkvm? > LKVM Containerization > - > > Key: MESOS-5163 > URL: https://issues.apache.org/jira/browse/MESOS-5163 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Vaibhav Khanduja > Labels: container, containerizer > > LKVM is lightweight kernel based hypervisors. The hypervisor is eventually > designed to land inside kernel code, it may be good step to consider > supporting as one the container option. LKVM comes with the advantage of been > light weight container along with its own kernel footprint. Having a separate > kernel footprint goes way forward in solving issue of security with > containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5163) LKVM Containerization
[ https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234420#comment-15234420 ] Fan Du edited comment on MESOS-5163 at 4/11/16 2:31 AM: [~vaibhav_khanduja] Is this ticket intened for Intel Clear Container, which based on lkvm? was (Author: fan.du): [~vaibhav_khanduja] Does this ticket is intened for Intel Clear Container, which based on lkvm? > LKVM Containerization > - > > Key: MESOS-5163 > URL: https://issues.apache.org/jira/browse/MESOS-5163 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Vaibhav Khanduja > Labels: container, containerizer > > LKVM is lightweight kernel based hypervisors. The hypervisor is eventually > designed to land inside kernel code, it may be good step to consider > supporting as one the container option. LKVM comes with the advantage of been > light weight container along with its own kernel footprint. Having a separate > kernel footprint goes way forward in solving issue of security with > containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227977#comment-15227977 ] Fan Du commented on MESOS-4981: --- [~bmahler] You are correct about this, I totally missed here. Please review the new RR: https://reviews.apache.org/r/45808/ Look, in linux kernel there is Suggested-by: indicates the idea comes from someone else, I didn't notice this in Mesos, so I add comments in the commit message. Thanks for your reviewing. > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5129) Supporting Container Images in Mesos Containerizer doesn't work
[ https://issues.apache.org/jira/browse/MESOS-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227792#comment-15227792 ] Fan Du commented on MESOS-5129: --- Then you need to install hadoop on your agent first. > Supporting Container Images in Mesos Containerizer doesn't work > --- > > Key: MESOS-5129 > URL: https://issues.apache.org/jira/browse/MESOS-5129 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.29.0 >Reporter: wangqun > > Hi > I try to test the feature of Supporting Container Images in Mesos > Containerizer according to > https://github.com/apache/mesos/blob/master/docs/container-image.md#test-it-out. > But it doesn't work. >I use the mesos 0.29 version. > The following is my step: > 1) sudo bin/mesos-master.sh --log_dir=/var/log/mesos --ip=9.5.124.139 > --work_dir=/tmp/mesos/master > 2) sudo bin/mesos-slave.sh --master=9.5.124.139:5050 --ip=9.5.124.139 > --hostname=mesos --isolation=docker/runtime,filesystem/linux > --work_dir=/tmp/mesos/slave --log_dir=/var/log/mesos --image_providers=docker > --executor_environment_variables="{}" > 3)sudo src/mesos-execute --master=9.5.124.139:5050 --name=test > --docker_image=library/redis --shell=false > WARNING: Logging before InitGoogleLogging() is written to STDERR > W0406 03:33:05.730432 5886 scheduler.cpp:157] > ** > Scheduler driver bound to loopback interface! Cannot communicate with remote > master(s). You might want to set 'LIBPROCESS_IP' environment variable to use > a routable IP address. > ** > I0406 03:33:05.730623 5886 scheduler.cpp:172] Version: 0.29.0 > Subscribed with ID '79b6ed58-46a9-4760-a589-a28061f4f1e9- > task test submitted to agent 7184bc3a-243c-4ca7-8897-c98e81836ed6-S1 > Received status update TASK_RUNNING for task test > 4) sudo vim lt-mesos-slave.mesos.root.log.ERROR > Command 'hadoop version 2>&1' failed; this is the output: > sh: 1: hadoop: not found -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5129) Supporting Container Images in Mesos Containerizer doesn't work
[ https://issues.apache.org/jira/browse/MESOS-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227778#comment-15227778 ] Fan Du commented on MESOS-5129: --- The log speaks for itself, it has nothing to do with Mesos, your Hadoop env probabaly not correctly set. {code} Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found {code} Refer: https://mail-archives.apache.org/mod_mbox/mesos-user/201511.mbox/%3c563acaf7.1030...@intel.com%3E > Supporting Container Images in Mesos Containerizer doesn't work > --- > > Key: MESOS-5129 > URL: https://issues.apache.org/jira/browse/MESOS-5129 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.29.0 >Reporter: wangqun > > Hi > I try to test the feature of Supporting Container Images in Mesos > Containerizer according to > https://github.com/apache/mesos/blob/master/docs/container-image.md#test-it-out. > But it doesn't work. >I use the mesos 0.29 version. > The following is my step: > 1) sudo bin/mesos-master.sh --log_dir=/var/log/mesos --ip=9.5.124.139 > --work_dir=/tmp/mesos/master > 2) sudo bin/mesos-slave.sh --master=9.5.124.139:5050 --ip=9.5.124.139 > --hostname=mesos --isolation=docker/runtime,filesystem/linux > --work_dir=/tmp/mesos/slave --log_dir=/var/log/mesos --image_providers=docker > --executor_environment_variables="{}" > 3)sudo src/mesos-execute --master=9.5.124.139:5050 --name=test > --docker_image=library/redis --shell=false > WARNING: Logging before InitGoogleLogging() is written to STDERR > W0406 03:33:05.730432 5886 scheduler.cpp:157] > ** > Scheduler driver bound to loopback interface! Cannot communicate with remote > master(s). You might want to set 'LIBPROCESS_IP' environment variable to use > a routable IP address. > ** > I0406 03:33:05.730623 5886 scheduler.cpp:172] Version: 0.29.0 > Subscribed with ID '79b6ed58-46a9-4760-a589-a28061f4f1e9- > task test submitted to agent 7184bc3a-243c-4ca7-8897-c98e81836ed6-S1 > Received status update TASK_RUNNING for task test > 4) sudo vim lt-mesos-slave.mesos.root.log.ERROR > Command 'hadoop version 2>&1' failed; this is the output: > sh: 1: hadoop: not found -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221062#comment-15221062 ] Fan Du commented on MESOS-4981: --- [~bmahler] & [~vinodkone] How about not to distinguish {{messages_register_framework}} with {{messages_reregister_framework}} in such strict manner? Update flow of {{subscribe}} by: {code} 1. bump messages_register_framework 2. Various of sanity check 3. Newborn framework? 3a. Add new framework 3b. Return 4. Add messages_reregister_framework 5. Otherwise framework is reregistering 5a. Updating the framework 5b. Return {code} > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221056#comment-15221056 ] Fan Du commented on MESOS-4492: --- [~jieyu] I'm wondering if you have any cycles for the final review? thanks! > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209672#comment-15209672 ] Fan Du commented on MESOS-4981: --- hmm, here is the scenario, I can explain :) When framework call SUBSCRIBE, it could be register a newborn framework, or it could also possibly updating(reregistering) a framework. For {{subscribe}} the flow is: {code} 1. bump messages_register_framework 2. Various of sanity check 3. Newborn framework? 3a. Add new framework 3b. Return 4. Roll back messages_register_framework, and add messages_reregister_framework 5. Otherwise framework is reregistering 5a. Updating the framework 5b. Return {code} That's why I ask two questions above: q1. Does metrics has to counter fail cases like sanity check? If no, we can fairly bump the metrics when we are sure it's a good/clean operation in 3a, and 5a. But from the conventions how other metrics are countered, metrics includes all other fail cases like sanity check. q2. Is it ok to update messages_register_framework, even though it's already know the operation should bump messages_reregister_framework? that's being said, do not need to roll back messages_register_framework again? > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209584#comment-15209584 ] Fan Du commented on MESOS-4981: --- [~bmahler] May I have your comments here? then I can move forward on this ticket. > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209572#comment-15209572 ] Fan Du commented on MESOS-4492: --- Done! Thanks for [~greggomann] and [~jieyu] to spend time to review. > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205960#comment-15205960 ] Fan Du commented on MESOS-4981: --- [~bbannier] Thanks for the quick review! :) [~bmahler] Actually I have two questions here first: 1. Do we need to bump the metrics for failure cases of operation, e.g. parameter sanity checks, authentication/authorization? 2. For the case of this ticket, we handle {{registerFramework}} and {{reregisterFramework}} together in {{[subscribe|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/master/master.cpp;h=e6290ea686ccf17813d6faeaf2f2012f79cf3b7f;hb=HEAD#l2256]}}, do we need to differentiate the metrics of {{registerFramework}} and {{reregisterFramework}} strictly? If "yes" to above two questions, IMO, we DO need Counter to be decremented for above case, to accommodate for the implementation :) I didn't know [~wangcong] has submit [r44473 | https://reviews.apache.org/r/44473/], I think it could be beneficial at least to my case here. Here is my understanding about Counter and Gauge, though we didn't differentiate them in Linux kernel. Use Counter for events or messages, and use Gauge to get a snapshot of Resources by its name and meaning. It lost the semantics if switching them over. > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205908#comment-15205908 ] Fan Du commented on MESOS-4492: --- [~greggomann] Any further comments about the review? :) > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205905#comment-15205905 ] Fan Du commented on MESOS-4981: --- [~anandmazumdar] Thanks, I have added [~vinodkone] as reviewer. > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. We should correctly be incrementing these counters for PID based > frameworks as was the case previously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887 ] Fan Du edited comment on MESOS-4981 at 3/21/16 8:34 AM: [~anandmazumdar] I happened to look a deep look at this, here is the fix works on my env. Please review: https://reviews.apache.org/r/45096 https://reviews.apache.org/r/45097 was (Author: fan.du): [~anandmazumdar] I happened to look a deep look at this, here is the fix works on my env. Please review: https://reviews.apache.org/r/45094/ > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. Either, we should think about adding new counter(s) for > {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the > existing code to correctly increment the counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887 ] Fan Du edited comment on MESOS-4981 at 3/21/16 8:19 AM: [~anandmazumdar] I happened to look a deep look at this, here is the fix works on my env. Please review: https://reviews.apache.org/r/45094/ was (Author: fan.du): [~anandmazumdar] I happened to look a deep look at this, here is fix works on my env. Please review: https://reviews.apache.org/r/45094/ > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. Either, we should think about adding new counter(s) for > {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the > existing code to correctly increment the counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887 ] Fan Du commented on MESOS-4981: --- [~anandmazumdar] I happened to look a deep look at this, here is fix works on my env. Please review: https://reviews.apache.org/r/45094/ > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. Either, we should think about adding new counter(s) for > {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the > existing code to correctly increment the counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du reassigned MESOS-4981: - Assignee: Fan Du > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar >Assignee: Fan Du > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. Either, we should think about adding new counter(s) for > {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the > existing code to correctly increment the counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4955) Generize perf event parsing to match PerfStatistics filed name for "perf stat"
[ https://issues.apache.org/jira/browse/MESOS-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199148#comment-15199148 ] Fan Du commented on MESOS-4955: --- Really sweet, this is exactly what I need. thanks for the point. > Generize perf event parsing to match PerfStatistics filed name for "perf stat" > -- > > Key: MESOS-4955 > URL: https://issues.apache.org/jira/browse/MESOS-4955 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Fan Du >Assignee: Fan Du > > Current > [design|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/mesos.proto;h=deb9c0910a27afd67276f54b3f666a878212727b;hb=HEAD#l981] > does not support event like: > {{SUBSYS/EVENT <- Most notable intel_cqm/llc_occupancy/}} > {{SUSSYS:EVENT <- All Tracepoint event}} > This gap could be fulfilled with a bit by matching EVENT with PerfStatistics > Proto Message name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4955) Generize perf event parsing to match PerfStatistics filed name for "perf stat"
[ https://issues.apache.org/jira/browse/MESOS-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196627#comment-15196627 ] Fan Du commented on MESOS-4955: --- Here posted the RFC review request to evaluate whether this ticket is worthwhile to pursue further more: https://reviews.apache.org/r/44881/ btw, currently I use {{intel_cqm/llc_occupancy/}} and {{sched:intel_cqm/llc_occupancy/}} as an example only, other event could be easily expended later on. > Generize perf event parsing to match PerfStatistics filed name for "perf stat" > -- > > Key: MESOS-4955 > URL: https://issues.apache.org/jira/browse/MESOS-4955 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Fan Du >Assignee: Fan Du > > Current > [design|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/mesos.proto;h=deb9c0910a27afd67276f54b3f666a878212727b;hb=HEAD#l981] > does not support event like: > {{SUBSYS/EVENT <- Most notable intel_cqm/llc_occupancy/}} > {{SUSSYS:EVENT <- All Tracepoint event}} > This gap could be fulfilled with a bit by matching EVENT with PerfStatistics > Proto Message name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192782#comment-15192782 ] Fan Du commented on MESOS-4705: --- Hi Benjiamin, could you please review the updated RR? thanks for your time! https://reviews.apache.org/r/44379/ > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4753) Add executor state when reporting resource usage
[ https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du updated MESOS-4753: -- Component/s: (was: slave) oversubscription > Add executor state when reporting resource usage > > > Key: MESOS-4753 > URL: https://issues.apache.org/jira/browse/MESOS-4753 > Project: Mesos > Issue Type: Improvement > Components: oversubscription, statistics >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > Slave reports resource usage of each executor for resource estimator to feed > master with revocable resource, it's better to append executor state as well > when reporting usage, which in turn resource estimator would easily focus on > the *RUNNING* executor only. > it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible > not sync up with the resource usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186741#comment-15186741 ] Fan Du commented on MESOS-4705: --- I have another thought, look at the perf stat format in differenct kernel version, it could be either of those: 1. value,event,cgroup 2. value,unit,event,cgroup 3. value,unit,event,cgroup,running,ratio For old kernel version maintained by OS vendors, the perf stat output elements don't change its order anyway, it only append new elements at the end. So why not drop meaningless kernel version checking, just take the needed elements as {code} if tokens.size = 3 return tokens[0] tokens[1] tokens[2] if tokens.size = 4 or tokens.size = 6 return tokens[0] tokens[2] tokens[3] {code} [~bmahler] and [~haosdent] any comments? > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182695#comment-15182695 ] Fan Du commented on MESOS-4492: --- [~greggomann] I saw this ticket is not accepted by committer so far, could you pls help to do that, and then I can update the JIRA workflow. And one more question, what do I need to do before [~jieyu] merge the patch since you have "ship it"? Thanks a lot for your reviewing :) > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179476#comment-15179476 ] Fan Du commented on MESOS-4705: --- Hi, [~bmahler] This is a follow-up bug fix for [MESOS-2834|https://issues.apache.org/jira/browse/MESOS-2834], am I wondering if you could shepherd this issue with an easy fix I posted above? :) > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179431#comment-15179431 ] Fan Du commented on MESOS-4705: --- Here comes the RR to fix this: https://reviews.apache.org/r/44379/ And I'm looking for shepherd to review this... > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4846) Add Memory Bandwidth Monitoring (MBM) perf support
Fan Du created MESOS-4846: - Summary: Add Memory Bandwidth Monitoring (MBM) perf support Key: MESOS-4846 URL: https://issues.apache.org/jira/browse/MESOS-4846 Project: Mesos Issue Type: Improvement Components: oversubscription, statistics Reporter: Fan Du Assignee: Fan Du This ticket will track the support of Intel Memory Bandwidth Monitoring (MBM) for current PerfStatistics, the per task memory bandwidth usage will be analyzed by QoS controller to make better corrections decision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168583#comment-15168583 ] Fan Du edited comment on MESOS-4492 at 3/2/16 5:16 AM: --- Here goes the RR: (Discarded) https://reviews.apache.org/r/44058/ Updated RR with document fix and test code addon: https://reviews.apache.org/r/44255/ was (Author: fan.du): Here goes the RR: https://reviews.apache.org/r/44058/ > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168587#comment-15168587 ] Fan Du commented on MESOS-4492: --- Thanks for the kind notice :) > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4753) Add executor state when reporting resource usage
[ https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du updated MESOS-4753: -- Description: Slave reports resource usage of each executor for resource estimator to feed master with revocable resource, it's better to append executor state as well when reporting usage, which in turn resource estimator would easily focus on the *RUNNING* executor only. it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible not sync up with the resource usage. was: Slave reports resource usage of each executor for resource estimator to feed master with revocable resource, it's better to append executor state as well when reporting usage, which in turn resource estimator would easily focus on the *RUNNING* executor only. it's possible to call {code} Slave:: getExecutor {code} in estimator, but it's possible not sync up with the resource usage. > Add executor state when reporting resource usage > > > Key: MESOS-4753 > URL: https://issues.apache.org/jira/browse/MESOS-4753 > Project: Mesos > Issue Type: Improvement > Components: slave, statistics >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > Slave reports resource usage of each executor for resource estimator to feed > master with revocable resource, it's better to append executor state as well > when reporting usage, which in turn resource estimator would easily focus on > the *RUNNING* executor only. > it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible > not sync up with the resource usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4753) Add executor state when reporting resource usage
[ https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160334#comment-15160334 ] Fan Du commented on MESOS-4753: --- [~nnielsen] IMHO, resource estimator and qos controller in serenity needs to count RUNNING executor resource usage only, I'm thinking about this change, and then will enhance serenity age filter, may I get some comments for you? :) > Add executor state when reporting resource usage > > > Key: MESOS-4753 > URL: https://issues.apache.org/jira/browse/MESOS-4753 > Project: Mesos > Issue Type: Improvement > Components: slave, statistics >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > Slave reports resource usage of each executor for resource estimator to feed > master with revocable resource, it's better to append executor state as well > when reporting usage, which in turn resource estimator would easily focus on > the *RUNNING* executor only. > it's possible to call {code} Slave:: getExecutor {code} in estimator, but > it's possible not sync up with the resource usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160322#comment-15160322 ] Fan Du edited comment on MESOS-4492 at 2/24/16 7:44 AM: [~jieyu] after reviewing the [code | https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/master/master.cpp;h=8d6d3c6468c6b85fe09c33cf9747cc3d1f515ab9;hb=HEAD#l3027] here, I would like to fill the gap, and I'm wondering if you could review the ticket? thanks was (Author: fan.du): [~jieyu] after reviewing the code here, I would like to fill the gap, and I'm wondering if you could review the ticket? thanks > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160322#comment-15160322 ] Fan Du commented on MESOS-4492: --- [~jieyu] after reviewing the code here, I would like to fill the gap, and I'm wondering if you could review the ticket? thanks > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4753) Add executor state when reporting resource usage
Fan Du created MESOS-4753: - Summary: Add executor state when reporting resource usage Key: MESOS-4753 URL: https://issues.apache.org/jira/browse/MESOS-4753 Project: Mesos Issue Type: Improvement Components: slave, statistics Reporter: Fan Du Assignee: Fan Du Priority: Minor Slave reports resource usage of each executor for resource estimator to feed master with revocable resource, it's better to append executor state as well when reporting usage, which in turn resource estimator would easily focus on the *RUNNING* executor only. it's possible to call {code} Slave:: getExecutor {code} in estimator, but it's possible not sync up with the resource usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153540#comment-15153540 ] Fan Du commented on MESOS-4705: --- Lots of China local cloud service provider still use 2.6.32 kernel as we have supported. It's easy to catch any exception in the last step anyway. > Slave failed to sample container with perf event > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4705) Slave failed to sample container with perf event
Fan Du created MESOS-4705: - Summary: Slave failed to sample container with perf event Key: MESOS-4705 URL: https://issues.apache.org/jira/browse/MESOS-4705 Project: Mesos Issue Type: Bug Components: cgroups, isolation Affects Versions: 0.27.1 Reporter: Fan Du Assignee: Fan Du When sampling container with perf event on Centos7 with kernel 3.10.0-123.el7.x86_64, slave complained with below error spew: {code} E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': Unexpected number of fields {code} it's caused by the current perf format [assumption | https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] with kernel version below 3.12 On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: value,unit,event,cgroup,running,ratio A local modification fixed this error on my test bed, please review this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} offer operation
Fan Du created MESOS-4492: - Summary: Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} offer operation Key: MESOS-4492 URL: https://issues.apache.org/jira/browse/MESOS-4492 Project: Mesos Issue Type: Improvement Components: master Reporter: Fan Du Assignee: Fan Du Priority: Minor This ticket aims to enable user or operator to inspect operation statistics such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
[ https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Du updated MESOS-4492: -- Summary: Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation (was: Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} offer operation) > Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation > -- > > Key: MESOS-4492 > URL: https://issues.apache.org/jira/browse/MESOS-4492 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Fan Du >Assignee: Fan Du >Priority: Minor > > This ticket aims to enable user or operator to inspect operation statistics > such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only > supports LAUNCH. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role
[ https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108147#comment-15108147 ] Fan Du commented on MESOS-4389: --- Based on the code review, it's by design, it doesn't matter much though to use it. Just a random puzzle :) > Master "roles" endpoint only shows active role > -- > > Key: MESOS-4389 > URL: https://issues.apache.org/jira/browse/MESOS-4389 > Project: Mesos > Issue Type: Improvement > Components: HTTP API, master >Reporter: Fan Du > > Register two slaves to master with role "busybox" and "ubuntu" respectively, > then running marthon with role "busybox", after this check master "roles" > endpoints, it can only get default and active role, could this be improved to > show all available roles for easily checking? > {code} > { > "roles": [ > { > "frameworks": [], > "name": "*", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > }, > { > "frameworks": [ > "2caebb14-161f-4941-b8ab-8990cef01ac0-" > ], > "name": "busybox", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > } > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108142#comment-15108142 ] Fan Du commented on MESOS-4339: --- [~adam-mesos] and [~bbannier] Based on the proposal documentation from MESOS-4284, it's well justified to enable weighted DRF framework sorter in a multi-role scenario, to keep the allocation decision fair across roles and frameworks. Although the work to support weighted DRF framework sorter is independent with that of multi-role frameworks in its design logic(which is what I thought before incompletely) but, the former needed to be done *AFTER* multi-role frameworks apparently in implementation. So I'm wondering if you don't mind, I would still like to contribute this ticket to multi-role frameworks. > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du >Assignee: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role
[ https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101514#comment-15101514 ] Fan Du commented on MESOS-4389: --- Thanks for notice of impcicit role, I will give it a try. The two slaves is configured with default role(busybox, ubuntu) respectively, master has not set any {{roles}} in command line. I realized when doing so, it will become role's on the whitelist, which means it will show up when querying the roles endpoint. > Master "roles" endpoint only shows active role > -- > > Key: MESOS-4389 > URL: https://issues.apache.org/jira/browse/MESOS-4389 > Project: Mesos > Issue Type: Improvement > Components: HTTP API, master >Reporter: Fan Du > > Register two slaves to master with role "busybox" and "ubuntu" respectively, > then running marthon with role "busybox", after this check master "roles" > endpoints, it can only get default and active role, could this be improved to > show all available roles for easily checking? > {code} > { > "roles": [ > { > "frameworks": [], > "name": "*", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > }, > { > "frameworks": [ > "2caebb14-161f-4941-b8ab-8990cef01ac0-" > ], > "name": "busybox", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > } > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101267#comment-15101267 ] Fan Du commented on MESOS-4339: --- Thanks for your kind reminder, I got it :) > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du >Assignee: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101251#comment-15101251 ] Fan Du commented on MESOS-4339: --- You understand my intention clearly, and thanks for in-depth comments of the background. As for why doing this, user scenario of making frameworks be able to prioritize each other within a role should be supported, just as the rationale of weigthed roles. The veto is based on the depoly assumption that one role could only have exactly one framework attached to it, I'm not sure how this is going to change after MESOS-4284. Anyway please add more comments. Here are my early thoughts about weighted framework sorter should support/respect: * Respect framework reregistration for weight update * Need an operator endpoint for dynamic reweighting(I didn't mention this in the ticket's description, though it's already in my mind) * In presence of multi-role frameworks, a per-role weight style makes more sense [MESOS-4284|https://issues.apache.org/jira/browse/MESOS-4284] has a design proposal published yesterday, which I need to dive into first to understand possible concerns from [~bbannier]. > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du >Assignee: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101267#comment-15101267 ] Fan Du edited comment on MESOS-4339 at 1/15/16 6:12 AM: Thanks for your kind reminder, I got it :) It seems I can't switch it back to OPEN... was (Author: fan.du): Thanks for your kind reminder, I got it :) > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du >Assignee: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4389) Master "roles" endpoint only shows active role
Fan Du created MESOS-4389: - Summary: Master "roles" endpoint only shows active role Key: MESOS-4389 URL: https://issues.apache.org/jira/browse/MESOS-4389 Project: Mesos Issue Type: Improvement Components: HTTP API, master Reporter: Fan Du Register two slaves to master with role "busybox" and "ubuntu" respectively, then running marthon with role "busybox", after this check master "roles" endpoints, it can only get default and active role, could this be improved to show all available roles for easily checking? {code} { "roles": [ { "frameworks": [], "name": "*", "resources": { "cpus": 0, "disk": 0, "mem": 0 }, "weight": 1.0 }, { "frameworks": [ "2caebb14-161f-4941-b8ab-8990cef01ac0-" ], "name": "busybox", "resources": { "cpus": 0, "disk": 0, "mem": 0 }, "weight": 1.0 } ] } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4339) Add weight support for framework sorter
Fan Du created MESOS-4339: - Summary: Add weight support for framework sorter Key: MESOS-4339 URL: https://issues.apache.org/jira/browse/MESOS-4339 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Fan Du Current framework sorter doesn't take into account of weights when sorting framework belonging to a particular role, i.e., all frameworks has equal weights as 1. Considering the role weight is controlled by the operator, enable the framework weight does not impact the role level allocation decision from any greedy frameworks, but it will be beneficial to some framework who could get more resources within a specific role. The framework weight will come from message FrameworkInfo when it got registered, and FrameworkSorters will "add" framework with weight, this will eventually result a weighted framework sorting flow when master make the finally allocation decision. Please review this ticket which I will work on if it's considered acceptable. Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093592#comment-15093592 ] Fan Du commented on MESOS-4339: --- Role sorter is weighted DRF, framework sorter DRF without weight. When add a new framework with a role, role sorter and framework sorter will come into play: (I am not sure whether Mesos community curtesy allows to paste code snippet) void HierarchicalAllocatorProcess::addFramework( const FrameworkID& frameworkId, const FrameworkInfo& frameworkInfo, const hashmap& used) { CHECK(initialized); const string& role = frameworkInfo.role(); // If this is the first framework to register as this role, // initialize state as necessary. if (!activeRoles.contains(role)) { activeRoles[role] = 1; roleSorter->add(role, roleWeight(role)); frameworkSorters[role] = frameworkSorterFactory(); } else { activeRoles[role]++; } CHECK(!frameworkSorters[role]->contains(frameworkId.value())); frameworkSorters[role]->add(frameworkId.value()); > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093812#comment-15093812 ] Fan Du commented on MESOS-4339: --- bq. since all the weights inside a role are identical, right? for current implementation, yes. it will behave just as weighted role if we add weight when adding new framework. My understanding about current allocation behavior is a triple iteration as following: [HierarchicalAllocatorProcess::allocate|https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1254] * Foreach Slave in the Slaves Vector ** Foreach Role sorted by rolesorter with role weights *** Foreach Framework sorted by frameworksorter with identical weights within the same role The intention of this ticket is enable Framework sorted by weights, i.e. the last iteration. I think this is where we saw differently. Please correct me if I missed somewhere else. bq. Also, currently frameworks can only have a single role. Yes, but temporally. It will be changed by [MESOS-1763|https://issues.apache.org/jira/browse/MESOS-1763] > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095460#comment-15095460 ] Fan Du commented on MESOS-4339: --- Of course the sorting is supported ever since the DRF sorter is created. but the framework sorter instance *NEVER* use it. In addition this ticket involves minimal clean change to the current design, whileas modification of MESOS-4284 is quite invasive. I didn't see any obivous reason why this ticket should be postponed util MESOS-4284, they are unrelated to each other at high level design and functionality, please elaborate more of the story behind your point of view. > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093477#comment-15093477 ] Fan Du commented on MESOS-3765: --- Sure, will do. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Guangya Liu > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093455#comment-15093455 ] Fan Du commented on MESOS-3765: --- [~gyliu] The proposal document states "DRF will be disabled with Fine-Grained Resource Offers." , I am wondering why fine grained offer should bypass WDRF in practice? By my understanding, impliments fine grained offer fits well inside current WDRF logic, because of current allocation behavior: Foreach Slave Foreach Role Foreach Framework within the role compute agent resources of revocable case OR compute agent resources of non-revocable case <- (*A) offer the agent resources to current framework <- (*B) Each slave will grant at most one time allocation offer for the first framework within a role, if there is no revocalbe frameworks; Each slave will grant at most two times allocations offer for one non-revocable and one revocalbe framework. If we apply granuality between (*A) and (*B), it would be perfet to make loops to iterate remaining framworks, the the goal to spread agent resource between frameworks is done. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Guangya Liu > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)