[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-07-31 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401435#comment-15401435
 ] 

Fan Du commented on MESOS-5545:
---

It makes no sense by changing the label at the time being, implementation 
details will be discussed with shepherd eventually.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Epic
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5545) Add rack awareness support for Mesos resources

2016-07-31 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-5545:
--
Issue Type: Story  (was: Epic)

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-07-31 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401422#comment-15401422
 ] 

Fan Du edited comment on MESOS-5545 at 8/1/16 1:20 AM:
---

[~haosd...@gmail.com]
What's the intention of changing label from *Story* to *Epic*?


was (Author: fan.du):
[~haosd...@gmail.com]
What's the intention of changing label from *Story* to "Epic"?

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Epic
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-07-31 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401422#comment-15401422
 ] 

Fan Du commented on MESOS-5545:
---

[~haosd...@gmail.com]
What's the intention of changing label from *Story* to "Epic"?

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Epic
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321408#comment-15321408
 ] 

Fan Du commented on MESOS-5545:
---

[~brugidou] Thanks for the sharing, I will look into it!

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320245#comment-15320245
 ] 

Fan Du commented on MESOS-5545:
---

[~jvanremoortere] Thanks for your constructive advices/suggestions!

Yes, this will be a long way, but it's fun to experiment the idea. :)
How about we sync up together in the next community meeting 6/16?

In my heart, it's not the attribute that I hate, but lack of doing this 
automatically with boring maintenance effort.
I will update my design doc to enhance current attribute with the goals:
a. Automatically probing rack topology, modular popular network plugins, e.g. 
Ethernet, Infiniband etc. 
b. Using rack topology information to re-arrange agents in per rack basis.
c. Design a common/friendly attribute scheme for framework to interpret
d. ACLs to enforce security

btw, may I ask can you shepherd this ticket? we can work shoulder by shoulder 
then.
Thanks!


> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320125#comment-15320125
 ] 

Fan Du commented on MESOS-5545:
---

[~adam-mesos] Thanks for sharing your thoughts here, profound and impressive! 

Mesos performs the lower level resource scheduling, exporting the network 
topology will fall into Mesos's role. It's up to the framework scheduler like 
[Firmament|https://github.com/camsas/firmament] to do more sophisticated 
scheduling decision based on a qualitative approach.

I will think more about here, willing to discuss with you if anything shiny 
pops up in my mind.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320093#comment-15320093
 ] 

Fan Du commented on MESOS-5545:
---

[~avin...@mesosphere.io] Thanks for the comments, apparently you did LLDP 
homework :)

The topology here only refer to the access layer, that is the switch the agent 
directly connected to. And lldptool will take care of parsing LLDP packet in 
various ways, so to my best knowledge, this will not relate to libprocess part.

You are right about LLDP has boundary of next bridge, i.e. only hop one time, 
in the scenario when OpenvSwitch invovled, Mesos runs inside KVM guest, I can 
think of two ways here:
1. It's the LLDP packets set by ovs bridge that matters so far, because ovs 
bridge now is the access bridge, and lldpad daemon will broadcast LLDP packets.
2. After commit 
[784b58a3|https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=784b58a327ad16967ab64bbfa558df81980d31e9],
 sys knobs could be tweaked to forward LLDP packets.

I don't have any comments about using the label/attributes at the time being, I 
will work out something more appealing based on it.
Will let you my thoughts!

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316113#comment-15316113
 ] 

Fan Du edited comment on MESOS-5545 at 6/7/16 4:52 AM:
---

The design doc to elaborate the story will be published soon for community to 
review.
Please hold on.

Design doc:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing



was (Author: fan.du):
The design doc to elaborate the story will be published soon for community to 
review.
Please hold on.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-06 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-5545:
--
Attachment: RackAwarenessforMesos-Lite.pdf

Rack Awareness Design doc(pdf)

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763
 ] 

Fan Du edited comment on MESOS-5545 at 6/7/16 3:35 AM:
---

[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +1 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc. The allocation strategy DOES honor DRF, current 
implementation is do the allocation in a per agent basis, and we could 
investigate different allocation modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +1 nodes is no good for every allocation 
iteration. IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)


was (Author: fan.du):
[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +1 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc. The allocation strategy DOES honor DRF, current 
implementation is do the allocation in a per agent basis, and we could 
investigate different allocation modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +1 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763
 ] 

Fan Du edited comment on MESOS-5545 at 6/7/16 3:35 AM:
---

[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +1 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc. The allocation strategy DOES honor DRF, current 
implementation is do the allocation in a per agent basis, and we could 
investigate different allocation modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +1 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)


was (Author: fan.du):
[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +1 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc.
The allocation strategy DOES honor DRF, current implementation is do the 
allocation in a per agent basis, and we could investigate different allocation 
modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +1 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317763#comment-15317763
 ] 

Fan Du commented on MESOS-5545:
---

[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +1 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc.
The allocation strategy DOES honor DRF, current implementation is do the 
allocation in a per agent basis, and we could investigate different allocation 
modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +1 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-05 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316120#comment-15316120
 ] 

Fan Du commented on MESOS-5545:
---

labels require manual configuration, or involves tools like Ansible, Puppet 
stuff.
This ticket will do it automatically to probe the cluster rack topology.


> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-05 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316113#comment-15316113
 ] 

Fan Du commented on MESOS-5545:
---

The design doc to elaborate the story will be published soon for community to 
review.
Please hold on.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-05 Thread Fan Du (JIRA)
Fan Du created MESOS-5545:
-

 Summary: Add rack awareness support for Mesos resources
 Key: MESOS-5545
 URL: https://issues.apache.org/jira/browse/MESOS-5545
 Project: Mesos
  Issue Type: Story
  Components: hadoop, master
Reporter: Fan Du


Resources managed by Mesos master have no topology information of the cluster, 
for example, rack topology. While lots of data center applications have rack 
awareness feature to provide data locality, fault tolerance and intelligent 
task placement. This ticket tries to investigate how to add rack awareness for 
Mesos resources topology.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-04-25 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257405#comment-15257405
 ] 

Fan Du commented on MESOS-4492:
---

[~bmahler] Can you please help to view this ticket?
RR: https://reviews.apache.org/r/44255/

Thanks a lot!

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-04-25 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255998#comment-15255998
 ] 

Fan Du commented on MESOS-4705:
---

[~bmahler] Ping ;)

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-04-17 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245167#comment-15245167
 ] 

Fan Du commented on MESOS-4705:
---

[~haosd...@gmail.com] [~bmahler] I have elaborated more about the comments, 
please review again:

https://reviews.apache.org/r/44379/

Thanks a lot!

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-04-15 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243029#comment-15243029
 ] 

Fan Du commented on MESOS-4705:
---

{quote}
Which patch? This one? https://reviews.apache.org/r/44379/

It still does not contain the information related to perf stat formats that 
haosdent provided earlier in this thread. Can you add that?
{quote}

[~haosd...@gmail.com] I think I have added the format you mention at the first 
reply of the comments {{value,unit,event,cgroup}}, and this format also matches 
what you describe in 
[MESOS-4655|https://issues.apache.org/jira/browse/MESOS-4655], right?

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5163) LKVM Containerization

2016-04-11 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234769#comment-15234769
 ] 

Fan Du commented on MESOS-5163:
---

AFAIK, Clear Container has additional features enhancement and bugfix for lkvm 
which maybe absent in upstream lkvm version. That's why I ask whether this 
ticket is intended for Clear Container.

> LKVM Containerization
> -
>
> Key: MESOS-5163
> URL: https://issues.apache.org/jira/browse/MESOS-5163
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Vaibhav Khanduja
>  Labels: container, containerizer
>
> LKVM is lightweight kernel based hypervisors. The hypervisor is eventually 
> designed to land inside kernel code, it may be good step to consider 
> supporting as one the container option. LKVM comes with the advantage of been 
> light weight container along with its own kernel footprint. Having a separate 
> kernel footprint goes way forward in solving issue of security with 
> containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-04-11 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234588#comment-15234588
 ] 

Fan Du commented on MESOS-4705:
---

[~bmahler]
I have updated the RR using tokens size to parse perf stat output format, 
please review.
btw, I'm wondering if you can help to review 
https://reviews.apache.org/r/44255/,
I shoot [~jieyu] the email weeks before, maybe he is quite busy with something 
else.
thanks a lot!

[~haosd...@gmail.com]
Add you as co-author.
Thanks for the comments and challenges.


> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5163) LKVM Containerization

2016-04-10 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234420#comment-15234420
 ] 

Fan Du commented on MESOS-5163:
---

[~vaibhav_khanduja]
Does this ticket is intened for Intel Clear Container, which based on lkvm?

> LKVM Containerization
> -
>
> Key: MESOS-5163
> URL: https://issues.apache.org/jira/browse/MESOS-5163
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Vaibhav Khanduja
>  Labels: container, containerizer
>
> LKVM is lightweight kernel based hypervisors. The hypervisor is eventually 
> designed to land inside kernel code, it may be good step to consider 
> supporting as one the container option. LKVM comes with the advantage of been 
> light weight container along with its own kernel footprint. Having a separate 
> kernel footprint goes way forward in solving issue of security with 
> containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5163) LKVM Containerization

2016-04-10 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234420#comment-15234420
 ] 

Fan Du edited comment on MESOS-5163 at 4/11/16 2:31 AM:


[~vaibhav_khanduja]
Is this ticket intened for Intel Clear Container, which based on lkvm?


was (Author: fan.du):
[~vaibhav_khanduja]
Does this ticket is intened for Intel Clear Container, which based on lkvm?

> LKVM Containerization
> -
>
> Key: MESOS-5163
> URL: https://issues.apache.org/jira/browse/MESOS-5163
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Vaibhav Khanduja
>  Labels: container, containerizer
>
> LKVM is lightweight kernel based hypervisors. The hypervisor is eventually 
> designed to land inside kernel code, it may be good step to consider 
> supporting as one the container option. LKVM comes with the advantage of been 
> light weight container along with its own kernel footprint. Having a separate 
> kernel footprint goes way forward in solving issue of security with 
> containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-04-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227977#comment-15227977
 ] 

Fan Du commented on MESOS-4981:
---

[~bmahler] You are correct about this, I totally missed here.
Please review the new RR:

https://reviews.apache.org/r/45808/

Look, in linux kernel there is Suggested-by: indicates the idea comes from 
someone else, I didn't notice this in Mesos, so I add comments in the commit 
message.
Thanks for your reviewing.


> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5129) Supporting Container Images in Mesos Containerizer doesn't work

2016-04-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227792#comment-15227792
 ] 

Fan Du commented on MESOS-5129:
---

Then you need to install hadoop on your agent first.

> Supporting Container Images in Mesos Containerizer doesn't work
> ---
>
> Key: MESOS-5129
> URL: https://issues.apache.org/jira/browse/MESOS-5129
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.29.0
>Reporter: wangqun
>
> Hi
> I try to test the feature of Supporting Container Images in Mesos 
> Containerizer according to 
> https://github.com/apache/mesos/blob/master/docs/container-image.md#test-it-out.
>  But it doesn't work. 
>I use the mesos 0.29 version.
> The following is my step:
> 1) sudo bin/mesos-master.sh --log_dir=/var/log/mesos --ip=9.5.124.139 
> --work_dir=/tmp/mesos/master
> 2) sudo bin/mesos-slave.sh --master=9.5.124.139:5050 --ip=9.5.124.139 
> --hostname=mesos --isolation=docker/runtime,filesystem/linux  
> --work_dir=/tmp/mesos/slave --log_dir=/var/log/mesos --image_providers=docker 
> --executor_environment_variables="{}"
> 3)sudo src/mesos-execute --master=9.5.124.139:5050 --name=test 
> --docker_image=library/redis  --shell=false
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0406 03:33:05.730432  5886 scheduler.cpp:157] 
> **
> Scheduler driver bound to loopback interface! Cannot communicate with remote 
> master(s). You might want to set 'LIBPROCESS_IP' environment variable to use 
> a routable IP address.
> **
> I0406 03:33:05.730623  5886 scheduler.cpp:172] Version: 0.29.0
> Subscribed with ID '79b6ed58-46a9-4760-a589-a28061f4f1e9-
> task test submitted to agent 7184bc3a-243c-4ca7-8897-c98e81836ed6-S1
> Received status update TASK_RUNNING for task test
> 4) sudo vim lt-mesos-slave.mesos.root.log.ERROR
> Command 'hadoop version 2>&1' failed; this is the output:
> sh: 1: hadoop: not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5129) Supporting Container Images in Mesos Containerizer doesn't work

2016-04-05 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227778#comment-15227778
 ] 

Fan Du commented on MESOS-5129:
---

The log speaks for itself, it has nothing to do with Mesos, your Hadoop env 
probabaly not correctly set.
{code}
Command 'hadoop version 2>&1' failed; this is the output:
sh: 1: hadoop: not found
{code}

Refer:
https://mail-archives.apache.org/mod_mbox/mesos-user/201511.mbox/%3c563acaf7.1030...@intel.com%3E

> Supporting Container Images in Mesos Containerizer doesn't work
> ---
>
> Key: MESOS-5129
> URL: https://issues.apache.org/jira/browse/MESOS-5129
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.29.0
>Reporter: wangqun
>
> Hi
> I try to test the feature of Supporting Container Images in Mesos 
> Containerizer according to 
> https://github.com/apache/mesos/blob/master/docs/container-image.md#test-it-out.
>  But it doesn't work. 
>I use the mesos 0.29 version.
> The following is my step:
> 1) sudo bin/mesos-master.sh --log_dir=/var/log/mesos --ip=9.5.124.139 
> --work_dir=/tmp/mesos/master
> 2) sudo bin/mesos-slave.sh --master=9.5.124.139:5050 --ip=9.5.124.139 
> --hostname=mesos --isolation=docker/runtime,filesystem/linux  
> --work_dir=/tmp/mesos/slave --log_dir=/var/log/mesos --image_providers=docker 
> --executor_environment_variables="{}"
> 3)sudo src/mesos-execute --master=9.5.124.139:5050 --name=test 
> --docker_image=library/redis  --shell=false
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0406 03:33:05.730432  5886 scheduler.cpp:157] 
> **
> Scheduler driver bound to loopback interface! Cannot communicate with remote 
> master(s). You might want to set 'LIBPROCESS_IP' environment variable to use 
> a routable IP address.
> **
> I0406 03:33:05.730623  5886 scheduler.cpp:172] Version: 0.29.0
> Subscribed with ID '79b6ed58-46a9-4760-a589-a28061f4f1e9-
> task test submitted to agent 7184bc3a-243c-4ca7-8897-c98e81836ed6-S1
> Received status update TASK_RUNNING for task test
> 4) sudo vim lt-mesos-slave.mesos.root.log.ERROR
> Command 'hadoop version 2>&1' failed; this is the output:
> sh: 1: hadoop: not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-31 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221062#comment-15221062
 ] 

Fan Du commented on MESOS-4981:
---

[~bmahler] & [~vinodkone]

How about not to distinguish {{messages_register_framework}} with 
{{messages_reregister_framework}} in such strict manner?
Update flow of {{subscribe}} by:
{code}
  1. bump messages_register_framework
  2. Various of sanity check
  3. Newborn framework?
 3a. Add new framework
 3b. Return
  4. Add messages_reregister_framework
  5. Otherwise framework is reregistering
 5a. Updating the framework
 5b. Return
{code}

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-31 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221056#comment-15221056
 ] 

Fan Du commented on MESOS-4492:
---

[~jieyu] I'm wondering if you have any cycles for the final review?
thanks!

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209672#comment-15209672
 ] 

Fan Du commented on MESOS-4981:
---

hmm, here is the scenario, I can explain :)
When framework call SUBSCRIBE, it could be register a newborn framework, or it 
could also possibly updating(reregistering) a framework.
For {{subscribe}} the flow is:
{code}
  1. bump messages_register_framework
  2. Various of sanity check
  3. Newborn framework?
 3a. Add new framework
 3b. Return
  4. Roll back messages_register_framework, and add 
messages_reregister_framework
  5. Otherwise framework is reregistering
 5a. Updating the framework
 5b. Return
{code}


That's why I ask two questions above:
q1. Does metrics has to counter fail cases like sanity check? If no, we can 
fairly bump the metrics when we are sure it's a good/clean operation
in 3a, and 5a. But from the conventions how other metrics are countered, 
metrics includes all other fail cases like sanity check.
q2. Is it ok to update messages_register_framework, even though it's already 
know the operation should bump messages_reregister_framework?
that's being said, do not need to roll back messages_register_framework 
again?



> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209584#comment-15209584
 ] 

Fan Du commented on MESOS-4981:
---

[~bmahler] May I have your comments here? then I can move forward on this 
ticket.

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209572#comment-15209572
 ] 

Fan Du commented on MESOS-4492:
---

Done! Thanks for [~greggomann] and [~jieyu] to spend time to review.

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-22 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205960#comment-15205960
 ] 

Fan Du commented on MESOS-4981:
---

[~bbannier] Thanks for the quick review! :)
[~bmahler] Actually I have two questions here first:
1. Do we need to bump the metrics for failure cases of operation, e.g. 
parameter sanity checks, authentication/authorization?
2. For the case of this ticket,  we handle {{registerFramework}} and 
{{reregisterFramework}} together in 
{{[subscribe|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/master/master.cpp;h=e6290ea686ccf17813d6faeaf2f2012f79cf3b7f;hb=HEAD#l2256]}},
 do we need to differentiate the metrics of  {{registerFramework}} and 
{{reregisterFramework}} strictly?

If "yes" to above two questions, IMO, we DO need  Counter to be decremented for 
above case, to accommodate for the implementation :)
I didn't know  [~wangcong] has submit [r44473 | 
https://reviews.apache.org/r/44473/], I think it could be beneficial at least 
to my case here.
Here is my understanding about Counter and Gauge, though we didn't 
differentiate them in Linux kernel. Use Counter for events or messages, and use 
Gauge to get a snapshot of Resources by its name and meaning. It lost the 
semantics if switching them over.


> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-22 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205908#comment-15205908
 ] 

Fan Du commented on MESOS-4492:
---

[~greggomann] Any further comments about the review? :)

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-22 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205905#comment-15205905
 ] 

Fan Du commented on MESOS-4981:
---

[~anandmazumdar] Thanks, I have added [~vinodkone] as reviewer.

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. We should correctly be incrementing these counters for PID based 
> frameworks as was the case previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-21 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887
 ] 

Fan Du edited comment on MESOS-4981 at 3/21/16 8:34 AM:


[~anandmazumdar] I happened to look a deep look at this, here is the fix works 
on my env.
Please review:
https://reviews.apache.org/r/45096
https://reviews.apache.org/r/45097


was (Author: fan.du):
[~anandmazumdar] I happened to look a deep look at this, here is the fix works 
on my env.
Please review:
https://reviews.apache.org/r/45094/ 

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. Either, we should think about adding new counter(s) for 
> {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the 
> existing code to correctly increment the counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-21 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887
 ] 

Fan Du edited comment on MESOS-4981 at 3/21/16 8:19 AM:


[~anandmazumdar] I happened to look a deep look at this, here is the fix works 
on my env.
Please review:
https://reviews.apache.org/r/45094/ 


was (Author: fan.du):
[~anandmazumdar] I happened to look a deep look at this, here is fix works on 
my env.
Please review:
https://reviews.apache.org/r/45094/ 

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. Either, we should think about adding new counter(s) for 
> {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the 
> existing code to correctly increment the counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-21 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203887#comment-15203887
 ] 

Fan Du commented on MESOS-4981:
---

[~anandmazumdar] I happened to look a deep look at this, here is fix works on 
my env.
Please review:
https://reviews.apache.org/r/45094/ 

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. Either, we should think about adding new counter(s) for 
> {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the 
> existing code to correctly increment the counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-21 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du reassigned MESOS-4981:
-

Assignee: Fan Du

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>Assignee: Fan Du
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. Either, we should think about adding new counter(s) for 
> {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the 
> existing code to correctly increment the counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4955) Generize perf event parsing to match PerfStatistics filed name for "perf stat"

2016-03-19 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199148#comment-15199148
 ] 

Fan Du commented on MESOS-4955:
---

Really sweet, this is exactly what I need.
thanks for the point.

> Generize perf event parsing to match PerfStatistics filed name for "perf stat"
> --
>
> Key: MESOS-4955
> URL: https://issues.apache.org/jira/browse/MESOS-4955
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current 
> [design|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/mesos.proto;h=deb9c0910a27afd67276f54b3f666a878212727b;hb=HEAD#l981]
>  does not support event like:
> {{SUBSYS/EVENT  <- Most notable intel_cqm/llc_occupancy/}}
> {{SUSSYS:EVENT  <- All Tracepoint event}}
> This gap could be fulfilled with a bit by matching EVENT with PerfStatistics 
> Proto Message name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4955) Generize perf event parsing to match PerfStatistics filed name for "perf stat"

2016-03-15 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196627#comment-15196627
 ] 

Fan Du commented on MESOS-4955:
---

Here posted the RFC review request to evaluate whether this ticket is 
worthwhile to pursue further more:
https://reviews.apache.org/r/44881/

btw, currently I use {{intel_cqm/llc_occupancy/}} and 
{{sched:intel_cqm/llc_occupancy/}} as an example only, other event could be 
easily expended later on.

> Generize perf event parsing to match PerfStatistics filed name for "perf stat"
> --
>
> Key: MESOS-4955
> URL: https://issues.apache.org/jira/browse/MESOS-4955
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current 
> [design|https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/mesos.proto;h=deb9c0910a27afd67276f54b3f666a878212727b;hb=HEAD#l981]
>  does not support event like:
> {{SUBSYS/EVENT  <- Most notable intel_cqm/llc_occupancy/}}
> {{SUSSYS:EVENT  <- All Tracepoint event}}
> This gap could be fulfilled with a bit by matching EVENT with PerfStatistics 
> Proto Message name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-03-13 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192782#comment-15192782
 ] 

Fan Du commented on MESOS-4705:
---

Hi Benjiamin, could you please review the updated RR? thanks for your time!

https://reviews.apache.org/r/44379/

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4753) Add executor state when reporting resource usage

2016-03-09 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-4753:
--
Component/s: (was: slave)
 oversubscription

> Add executor state when reporting resource usage
> 
>
> Key: MESOS-4753
> URL: https://issues.apache.org/jira/browse/MESOS-4753
> Project: Mesos
>  Issue Type: Improvement
>  Components: oversubscription, statistics
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> Slave reports resource usage of each executor for resource estimator to feed 
> master with revocable resource,  it's better to append executor state as well 
> when reporting usage, which in turn resource estimator would easily focus on 
> the *RUNNING* executor only.
> it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible 
> not sync up with the resource usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-03-09 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186741#comment-15186741
 ] 

Fan Du commented on MESOS-4705:
---

I have another thought, look at the perf stat format in differenct kernel 
version, it could be either of those:
1. value,event,cgroup
2. value,unit,event,cgroup
3. value,unit,event,cgroup,running,ratio

For old kernel version maintained by OS vendors, the perf stat output elements 
don't change its order anyway,
it only append new elements at the end. So why not drop meaningless kernel 
version checking, just take the needed elements as 
{code}
 if tokens.size = 3
   return tokens[0] tokens[1] tokens[2]

 if tokens.size = 4 or tokens.size = 6
   return tokens[0] tokens[2] tokens[3]
{code}

[~bmahler] and [~haosdent] any comments?

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-06 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182695#comment-15182695
 ] 

Fan Du commented on MESOS-4492:
---

[~greggomann] I saw this ticket is not accepted by committer so far, could you 
pls help to do that, and then I can update the JIRA workflow. And one more 
question, what do I need to do before [~jieyu] merge the patch since you have 
"ship it"?

Thanks a lot for your reviewing :)

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-03-03 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179476#comment-15179476
 ] 

Fan Du commented on MESOS-4705:
---

Hi, [~bmahler] This is a follow-up bug fix for  
[MESOS-2834|https://issues.apache.org/jira/browse/MESOS-2834], am I wondering 
if you could shepherd this issue with an easy fix I posted above? :)

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-03-03 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179431#comment-15179431
 ] 

Fan Du commented on MESOS-4705:
---

Here comes the RR to fix this:
https://reviews.apache.org/r/44379/

And I'm looking for shepherd to review this...

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4846) Add Memory Bandwidth Monitoring (MBM) perf support

2016-03-02 Thread Fan Du (JIRA)
Fan Du created MESOS-4846:
-

 Summary: Add Memory Bandwidth Monitoring (MBM) perf support
 Key: MESOS-4846
 URL: https://issues.apache.org/jira/browse/MESOS-4846
 Project: Mesos
  Issue Type: Improvement
  Components: oversubscription, statistics
Reporter: Fan Du
Assignee: Fan Du


This ticket will track the support of Intel Memory Bandwidth Monitoring (MBM) 
for current PerfStatistics, the per task memory bandwidth usage will be 
analyzed by QoS controller to make better corrections decision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-01 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168583#comment-15168583
 ] 

Fan Du edited comment on MESOS-4492 at 3/2/16 5:16 AM:
---

Here goes the RR: (Discarded)
https://reviews.apache.org/r/44058/

Updated RR with document fix and test code addon:
https://reviews.apache.org/r/44255/



was (Author: fan.du):
Here goes the RR:
https://reviews.apache.org/r/44058/

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-25 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168587#comment-15168587
 ] 

Fan Du commented on MESOS-4492:
---

Thanks for the kind notice :)

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4753) Add executor state when reporting resource usage

2016-02-24 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-4753:
--
Description: 
Slave reports resource usage of each executor for resource estimator to feed 
master with revocable resource,  it's better to append executor state as well 
when reporting usage, which in turn resource estimator would easily focus on 
the *RUNNING* executor only.

it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible 
not sync up with the resource usage. 

  was:
Slave reports resource usage of each executor for resource estimator to feed 
master with revocable resource,  it's better to append executor state as well 
when reporting usage, which in turn resource estimator would easily focus on 
the *RUNNING* executor only.

it's possible to call {code} Slave:: getExecutor {code} in estimator, but it's 
possible not sync up with the resource usage. 


> Add executor state when reporting resource usage
> 
>
> Key: MESOS-4753
> URL: https://issues.apache.org/jira/browse/MESOS-4753
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave, statistics
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> Slave reports resource usage of each executor for resource estimator to feed 
> master with revocable resource,  it's better to append executor state as well 
> when reporting usage, which in turn resource estimator would easily focus on 
> the *RUNNING* executor only.
> it's possible to call {{Slave:: getExecutor}} in estimator, but it's possible 
> not sync up with the resource usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4753) Add executor state when reporting resource usage

2016-02-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160334#comment-15160334
 ] 

Fan Du commented on MESOS-4753:
---

[~nnielsen] IMHO, resource estimator and qos controller in serenity needs to 
count RUNNING executor resource usage only, I'm thinking about this change, and 
then will enhance serenity age filter, may I get some comments for you? :)

> Add executor state when reporting resource usage
> 
>
> Key: MESOS-4753
> URL: https://issues.apache.org/jira/browse/MESOS-4753
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave, statistics
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> Slave reports resource usage of each executor for resource estimator to feed 
> master with revocable resource,  it's better to append executor state as well 
> when reporting usage, which in turn resource estimator would easily focus on 
> the *RUNNING* executor only.
> it's possible to call {code} Slave:: getExecutor {code} in estimator, but 
> it's possible not sync up with the resource usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160322#comment-15160322
 ] 

Fan Du edited comment on MESOS-4492 at 2/24/16 7:44 AM:


[~jieyu]  after reviewing the [code | 
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/master/master.cpp;h=8d6d3c6468c6b85fe09c33cf9747cc3d1f515ab9;hb=HEAD#l3027]
 here, I would like to fill the gap, and I'm wondering if you could review the 
ticket?
thanks


was (Author: fan.du):
[~jieyu]  after reviewing the code here, I would like to fill the gap, and I'm 
wondering if you could review the ticket?
thanks

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-02-23 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160322#comment-15160322
 ] 

Fan Du commented on MESOS-4492:
---

[~jieyu]  after reviewing the code here, I would like to fill the gap, and I'm 
wondering if you could review the ticket?
thanks

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4753) Add executor state when reporting resource usage

2016-02-23 Thread Fan Du (JIRA)
Fan Du created MESOS-4753:
-

 Summary: Add executor state when reporting resource usage
 Key: MESOS-4753
 URL: https://issues.apache.org/jira/browse/MESOS-4753
 Project: Mesos
  Issue Type: Improvement
  Components: slave, statistics
Reporter: Fan Du
Assignee: Fan Du
Priority: Minor


Slave reports resource usage of each executor for resource estimator to feed 
master with revocable resource,  it's better to append executor state as well 
when reporting usage, which in turn resource estimator would easily focus on 
the *RUNNING* executor only.

it's possible to call {code} Slave:: getExecutor {code} in estimator, but it's 
possible not sync up with the resource usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-02-18 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153540#comment-15153540
 ] 

Fan Du commented on MESOS-4705:
---

Lots of China local cloud service provider still use 2.6.32 kernel as we have 
supported.
It's easy to catch any exception in the last step anyway.


> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4705) Slave failed to sample container with perf event

2016-02-18 Thread Fan Du (JIRA)
Fan Du created MESOS-4705:
-

 Summary: Slave failed to sample container with perf event
 Key: MESOS-4705
 URL: https://issues.apache.org/jira/browse/MESOS-4705
 Project: Mesos
  Issue Type: Bug
  Components: cgroups, isolation
Affects Versions: 0.27.1
Reporter: Fan Du
Assignee: Fan Du


When sampling container with perf event on Centos7 with kernel 
3.10.0-123.el7.x86_64, slave complained with below error spew:

{code}
E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line 
'25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
 Unexpected number of fields
{code}

it's caused by the current perf format [assumption | 
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
 with kernel version below 3.12 

On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
value,unit,event,cgroup,running,ratio

A local modification fixed this error on my test bed, please review this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} offer operation

2016-01-24 Thread Fan Du (JIRA)
Fan Du created MESOS-4492:
-

 Summary: Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} 
offer operation
 Key: MESOS-4492
 URL: https://issues.apache.org/jira/browse/MESOS-4492
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Fan Du
Assignee: Fan Du
Priority: Minor


This ticket aims to enable user or operator to inspect operation statistics 
such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-01-24 Thread Fan Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Du updated MESOS-4492:
--
Summary: Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer 
operation  (was: Add metrics for {RESERVE, UNRESERVE} and {CREATE DESTROY} 
offer operation)

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-19 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108147#comment-15108147
 ] 

Fan Du commented on MESOS-4389:
---

Based on the code review, it's by design, it doesn't matter much though to use 
it.
Just a random puzzle :)

> Master "roles" endpoint only shows active role
> --
>
> Key: MESOS-4389
> URL: https://issues.apache.org/jira/browse/MESOS-4389
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, master
>Reporter: Fan Du
>
> Register two slaves to master with role "busybox" and "ubuntu" respectively, 
> then running marthon with role "busybox", after this check master "roles" 
> endpoints, it can only get default and active role, could this be improved to 
> show all available roles for easily checking?
> {code}
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> },
> {
> "frameworks": [
> "2caebb14-161f-4941-b8ab-8990cef01ac0-"
> ],
> "name": "busybox",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-19 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108142#comment-15108142
 ] 

Fan Du commented on MESOS-4339:
---

[~adam-mesos] and [~bbannier]
Based on the proposal documentation from MESOS-4284, it's well justified to 
enable weighted DRF framework sorter in a multi-role scenario, to keep the 
allocation decision fair across roles and frameworks. Although the work to 
support weighted DRF framework sorter is independent with that of multi-role 
frameworks in its design logic(which is what I thought before incompletely) 
but, the former needed to be done *AFTER* multi-role frameworks apparently in 
implementation.

So I'm wondering if you don't mind, I would still like to contribute this 
ticket to multi-role frameworks.

> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-15 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101514#comment-15101514
 ] 

Fan Du commented on MESOS-4389:
---

Thanks for notice of impcicit role, I will give it a try.
The two slaves is configured with default role(busybox, ubuntu) respectively, 
master has not set any {{roles}} in command line. I realized when doing so, it 
will become role's on the whitelist,
which means it will show up when querying the roles endpoint.

> Master "roles" endpoint only shows active role
> --
>
> Key: MESOS-4389
> URL: https://issues.apache.org/jira/browse/MESOS-4389
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, master
>Reporter: Fan Du
>
> Register two slaves to master with role "busybox" and "ubuntu" respectively, 
> then running marthon with role "busybox", after this check master "roles" 
> endpoints, it can only get default and active role, could this be improved to 
> show all available roles for easily checking?
> {code}
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> },
> {
> "frameworks": [
> "2caebb14-161f-4941-b8ab-8990cef01ac0-"
> ],
> "name": "busybox",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-14 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101267#comment-15101267
 ] 

Fan Du commented on MESOS-4339:
---

Thanks for your kind reminder, I got it :)

> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-14 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101251#comment-15101251
 ] 

Fan Du commented on MESOS-4339:
---

You understand my intention clearly, and thanks for in-depth comments of the 
background.
As for why doing this, user scenario of making frameworks be able to prioritize 
each other within a role should be supported, just as the rationale of weigthed 
roles. The veto is based on the depoly assumption that one role could only have 
exactly one framework attached to it, I'm not sure how this is going to change 
after MESOS-4284. Anyway please add more comments.

Here are my early thoughts about weighted framework sorter should 
support/respect:
* Respect framework reregistration for weight update
* Need an operator endpoint for dynamic reweighting(I didn't mention this in 
the ticket's 
  description, though it's already in my mind)
* In presence of multi-role frameworks, a per-role weight style makes more sense

[MESOS-4284|https://issues.apache.org/jira/browse/MESOS-4284] has a design 
proposal published yesterday, which I need to dive into first to understand 
possible concerns from [~bbannier].



> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4339) Add weight support for framework sorter

2016-01-14 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101267#comment-15101267
 ] 

Fan Du edited comment on MESOS-4339 at 1/15/16 6:12 AM:


Thanks for your kind reminder, I got it :)
It seems I can't switch it back to OPEN...


was (Author: fan.du):
Thanks for your kind reminder, I got it :)

> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>Assignee: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4389) Master "roles" endpoint only shows active role

2016-01-14 Thread Fan Du (JIRA)
Fan Du created MESOS-4389:
-

 Summary: Master "roles" endpoint only shows active role
 Key: MESOS-4389
 URL: https://issues.apache.org/jira/browse/MESOS-4389
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API, master
Reporter: Fan Du


Register two slaves to master with role "busybox" and "ubuntu" respectively, 
then running marthon with role "busybox", after this check master "roles" 
endpoints, it can only get default and active role, could this be improved to 
show all available roles for easily checking?

{code}
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"2caebb14-161f-4941-b8ab-8990cef01ac0-"
],
"name": "busybox",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4339) Add weight support for framework sorter

2016-01-12 Thread Fan Du (JIRA)
Fan Du created MESOS-4339:
-

 Summary: Add weight support for framework sorter
 Key: MESOS-4339
 URL: https://issues.apache.org/jira/browse/MESOS-4339
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Fan Du


Current framework sorter doesn't take into account of weights when sorting 
framework belonging to a particular role, i.e., all frameworks has equal 
weights as 1. Considering the role weight is controlled by the operator, enable 
the framework weight does not impact the role level allocation decision from 
any greedy frameworks, but it will be beneficial to some framework who could 
get more resources within a specific role.

The framework weight will come from message FrameworkInfo when it got 
registered, and FrameworkSorters will "add" framework with weight,
this will eventually result a weighted framework sorting flow when master make 
the finally allocation decision.

Please review this ticket which I will work on if it's considered acceptable.
Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-12 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093592#comment-15093592
 ] 

Fan Du commented on MESOS-4339:
---

Role sorter is weighted DRF, framework sorter DRF without weight.

When add a new framework with a role, role sorter and framework sorter will 
come into play:
(I am not sure whether Mesos community curtesy allows to paste code snippet)

void HierarchicalAllocatorProcess::addFramework(
const FrameworkID& frameworkId,
const FrameworkInfo& frameworkInfo,
const hashmap& used)
{
  CHECK(initialized);

  const string& role = frameworkInfo.role();

  // If this is the first framework to register as this role,
  // initialize state as necessary.
  if (!activeRoles.contains(role)) {
activeRoles[role] = 1;
roleSorter->add(role, roleWeight(role));
frameworkSorters[role] = frameworkSorterFactory();
  } else {
activeRoles[role]++;
  }

  CHECK(!frameworkSorters[role]->contains(frameworkId.value()));
  frameworkSorters[role]->add(frameworkId.value());






> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-12 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093812#comment-15093812
 ] 

Fan Du commented on MESOS-4339:
---

bq. since all the weights inside a role are identical, right?

for current implementation, yes.
it will behave just as weighted role if we add weight when adding new framework.

My understanding about current allocation behavior is a triple iteration
as following:

[HierarchicalAllocatorProcess::allocate|https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1254]
* Foreach Slave in the Slaves Vector
** Foreach Role sorted by rolesorter with role weights
*** Foreach Framework sorted by frameworksorter with identical weights within 
the same role

The intention of this ticket is enable Framework sorted by weights, i.e. the 
last iteration. I think this is where we saw differently.
Please correct me if I missed somewhere else.

bq. Also, currently frameworks can only have a single role.

Yes, but temporally.
It will be changed by 
[MESOS-1763|https://issues.apache.org/jira/browse/MESOS-1763]





> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4339) Add weight support for framework sorter

2016-01-12 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095460#comment-15095460
 ] 

Fan Du commented on MESOS-4339:
---

Of course the sorting is supported ever since the DRF sorter is created.
but the framework sorter instance *NEVER* use it.

In addition this ticket involves minimal clean change to the current design, 
whileas modification of MESOS-4284 is quite invasive.

I didn't see any obivous reason why this ticket should be postponed util 
MESOS-4284, they are unrelated to each other at high level design and 
functionality, please elaborate more of the story behind your point of view.



> Add weight support for framework sorter
> ---
>
> Key: MESOS-4339
> URL: https://issues.apache.org/jira/browse/MESOS-4339
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Fan Du
>
> Current framework sorter doesn't take into account of weights when sorting 
> framework belonging to a particular role, i.e., all frameworks has equal 
> weights as 1. Considering the role weight is controlled by the operator, 
> enable the framework weight does not impact the role level allocation 
> decision from any greedy frameworks, but it will be beneficial to some 
> framework who could get more resources within a specific role.
> The framework weight will come from message FrameworkInfo when it got 
> registered, and FrameworkSorters will "add" framework with weight,
> this will eventually result a weighted framework sorting flow when master 
> make the finally allocation decision.
> Please review this ticket which I will work on if it's considered acceptable.
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2016-01-11 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093477#comment-15093477
 ] 

Fan Du commented on MESOS-3765:
---

Sure, will do.


> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2016-01-11 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093455#comment-15093455
 ] 

Fan Du commented on MESOS-3765:
---

[~gyliu] The proposal document states "DRF will be disabled with Fine-Grained 
Resource Offers." , I am wondering why fine grained offer should bypass WDRF in 
practice? 

By my understanding, impliments fine grained offer fits well inside current 
WDRF logic, because of current allocation behavior:
Foreach Slave
   Foreach Role
 Foreach Framework within the role
 compute agent resources of revocable case OR
 compute agent resources of non-revocable case  <- (*A)
 offer the agent resources to current framework <- (*B)

Each slave will grant at most one time allocation offer for the first framework 
within a role, if there is no revocalbe frameworks;
Each slave will grant at most two times allocations offer for one non-revocable 
and one revocalbe framework.

If we apply granuality between (*A) and (*B), it would be perfet to make loops 
to iterate remaining framworks,
the the goal to spread agent resource between frameworks is done.







> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)