[jira] [Commented] (MESOS-5700) Benchmark for Resource class (protobuf vs. C++)

2016-06-29 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356374#comment-15356374
 ] 

Klaus Ma commented on MESOS-5700:
-

Will add more benchmark for {{operator+(...)}}, according to the output of 
{{calgrind}}, CopyFrom is the heavy operators.

> Benchmark for Resource class (protobuf vs. C++)
> ---
>
> Key: MESOS-5700
> URL: https://issues.apache.org/jira/browse/MESOS-5700
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> Add benchmark of Resource class for Allocation Performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4770) Investigate performance improvements for 'Resources' class.

2016-06-29 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356373#comment-15356373
 ] 

Klaus Ma commented on MESOS-4770:
-

Add some benchmark for Resources operators; i'll hep to evaluate  the 
improvement.

> Investigate performance improvements for 'Resources' class.
> ---
>
> Key: MESOS-4770
> URL: https://issues.apache.org/jira/browse/MESOS-4770
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> Currently we have some performance issues when we have heavy usage of the 
> {{Resources}} class. Currently, we tend to work around these issues (e.g. 
> reduce the amount of Resources arithmetic operations in the caller code).
> The implementation of {{Resources}} currently consists of wrapping underlying 
> {{Resource}} protobuf objects and manipulating them. This is fairly expensive 
> compared to doing things more directly with C++ objects.
> This ticket is to explore the performance improvements of using C++ objects 
> more directly instead of working off of {{Resource}} objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5750) Implement GET_EXECUTORS Call in v1 master API.

2016-06-29 Thread haosdent (JIRA)
haosdent created MESOS-5750:
---

 Summary: Implement GET_EXECUTORS Call in v1 master API.
 Key: MESOS-5750
 URL: https://issues.apache.org/jira/browse/MESOS-5750
 Project: Mesos
  Issue Type: Task
Reporter: haosdent
Assignee: Abhishek Dasgupta
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5750) Implement GET_EXECUTORS Call in v1 master API.

2016-06-29 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-5750:
---

Assignee: haosdent  (was: Abhishek Dasgupta)

> Implement GET_EXECUTORS Call in v1 master API.
> --
>
> Key: MESOS-5750
> URL: https://issues.apache.org/jira/browse/MESOS-5750
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: haosdent
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5749) Have maven run in batch mode

2016-06-29 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356339#comment-15356339
 ] 

Charles Allen commented on MESOS-5749:
--

patch incoming through reviewboard shortly

> Have maven run in batch mode
> 
>
> Key: MESOS-5749
> URL: https://issues.apache.org/jira/browse/MESOS-5749
> Project: Mesos
>  Issue Type: Improvement
>  Components: java api
>Reporter: Charles Allen
>Priority: Minor
>
> Currently when the Makefile invokes maven, it does not use the -B flag. This 
> ask is to have maven use the -B flag to make it friendly for automated build 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-29 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356334#comment-15356334
 ] 

Till Toenshoff commented on MESOS-5724:
---

Sorry, forgot to set it "in progress". 

> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.23.0, 0.23.1, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 
> 0.26.0, 0.26.1, 0.27.0, 0.27.1, 0.27.2, 0.27.3, 0.28.0, 0.28.1, 0.28.2, 1.0.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
> Fix For: 1.0.0
>
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5735) Update WebUI to use v1 operator API

2016-06-29 Thread zhou xing (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356290#comment-15356290
 ] 

zhou xing commented on MESOS-5735:
--

[~vinodkone]
• WebUI may connect to a follower master and get /state from that master, in 
the code, it will use {{state.leader == state.pid}} to judge whether it is 
connecting to the leader master, if not, it will redirect. 
• For {{log_dir}} and {{external_log_dir}}, I think we can also get from 
GetFlags?  
• If we only want to only include the mutable stuff in GetState, I think client 
like WebUI can be changed to call GetVersion/GetFlags/GetMaster respectively. 
Considering that WebUI is periodically calling /state endpoint, I tend to think 
that keep GetState simple would be better.

> Update WebUI to use v1 operator API
> ---
>
> Key: MESOS-5735
> URL: https://issues.apache.org/jira/browse/MESOS-5735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: zhou xing
>
> Having the WebUI use the v1 API would be a good validation of it's usefulness 
> and correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5748) Potential segfault in `link` and `send` when linking to a remote process

2016-06-29 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356216#comment-15356216
 ] 

Joseph Wu edited comment on MESOS-5748 at 6/30/16 1:11 AM:
---

Backport-able fix: https://reviews.apache.org/r/49416/
Complete fix + cleanup: https://reviews.apache.org/r/49404/


was (Author: kaysoky):
A fix: https://reviews.apache.org/r/49404/

> Potential segfault in `link` and `send` when linking to a remote process
> 
>
> Key: MESOS-5748
> URL: https://issues.apache.org/jira/browse/MESOS-5748
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere
> Fix For: 1.0.0
>
>
> There is a race in the SocketManager, between a remote {{link}} and 
> disconnection of the underlying socket.
> We potentially segfault here: 
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512
> {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
> object.  However, the code above this line actually has ownership of the 
> pointer:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499
> If the socket dies during the link, the {{ignore_recv_data}} may delete the 
> Socket underneath {{link}}:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411
> 
> The same race exists for {{send}}.
> This race was discovered while running a new test in repetition:
> https://reviews.apache.org/r/49175/
> On OSX, I hit the race consistently every 500-800 repetitions:
> {code}
> 3rdparty/libprocess/libprocess-tests 
> --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
> --gtest_repeat=1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5737) Expose Executor PID in containers endpoint

2016-06-29 Thread Haris Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356278#comment-15356278
 ] 

Haris Choudhary commented on MESOS-5737:


https://reviews.apache.org/r/49414/

> Expose Executor PID in containers endpoint
> --
>
> Key: MESOS-5737
> URL: https://issues.apache.org/jira/browse/MESOS-5737
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Haris Choudhary
>Assignee: Haris Choudhary
>Priority: Minor
>
> In order to greatly simplify the implementation for the redesigned Mesos 
> CLI's container plugin, we need the executor PID (Process ID) to be exposed 
> in the /containers endpoint. [Mesos CLI Epic | 
> https://issues.apache.org/jira/browse/MESOS-5676]
> This change will introduce the pid for an executor if it was launched by the 
> mesos containerizer.
> We need this PID for setns() calls to enter the container namespace for 
> commands such as container execute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356273#comment-15356273
 ] 

Sunzhe edited comment on MESOS-5742 at 6/30/16 1:07 AM:


Yes, I see. Because I noticed many people have implemented GPU resources 
without what you have done. They did it only with {{--resources=}} and other 
flags when starting agents. I wonder whether it can influence the existing 
mechanism.

As you said, the existing mechanism need to set the 
{{\-\-isolation=gpu/nvidia}} flag, so the {{--resources=}} is limited to set 
fraction. Yes. you are right.

Thank you for your reply.


was (Author: sunzhe):
Yes, I see. Because I noticed many people have implemented GPU resources 
without what you have done. They did it only with {{--resources=}} and others 
flags when starting agents. I wonder whether it can influence the existing 
mechanism.

As you said, the existing mechanism need to set the 
{{\-\-isolation=gpu/nvidia}} flag, so the {{--resources=}} is limited to set 
fraction. Yes. you are right.

Thank you for your reply.

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356273#comment-15356273
 ] 

Sunzhe edited comment on MESOS-5742 at 6/30/16 1:06 AM:


Yes, I see. Because I noticed many people have implemented GPU resources 
without what you have done. They did it only with {{--resources=}} and others 
flags when starting agents. I wonder whether it can influence the existing 
mechanism.

As you said, the existing mechanism need to set the 
{{\-\-isolation=gpu/nvidia}} flag, so the {{--resources=}} is limited to set 
fraction. Yes. you are right.

Thank you for your reply.


was (Author: sunzhe):
Yes, I see. Because I noticed many people have implemented GPU resources 
without what you have done. They did it only with {{--resources=}} and others 
flags when starting agents. I wonder whether it can influence the existing 
mechanism.

As you said, the existing mechanism need to set the {{--isolation=gpu/nvidia}} 
flag, so the {{--resources=}} is limited to set fraction. Yes. you are right.

Thank you for your reply.

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356273#comment-15356273
 ] 

Sunzhe commented on MESOS-5742:
---

Yes, I see. Because I noticed many people have implemented GPU resources 
without what you have done. They did it only with {{--resources=}} and others 
flags when starting agents. I wonder whether it can influence the existing 
mechanism.

As you said, the existing mechanism need to set the {{--isolation=gpu/nvidia}} 
flag, so the {{--resources=}} is limited to set fraction. Yes. you are right.

Thank you for your reply.

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5735) Update WebUI to use v1 operator API

2016-06-29 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356243#comment-15356243
 ] 

Vinod Kone commented on MESOS-5735:
---

Thanks for the investigation [~dongdong]!

1. I think state.leader and state.pid are not relevant because /state sends a 
redirect if it's not the leader. `start_time` and `elected_time` can be gotten 
from GetMaster. `version` and `build_time` can be gotten from GetVersion. 
`clustername` can be gotten from GetFlags. Another option would be to include 
GetMaster, GetVersion and GetFlags in GetState.

2. Yes, I'll send a review for GetState on agent as well.

5. /containers or GetContainers should have all the information included in 
/monitor/statistics.

> Update WebUI to use v1 operator API
> ---
>
> Key: MESOS-5735
> URL: https://issues.apache.org/jira/browse/MESOS-5735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: zhou xing
>
> Having the WebUI use the v1 API would be a good validation of it's usefulness 
> and correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356238#comment-15356238
 ] 

Kevin Klues edited comment on MESOS-5742 at 6/30/16 12:20 AM:
--

If you don't also set the {{\-\-isolators=gpu/nvidia}} flag, then you can set 
{{\-\-resources="gpus:   "}} to whatever you want. The idea being that you only 
want the semantics enforced by the {{gpu/nvidia}} isolator (i.e. that only 
*non-fractional* GPUs can be specified) when the {{gpu/nvidia}} isolator is 
enabled.  Some other isolator might be built in the future that allows 
fractional GPUs.


was (Author: klueska):
If you don't also set the {{--isolators=gpu/nvidia}} flag, then you can set 
{{--resources="gpus:   "}} to whatever you want. The idea being that you only 
want the semantics enforced by the {{gpu/nvidia}} isolator (i.e. that only 
*non-fractional* GPUs can be specified) when the {{gpu/nvidia}} isolator is 
enabled.  Some other isolator might be built in the future that allows 
fractional GPUs.

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356238#comment-15356238
 ] 

Kevin Klues commented on MESOS-5742:


If you don't also set the {{--isolators=gpu/nvidia}} flag, then you can set 
{{--resources="gpus:   "}} to whatever you want. The idea being that you only 
want the semantics enforced by the {{gpu/nvidia}} isolator (i.e. that only 
*non-fractional* GPUs can be specified) when the {{gpu/nvidia}} isolator is 
enabled.  Some other isolator might be built in the future that allows 
fractional GPUs.

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356217#comment-15356217
 ] 

Sunzhe commented on MESOS-5742:
---

cc [~klueska] [~bmahler]
Is that right with {{--resources='gpu:1.2'}}?

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5748) Potential segfault in `link` and `send` when linking to a remote process

2016-06-29 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-5748:


Assignee: Joseph Wu

> Potential segfault in `link` and `send` when linking to a remote process
> 
>
> Key: MESOS-5748
> URL: https://issues.apache.org/jira/browse/MESOS-5748
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere
> Fix For: 1.0.0
>
>
> There is a race in the SocketManager, between a remote {{link}} and 
> disconnection of the underlying socket.
> We potentially segfault here: 
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512
> {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
> object.  However, the code above this line actually has ownership of the 
> pointer:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499
> If the socket dies during the link, the {{ignore_recv_data}} may delete the 
> Socket underneath {{link}}:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411
> 
> The same race exists for {{send}}.
> This race was discovered while running a new test in repetition:
> https://reviews.apache.org/r/49175/
> On OSX, I hit the race consistently every 500-800 repetitions:
> {code}
> 3rdparty/libprocess/libprocess-tests 
> --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
> --gtest_repeat=1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5748) Potential segfault in `link` and `send` when linking to a remote process

2016-06-29 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5748:
-
Description: 
There is a race in the SocketManager, between a remote {{link}} and 
disconnection of the underlying socket.

We potentially segfault here: 
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

{{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
object.  However, the code above this line actually has ownership of the 
pointer:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

If the socket dies during the link, the {{ignore_recv_data}} may delete the 
Socket underneath {{link}}:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411


The same race exists for {{send}}.

This race was discovered while running a new test in repetition:
https://reviews.apache.org/r/49175/

On OSX, I hit the race consistently every 500-800 repetitions:
{code}
3rdparty/libprocess/libprocess-tests 
--gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
--gtest_repeat=1000
{code}

  was:
There is a race the SocketManager, between a remote {{link}} and disconnection 
of the underlying socket.

We potentially segfault here: 
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

{{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
object.  However, the code above this line actually has ownership of the 
pointer:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

If the socket dies during the link, the {{ignore_recv_data}} may delete the 
Socket underneath {{link}}:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411


The same race exists for {{send}}.

This race was discovered while running a new test in repetition:
https://reviews.apache.org/r/49175/

On OSX, I hit the race consistently every 500-800 repetitions:
{code}
3rdparty/libprocess/libprocess-tests 
--gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
--gtest_repeat=1000
{code}


> Potential segfault in `link` and `send` when linking to a remote process
> 
>
> Key: MESOS-5748
> URL: https://issues.apache.org/jira/browse/MESOS-5748
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Joseph Wu
>  Labels: libprocess, mesosphere
> Fix For: 1.0.0
>
>
> There is a race in the SocketManager, between a remote {{link}} and 
> disconnection of the underlying socket.
> We potentially segfault here: 
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512
> {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
> object.  However, the code above this line actually has ownership of the 
> pointer:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499
> If the socket dies during the link, the {{ignore_recv_data}} may delete the 
> Socket underneath {{link}}:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411
> 
> The same race exists for {{send}}.
> This race was discovered while running a new test in repetition:
> https://reviews.apache.org/r/49175/
> On OSX, I hit the race consistently every 500-800 repetitions:
> {code}
> 3rdparty/libprocess/libprocess-tests 
> --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
> --gtest_repeat=1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5748) Potential segfault in `link` and `send` when linking to a remote process

2016-06-29 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-5748:


 Summary: Potential segfault in `link` and `send` when linking to a 
remote process
 Key: MESOS-5748
 URL: https://issues.apache.org/jira/browse/MESOS-5748
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0, 0.23.0, 0.22.0
Reporter: Joseph Wu
 Fix For: 1.0.0


There is a race the SocketManager, between a remote {{link}} and disconnection 
of the underlying socket.

We potentially segfault here: 
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512

{{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
object.  However, the code above this line actually has ownership of the 
pointer:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499

If the socket dies during the link, the {{ignore_recv_data}} may delete the 
Socket underneath {{link}}:
https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411


The same race exists for {{send}}.

This race was discovered while running a new test in repetition:
https://reviews.apache.org/r/49175/

On OSX, I hit the race consistently every 500-800 repetitions:
{code}
3rdparty/libprocess/libprocess-tests 
--gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
--gtest_repeat=1000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-29 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356191#comment-15356191
 ] 

Vinod Kone commented on MESOS-5724:
---

Is this in progress? Marked as blocker for 1.0 but I don't see updates.

> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.23.0, 0.23.1, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 
> 0.26.0, 0.26.1, 0.27.0, 0.27.1, 0.27.2, 0.27.3, 0.28.0, 0.28.1, 0.28.2, 1.0.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
> Fix For: 1.0.0
>
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5747) DiskResource/PersistentVolumeTest.SlaveRecovery/0 is flaky

2016-06-29 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-5747:
-

 Summary: DiskResource/PersistentVolumeTest.SlaveRecovery/0 is flaky
 Key: MESOS-5747
 URL: https://issues.apache.org/jira/browse/MESOS-5747
 Project: Mesos
  Issue Type: Bug
 Environment: Ubuntu 16
Reporter: Anand Mazumdar


Showed up on Internal CI:
{code}
[17:07:42] : [Step 10/10] [ RUN  ] 
DiskResource/PersistentVolumeTest.SlaveRecovery/0
[17:07:42]W: [Step 10/10] I0629 17:07:42.826249  3720 cluster.cpp:155] 
Creating default 'local' authorizer
[17:07:42]W: [Step 10/10] I0629 17:07:42.833847  3720 leveldb.cpp:174] 
Opened db in 7.482452ms
[17:07:42]W: [Step 10/10] I0629 17:07:42.834661  3720 leveldb.cpp:181] 
Compacted db in 794368ns
[17:07:42]W: [Step 10/10] I0629 17:07:42.834678  3720 leveldb.cpp:196] 
Created db iterator in 3611ns
[17:07:42]W: [Step 10/10] I0629 17:07:42.834684  3720 leveldb.cpp:202] 
Seeked to beginning of db in 556ns
[17:07:42]W: [Step 10/10] I0629 17:07:42.834689  3720 leveldb.cpp:271] 
Iterated through 0 keys in the db in 345ns
[17:07:42]W: [Step 10/10] I0629 17:07:42.834702  3720 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[17:07:42]W: [Step 10/10] I0629 17:07:42.834889  3739 recover.cpp:451] 
Starting replica recovery
[17:07:42]W: [Step 10/10] I0629 17:07:42.835039  3739 recover.cpp:477] 
Replica is in EMPTY status
[17:07:42]W: [Step 10/10] I0629 17:07:42.835422  3738 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(22871)@172.30.2.145:46175
[17:07:42]W: [Step 10/10] I0629 17:07:42.835624  3735 recover.cpp:197] 
Received a recover response from a replica in EMPTY status
[17:07:42]W: [Step 10/10] I0629 17:07:42.835758  3740 recover.cpp:568] 
Updating replica status to STARTING
[17:07:42]W: [Step 10/10] I0629 17:07:42.835814  3740 master.cpp:382] 
Master 59345281-cdcf-49ad-a546-9e2584432372 (ip-172-30-2-145.mesosphere.io) 
started on 172.30.2.145:46175
[17:07:42]W: [Step 10/10] I0629 17:07:42.835829  3740 master.cpp:384] Flags 
at startup: --acls="" --agent_ping_timeout="15secs" 
--agent_reregister_timeout="10mins" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http="true" 
--authenticate_http_frameworks="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/nsSI5J/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/nsSI5J/master" 
--zk_session_timeout="10secs"
[17:07:42]W: [Step 10/10] I0629 17:07:42.835991  3740 master.cpp:434] 
Master only allowing authenticated frameworks to register
[17:07:42]W: [Step 10/10] I0629 17:07:42.835999  3740 master.cpp:448] 
Master only allowing authenticated agents to register
[17:07:42]W: [Step 10/10] I0629 17:07:42.836004  3740 master.cpp:461] 
Master only allowing authenticated HTTP frameworks to register
[17:07:42]W: [Step 10/10] I0629 17:07:42.836007  3740 credentials.hpp:37] 
Loading credentials for authentication from '/tmp/nsSI5J/credentials'
[17:07:42]W: [Step 10/10] I0629 17:07:42.836081  3740 master.cpp:506] Using 
default 'crammd5' authenticator
[17:07:42]W: [Step 10/10] I0629 17:07:42.836120  3740 master.cpp:578] Using 
default 'basic' HTTP authenticator
[17:07:42]W: [Step 10/10] I0629 17:07:42.836174  3740 master.cpp:658] Using 
default 'basic' HTTP framework authenticator
[17:07:42]W: [Step 10/10] I0629 17:07:42.836231  3740 master.cpp:705] 
Authorization enabled
[17:07:42]W: [Step 10/10] I0629 17:07:42.836277  3739 
whitelist_watcher.cpp:77] No whitelist given
[17:07:42]W: [Step 10/10] I0629 17:07:42.836329  3736 hierarchical.cpp:142] 
Initialized hierarchical allocator process
[17:07:42]W: [Step 10/10] I0629 17:07:42.836783  3740 master.cpp:1971] The 
newly elected leader is master@172.30.2.145:46175 with id 
59345281-cdcf-49ad-a546-9e2584432372
[17:07:42]W: [Step 10/10] I0629 17:07:42.836797  3740 master.cpp:1984] 
Elected as the leading master!
[17:07:42]W: [Step 10/10] I0629 17:07:42.836803  3740 master.cpp:1671] 
Recovering from registrar
[17:07:42]W: [Step 10/10] I0629 17:07:42.836865  3741 registrar.cpp:332] 
Recovering registrar
[17:07:42]W: 

[jira] [Commented] (MESOS-5746) Sandbox links are broken in authorized cluster

2016-06-29 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355984#comment-15355984
 ] 

Greg Mann commented on MESOS-5746:
--

My apologies - this was just due to an incorrect ACL configuration, as you can 
see from the scripts above :-)

> Sandbox links are broken in authorized cluster
> --
>
> Key: MESOS-5746
> URL: https://issues.apache.org/jira/browse/MESOS-5746
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Greg Mann
>  Labels: authorization, mesosphere, security
> Attachments: Screen Shot 2016-06-29 at 12.28.49 PM.png
>
>
> I ran Mesos master with this script:
> {code}
> #! /usr/bin/env bash
> rm -rf /tmp/mesos/*
> cat < /tmp/credentials.txt
> foo bar
> baz bar
> EOF
> cat < /tmp/acls.json
> {
>   "permissive": false,
>   "access_mesos_logs" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ],
>   "register_frameworks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "roles" : { "type" : "ANY" }
> }
>   ],
>   "run_tasks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "get_endpoints" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "paths" : { "type" : "ANY" }
> }
>   ],
>   "view_frameworks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "view_tasks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "view_executors" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "access_sandboxes" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "access_mesos_logs" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ],
>   "get_quotas" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "roles" : { "type" : "ANY" }
> }
>   ]
> }
> EOF
> export GLOG_v=2
> export MESOS_VERBOSE=1
> ./bin/mesos-master.sh --work_dir=/tmp/mesos/master \
>   --authenticate_http \
>   --credentials=file:///tmp/credentials.txt \
>   --acls=file:///tmp/acls.json \
>   --log_dir=/tmp/mesos/logs/master
> {code}
> and ran the agent with this script:
> {code}
> #! /usr/bin/env bash
> cat < /tmp/credentials.txt
> foo bar
> baz bar
> EOF
> cat < /tmp/acls.json
> {
>   "permissive": false,
>   "access_mesos_log" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ]
> }
> EOF
> export GLOG_v=2
> export MESOS_VERBOSE=1
> ./bin/mesos-slave.sh --work_dir=/tmp/mesos/agent \
>  --master=127.0.0.1:5050 \
>  --authenticate_http \
>  --http_credentials=file:///tmp/credentials.txt \
>  --acls=file:///tmp/acls.json \
>  --log_dir=/tmp/mesos/logs/agent
> {code}
> And then ran the long-lived framework with {{src/long-lived-framework 
> --master=127.0.0.1:5050 --principal=foo --secret=bar}}. When attempting to 
> click on "Sandbox" links in the Mesos web UI, I see the error {{Framework 
> with ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-' does not exist on agent 
> with ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-S0'.}} (screenshot attached). 
> Looking at Chrome devtools, I don't see any non-200 return codes in HTTP 
> responses. Each click on "Sandbox" produces a single request to the agent's 
> {{/state}} endpoint, which returns 200 OK.
> I verified that the sandbox links work as expected when authorization is not 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5746) Sandbox links are broken in authorized cluster

2016-06-29 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5746:
-
Attachment: Screen Shot 2016-06-29 at 12.28.49 PM.png

> Sandbox links are broken in authorized cluster
> --
>
> Key: MESOS-5746
> URL: https://issues.apache.org/jira/browse/MESOS-5746
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Greg Mann
>  Labels: authorization, mesosphere, security
> Attachments: Screen Shot 2016-06-29 at 12.28.49 PM.png
>
>
> I ran Mesos master with this script:
> {code}
> #! /usr/bin/env bash
> rm -rf /tmp/mesos/*
> cat < /tmp/credentials.txt
> foo bar
> baz bar
> EOF
> cat < /tmp/acls.json
> {
>   "permissive": false,
>   "access_mesos_logs" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ],
>   "register_frameworks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "roles" : { "type" : "ANY" }
> }
>   ],
>   "run_tasks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "get_endpoints" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "paths" : { "type" : "ANY" }
> }
>   ],
>   "view_frameworks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "view_tasks" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "view_executors" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "access_sandboxes" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "users" : { "type" : "ANY" }
> }
>   ],
>   "access_mesos_logs" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ],
>   "get_quotas" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "roles" : { "type" : "ANY" }
> }
>   ]
> }
> EOF
> export GLOG_v=2
> export MESOS_VERBOSE=1
> ./bin/mesos-master.sh --work_dir=/tmp/mesos/master \
>   --authenticate_http \
>   --credentials=file:///tmp/credentials.txt \
>   --acls=file:///tmp/acls.json \
>   --log_dir=/tmp/mesos/logs/master
> {code}
> and ran the agent with this script:
> {code}
> #! /usr/bin/env bash
> cat < /tmp/credentials.txt
> foo bar
> baz bar
> EOF
> cat < /tmp/acls.json
> {
>   "permissive": false,
>   "access_mesos_log" : [
> {
>   "principals" : { "values" : ["foo"] },
>   "logs" : { "type" : "ANY" }
> }
>   ]
> }
> EOF
> export GLOG_v=2
> export MESOS_VERBOSE=1
> ./bin/mesos-slave.sh --work_dir=/tmp/mesos/agent \
>  --master=127.0.0.1:5050 \
>  --authenticate_http \
>  --http_credentials=file:///tmp/credentials.txt \
>  --acls=file:///tmp/acls.json \
>  --log_dir=/tmp/mesos/logs/agent
> {code}
> And then ran the long-lived framework with {{src/long-lived-framework 
> --master=127.0.0.1:5050 --principal=foo --secret=bar}}. When attempting to 
> click on "Sandbox" links in the Mesos web UI, I see the error {{Framework 
> with ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-' does not exist on agent 
> with ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-S0'.}} (screenshot attached). 
> Looking at Chrome devtools, I don't see any non-200 return codes in HTTP 
> responses. Each click on "Sandbox" produces a single request to the agent's 
> {{/state}} endpoint, which returns 200 OK.
> I verified that the sandbox links work as expected when authorization is not 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5746) Sandbox links are broken in authorized cluster

2016-06-29 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5746:
-
Description: 
I ran Mesos master with this script:
{code}
#! /usr/bin/env bash

rm -rf /tmp/mesos/*

cat < /tmp/credentials.txt
foo bar
baz bar
EOF

cat < /tmp/acls.json
{
  "permissive": false,
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "register_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ],
  "run_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "get_endpoints" : [
{
  "principals" : { "values" : ["foo"] },
  "paths" : { "type" : "ANY" }
}
  ],
  "view_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_executors" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_sandboxes" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "get_quotas" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ]
}
EOF

export GLOG_v=2
export MESOS_VERBOSE=1
./bin/mesos-master.sh --work_dir=/tmp/mesos/master \
  --authenticate_http \
  --credentials=file:///tmp/credentials.txt \
  --acls=file:///tmp/acls.json \
  --log_dir=/tmp/mesos/logs/master
{code}
and ran the agent with this script:
{code}
#! /usr/bin/env bash

cat < /tmp/credentials.txt
foo bar
baz bar
EOF

cat < /tmp/acls.json
{
  "permissive": false,
  "access_mesos_log" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ]
}
EOF

export GLOG_v=2
export MESOS_VERBOSE=1
./bin/mesos-slave.sh --work_dir=/tmp/mesos/agent \
 --master=127.0.0.1:5050 \
 --authenticate_http \
 --http_credentials=file:///tmp/credentials.txt \
 --acls=file:///tmp/acls.json \
 --log_dir=/tmp/mesos/logs/agent
{code}

And then ran the long-lived framework with {{src/long-lived-framework 
--master=127.0.0.1:5050 --principal=foo --secret=bar}}. When attempting to 
click on "Sandbox" links in the Mesos web UI, I see the error {{Framework with 
ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-' does not exist on agent with ID 
'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-S0'.}} (screenshot attached). Looking at 
Chrome devtools, I don't see any non-200 return codes in HTTP responses. Each 
click on "Sandbox" produces a single request to the agent's {{/state}} 
endpoint, which returns 200 OK.

I verified that the sandbox links work as expected when authorization is not 
enabled.

  was:
I ran Mesos master with this script:
{code}
#! /usr/bin/env bash

rm -rf /tmp/mesos/*

cat < /tmp/credentials.txt
foo bar
baz bar
EOF

cat < /tmp/acls.json
{
  "permissive": false,
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "register_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ],
  "run_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "get_endpoints" : [
{
  "principals" : { "values" : ["foo"] },
  "paths" : { "type" : "ANY" }
}
  ],
  "view_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_executors" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_sandboxes" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "get_quotas" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ]
}
EOF

export GLOG_v=2
export MESOS_VERBOSE=1
./bin/mesos-master.sh --work_dir=/tmp/mesos/master \
  --authenticate_http \
  --credentials=file:///tmp/credentials.txt \
  --acls=file:///tmp/acls.json \
  --log_dir=/tmp/mesos/logs/master
{code}
and ran the agent with this script:
{code}
#! /usr/bin/env bash

cat < 

[jira] [Created] (MESOS-5746) Sandbox links are broken in authorized cluster

2016-06-29 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5746:


 Summary: Sandbox links are broken in authorized cluster
 Key: MESOS-5746
 URL: https://issues.apache.org/jira/browse/MESOS-5746
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Greg Mann


I ran Mesos master with this script:
{code}
#! /usr/bin/env bash

rm -rf /tmp/mesos/*

cat < /tmp/credentials.txt
foo bar
baz bar
EOF

cat < /tmp/acls.json
{
  "permissive": false,
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "register_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ],
  "run_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "get_endpoints" : [
{
  "principals" : { "values" : ["foo"] },
  "paths" : { "type" : "ANY" }
}
  ],
  "view_frameworks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_tasks" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "view_executors" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_sandboxes" : [
{
  "principals" : { "values" : ["foo"] },
  "users" : { "type" : "ANY" }
}
  ],
  "access_mesos_logs" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ],
  "get_quotas" : [
{
  "principals" : { "values" : ["foo"] },
  "roles" : { "type" : "ANY" }
}
  ]
}
EOF

export GLOG_v=2
export MESOS_VERBOSE=1
./bin/mesos-master.sh --work_dir=/tmp/mesos/master \
  --authenticate_http \
  --credentials=file:///tmp/credentials.txt \
  --acls=file:///tmp/acls.json \
  --log_dir=/tmp/mesos/logs/master
{code}
and ran the agent with this script:
{code}
#! /usr/bin/env bash

cat < /tmp/credentials.txt
foo bar
baz bar
EOF

cat < /tmp/acls.json
{
  "permissive": false,
  "access_mesos_log" : [
{
  "principals" : { "values" : ["foo"] },
  "logs" : { "type" : "ANY" }
}
  ]
}
EOF

export GLOG_v=2
export MESOS_VERBOSE=1
./bin/mesos-slave.sh --work_dir=/tmp/mesos/agent \
 --master=127.0.0.1:5050 \
 --authenticate_http \
 --http_credentials=file:///tmp/credentials.txt \
 --acls=file:///tmp/acls.json \
 --log_dir=/tmp/mesos/logs/agent
{code}

And then ran the long-lived framework with {{src/long-lived-framework 
--master=127.0.0.1:5050 --principal=foo --secret=bar}}. When attempting to 
click on "Sandbox" links in the Mesos web UI, I see the error {{Framework with 
ID 'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-' does not exist on agent with ID 
'd2735ff3-52ac-467a-b8eb-6bd7a119ee32-S0'.
}} (screenshot attached). Looking at Chrome devtools, I don't see any non-200 
return codes in HTTP responses. Each click on "Sandbox" produces a single 
request to the agent's {{/state}} endpoint, which returns 200 OK.

I verified that the sandbox links work as expected when authorization is not 
enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5598) pailer dies and no longer spools logs from docker container

2016-06-29 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355689#comment-15355689
 ] 

Greg Mann commented on MESOS-5598:
--

Hi [~john.came...@gmail.com], thanks for the ticket! Have you noticed any 
pattern in the cases where the pailer fails? How large are the log files in 
these cases?

One thing that would be very helpful is information about specific HTTP 
requests that are failing, and the return codes of the responses. If you 
encounter this behavior again, could you open the dev tools in your browser and 
grab any information associated with failed HTTP requests that you see?

> pailer dies and no longer spools logs from docker container
> ---
>
> Key: MESOS-5598
> URL: https://issues.apache.org/jira/browse/MESOS-5598
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.2
>Reporter: John Camelon
>
> There are numerous instances where we see pailer choke on logs and stop 
> updating.  
> When I ssh into the host where the container is running, running "docker 
> logs" will yield much more output past the point where pailer stopped 
> working.   
> I am not sure what logs I am supposed to gather to diagnose this, please let 
> me know.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5735) Update WebUI to use v1 operator API

2016-06-29 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355498#comment-15355498
 ] 

haosdent commented on MESOS-5735:
-

[~dongdong] {{/monitor/statistics}} is still used in {{AgentCtrl}}. When you 
open {{http://${MASTER_HOST}/#/slaves/${SLAVE_ID}}}, you should see the 
{{/monitor/statistics}} request.

> Update WebUI to use v1 operator API
> ---
>
> Key: MESOS-5735
> URL: https://issues.apache.org/jira/browse/MESOS-5735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: zhou xing
>
> Having the WebUI use the v1 API would be a good validation of it's usefulness 
> and correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5735) Update WebUI to use v1 operator API

2016-06-29 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355498#comment-15355498
 ] 

haosdent edited comment on MESOS-5735 at 6/29/16 5:14 PM:
--

[~dongdong] {{/monitor/statistics}} is still used in {{AgentCtrl}}. When you 
open http://MASTER_HOST/#/slaves/SLAVE_ID, you should see the 
{{/monitor/statistics}} request.


was (Author: haosd...@gmail.com):
[~dongdong] {{/monitor/statistics}} is still used in {{AgentCtrl}}. When you 
open {{http://${MASTER_HOST}/#/slaves/${SLAVE_ID}}}, you should see the 
{{/monitor/statistics}} request.

> Update WebUI to use v1 operator API
> ---
>
> Key: MESOS-5735
> URL: https://issues.apache.org/jira/browse/MESOS-5735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: zhou xing
>
> Having the WebUI use the v1 API would be a good validation of it's usefulness 
> and correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355357#comment-15355357
 ] 

Michael Park commented on MESOS-5745:
-

Let's keep both for now, since this one captures a different error message and 
fails a different test. We can close this at the same time as MESOS-3335 if it 
goes away.

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-5745:

Comment: was deleted

(was: Let's keep both for now, since this one captures a different error 
message and fails a different test. We can close this at the same time as 
MESOS-3335 if it goes away.)

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355355#comment-15355355
 ] 

Michael Park commented on MESOS-5745:
-

Let's keep both for now, since this one captures a different error message and 
fails a different test. We can close this at the same time as MESOS-3335 if it 
goes away.

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355337#comment-15355337
 ] 

Benjamin Bannier edited comment on MESOS-5745 at 6/29/16 2:39 PM:
--

This seems to be related to MESOS-3335; at least I saw this exact problem with 
optimizing builds with some clang-3.8+ and it went away after -making 
{{FlagsBase}} properly copyable- removing the slicing of {{FlagsBase}} via 
{{Option}}. Shall we close this as a dup [~mcypark]?


was (Author: bbannier):
This seems to be related to MESOS-3335; at least I saw this exact problem with 
optimizing builds with some clang-3.8+ and it went away after making 
{{FlagsBase}} properly copyable. Shall we close this as a dup [~mcypark]?

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5700) Benchmark for Resource class (protobuf vs. C++)

2016-06-29 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355336#comment-15355336
 ] 

Klaus Ma commented on MESOS-5700:
-

Keep add the same resource for 100 times:

{code}
[ RUN  ] ResourcesOperatorCount/Resources_BENCHMARK_Test.Operator_Add/17
Added 100 resources (cpus:1;ports:[1-100]) in 1.895914secs
[   OK ] ResourcesOperatorCount/Resources_BENCHMARK_Test.Operator_Add/17 
(1896 ms)
{code}

> Benchmark for Resource class (protobuf vs. C++)
> ---
>
> Key: MESOS-5700
> URL: https://issues.apache.org/jira/browse/MESOS-5700
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> Add benchmark of Resource class for Allocation Performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355337#comment-15355337
 ] 

Benjamin Bannier edited comment on MESOS-5745 at 6/29/16 2:40 PM:
--

This seems to be related to MESOS-3335; at least I saw this exact problem with 
optimizing builds with some clang-3.8+ and it went away after -making 
{{FlagsBase}} properly copyable- removing the slicing of {{FlagsBase}} via 
{{Option}}.


was (Author: bbannier):
This seems to be related to MESOS-3335; at least I saw this exact problem with 
optimizing builds with some clang-3.8+ and it went away after -making 
{{FlagsBase}} properly copyable- removing the slicing of {{FlagsBase}} via 
{{Option}}. Shall we close this as a dup [~mcypark]?

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355337#comment-15355337
 ] 

Benjamin Bannier commented on MESOS-5745:
-

This seems to related to MESOS-3335; at least I saw this problem with 
optimizing builds with some clang-3.8+ and it went away after making 
{{FlagsBase}} properly copyable. Shall we close this as a dup [~mcypark]?

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355337#comment-15355337
 ] 

Benjamin Bannier edited comment on MESOS-5745 at 6/29/16 2:38 PM:
--

This seems to be related to MESOS-3335; at least I saw this exact problem with 
optimizing builds with some clang-3.8+ and it went away after making 
{{FlagsBase}} properly copyable. Shall we close this as a dup [~mcypark]?


was (Author: bbannier):
This seems to related to MESOS-3335; at least I saw this problem with 
optimizing builds with some clang-3.8+ and it went away after making 
{{FlagsBase}} properly copyable. Shall we close this as a dup [~mcypark]?

> AuthenticationTest.UnauthenticatedSlave fails with clang++3.8
> -
>
> Key: MESOS-5745
> URL: https://issues.apache.org/jira/browse/MESOS-5745
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Park
>  Labels: mesosphere
>
> With {{clang++-3.8}}, {{make check}} fails with the following message:
> {noformat}
> [ RUN  ] AuthenticationTest.UnauthenticatedSlave
> *** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
> using GNU date ***
> PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
> *** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
> @ 0x7fff8af4252a _sigtramp
> @0x110216a00 (unknown)
> @0x10b7f5881 mesos::internal::logging::Flags::operator=()
> @0x10b7f3076 mesos::internal::slave::Flags::operator=()
> @0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
> @0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
> @0x10b7511b9 
> mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
> @0x10c703caa 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c703b0a testing::Test::Run()
> @0x10c704b02 testing::TestInfo::Run()
> @0x10c7053c3 testing::TestCase::Run()
> @0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
> @0x10c70ca43 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10c70c95e testing::UnitTest::Run()
> @0x10bbe44f3 main
> @ 0x7fff9071a5ad start
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5735) Update WebUI to use v1 operator API

2016-06-29 Thread zhou xing (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355322#comment-15355322
 ] 

zhou xing commented on MESOS-5735:
--

[~vinodkone], I did a primary investigation on WebUI today, here I listed the 
WebUI related http api calls:

1. call to */master/state*:
most of the pages in WebUI relies on this request to get FRAMEWORK, AGENT, 
OFFER, TASK and EXECUTOR objects. Our newly proposed GetState message can 
satisfy almost all the requirements from these pages, except for the following 
fields:
• {{state.leader}} and {{state.pid}}: these fields are used to decide 
whether current master is leading master, if not, ui will redirect
• {{clustername}}, {{version}}, {{build_time}}, {{start_time}}, 
{{elected_time}}: these fields will be shown on WebUI.
• {{log_dir}} && {{external_log_dir}}: these fields will be used when user 
clicks the LOG link on the home page.

2. call to */slave/state*:
when browsing a single agent information on WebUI(select to open one single 
Agent on Agents page), WebUI will call */slave/state* endpoint to get the 
detail information of that slave, I think we still need to work out how slave's 
{{GetState}} message look like. There should be differences between master's 
{{GetState}} message and slave's {{GetState}} message. e.g. the executor 
information returned by slave's */state* endpoint now will return three more 
fields: {{queued_tasks}}, {{active_tasks}} and {{completed_tasks}}, these three 
fields are not in master's executor message.

3. call to */files/browse* and */files/read*
The new two operator v1 files API should have the same functionality with the 
previous http api.

4. call to */metrics/snapshot* for both master and agent
The metrics api should have same set of metrics.

5. call to slave's */monitor/statistics* endpoint
I happened to see this endpoint in current WebUI's code, but when I tried to 
visit WebUI by launching master, agent and long-lived-framework, I can not see 
this endpoint call in HTTP call history. Also, operator V1 API epic does not 
contain a ticket for this endpoint, are we still using this API now?

> Update WebUI to use v1 operator API
> ---
>
> Key: MESOS-5735
> URL: https://issues.apache.org/jira/browse/MESOS-5735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: zhou xing
>
> Having the WebUI use the v1 API would be a good validation of it's usefulness 
> and correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5745) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Michael Park (JIRA)
Michael Park created MESOS-5745:
---

 Summary: AuthenticationTest.UnauthenticatedSlave fails with 
clang++3.8
 Key: MESOS-5745
 URL: https://issues.apache.org/jira/browse/MESOS-5745
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Michael Park


With {{clang++-3.8}}, {{make check}} fails with the following message:

{noformat}
[ RUN  ] AuthenticationTest.UnauthenticatedSlave
*** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
using GNU date ***
PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
*** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
@ 0x7fff8af4252a _sigtramp
@0x110216a00 (unknown)
@0x10b7f5881 mesos::internal::logging::Flags::operator=()
@0x10b7f3076 mesos::internal::slave::Flags::operator=()
@0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
@0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
@0x10b7511b9 
mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
@0x10c703caa 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10c703b0a testing::Test::Run()
@0x10c704b02 testing::TestInfo::Run()
@0x10c7053c3 testing::TestCase::Run()
@0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
@0x10c70ca43 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10c70c95e testing::UnitTest::Run()
@0x10bbe44f3 main
@ 0x7fff9071a5ad start
make[3]: *** [check-local] Segmentation fault: 11
make[2]: *** [check-am] Error 2
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5744) AuthenticationTest.UnauthenticatedSlave fails with clang++3.8

2016-06-29 Thread Michael Park (JIRA)
Michael Park created MESOS-5744:
---

 Summary: AuthenticationTest.UnauthenticatedSlave fails with 
clang++3.8
 Key: MESOS-5744
 URL: https://issues.apache.org/jira/browse/MESOS-5744
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Michael Park


With {{clang++-3.8}}, {{make check}} fails with the following message:

{noformat}
[ RUN  ] AuthenticationTest.UnauthenticatedSlave
*** Aborted at 1467208613 (unix time) try "date -d @1467208613" if you are 
using GNU date ***
PC: @0x10b7f5a8b std::__1::__tree<>::__assign_multi<>()
*** SIGSEGV (@0x0) received by PID 40053 (TID 0x7fff73aaf000) stack trace: ***
@ 0x7fff8af4252a _sigtramp
@0x110216a00 (unknown)
@0x10b7f5881 mesos::internal::logging::Flags::operator=()
@0x10b7f3076 mesos::internal::slave::Flags::operator=()
@0x10b7f1cbf mesos::internal::tests::cluster::Slave::start()
@0x10bf1a2d1 mesos::internal::tests::MesosTest::StartSlave()
@0x10b7511b9 
mesos::internal::tests::AuthenticationTest_UnauthenticatedSlave_Test::TestBody()
@0x10c703caa 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10c703b0a testing::Test::Run()
@0x10c704b02 testing::TestInfo::Run()
@0x10c7053c3 testing::TestCase::Run()
@0x10c70cefb testing::internal::UnitTestImpl::RunAllTests()
@0x10c70ca43 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10c70c95e testing::UnitTest::Run()
@0x10bbe44f3 main
@ 0x7fff9071a5ad start
make[3]: *** [check-local] Segmentation fault: 11
make[2]: *** [check-am] Error 2
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5743) Added a flag parser for hashset.

2016-06-29 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5743:
--

 Summary: Added a flag parser for hashset.
 Key: MESOS-5743
 URL: https://issues.apache.org/jira/browse/MESOS-5743
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


We are introducing a new flag in master to set multiple exclude resource names 
from sorter, it is better add a lag parser for hashset to parse 
the flag for multiple exclude resource names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5709) Authorization for /roles

2016-06-29 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355062#comment-15355062
 ] 

Joerg Schad commented on MESOS-5709:


Introduced authorization based filtering for /roles.
https://reviews.apache.org/r/49369

Updateted documentation for roles endpoint filtering
https://reviews.apache.org/r/49370/

> Authorization for /roles
> 
>
> Key: MESOS-5709
> URL: https://issues.apache.org/jira/browse/MESOS-5709
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Adam B
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> The /roles endpoint exposes the list of all roles and their weights, as well 
> as the list of all frameworkIds registered with each role. This is a superset 
> of the information exposed on GET /weights, which we already protect. We 
> should protect the data in /roles the same way.
> - Should we reuse VIEW_FRAMEWORK with role (from /state)?
> - Should we add a new VIEW_ROLE and adapt GET_WEIGHTS to use it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5708) Add authz to /files/debug

2016-06-29 Thread Abhishek Dasgupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Dasgupta reassigned MESOS-5708:


Assignee: Abhishek Dasgupta

> Add authz to /files/debug
> -
>
> Key: MESOS-5708
> URL: https://issues.apache.org/jira/browse/MESOS-5708
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Adam B
>Assignee: Abhishek Dasgupta
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> The /files/debug endpoint exposes the attached master/agent log paths and 
> every attached sandbox path, which includes the frameworkId and executorId. 
> Even if sandboxes are protected, we still don't want to expose this 
> information to unauthorized users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5732) MasterAPITest.UnreserveResources is slow

2016-06-29 Thread Abhishek Dasgupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Dasgupta reassigned MESOS-5732:


Assignee: Abhishek Dasgupta

> MasterAPITest.UnreserveResources is slow
> 
>
> Key: MESOS-5732
> URL: https://issues.apache.org/jira/browse/MESOS-5732
> Project: Mesos
>  Issue Type: Improvement
>  Components: tests
>Reporter: Neil Conway
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] ContentType/MasterAPITest.UnreserveResources/0
> [   OK ] ContentType/MasterAPITest.UnreserveResources/0 (6033 ms)
> [ RUN  ] ContentType/MasterAPITest.UnreserveResources/1
> [   OK ] ContentType/MasterAPITest.UnreserveResources/1 (6041 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunzhe updated MESOS-5742:
--
Story Points: 1  (was: 2)

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunzhe updated MESOS-5742:
--
Description: So far, the GPU resource is not fractional, only integer 
values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, 
it can also work without any warning or error. And in the webui the GPU 
resource is `1.2`.  (was: So far, the GPU resource is not fractional, only 
integer values are allowed. But when starting agents with 
{{\-\-resources='gpu:1.2'}}, it can also work with any warning or error. And in 
the webui the GPU resource is `1.2`.)

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> without any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunzhe updated MESOS-5742:
--
Description: So far, the GPU resource is not fractional, only integer 
values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, 
it can also work with any warning or error. And in the webui the GPU resource 
is `1.2`.  (was: So far, the GPU resource is not fractional, only integer 
values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, 
it also can work with any warning or error. And in the webui the GPU resource 
is {{1.2}})

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work 
> with any warning or error. And in the webui the GPU resource is `1.2`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunzhe updated MESOS-5742:
--
Description: So far, the GPU resource is not fractional, only integer 
values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, 
it also can work with any warning or error. And in the webui the GPU resource 
is {{1.2}}  (was: So far, the GPU resource is not fractional, only integer 
values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, 
it also can work with any warning or error.)

> When start an agent with `--resources`, the GPU resource can be fractional
> --
>
> Key: MESOS-5742
> URL: https://issues.apache.org/jira/browse/MESOS-5742
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>Assignee: Sunzhe
>  Labels: gpu
>
> So far, the GPU resource is not fractional, only integer values are allowed. 
> But when starting agents with {{\-\-resources='gpu:1.2'}}, it also can work 
> with any warning or error. And in the webui the GPU resource is {{1.2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5742) When start an agent with `--resources`, the GPU resource can be fractional

2016-06-29 Thread Sunzhe (JIRA)
Sunzhe created MESOS-5742:
-

 Summary: When start an agent with `--resources`, the GPU resource 
can be fractional
 Key: MESOS-5742
 URL: https://issues.apache.org/jira/browse/MESOS-5742
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sunzhe
Assignee: Sunzhe


So far, the GPU resource is not fractional, only integer values are allowed. 
But when starting agents with {{\-\-resources='gpu:1.2'}}, it also can work 
with any warning or error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5741) Quota + reserved resources is overly conservative

2016-06-29 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5741:
--

 Summary: Quota + reserved resources is overly conservative
 Key: MESOS-5741
 URL: https://issues.apache.org/jira/browse/MESOS-5741
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


Consider this scenario:

* Cluster has 10 CPUs total. 8 CPUs are reserved for role X, 2 CPUs are 
unreserved.
* Role X has a quota for 4 CPUs, but has only been allocated 2 CPUs (e.g., 
because it has declined an offer for the other 2 CPUs). The CPUs it has been 
allocated come from the reserved resources, so there are 6 reserved CPUs and 2 
unreserved CPUs available.
* That means 6 CPUs should be offered as non-quota resources. However, which 6 
CPUs should be offered -- the 6 reserved CPUs, or 4 reserved CPUs and 2 
unreserved CPUs?

The current quota allocation logic appears to always offer the 6 reserved CPUs. 
This is unfortunate, because frameworks in other roles won't be able to use 
those resources. The reason for this behavior is:

{code}
  Resources remainingClusterResources = roleSorter->totalScalarQuantities();
  foreachkey (const string& role, activeRoles) {
remainingClusterResources -= roleSorter->allocationScalarQuantities(role);
  }
{code}

{{remainingClusterResources}} may have a {{role}} set (although dynamically 
reserved resources will have been converted into effectively static 
reservations).

{code}
  Resources unallocatedQuotaResources;
  foreachpair (const string& name, const Quota& quota, quotas) {
// Compute the amount of quota that the role does not have allocated.
//
// NOTE: Revocable resources are excluded in `quotaRoleSorter`.
// NOTE: Only scalars are considered for quota.
Resources allocated = getQuotaRoleAllocatedResources(name);
const Resources required = quota.info.guarantee();
unallocatedQuotaResources += (required - allocated);
  }
{code}

{{unallocatedQuotaResources}} will *not* have {{role}} set, per the 
implementation of {{getQuotaRoleAllocatedResources}}.

{code}
remainingClusterResources -= unallocatedQuotaResources;
{code}

This means that *only* unreserved resources will be subtracted from 
{{remainingClusterResources}}. At best this is sub-optimal, because it seems 
better to lay-away resources for role X that are already reserved for X. I 
don't *think* it will result in violating quota guarantees, though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4611) Passing a lambda to dispatch() always matches the template returning void

2016-06-29 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4611:

Sprint: Mesosphere Sprint 38

> Passing a lambda to dispatch() always matches the template returning void
> -
>
> Key: MESOS-4611
> URL: https://issues.apache.org/jira/browse/MESOS-4611
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Kevin Klues
>Assignee: haosdent
>  Labels: dispatch, libprocess, mesosphere
>
> The following idiom does not currently compile:
> {code}
>   Future initialized = dispatch(pid, [] () -> Nothing {
> return Nothing();
>   });
> {code}
> This seems non-intuitive because the following template exists for dispatch:
> {code}
> template 
> Future dispatch(const UPID& pid, const std::function& f)
> {
>   std::shared_ptr promise(new Promise()); 
>  
>   std::shared_ptr> f_(
>   new std::function(
>   [=](ProcessBase*) {
> promise->set(f());
>   }));
>   internal::dispatch(pid, f_);
>   
>   return promise->future();
> } 
> {code}
> However, lambdas cannot be implicitly cast to a corresponding 
> std::function type.
> To make this work, you have to explicitly type the lambda before passing it 
> to dispatch.
> {code}
>   std::function f = []() { return Nothing(); };
>   Future initialized = dispatch(pid, f);
> {code}
> We should add template support to allow lambdas to be passed to dispatch() 
> without explicit typing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)