[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325712#comment-15325712
 ] 

Klaus Ma commented on MESOS-5600:
-

Think more about that, current behaiour is right: if total is updated, it will 
re-calculate all, otherwise, only related client is updated.when update weight, 
we should do the same thing.

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325693#comment-15325693
 ] 

Klaus Ma commented on MESOS-5600:
-

OK, thanks; one more comments :).

In {{DRFSorter::allocated()}}, the {{allocations}} of client was updated so 
{{dirty}} should be true to trigger {{sort()}} and {{update(name)}} seems not 
necessary as the {{share}} will be re-calculated in {{sort()}}.
 
Further more, I'm thinking how many performance contribution will {{dirty}} 
help? In each allocator loop, the allocation maybe changed so the sorter should 
re-calculate the order.

To improve the performance, I think we can only update the single client in 
{{allocated}} & {{unallocated}}; and only return clients list in {{sort()}} 
(the client was sorted when inserted in {{allocated}} & {{unallocated}}).

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325675#comment-15325675
 ] 

Guangya Liu commented on MESOS-5600:


I filed another JIRA here to call update() when update weight for a client 
https://issues.apache.org/jira/browse/MESOS-5601

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5601) The sorter should re-calculate share if weight was update

2016-06-10 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5601:
--

 Summary: The sorter should re-calculate share if weight was update
 Key: MESOS-5601
 URL: https://issues.apache.org/jira/browse/MESOS-5601
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Guangya Liu
Assignee: Guangya Liu


When update weight for client, if dirty is false, the sorter should 
re-calculate the share.

https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L64



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325674#comment-15325674
 ] 

Klaus Ma commented on MESOS-5600:
-

It's not only about {{total}}; in {{sort()}}, it will update the order of 
client based on share: in {{DRFSorter::update}}, the weight is updated which 
will impact the result of {{DRFSorter::calculateShare}}; in 
{{DRFSorter::remove}}, the client was removed, {{sort()}} should not return 
removed clients.

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325673#comment-15325673
 ] 

Guangya Liu commented on MESOS-5600:


Why? I saw the total resources was not update in the code block, comments?

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325668#comment-15325668
 ] 

Klaus Ma edited comment on MESOS-5600 at 6/11/16 2:49 AM:
--

[~gyliu]/[~bmahler], please also check the following code, I think the 
{{dirty}} should be also updated accordingly:

https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L48-L85


was (Author: klaus1982):
[~gyliu]/[~bmahler], please also check the following code, I think the `dirty` 
should be also updated accordingly:

https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L48-L85

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325668#comment-15325668
 ] 

Klaus Ma commented on MESOS-5600:
-

[~gyliu]/[~bmahler], please also check the following code, I think the `dirty` 
should be also updated accordingly:

https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L48-L85

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325648#comment-15325648
 ] 

Guangya Liu commented on MESOS-5600:


[~bmahler] compared with move the false initialization into the initializer 
list, I prefer we keep current logic by initializing the {{dirty}} in 
sorter.hpp. 

My thinking is that if we move the false initialization into the initializer 
list, then I may need to add a new constructor for sorter() to initialize dirty 
and update all initialize code when create the sorters, this will involve many 
code change, but the current initialize logic is simple and easy to understand. 
Comments?

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-5600:
---
Component/s: allocation

> "dirty" was never set back as false in sorter
> -
>
> Key: MESOS-5600
> URL: https://issues.apache.org/jira/browse/MESOS-5600
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> dirty was set as true when the total resource was updated in the cluster, but 
> it was never set back as false. The dirty should be set back as false in 
> DRFSorter::sort 
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334
> The reason that we cannot detect this is because once an agent was added to 
> cluster, the dirty will be set as true and the sorter will always call sort() 
> to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5600) "dirty" was never set back as false in sorter

2016-06-10 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5600:
--

 Summary: "dirty" was never set back as false in sorter
 Key: MESOS-5600
 URL: https://issues.apache.org/jira/browse/MESOS-5600
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


dirty was set as true when the total resource was updated in the cluster, but 
it was never set back as false. The dirty should be set back as false in 
DRFSorter::sort 
https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L320-L334

The reason that we cannot detect this is because once an agent was added to 
cluster, the dirty will be set as true and the sorter will always call sort() 
to calculate share for each framework, this will impact the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5524) Expose resource allocation constraints (quota, shares) to schedulers.

2016-06-10 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5524:
---
Summary: Expose resource allocation constraints (quota, shares) to 
schedulers.  (was: Expose resource consumption constraints (quota, shares) to 
schedulers.)

> Expose resource allocation constraints (quota, shares) to schedulers.
> -
>
> Key: MESOS-5524
> URL: https://issues.apache.org/jira/browse/MESOS-5524
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, scheduler api
>Reporter: Benjamin Mahler
>
> Currently, schedulers do not have visibility into their quota or shares of 
> the cluster. By providing this information, we give the scheduler the ability 
> to make better decisions. As we start to allow schedulers to decide how 
> they'd like to use a particular resource (e.g. as non-revocable or 
> revocable), schedulers need visibility into their quota and shares to make an 
> effective decision (otherwise they may accidentally exceed their quota and 
> will not find out until mesos replies with TASK_LOST REASON_QUOTA_EXCEEDED).
> We would start by exposing the following information:
> * quota: e.g. cpus:10, mem:20, disk:40
> * shares: e.g. cpus:20, mem:40, disk:80
> Currently, quota is used for non-revocable resources and the idea is to use 
> shares only for consuming revocable resources since the number of shares 
> available to a role changes dynamically as resources come and go, frameworks 
> come and go, or the operator manipulates the amount of resources sectioned 
> off for quota.
> By exposing quota and shares, the framework knows when it can consume 
> additional non-revocable resources (i.e. when it has fewer non-revocable 
> resources allocated to it than its quota) or when it can consume revocable 
> resources (always! but in the future, it cannot revoke another user's 
> revocable resources if the framework is above its fair share).
> This also allows schedulers to determine whether they have sufficient quota 
> assigned to them, and to alert the operator if they need more to run safely. 
> Also, by viewing their fair share, the framework can expose monitoring 
> information that shows the discrepancy between how much it would like and its 
> fair share (note that the framework can actually exceed its fair share but in 
> the future this will mean increased potential for revocation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5578) Support static address allocation in CNI

2016-06-10 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325545#comment-15325545
 ] 

Qian Zhang commented on MESOS-5578:
---

Agree, we definitely need to avoid plugin specific implementation in our CNI 
isolator.

> Support static address allocation in CNI
> 
>
> Key: MESOS-5578
> URL: https://issues.apache.org/jira/browse/MESOS-5578
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently a framework can't specify a static IP address for the container 
> when using the network/cni isolator.
> The `ipaddress` field in the `NetworkInfo` protobuf was designed for this 
> specific purpose but since the CNI spec does not specify a means to allocate 
> an IP address to the container the `network/cni` isolator cannot honor this 
> field even when it is filled in by the framework.
> Creating this ticket to act as a place holder to track this limitation. As 
> and when the CNI spec allows us to specify a static IP address for the 
> container, we can resolve this ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5593:
--
Fix Version/s: 1.0.0

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5586) Move design docs from wiki to web page

2016-06-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325310#comment-15325310
 ] 

Vinod Kone commented on MESOS-5586:
---

Thanks for breaking down the tickets. Looks like you missed creating the *Epic* 
 type ticket?

> Move design docs from wiki to web page
> --
>
> Key: MESOS-5586
> URL: https://issues.apache.org/jira/browse/MESOS-5586
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Tomasz Janiszewski
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> {quote}
> Hi folks,
> I am proposing moving our content in Wiki (e.g., working groups, release
> tracking, etc.) to our docs in the code repo. I personally found that wiki
> is hard to use and there's no reviewing process for changes in the Wiki.
> The content in Wiki historically received less attention than that in the
> docs.
> What do you think?
> - Jie
> {quote}
> http://www.mail-archive.com/dev@mesos.apache.org/msg35506.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-10 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
  Sprint: Mesosphere Sprint 37
Story Points: 5

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5587) FullFrameworkWriter makes master segmentation fault.

2016-06-10 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325273#comment-15325273
 ] 

Joerg Schad commented on MESOS-5587:


Potentially the sandbox authorization has the same issue:
see https://reviews.apache.org/r/48566/

> FullFrameworkWriter makes master segmentation fault.
> 
>
> Key: MESOS-5587
> URL: https://issues.apache.org/jira/browse/MESOS-5587
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gilbert Song
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: authentication, mesosphere
> Fix For: 1.0.0
>
>
> FullFrameworkWriter::operator() may take down the master. Here is the log:
> {noformat}
> Jun 09 02:28:42 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:42.147253 
> 18633 master.cpp:5772] Sending 1 offers to framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-0001 (chronos) at 
> scheduler-c9cb7c2c-ae6b-4a34-8663-6a52980161c1@10.10.0.20:39285
> Jun 09 02:28:42 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:42.148890 
> 18637 master.cpp:4066] Processing DECLINE call for offers: [ 
> 7567c338-3ae5-4a84-bf5b-6a75a8a49341-O992 ] for framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-0001 (chronos) at 
> scheduler-c9cb7c2c-ae6b-4a34-8663-6a52980161c1@10.10.0.20:39285
> Jun 09 02:28:42 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:42.639813 
> 18632 http.cpp:483] HTTP GET for /master/state-summary from 10.10.0.180:45790 
> with User-Agent='python-requests/2.6.0 CPython/3.4.2 
> Linux/3.10.0-327.10.1.el7.x86_64'
> Jun 09 02:28:42 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:42.890702 
> 18632 http.cpp:483] HTTP GET for /master/state from 10.10.0.181:33830 with 
> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) 
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36'
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.139240 
> 18639 http.cpp:483] HTTP GET for /master/state-summary from 10.10.0.181:33831 
> with User-Agent='python-requests/2.6.0 CPython/3.4.2 
> Linux/3.10.0-327.18.2.el7.x86_64'
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.148582 
> 18633 master.cpp:5772] Sending 1 offers to framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 (Long Lived Framework (C++))
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.150388 
> 18635 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 
> 10.10.0.178:51645
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.150645 
> 18635 master.cpp:3457] Processing ACCEPT call for offers: [ 
> 7567c338-3ae5-4a84-bf5b-6a75a8a49341-O993 ] on agent 
> 091e9c3f-8a01-4890-8790-48b75fd81b40-S0 at slave(1)@10.10.0.20:5051 
> (10.10.0.20) for framework 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 (Long 
> Lived Framework (C++))
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.151268 
> 18635 master.hpp:178] Adding task 5699 with resources cpus(*):0.001; mem(*):1 
> on agent 091e9c3f-8a01-4890-8790-48b75fd81b40-S0 (10.10.0.20)
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.151322 
> 18635 master.cpp:3946] Launching task 5699 of framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 (Long Lived Framework (C++)) with 
> resources cpus(*):0.001; mem(*):1 on agent 
> 091e9c3f-8a01-4890-8790-48b75fd81b40-S0 at slave(1)@10.10.0.20:5051 
> (10.10.0.20)
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.160475 
> 18635 master.cpp:5211] Status update TASK_RUNNING (UUID: 
> 3f651ba8-7c80-4ac0-ae18-579371ec82d5) for task 5699 of framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 from agent 
> 091e9c3f-8a01-4890-8790-48b75fd81b40-S0 at slave(1)@10.10.0.20:5051 
> (10.10.0.20)
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.160516 
> 18635 master.cpp:5259] Forwarding status update TASK_RUNNING (UUID: 
> 3f651ba8-7c80-4ac0-ae18-579371ec82d5) for task 5699 of framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.160645 
> 18635 master.cpp:6871] Updating the state of task 5699 of framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.161842 
> 18639 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 
> 10.10.0.178:51645
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.161912 
> 18639 master.cpp:4365] Processing ACKNOWLEDGE call 
> 3f651ba8-7c80-4ac0-ae18-579371ec82d5 for task 5699 of framework 
> 4c6031e7-4cfd-4219-89b2-d19c7101e045-0001 (Long Lived Framework (C++)) on 
> agent 091e9c3f-8a01-4890-8790-48b75fd81b40-S0
> Jun 09 02:28:43 ip-10-10-0-180 mesos-master[18627]: I0609 02:28:43.556354 
> 

[jira] [Commented] (MESOS-2105) Reliably report OOM even if the executor exits normally

2016-06-10 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325265#comment-15325265
 ] 

Greg Mann commented on MESOS-2105:
--

We recently observed this on an internal test cluster. An executor was 
OOM-killed before the cgroup mem isolator was able to destroy the offending 
container. Here are the kernel logs from the agent machine interleaved with the 
Mesos agent logs:
{code}
Jun 10 16:14:47 ip-10-10-0-87 mesos-slave[3038]: I0610 16:14:47.434166  3044 
mem.cpp:644] OOM detected for container d9d84892-1165-43a2-9675-10b88be141f4
Jun 10 16:14:47 ip-10-10-0-87 kernel: docker0: port 1(vethb30b136) entered 
forwarding state
Jun 10 16:14:47 ip-10-10-0-87 kernel: balloon-executo invoked oom-killer: 
gfp_mask=0xd0, order=0, oom_score_adj=0
Jun 10 16:14:47 ip-10-10-0-87 kernel: balloon-executo cpuset=/ mems_allowed=0
Jun 10 16:14:47 ip-10-10-0-87 kernel: CPU: 2 PID: 23924 Comm: balloon-executo 
Tainted: G    T 3.10.0-327.10.1.el7.x86_64 #1
Jun 10 16:14:47 ip-10-10-0-87 kernel: Hardware name: Xen HVM domU, BIOS 
4.2.amazon 05/12/2016
Jun 10 16:14:47 ip-10-10-0-87 kernel:  8803a6463980 9a29939c 
88025f85bcd0 816352cc
Jun 10 16:14:47 ip-10-10-0-87 kernel:  88025f85bd60 8163026c 
8802ec7265b8 0001
Jun 10 16:14:47 ip-10-10-0-87 kernel:  0003 fffeefff 
0001 8803a6467803
Jun 10 16:14:47 ip-10-10-0-87 kernel: Call Trace:
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] dump_stack+0x19/0x1b
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
dump_header+0x8e/0x214
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
oom_kill_process+0x24e/0x3b0
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] ? 
has_capability_noaudit+0x1e/0x30
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
mem_cgroup_oom_synchronize+0x555/0x580
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] ? 
mem_cgroup_charge_common+0xc0/0xc0
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
pagefault_out_of_memory+0x14/0x90
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
mm_fault_error+0x68/0x12b
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
__do_page_fault+0x3e2/0x450
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] 
do_page_fault+0x23/0x80
Jun 10 16:14:47 ip-10-10-0-87 kernel:  [] page_fault+0x28/0x30
Jun 10 16:14:47 ip-10-10-0-87 kernel: Task in 
/mesos/d9d84892-1165-43a2-9675-10b88be141f4 killed as a result of limit of 
/mesos/d9d84892-1165-43a2-9675-10b88be141f4
Jun 10 16:14:47 ip-10-10-0-87 kernel: memory: usage 196608kB, limit 196608kB, 
failcnt 50
Jun 10 16:14:47 ip-10-10-0-87 kernel: memory+swap: usage 196608kB, limit 
9007199254740991kB, failcnt 0
Jun 10 16:14:47 ip-10-10-0-87 kernel: kmem: usage 0kB, limit 
9007199254740991kB, failcnt 0
Jun 10 16:14:47 ip-10-10-0-87 kernel: Memory cgroup stats for 
/mesos/d9d84892-1165-43a2-9675-10b88be141f4: cache:0KB rss:196608KB 
rss_huge:188416KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:
Jun 10 16:14:47 ip-10-10-0-87 kernel: [ pid ]   uid  tgid total_vm  rss 
nr_ptes swapents oom_score_adj name
Jun 10 16:14:47 ip-10-10-0-87 kernel: [23886] 0 23886 2378  288 
 110 0 sh
Jun 10 16:14:47 ip-10-10-0-87 kernel: [23914] 0 23914   22324052827 
1590 0 balloon-executo
Jun 10 16:14:47 ip-10-10-0-87 kernel: Memory cgroup out of memory: Kill process 
23924 (balloon-executo) score 1045 or sacrifice child
Jun 10 16:14:47 ip-10-10-0-87 kernel: Killed process 23914 (balloon-executo) 
total-vm:892960kB, anon-rss:196168kB, file-rss:15140kB
Jun 10 16:14:47 ip-10-10-0-87 mesos-slave[3038]: I0610 16:14:47.600641  3043 
slave.cpp:3788] executor(1)@10.10.0.87:37878 exited
{code}

> Reliably report OOM even if the executor exits normally
> ---
>
> Key: MESOS-2105
> URL: https://issues.apache.org/jira/browse/MESOS-2105
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Affects Versions: 0.20.0
>Reporter: Ian Downes
>
> Container OOMs are asynchronously reported by the kernel and the following 
> sequence can occur:
> 1) Container OOMs
> 2) Kernel chooses to kill the task
> 3) Executor notices, reports TASK_FAILED, then exits
> 4) MesosContainerizer sees executor exit, *doesn't check for an OOM*, and 
> destroys the container
> 5) Memory isolator may or may not have seen the OOM event but the container 
> is destroyed anyway.
> The task is reported to have failed but without including the cause.
> Suggest always checking if an OOM has occurred, even if the executor exits 
> normally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5536) Completed executors presented as alive

2016-06-10 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325264#comment-15325264
 ] 

Anand Mazumdar commented on MESOS-5536:
---

Thanks for reporting this issue [~janisz]. From the logs, This looks related to 
MESOS-5380 that we recently fixed. Can you try this with Mesos 0.28.2?

> Completed executors presented as alive
> --
>
> Key: MESOS-5536
> URL: https://issues.apache.org/jira/browse/MESOS-5536
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
> Environment: Ubuntu 14.04.3 LTS
>Reporter: Tomasz Janiszewski
>
> I'm running Mesos 0.28.0. Mesos {{slave(1)/state}} endpoint returns some 
> completed executors not in frameworks.completed_executors but in 
> frameworks.executors. Alsa this executor presents in {{monitor/statistics}}
> {code:JavaScript:title=slave(1)/state}
> {
> "attributes": {...},
> "completed_frameworks": [],
> "flags": {...},
> "frameworks": [
> {
> "checkpoint": true,
> "completed_executors": [...],
> "executors": [
>   {
>   "queued_tasks": [],
>   "tasks": [],
>   "completed_tasks": [
>   {
>   "discovery": {...},
>   "executor_id": "",
>   "framework_id": 
> "f65b163c-0faf-441f-ac14-91739fa4394c-",
>   "id": 
> "service.a3b609b8-27ec-11e6-8044-02c89eb9127e",
>   "labels": [...],
>   "name": "service",
>   "resources": {...},
>   "slave_id": 
> "ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13",
>   "state": "TASK_KILLED",
>   "statuses": []
>   }
>   ],
>   "container": "ead42e63-ac92-4ad0-a99c-4af9c3fa5e31",
>   "directory": "...",
>   "id": "service.a3b609b8-27ec-11e6-8044-02c89eb9127e",
>   "name": "Command Executor (Task: 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e) (Command: sh -c 'cd 
> service...')",  
>   "resources": {...},
>   "source": "service.a3b609b8-27ec-11e6-8044-02c89eb9127e"
>   
>   },
>   ...
> ],
> }
> ],
> "git_sha": "961edbd82e691a619a4c171a7aadc9c32957fa73",
> "git_tag": "0.28.0",
> "version": "0.28.0",
> ...
> }
> {code}
> {code:title="var/log/mesos/mesos-slave.INFO"}
> 13:33:19.479182  [slave.cpp:1361] Got assigned task 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework 
> f65b163c-0faf-441f-ac14-91739fa4394c-
> 13:33:19.482566  [slave.cpp:1480] Launching task 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework 
> f65b163c-0faf-441f-ac14-91739fa4394c-
> 13:33:19.483921  [paths.cpp:528] Trying to chown 
> '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31'
>  to user 'mesosuser'
> 13:33:19.504173  [slave.cpp:5367] Launching executor 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework 
> f65b163c-0faf-441f-ac14-91739fa4394c- with resources cpus(*):0.1; 
> mem(*):32 in work directory 
> '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31'
> 13:33:19.505537  [containerizer.cpp:666] Starting container 
> 'ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' for executor 
> 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework 
> 'f65b163c-0faf-441f-ac14-91739fa4394c-'
> 13:33:19.505734  [slave.cpp:1698] Queuing task 
> 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' for executor 
> 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework 
> f65b163c-0faf-441f-ac14-91739fa4394c-
> ...
> 13:33:19.977483  [containerizer.cpp:1118] Checkpointing executor's forked pid 
> 25576 to 
> '/tmp/mesos/meta/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31/pids/forked.pid'
> 13:33:35.775195  [slave.cpp:1891] Asked to kill task 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework 
> f65b163c-0faf-441f-ac14-91739fa4394c-
> 13:33:35.775645  [slave.cpp:3002] Handling status update TASK_KILLED (UUID: 
> eba64915-7df2-483d-8982-a9a46a48a81b) for task 
> service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework 
> 

[jira] [Created] (MESOS-5599) oom

2016-06-10 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5599:


 Summary: oom
 Key: MESOS-5599
 URL: https://issues.apache.org/jira/browse/MESOS-5599
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4749) Move HTB out of containers

2016-06-10 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4749:
--
Component/s: network

> Move HTB out of containers
> --
>
> Key: MESOS-4749
> URL: https://issues.apache.org/jira/browse/MESOS-4749
> Project: Mesos
>  Issue Type: Task
>  Components: network
>Reporter: Cong Wang
>Assignee: Cong Wang
>Priority: Minor
>
> Currently we set a fixed HTB bandwidth in each of the container, which makes 
> it impossible to share the link if idle. As the first step, we should move it 
> out of the containers, into the qdisc hierarchy of the physical interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4749) Move HTB out of containers

2016-06-10 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4749:
--
Shepherd: Jie Yu
  Sprint: Mesosphere Sprint 36
Story Points: 3

> Move HTB out of containers
> --
>
> Key: MESOS-4749
> URL: https://issues.apache.org/jira/browse/MESOS-4749
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>Priority: Minor
>
> Currently we set a fixed HTB bandwidth in each of the container, which makes 
> it impossible to share the link if idle. As the first step, we should move it 
> out of the containers, into the qdisc hierarchy of the physical interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5598) pailer dies and no longer spools logs from docker container

2016-06-10 Thread John Camelon (JIRA)
John Camelon created MESOS-5598:
---

 Summary: pailer dies and no longer spools logs from docker 
container
 Key: MESOS-5598
 URL: https://issues.apache.org/jira/browse/MESOS-5598
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.2
Reporter: John Camelon


There are numerous instances where we see pailer choke on logs and stop 
updating.  

When I ssh into the host where the container is running, running "docker logs" 
will yield much more output past the point where pailer stopped working.   

I am not sure what logs I am supposed to gather to diagnose this, please let me 
know.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5597) Document Mesos "health check" feature

2016-06-10 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5597:
--

 Summary: Document Mesos "health check" feature
 Key: MESOS-5597
 URL: https://issues.apache.org/jira/browse/MESOS-5597
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Neil Conway


We don't talk about this feature at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5596) Document agent SIGTERM behavior

2016-06-10 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5596:
--

 Summary: Document agent SIGTERM behavior
 Key: MESOS-5596
 URL: https://issues.apache.org/jira/browse/MESOS-5596
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Neil Conway
Priority: Minor


Sending a SIGTERM to agents can be useful; we should document how 
agents/masters handle this situation, versus a spontaneous agent disconnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5595) GMock warning in FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout

2016-06-10 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5595:
---
Shepherd: Anand Mazumdar

> GMock warning in FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout
> ---
>
> Key: MESOS-5595
> URL: https://issues.apache.org/jira/browse/MESOS-5595
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Trivial
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: error(0x7fff573067c0, @0x7fbccb44a920 "Framework 
> disconnected")
> Stack trace:
> [   OK ] FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout (181 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5595) GMock warning in FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout

2016-06-10 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5595:
--

 Summary: GMock warning in 
FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout
 Key: MESOS-5595
 URL: https://issues.apache.org/jira/browse/MESOS-5595
 Project: Mesos
  Issue Type: Bug
  Components: tests
Reporter: Neil Conway
Assignee: Neil Conway
Priority: Trivial


{noformat}
[ RUN  ] FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout

GMOCK WARNING:
Uninteresting mock function call - returning directly.
Function call: error(0x7fff573067c0, @0x7fbccb44a920 "Framework 
disconnected")
Stack trace:
[   OK ] FaultToleranceTest.SchedulerReregisterAfterFailoverTimeout (181 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-10 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324043#comment-15324043
 ] 

Joerg Schad commented on MESOS-5588:


The problem arises from the fact that the protobuf parser ignores unknown field 
values:
https://github.com/apache/mesos/blob/master/3rdparty/stout/include/stout/protobuf.hpp#L599

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of od user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)