[jira] [Updated] (MESOS-8194) Make agent support status updates for operations affecting default resources.

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8194:
-
Description: 
The operations should be applied and it should send {{OperationStatusUpdates}} 
to the master.
A status update manager must be held by the agent to send updates.
The agent's acknowledgement handler should also be updated to acknowledge 
updates.

  was:The operations should be applied and it should send 
{{OperationStatusUpdates}} to the master.


> Make agent support status updates for operations affecting default resources.
> -
>
> Key: MESOS-8194
> URL: https://issues.apache.org/jira/browse/MESOS-8194
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The operations should be applied and it should send 
> {{OperationStatusUpdates}} to the master.
> A status update manager must be held by the agent to send updates.
> The agent's acknowledgement handler should also be updated to acknowledge 
> updates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8194) Make agent support status updates for operations affecting default resources.

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8194:
-
Summary: Make agent support status updates for operations affecting default 
resources.  (was: Make agent’s ApplyOfferOperationMessage handler support 
operations affecting default resources.)

> Make agent support status updates for operations affecting default resources.
> -
>
> Key: MESOS-8194
> URL: https://issues.apache.org/jira/browse/MESOS-8194
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The operations should be applied and it should send 
> {{OperationStatusUpdates}} to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8186:
-
Description: 
The handler should forward acks to the RP manager for resource provider 
operations.
Handling of operations on agent default resources will be taken care of as part 
of MESOS-8194

  was:
The handler should forward acks to the RP manager for resource provider 
operations.
Handling of operations on agent default resources will be taken care of as part 
of 


> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> The handler should forward acks to the RP manager for resource provider 
> operations.
> Handling of operations on agent default resources will be taken care of as 
> part of MESOS-8194



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8186:
-
Description: 
The handler should forward acks to the RP manager for resource provider 
operations.
Handling of operations on agent default resources will be taken care of as part 
of 

  was:The handler should handle acks for operations handled by the agent, and 
forward the ack to the RP manager for all other operations.


> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> The handler should forward acks to the RP manager for resource provider 
> operations.
> Handling of operations on agent default resources will be taken care of as 
> part of 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283060#comment-16283060
 ] 

Greg Mann commented on MESOS-8186:
--

{code}
commit 761f47fe8e51ca3065a156308220e2a1ab61a628
Author: Greg Mann 
Date:   Thu Dec 7 11:36:20 2017 -0800

Added status update acknowledgement to resource provider manager.

When the agent receives an offer operation update acknowledgement
from the master, it is forwarded to the relevant resource
provider via the resource provider manager.

Review: https://reviews.apache.org/r/64145/
{code}
{code}
commit cfb634b6742d3b6d74bef079473663d915014cfc
Author: Greg Mann 
Date:   Thu Dec 7 11:36:17 2017 -0800

Added offer operation update acknowledgement to the agent.

The agent's 'offerOperationUpdateAcknowlegement' handler is
updated to pass acknowledgements to the resource provider
manager.

The agent's resource provider message handler is also
updated to avoid removing offer operations, since this
should actually be done upon acknowledgement of the update.

Review: https://reviews.apache.org/r/64146/
{code}
{code}
commit ecef069b8349cdb9d98b6267636d4cb66948da20
Author: Greg Mann 
Date:   Thu Dec 7 11:36:14 2017 -0800

Added ACKNOWLEDGE event to the resource provider API.

The new event is used to send offer operation status udpate
acknowledgements to resource providers.

Review: https://reviews.apache.org/r/64143/
{code}

> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> The handler should handle acks for operations handled by the agent, and 
> forward the ack to the RP manager for all other operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283056#comment-16283056
 ] 

Greg Mann commented on MESOS-8193:
--

{code}
commit 328c1c11690dc112c4af4a5cc79ca0d688ebfb19
Author: Greg Mann 
Date:   Thu Dec 7 11:35:55 2017 -0800

Updated master ACCEPT handler to disallow offer operation feedback.

This patch updates the master's ACCEPT call code path to fail
offer operations when their `id` field is set. Since protobufs
have already been updated for offer operation feedback, but the
feature is not fully implemented, we will disallow the setting
of this field for now.

Review: https://reviews.apache.org/r/64142/
{code}

> Update master’s OfferOperationStatusUpdate handler to acknowledge updates to 
> the agent if OfferOperationID is not set.
> --
>
> Key: MESOS-8193
> URL: https://issues.apache.org/jira/browse/MESOS-8193
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283056#comment-16283056
 ] 

Greg Mann edited comment on MESOS-8193 at 12/8/17 4:53 AM:
---

{code}
commit 641b716dca7fdb1da7402c4e5728ff33f85d5d74
Author: Greg Mann 
Date:   Thu Dec 7 11:36:10 2017 -0800

Made master acknowledge offer operation updates when 'id' isn't set.

When a framework does not request feedback about an operation,
the master should acknowledge offer operation status updates
to the agent so that the updates are not retried.

Review: https://reviews.apache.org/r/64144/
{code}


was (Author: greggomann):
{code}
commit 328c1c11690dc112c4af4a5cc79ca0d688ebfb19
Author: Greg Mann 
Date:   Thu Dec 7 11:35:55 2017 -0800

Updated master ACCEPT handler to disallow offer operation feedback.

This patch updates the master's ACCEPT call code path to fail
offer operations when their `id` field is set. Since protobufs
have already been updated for offer operation feedback, but the
feature is not fully implemented, we will disallow the setting
of this field for now.

Review: https://reviews.apache.org/r/64142/
{code}

> Update master’s OfferOperationStatusUpdate handler to acknowledge updates to 
> the agent if OfferOperationID is not set.
> --
>
> Key: MESOS-8193
> URL: https://issues.apache.org/jira/browse/MESOS-8193
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283055#comment-16283055
 ] 

Greg Mann commented on MESOS-8190:
--

We'll pause this for a bit, landing other storage-related stuff first.

> Update the master to accept OfferOperationIDs from frameworks.
> --
>
> Key: MESOS-8190
> URL: https://issues.apache.org/jira/browse/MESOS-8190
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Master’s {{ACCEPT}} handler should send failed operation updates when a 
> framework sets the {{OfferOperationID}} on an operation destined for an agent 
> without the {{RESOURCE_PROVIDER}} capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8197) Implement a library to send offer operation status updates

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283050#comment-16283050
 ] 

Greg Mann commented on MESOS-8197:
--

{code}
commit 8844591685c7395552bc740954bdf83ff0fc67d0
Author: Gaston Kleiman 
Date:   Thu Dec 7 19:59:51 2017 -0800

Implemented the `OfferOperationStatusUpdateManager`.

This class will handle the offer operation status updates generated by
the agent and by resource providers.

Review: https://reviews.apache.org/r/64096/
{code}
{code}
commit 7ce169cf0895cdf512be4cc756a0ae67a402cebe
Author: Gaston Kleiman 
Date:   Thu Dec 7 19:59:49 2017 -0800

Added a generic actor to be used by status update managers.

This actor handles the checkpointing, recovery, and retry of status
updates.

It will initially be used by the offer operation status update
manager, but it was designed and implemented so that it can replace
the current implementation of the task status update manager.

Review: https://reviews.apache.org/r/64095/
{code}
{code}
commit 680ccee60c29747a1e929e4b7bb8dbed5216
Author: Gaston Kleiman 
Date:   Thu Dec 7 19:59:48 2017 -0800

Added the `OfferOperationStatusUpdateRecord` protobuf message.

This protobuf message is used to checkpoint offer operation status
updates and acknowledgments.

Review: https://reviews.apache.org/r/64094/
{code}
{code}
commit 06209e614c6f4a0593d017ab293bd1819acbc02b
Author: Gaston Kleiman 
Date:   Thu Dec 7 19:59:46 2017 -0800

Added operators for offer operation update protobuf classes.

Review: https://reviews.apache.org/r/64093/
{code}

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-12-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283052#comment-16283052
 ] 

Greg Mann commented on MESOS-8199:
--

{code}
commit 9cb85e9551254c359f9d3989701ac5bd9e9adf8f
Author: Greg Mann 
Date:   Thu Dec 7 11:36:22 2017 -0800

Added plumbing for master to reconcile offer operations with agent.

This patch adds the RECONCILE_OFFER_OPERATIONS event to the resource
provider API, along with the internal message
'ReconcileOfferOperationsMessage' used for explicit operation
reconciliation between master and agent. Handlers for these are
added to the agent and resource provider manager as well.

This explicit reconciliation is useful in cases where an agent's
'UpdateSlaveMessage' races with an incoming task launch so that
the master's view of the agent's state is not consistent with the
agent's actual state when the 'UpdateSlaveMessage' was sent.

Review: https://reviews.apache.org/r/63804/
{code}

> Add plumbing for explicit offer operation reconciliation between master, 
> agent, and RPs.
> 
>
> Key: MESOS-8199
> URL: https://issues.apache.org/jira/browse/MESOS-8199
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8312) Pass resource provider information to master as part of UpdateSlaveMessage

2017-12-07 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-8312:
-

Assignee: Benjamin Bannier

> Pass resource provider information to master as part of UpdateSlaveMessage
> --
>
> Key: MESOS-8312
> URL: https://issues.apache.org/jira/browse/MESOS-8312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> We extended {{UpdateSlaveMessage}} so updates to an agent's total resources 
> from resource providers are possible. We realized that will need to 
> explicitly pass resource provider details (here for now: 
> {{ResourceProviderInfo}}) to the master so it can be queried for the 
> providers present on certain agents. This should happen as part of 
> {{UpdateSlaveMessage}} so a single synchronization channel is used for this 
> kind of information.
> We need to adjust {{UpdateSlaveMessage}} for these requirements. This should 
> happen before 1.5.0 gets released so we do not need to deprecate a never 
> really used message format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7663) Update the documentation to reflect the addition of reservation refinement.

2017-12-07 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282825#comment-16282825
 ] 

Michael Park edited comment on MESOS-7663 at 12/8/17 12:50 AM:
---

{noformat}
commit 02758c4e75483a0cd135fa465d1704d793bd4e48
Author: Michael Park 
Date:   Thu Dec 7 10:20:18 2017 -0800

Added reservation refinement documentation.

Review: https://reviews.apache.org/r/64312/
{noformat}


was (Author: mcypark):
{noformat}
commit 02758c4e75483a0cd135fa465d1704d793bd4e48 (HEAD -> master, 
upstream/master, reservation-refinement-doc)
Author: Michael Park 
Date:   Thu Dec 7 10:20:18 2017 -0800

Added reservation refinement documentation.

Review: https://reviews.apache.org/r/64312/
{noformat}

> Update the documentation to reflect the addition of reservation refinement.
> ---
>
> Key: MESOS-7663
> URL: https://issues.apache.org/jira/browse/MESOS-7663
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Blocker
> Fix For: 1.5.0
>
>
> There are a few things we need to be sure to document:
> * What reservation refinement is.
> * The new "format" for Resource, when using the RESERVATION_REFINEMENT 
> capability.
> * The filtering of resources if a framework is not RESERVATION_REFINEMENT 
> capable.
> * The current limitations that only a single reservation can be pushed / 
> popped within a single RESERVE / UNRESERVE operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8302) Improve master failover performance.

2017-12-07 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8302:
---
Attachment: 1.3-1.5_no_history.png
1.3-1.5_history.png

Attached graphs of improvement from 1.3 to 1.5. Will resolve this for now since 
significant improvements were made, will be publishing a blog post that 
outlines the work!

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
> Fix For: 1.5.0
>
> Attachments: 1.3-1.5_history.png, 1.3-1.5_no_history.png
>
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8195) Implement explicit offer operation reconciliation between the master, agent and RPs.

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8195:
-
Target Version/s: 1.5.0

> Implement explicit offer operation reconciliation between the master, agent 
> and RPs.
> 
>
> Key: MESOS-8195
> URL: https://issues.apache.org/jira/browse/MESOS-8195
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Upon receiving an {{UpdateSlave}} message the master should compare its list 
> of pending operations for the agent/LRPs to the list of pending operations 
> contained in the message. It should then build a{{ ReconcileOfferOperations}} 
> message with all the operations missing in the {{UpdateSlave}} message and 
> send it to the agent.
> The agent will receive these messages and should handle them by itself if the 
> operations affect the default resources, or forward them to the RP manager 
> otherwise.
> The agent/RP handler should check if the operations are pending. If an 
> operation is not pending, then an {{ApplyOfferOperation}} message got 
> dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status 
> update to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3437) Port flags_tests

2017-12-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3437:

Priority: Minor  (was: Major)

> Port flags_tests
> 
>
> Key: MESOS-3437
> URL: https://issues.apache.org/jira/browse/MESOS-3437
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, stout
>
> Straightforward tests that happen to depend on os.hpp. If we can get os.hpp 
> ported, this will probably follow as a consequence.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3437) Port flags_tests

2017-12-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3437:
---

Assignee: Andrew Schwartzmeyer  (was: Raluca Miclea)

> Port flags_tests
> 
>
> Key: MESOS-3437
> URL: https://issues.apache.org/jira/browse/MESOS-3437
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, stout
>
> Straightforward tests that happen to depend on os.hpp. If we can get os.hpp 
> ported, this will probably follow as a consequence.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-3437) Port flags_tests

2017-12-07 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217742#comment-16217742
 ] 

Andrew Schwartzmeyer edited comment on MESOS-3437 at 12/8/17 12:27 AM:
---

-Review here: https://reviews.apache.org/r/63239/-


was (Author: andschwa):
Review here: https://reviews.apache.org/r/63239/

> Port flags_tests
> 
>
> Key: MESOS-3437
> URL: https://issues.apache.org/jira/browse/MESOS-3437
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, stout
>
> Straightforward tests that happen to depend on os.hpp. If we can get os.hpp 
> ported, this will probably follow as a consequence.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8302) Improve master failover performance.

2017-12-07 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282768#comment-16282768
 ] 

Benjamin Mahler edited comment on MESOS-8302 at 12/8/17 12:05 AM:
--

{noformat}
commit 108eb0631ec84251812981182f24979216c3a1c0
Author: Benjamin Mahler 
Date:   Thu Dec 7 11:59:46 2017 -0800

Added a RepeatedPtrField to vector conversion overload for rvalues.

This enables moving the individual entries out into the output vector.

Review: https://reviews.apache.org/r/64427
{noformat}

{noformat}
commit 4fe3bdb8ed5e8d4ddc894ff7fd5cbcd3183526be
Author: Benjamin Mahler 
Date:   Thu Dec 7 12:03:58 2017 -0800

Eliminated some copying of tasks / executors in agent re-registration.

Review: https://reviews.apache.org/r/64428
{noformat}


was (Author: bmahler):
{noformat}
commit 4fe3bdb8ed5e8d4ddc894ff7fd5cbcd3183526be
Author: Benjamin Mahler 
Date:   Thu Dec 7 12:03:58 2017 -0800

Eliminated some copying of tasks / executors in agent re-registration.

Review: https://reviews.apache.org/r/64428
{noformat}

{noformat}
commit 54e03f3ceae87c69e5a01a585a23ec87f2dd8206
Author: Benjamin Mahler 
Date:   Thu Dec 7 14:29:03 2017 -0800

Fixed skipping of completed frameworks in the master failover benchmark.

This benchmark was previously accidentally skipping the completed
frameworks, due to an incorrect loop.

Review: https://reviews.apache.org/r/64429
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8302) Improve master failover performance.

2017-12-07 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282768#comment-16282768
 ] 

Benjamin Mahler commented on MESOS-8302:


{noformat}
commit 4fe3bdb8ed5e8d4ddc894ff7fd5cbcd3183526be
Author: Benjamin Mahler 
Date:   Thu Dec 7 12:03:58 2017 -0800

Eliminated some copying of tasks / executors in agent re-registration.

Review: https://reviews.apache.org/r/64428
{noformat}

{noformat}
commit 54e03f3ceae87c69e5a01a585a23ec87f2dd8206
Author: Benjamin Mahler 
Date:   Thu Dec 7 14:29:03 2017 -0800

Fixed skipping of completed frameworks in the master failover benchmark.

This benchmark was previously accidentally skipping the completed
frameworks, due to an incorrect loop.

Review: https://reviews.apache.org/r/64429
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8313) Provide a host namespace container supervisor.

2017-12-07 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282762#comment-16282762
 ] 

Jie Yu commented on MESOS-8313:
---

Yeah, agreed. This also allows us to support systemd running inside Mesos 
container.

> Provide a host namespace container supervisor.
> --
>
> Key: MESOS-8313
> URL: https://issues.apache.org/jira/browse/MESOS-8313
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
> Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of 
> creating the container namespaces needs some adjustment before we can 
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace 
> to mount {{procfs}}. Currently, this prevents containers joining the host PID 
> namespace. The workaround is to always create a new container PID namespace 
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network 
> namespace to mount {{sysfs}}. There's no general workaround for this since we 
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the 
> {{chroot}}. This restriction makes the existing order of containerizer 
> operations impossible to remain in the case where we want the executor to be 
> in a new user namespace that has no children (i.e. to protect the container 
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all 
> of these issues by creating a new containerized supervisor that runs fully 
> outside the container and is responsible for constructing the roots mount 
> namespace, launching the containerized to enter the rest of the container, 
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it 
> will be able to construct the container rootfs in a new mount namespace 
> without user namespace restrictions. We can then clone a child to fully 
> create and enter container namespaces along with the prefabricated rootfs 
> mount namespace.
> The only drawback to this approach is that the container's mount namespace 
> will be owned by the root user namespace rather than the container user 
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}} 
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
> will be its parent process. This new subcommand will be used for the default 
> executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8313) Provide a host namespace container supervisor.

2017-12-07 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282757#comment-16282757
 ] 

James Peach commented on MESOS-8313:


{quote}
The other draw back is that we created another nanny process in addition to the 
one that'll perform pid 1 reaping.
{quote}

Right. Currently, the supervisor is optional and inside the container. In this 
proposal, there would always be a supervisor outside the container, though I 
think that the one inside the container would remain optional.

> Provide a host namespace container supervisor.
> --
>
> Key: MESOS-8313
> URL: https://issues.apache.org/jira/browse/MESOS-8313
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
> Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of 
> creating the container namespaces needs some adjustment before we can 
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace 
> to mount {{procfs}}. Currently, this prevents containers joining the host PID 
> namespace. The workaround is to always create a new container PID namespace 
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network 
> namespace to mount {{sysfs}}. There's no general workaround for this since we 
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the 
> {{chroot}}. This restriction makes the existing order of containerizer 
> operations impossible to remain in the case where we want the executor to be 
> in a new user namespace that has no children (i.e. to protect the container 
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all 
> of these issues by creating a new containerized supervisor that runs fully 
> outside the container and is responsible for constructing the roots mount 
> namespace, launching the containerized to enter the rest of the container, 
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it 
> will be able to construct the container rootfs in a new mount namespace 
> without user namespace restrictions. We can then clone a child to fully 
> create and enter container namespaces along with the prefabricated rootfs 
> mount namespace.
> The only drawback to this approach is that the container's mount namespace 
> will be owned by the root user namespace rather than the container user 
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}} 
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
> will be its parent process. This new subcommand will be used for the default 
> executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8142) Improve container security with user namespaces.

2017-12-07 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-8142:
---
Summary: Improve container security with user namespaces.  (was: Improve 
container security with user namespaces)

> Improve container security with user namespaces.
> 
>
> Key: MESOS-8142
> URL: https://issues.apache.org/jira/browse/MESOS-8142
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, security
>Reporter: James Peach
>Assignee: James Peach
>
> As a first pass at supporting user namespaces, figure out how we can use them 
> to improve container security when running untrusted tasks.
> This ticket is specifically targeting how to build a user namespace hierarchy 
> and excluding any sort of ID mapping for the container images.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8313) Provide a host namespace container supervisor.

2017-12-07 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-8313:
---
Attachment: IMG_2629.JPG

> Provide a host namespace container supervisor.
> --
>
> Key: MESOS-8313
> URL: https://issues.apache.org/jira/browse/MESOS-8313
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
> Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of 
> creating the container namespaces needs some adjustment before we can 
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace 
> to mount {{procfs}}. Currently, this prevents containers joining the host PID 
> namespace. The workaround is to always create a new container PID namespace 
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network 
> namespace to mount {{sysfs}}. There's no general workaround for this since we 
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the 
> {{chroot}}. This restriction makes the existing order of containerizer 
> operations impossible to remain in the case where we want the executor to be 
> in a new user namespace that has no children (i.e. to protect the container 
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all 
> of these issues by creating a new containerized supervisor that runs fully 
> outside the container and is responsible for constructing the roots mount 
> namespace, launching the containerized to enter the rest of the container, 
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it 
> will be able to construct the container rootfs in a new mount namespace 
> without user namespace restrictions. We can then clone a child to fully 
> create and enter container namespaces along with the prefabricated rootfs 
> mount namespace.
> The only drawback to this approach is that the container's mount namespace 
> will be owned by the root user namespace rather than the container user 
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}} 
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
> will be its parent process. This new subcommand will be used for the default 
> executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8313) Provide a host namespace container supervisor.

2017-12-07 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282694#comment-16282694
 ] 

Jie Yu commented on MESOS-8313:
---

The other draw back is that we created another nanny process in addition to the 
one that'll perform pid 1 reaping.

> Provide a host namespace container supervisor.
> --
>
> Key: MESOS-8313
> URL: https://issues.apache.org/jira/browse/MESOS-8313
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
> Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of 
> creating the container namespaces needs some adjustment before we can 
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace 
> to mount {{procfs}}. Currently, this prevents containers joining the host PID 
> namespace. The workaround is to always create a new container PID namespace 
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network 
> namespace to mount {{sysfs}}. There's no general workaround for this since we 
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the 
> {{chroot}}. This restriction makes the existing order of containerizer 
> operations impossible to remain in the case where we want the executor to be 
> in a new user namespace that has no children (i.e. to protect the container 
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all 
> of these issues by creating a new containerized supervisor that runs fully 
> outside the container and is responsible for constructing the roots mount 
> namespace, launching the containerized to enter the rest of the container, 
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it 
> will be able to construct the container rootfs in a new mount namespace 
> without user namespace restrictions. We can then clone a child to fully 
> create and enter container namespaces along with the prefabricated rootfs 
> mount namespace.
> The only drawback to this approach is that the container's mount namespace 
> will be owned by the root user namespace rather than the container user 
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}} 
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
> will be its parent process. This new subcommand will be used for the default 
> executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8313) Provide a host namespace container supervisor.

2017-12-07 Thread James Peach (JIRA)
James Peach created MESOS-8313:
--

 Summary: Provide a host namespace container supervisor.
 Key: MESOS-8313
 URL: https://issues.apache.org/jira/browse/MESOS-8313
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: James Peach
Assignee: James Peach


After more investigation on user namespaces, the current implementation of 
creating the container namespaces needs some adjustment before we can implement 
user namespaces in a useable fashion.

The problems we need to address are:

1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace to 
mount {{procfs}}. Currently, this prevents containers joining the host PID 
namespace. The workaround is to always create a new container PID namespace (as 
a child of the user namespace) with the {{namespaces/pid}} isolator.

2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network namespace 
to mount {{sysfs}}. There's no general workaround for this since we can't 
generally require containers to not join the host network namespace.

3. The containerizer can't enter a user namespace after entering the 
{{chroot}}. This restriction makes the existing order of containerizer 
operations impossible to remain in the case where we want the executor to be in 
a new user namespace that has no children (i.e. to protect the container from a 
privileged task).

After some discussion with [~jieyu], we believe that we can some most or all of 
these issues by creating a new containerized supervisor that runs fully outside 
the container and is responsible for constructing the roots mount namespace, 
launching the containerized to enter the rest of the container, and waiting on 
the entered process.

Since this new supervisor process is not running in the user namespace, it will 
be able to construct the container rootfs in a new mount namespace without user 
namespace restrictions. We can then clone a child to fully create and enter 
container namespaces along with the prefabricated rootfs mount namespace.

The only drawback to this approach is that the container's mount namespace will 
be owned by the root user namespace rather than the container user namespace. 
We are OK with this for now.

The plan here is to retain the existing {{mesos-containerizer launch}} 
subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
will be its parent process. This new subcommand will be used for the default 
executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8197) Implement a library to send offer operation status updates

2017-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267340#comment-16267340
 ] 

Gastón Kleiman edited comment on MESOS-8197 at 12/7/17 9:21 PM:


https://reviews.apache.org/r/63852/
https://reviews.apache.org/r/63853/
https://reviews.apache.org/r/64093/
https://reviews.apache.org/r/64094/
https://reviews.apache.org/r/64095/
https://reviews.apache.org/r/64096/


was (Author: gkleiman):
https://reviews.apache.org/r/63852/
https://reviews.apache.org/r/63853/

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8311) DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP/0 is flaky.

2017-12-07 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8311:
--

 Summary: 
DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP/0 
is flaky.
 Key: MESOS-8311
 URL: https://issues.apache.org/jira/browse/MESOS-8311
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Fedora 23
Reporter: Alexander Rukletsov
Assignee: Qian Zhang
 Attachments: ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP-badrun.txt

{noformat}
../../src/tests/health_check_tests.cpp:2364
Failed to wait 15secs for statusHealthy
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4527) Roles can exceed limit allocation via reservations.

2017-12-07 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-4527:

Story Points: 5  (was: 1)

> Roles can exceed limit allocation via reservations.
> ---
>
> Key: MESOS-4527
> URL: https://issues.apache.org/jira/browse/MESOS-4527
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Michael Park
>Assignee: Meng Zhu
>  Labels: mesosphere, multitenancy
>
> Since unallocated reservations are not accounted towards the guarantee (which 
> today is also a limit), we might exceed the limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8312) Pass resource provider information to master as part of UpdateSlaveMessage

2017-12-07 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8312:
---

 Summary: Pass resource provider information to master as part of 
UpdateSlaveMessage
 Key: MESOS-8312
 URL: https://issues.apache.org/jira/browse/MESOS-8312
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Bannier


We extended {{UpdateSlaveMessage}} so updates to an agent's total resources 
from resource providers are possible. We realized that will need to explicitly 
pass resource provider details (here for now: {{ResourceProviderInfo}}) to the 
master so it can be queried for the providers present on certain agents. This 
should happen as part of {{UpdateSlaveMessage}} so a single synchronization 
channel is used for this kind of information.

We need to adjust {{UpdateSlaveMessage}} for these requirements. This should 
happen before 1.5.0 gets released so we do not need to deprecate a never really 
used message format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4527) Roles can exceed limit allocation via reservations.

2017-12-07 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-4527:

Story Points: 3  (was: 5)

> Roles can exceed limit allocation via reservations.
> ---
>
> Key: MESOS-4527
> URL: https://issues.apache.org/jira/browse/MESOS-4527
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Michael Park
>Assignee: Meng Zhu
>  Labels: mesosphere, multitenancy
>
> Since unallocated reservations are not accounted towards the guarantee (which 
> today is also a limit), we might exceed the limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8311) DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP/0 is flaky.

2017-12-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8311:
---
Attachment: ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP-badrun.txt

> DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP/0
>  is flaky.
> -
>
> Key: MESOS-8311
> URL: https://issues.apache.org/jira/browse/MESOS-8311
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Fedora 23
>Reporter: Alexander Rukletsov
>Assignee: Qian Zhang
>  Labels: flaky-test
> Attachments: ROOT_DOCKER_USERNETWORK_HealthyTaskViaHTTP-badrun.txt
>
>
> {noformat}
> ../../src/tests/health_check_tests.cpp:2364
> Failed to wait 15secs for statusHealthy
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7944) Implement jemalloc support for Mesos

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7944:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68  (was: Mesosphere Sprint 63, 
Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere 
Sprint 68, Mesosphere Sprint 69)

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>  Labels: mesosphere
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-12-07 Thread Dmitrii Rozhkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitrii Rozhkov updated MESOS-8078:
---
Description: 
Hi friends, 

These fields are available via the state.json but went missing in the v1 of the 
API:
-leader_info- -> available via GET_MASTER which should always return leading 
master info
start_time
elected_time

As we're showing them on the Overview page of the DC/OS UI, yet would like not 
be using state.json, it would be great to have them somewhere in V1.

  was:
Hi friends, 

These fields are available via the state.json but went missing in the v1 of the 
API:
leader_info
start_time
elected_time

As we're showing them on the Overview page of the DC/OS UI, yet would like not 
be using state.json, it would be great to have them somewhere in V1.


> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> -leader_info- -> available via GET_MASTER which should always return leading 
> master info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8293) Reservation may not be allocated when the role has no quota.

2017-12-07 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-8293:

  Sprint: Mesosphere Sprint 70  (was: Mesosphere Sprint 69)
Story Points: 3

> Reservation may not be allocated when the role has no quota.
> 
>
> Key: MESOS-8293
> URL: https://issues.apache.org/jira/browse/MESOS-8293
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>Priority: Critical
>  Labels: multitenancy
>
> Reservations that belong to a role that has no quota may not be allocated 
> even when the reserved resources are allocatable to the role.
> This is because in the current implementation the reserved resources may be 
> counted towards the headroom left for unallocated quota limit in the second 
> stage allocation.
> https://github.com/apache/mesos/blob/c844db9ac7c0cef59be87438c6781bfb71adcc42/src/master/allocator/mesos/hierarchical.cpp#L1764-L1767
> Roles with quota do not have this issue because currently their reservations 
> are taken care of in the first stage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8291) Add documentation about fault domains

2017-12-07 Thread Benno Evers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benno Evers updated MESOS-8291:
---
Sprint: Mesosphere Sprint 70

> Add documentation about fault domains
> -
>
> Key: MESOS-8291
> URL: https://issues.apache.org/jira/browse/MESOS-8291
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> We need some user docs for fault domains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8303) Add user doc for agent reconfiguration

2017-12-07 Thread Benno Evers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benno Evers updated MESOS-8303:
---
Sprint: Mesosphere Sprint 70

> Add user doc for agent reconfiguration
> --
>
> Key: MESOS-8303
> URL: https://issues.apache.org/jira/browse/MESOS-8303
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8078:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70)

> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> leader_info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69)

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8185) Tasks can be known to the agent but unknown to the master.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8185:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 70  (was: Mesosphere Sprint 
68)

> Tasks can be known to the agent but unknown to the master.
> --
>
> Key: MESOS-8185
> URL: https://issues.apache.org/jira/browse/MESOS-8185
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>  Labels: reliability
>
> Currently, when a master re-registers an agent that was marked unreachable, 
> it shutdowns all not partition-aware frameworks on that agent. When a master 
> re-registers an agent that is already registered, it doesn't check that all 
> tasks from the slave's re-registration message are known to it.
> It is possible that due to a transient loss of connectivity an agent may miss 
> {{SlaveReregisteredMessage}} along with {{ShutdownFrameworkMessage}} and thus 
> will not kill not partition-aware tasks. But the master will mark the agent 
> as registered and will not re-add tasks that it thought will be killed. The 
> agent may re-register again, this time successfully, before becoming marked 
> unreachable while never having terminated tasks of not partition-aware 
> frameworks. The master will simply forget those tasks ever existed, because 
> it has "removed" them during the previous re-registration.
> Example scenario:
> # Connection from the master to the agent stops working
> # Agent doesn't see pings from the master and attempts to re-register
> # Master sends {{SlaveRegisteredMessage}} and {{ShutdownSlaveMessage}}, which 
> don't get to the agent because of the connection failure. Agent is marked 
> registered.
> # Network issue resolves, connection breaks. Agent retries re-registration.
> # Master thinks that the agent was registered since step (3) and just 
> re-sends {{SlaveRegisteredMessage}}. Tasks remain running on the agent.
> One of the possible solutions would be to compare the list of tasks the the 
> already registered agent reports in {{ReregisterSlaveMessage}} and the list 
> of tasks the master has. In this case anything that the master doesn't know 
> about should not exist on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8078:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70  (was: Mesosphere Sprint 66, Mesosphere Sprint 67, 
Mesosphere Sprint 68)

> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> leader_info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7944) Implement jemalloc support for Mesos

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7944:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68  (was: Mesosphere Sprint 63, 
Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere 
Sprint 68, Mesosphere Sprint 70)

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>  Labels: mesosphere
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8185) Tasks can be known to the agent but unknown to the master.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8185:
--
Sprint: Mesosphere Sprint 68  (was: Mesosphere Sprint 68, Mesosphere Sprint 
69)

> Tasks can be known to the agent but unknown to the master.
> --
>
> Key: MESOS-8185
> URL: https://issues.apache.org/jira/browse/MESOS-8185
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>  Labels: reliability
>
> Currently, when a master re-registers an agent that was marked unreachable, 
> it shutdowns all not partition-aware frameworks on that agent. When a master 
> re-registers an agent that is already registered, it doesn't check that all 
> tasks from the slave's re-registration message are known to it.
> It is possible that due to a transient loss of connectivity an agent may miss 
> {{SlaveReregisteredMessage}} along with {{ShutdownFrameworkMessage}} and thus 
> will not kill not partition-aware tasks. But the master will mark the agent 
> as registered and will not re-add tasks that it thought will be killed. The 
> agent may re-register again, this time successfully, before becoming marked 
> unreachable while never having terminated tasks of not partition-aware 
> frameworks. The master will simply forget those tasks ever existed, because 
> it has "removed" them during the previous re-registration.
> Example scenario:
> # Connection from the master to the agent stops working
> # Agent doesn't see pings from the master and attempts to re-register
> # Master sends {{SlaveRegisteredMessage}} and {{ShutdownSlaveMessage}}, which 
> don't get to the agent because of the connection failure. Agent is marked 
> registered.
> # Network issue resolves, connection breaks. Agent retries re-registration.
> # Master thinks that the agent was registered since step (3) and just 
> re-sends {{SlaveRegisteredMessage}}. Tasks remain running on the agent.
> One of the possible solutions would be to compare the list of tasks the the 
> already registered agent reports in {{ReregisterSlaveMessage}} and the list 
> of tasks the master has. In this case anything that the master doesn't know 
> about should not exist on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7699:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70  (was: Mesosphere Sprint 66, Mesosphere Sprint 67, 
Mesosphere Sprint 68)

> "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable 
> freshly released)
> ---
>
> Key: MESOS-7699
> URL: https://issues.apache.org/jira/browse/MESOS-7699
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.2.0
>Reporter: Adam Cecile
>Assignee: Benno Evers
>  Labels: autotools
>
> Hi,
> It seems the issue comes from a workaround added a while ago:
> https://reviews.apache.org/r/40326/
> https://reviews.apache.org/r/40327/
> When building with external libraries it turns out creating build commands 
> line with -isystem /usr/include which is clearly stated as being wrong, 
> according to GCC guys:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129
> I'll do some testing by reverting all -isystem to -I and I'll let it know if 
> it gets built.
> Regards, Adam.
> {noformat}
> configure:21642: result: no
> configure:21642: checking glog/logging.h presence
> configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include 
> -I/usr/include conftest.cpp
> In file included from /usr/include/c++/6/ext/string_conversions.h:41:0,
>  from /usr/include/c++/6/bits/basic_string.h:5417,
>  from /usr/include/c++/6/string:52,
>  from /usr/include/c++/6/bits/locale_classes.h:40,
>  from /usr/include/c++/6/bits/ios_base.h:41,
>  from /usr/include/c++/6/ios:42,
>  from /usr/include/c++/6/ostream:38,
>  from /usr/include/glog/logging.h:43,
>  from conftest.cpp:32:
> /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or 
> directory
>  #include_next 
>  ^
> compilation terminated.
> configure:21642: $? = 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "mesos"
> | #define PACKAGE_TARNAME "mesos"
> | #define PACKAGE_VERSION "1.2.0"
> | #define PACKAGE_STRING "mesos 1.2.0"
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define PACKAGE "mesos"
> | #define VERSION "1.2.0"
> | #define STDC_HEADERS 1
> | #define HAVE_SYS_TYPES_H 1
> | #define HAVE_SYS_STAT_H 1
> | #define HAVE_STDLIB_H 1
> | #define HAVE_STRING_H 1
> | #define HAVE_MEMORY_H 1
> | #define HAVE_STRINGS_H 1
> | #define HAVE_INTTYPES_H 1
> | #define HAVE_STDINT_H 1
> | #define HAVE_UNISTD_H 1
> | #define HAVE_DLFCN_H 1
> | #define LT_OBJDIR ".libs/"
> | #define HAVE_CXX11 1
> | #define HAVE_PTHREAD_PRIO_INHERIT 1
> | #define HAVE_PTHREAD 1
> | #define HAVE_LIBZ 1
> | #define HAVE_FTS_H 1
> | #define HAVE_APR_POOLS_H 1
> | #define HAVE_LIBAPR_1 1
> | #define HAVE_BOOST_VERSION_HPP 1
> | #define HAVE_LIBCURL 1
> | /* end confdefs.h.  */
> | #include 
> configure:21642: result: no
> configure:21642: checking for glog/logging.h
> configure:21642: result: no
> configure:21674: error: cannot find glog
> ---
> You have requested the use of a non-bundled glog but no suitable
> glog could be found.
> You may want specify the location of glog by providing a prefix
> path via --with-glog=DIR, or check that the path you provided is
> correct if you're already doing this.
> ---
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7699:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69)

> "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable 
> freshly released)
> ---
>
> Key: MESOS-7699
> URL: https://issues.apache.org/jira/browse/MESOS-7699
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.2.0
>Reporter: Adam Cecile
>Assignee: Benno Evers
>  Labels: autotools
>
> Hi,
> It seems the issue comes from a workaround added a while ago:
> https://reviews.apache.org/r/40326/
> https://reviews.apache.org/r/40327/
> When building with external libraries it turns out creating build commands 
> line with -isystem /usr/include which is clearly stated as being wrong, 
> according to GCC guys:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129
> I'll do some testing by reverting all -isystem to -I and I'll let it know if 
> it gets built.
> Regards, Adam.
> {noformat}
> configure:21642: result: no
> configure:21642: checking glog/logging.h presence
> configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include 
> -I/usr/include conftest.cpp
> In file included from /usr/include/c++/6/ext/string_conversions.h:41:0,
>  from /usr/include/c++/6/bits/basic_string.h:5417,
>  from /usr/include/c++/6/string:52,
>  from /usr/include/c++/6/bits/locale_classes.h:40,
>  from /usr/include/c++/6/bits/ios_base.h:41,
>  from /usr/include/c++/6/ios:42,
>  from /usr/include/c++/6/ostream:38,
>  from /usr/include/glog/logging.h:43,
>  from conftest.cpp:32:
> /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or 
> directory
>  #include_next 
>  ^
> compilation terminated.
> configure:21642: $? = 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "mesos"
> | #define PACKAGE_TARNAME "mesos"
> | #define PACKAGE_VERSION "1.2.0"
> | #define PACKAGE_STRING "mesos 1.2.0"
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define PACKAGE "mesos"
> | #define VERSION "1.2.0"
> | #define STDC_HEADERS 1
> | #define HAVE_SYS_TYPES_H 1
> | #define HAVE_SYS_STAT_H 1
> | #define HAVE_STDLIB_H 1
> | #define HAVE_STRING_H 1
> | #define HAVE_MEMORY_H 1
> | #define HAVE_STRINGS_H 1
> | #define HAVE_INTTYPES_H 1
> | #define HAVE_STDINT_H 1
> | #define HAVE_UNISTD_H 1
> | #define HAVE_DLFCN_H 1
> | #define LT_OBJDIR ".libs/"
> | #define HAVE_CXX11 1
> | #define HAVE_PTHREAD_PRIO_INHERIT 1
> | #define HAVE_PTHREAD 1
> | #define HAVE_LIBZ 1
> | #define HAVE_FTS_H 1
> | #define HAVE_APR_POOLS_H 1
> | #define HAVE_LIBAPR_1 1
> | #define HAVE_BOOST_VERSION_HPP 1
> | #define HAVE_LIBCURL 1
> | /* end confdefs.h.  */
> | #include 
> configure:21642: result: no
> configure:21642: checking for glog/logging.h
> configure:21642: result: no
> configure:21674: error: cannot find glog
> ---
> You have requested the use of a non-bundled glog but no suitable
> glog could be found.
> You may want specify the location of glog by providing a prefix
> path via --with-glog=DIR, or check that the path you provided is
> correct if you're already doing this.
> ---
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7699:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70)

> "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable 
> freshly released)
> ---
>
> Key: MESOS-7699
> URL: https://issues.apache.org/jira/browse/MESOS-7699
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.2.0
>Reporter: Adam Cecile
>Assignee: Benno Evers
>  Labels: autotools
>
> Hi,
> It seems the issue comes from a workaround added a while ago:
> https://reviews.apache.org/r/40326/
> https://reviews.apache.org/r/40327/
> When building with external libraries it turns out creating build commands 
> line with -isystem /usr/include which is clearly stated as being wrong, 
> according to GCC guys:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129
> I'll do some testing by reverting all -isystem to -I and I'll let it know if 
> it gets built.
> Regards, Adam.
> {noformat}
> configure:21642: result: no
> configure:21642: checking glog/logging.h presence
> configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include 
> -I/usr/include conftest.cpp
> In file included from /usr/include/c++/6/ext/string_conversions.h:41:0,
>  from /usr/include/c++/6/bits/basic_string.h:5417,
>  from /usr/include/c++/6/string:52,
>  from /usr/include/c++/6/bits/locale_classes.h:40,
>  from /usr/include/c++/6/bits/ios_base.h:41,
>  from /usr/include/c++/6/ios:42,
>  from /usr/include/c++/6/ostream:38,
>  from /usr/include/glog/logging.h:43,
>  from conftest.cpp:32:
> /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or 
> directory
>  #include_next 
>  ^
> compilation terminated.
> configure:21642: $? = 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "mesos"
> | #define PACKAGE_TARNAME "mesos"
> | #define PACKAGE_VERSION "1.2.0"
> | #define PACKAGE_STRING "mesos 1.2.0"
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define PACKAGE "mesos"
> | #define VERSION "1.2.0"
> | #define STDC_HEADERS 1
> | #define HAVE_SYS_TYPES_H 1
> | #define HAVE_SYS_STAT_H 1
> | #define HAVE_STDLIB_H 1
> | #define HAVE_STRING_H 1
> | #define HAVE_MEMORY_H 1
> | #define HAVE_STRINGS_H 1
> | #define HAVE_INTTYPES_H 1
> | #define HAVE_STDINT_H 1
> | #define HAVE_UNISTD_H 1
> | #define HAVE_DLFCN_H 1
> | #define LT_OBJDIR ".libs/"
> | #define HAVE_CXX11 1
> | #define HAVE_PTHREAD_PRIO_INHERIT 1
> | #define HAVE_PTHREAD 1
> | #define HAVE_LIBZ 1
> | #define HAVE_FTS_H 1
> | #define HAVE_APR_POOLS_H 1
> | #define HAVE_LIBAPR_1 1
> | #define HAVE_BOOST_VERSION_HPP 1
> | #define HAVE_LIBCURL 1
> | /* end confdefs.h.  */
> | #include 
> configure:21642: result: no
> configure:21642: checking for glog/logging.h
> configure:21642: result: no
> configure:21674: error: cannot find glog
> ---
> You have requested the use of a non-bundled glog but no suitable
> glog could be found.
> You may want specify the location of glog by providing a prefix
> path via --with-glog=DIR, or check that the path you provided is
> correct if you're already doing this.
> ---
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8185) Tasks can be known to the agent but unknown to the master.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8185:
--
Sprint: Mesosphere Sprint 68  (was: Mesosphere Sprint 68, Mesosphere Sprint 
70)

> Tasks can be known to the agent but unknown to the master.
> --
>
> Key: MESOS-8185
> URL: https://issues.apache.org/jira/browse/MESOS-8185
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>  Labels: reliability
>
> Currently, when a master re-registers an agent that was marked unreachable, 
> it shutdowns all not partition-aware frameworks on that agent. When a master 
> re-registers an agent that is already registered, it doesn't check that all 
> tasks from the slave's re-registration message are known to it.
> It is possible that due to a transient loss of connectivity an agent may miss 
> {{SlaveReregisteredMessage}} along with {{ShutdownFrameworkMessage}} and thus 
> will not kill not partition-aware tasks. But the master will mark the agent 
> as registered and will not re-add tasks that it thought will be killed. The 
> agent may re-register again, this time successfully, before becoming marked 
> unreachable while never having terminated tasks of not partition-aware 
> frameworks. The master will simply forget those tasks ever existed, because 
> it has "removed" them during the previous re-registration.
> Example scenario:
> # Connection from the master to the agent stops working
> # Agent doesn't see pings from the master and attempts to re-register
> # Master sends {{SlaveRegisteredMessage}} and {{ShutdownSlaveMessage}}, which 
> don't get to the agent because of the connection failure. Agent is marked 
> registered.
> # Network issue resolves, connection breaks. Agent retries re-registration.
> # Master thinks that the agent was registered since step (3) and just 
> re-sends {{SlaveRegisteredMessage}}. Tasks remain running on the agent.
> One of the possible solutions would be to compare the list of tasks the the 
> already registered agent reports in {{ReregisterSlaveMessage}} and the list 
> of tasks the master has. In this case anything that the master doesn't know 
> about should not exist on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70)

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8078:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68  
(was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69)

> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> leader_info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 70  (was: Mesosphere Sprint 66, Mesosphere Sprint 67, 
Mesosphere Sprint 68)

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7944) Implement jemalloc support for Mesos

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7944:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 70  (was: 
Mesosphere Sprint 63, Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere 
Sprint 67, Mesosphere Sprint 68)

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>  Labels: mesosphere
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7361) Command checks via agent pollute agent logs.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7361:
--
Shepherd: Alexander Rukletsov

> Command checks via agent pollute agent logs.
> 
>
> Key: MESOS-7361
> URL: https://issues.apache.org/jira/browse/MESOS-7361
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Alexander Rukletsov
>Assignee: Armand Grillet
>  Labels: check, health-check, mesosphere
>
> Command checks via agent leverage debug container API of the agent to start 
> checks. Each such invocation triggers a bunch of logs on the agent, because 
> the API was not originally designed with periodic invocations in mind. We 
> should find a way to avoid excessive logging on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8240) Add an option to build the new CLI and run unit tests.

2017-12-07 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet updated MESOS-8240:
--
Sprint: Mesosphere Sprint 70  (was: Mesosphere Sprint 69)

> Add an option to build the new CLI and run unit tests.
> --
>
> Key: MESOS-8240
> URL: https://issues.apache.org/jira/browse/MESOS-8240
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> An update of the discarded https://reviews.apache.org/r/52543/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8310) Document container image garbage collection.

2017-12-07 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8310:
---

 Summary: Document container image garbage collection.
 Key: MESOS-8310
 URL: https://issues.apache.org/jira/browse/MESOS-8310
 Project: Mesos
  Issue Type: Documentation
  Components: containerization, image-gc, provisioner
Reporter: Gilbert Song
Assignee: Zhitao Li


Document container image garbage collection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8280) Mesos Containerizer GC should set 'layers' after checkpointing layer ids in provisioner.

2017-12-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-8280:

Shepherd: Gilbert Song
  Sprint: Mesosphere Sprint 69
Story Points: 3
Target Version/s: 1.5.0
  Labels: containerizer image-gc mesosphere provisioner uber  (was: 
containerizer image-gc provisioner)

> Mesos Containerizer GC should set 'layers' after checkpointing layer ids in 
> provisioner.
> 
>
> Key: MESOS-8280
> URL: https://issues.apache.org/jira/browse/MESOS-8280
> Project: Mesos
>  Issue Type: Bug
>  Components: image-gc, provisioner
>Reporter: Gilbert Song
>Assignee: Zhitao Li
>Priority: Critical
>  Labels: containerizer, image-gc, mesosphere, provisioner, uber
> Fix For: 1.5.0
>
>
> {noformat}
> 1
> 22
> 33
> 44
> 1
> 22
> 33
> 44
> I1129 23:24:45.469543  6592 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/MVgVC7/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> '/tmp/mesos/store/docker/staging/MVgVC7/38135e3743e6dcb66bd1394b633053714333c7b7cf930bfeebfda660c06e/rootfs.overlay'
> I1129 23:24:45.473287  6592 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/MVgVC7/sha256:b56ae66c29370df48e7377c8f9baa744a3958058a766793f821dadcb144a4647
>  to rootfs 
> '/tmp/mesos/store/docker/staging/MVgVC7/b5815a31a59b66c909dbf6c670de78690d4b52649b8e283fc2bfd2594f61cca3/rootfs.overlay'
> I1129 23:24:45.582002  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> '/tmp/mesos/store/docker/staging/6Zbc17/e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6/rootfs.overlay'
> I1129 23:24:45.589404  6595 metadata_manager.cpp:167] Successfully cached 
> image 'alpine'
> I1129 23:24:45.590204  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> '/tmp/mesos/store/docker/staging/6Zbc17/be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e/rootfs.overlay'
> I1129 23:24:45.595190  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> '/tmp/mesos/store/docker/staging/6Zbc17/53b5066c5a7dff5d6f6ef0c1945572d6578c083d550d2a3d575b4cdf7460306f/rootfs.overlay'
> I1129 23:24:45.599500  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> '/tmp/mesos/store/docker/staging/6Zbc17/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721/rootfs.overlay'
> I1129 23:24:45.602047  6597 provisioner.cpp:506] Provisioning image rootfs 
> '/tmp/provisioner/containers/3bbc3fd1-0138-43a9-94ba-d017d813daac/containers/01de09c5-d8e9-412e-8825-a592d2c875e5/backends/overlay/rootfses/b5d48445-848d-4274-a4f8-e909351ebc35'
>  for container 
> 3bbc3fd1-0138-43a9-94ba-d017d813daac.01de09c5-d8e9-412e-8825-a592d2c875e5 
> using overlay backend
> I1129 23:24:45.602751  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66
>  to rootfs 
> '/tmp/mesos/store/docker/staging/6Zbc17/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16/rootfs.overlay'
> I1129 23:24:45.603054  6596 overlay.cpp:168] Created symlink 
> '/tmp/provisioner/containers/3bbc3fd1-0138-43a9-94ba-d017d813daac/containers/01de09c5-d8e9-412e-8825-a592d2c875e5/backends/overlay/scratch/b5d48445-848d-4274-a4f8-e909351ebc35/links'
>  -> '/tmp/xAWQ8y'
> I1129 23:24:45.604398  6596 overlay.cpp:196] Provisioning image rootfs with 
> overlayfs: 
> 'lowerdir=/tmp/xAWQ8y/1:/tmp/xAWQ8y/0,upperdir=/tmp/provisioner/containers/3bbc3fd1-0138-43a9-94ba-d017d813daac/containers/01de09c5-d8e9-412e-8825-a592d2c875e5/backends/overlay/scratch/b5d48445-848d-4274-a4f8-e909351ebc35/upperdir,workdir=/tmp/provisioner/containers/3bbc3fd1-0138-43a9-94ba-d017d813daac/containers/01de09c5-d8e9-412e-8825-a592d2c875e5/backends/overlay/scratch/b5d48445-848d-4274-a4f8-e909351ebc35/workdir'
> I1129 23:24:45.607802  6594 registry_puller.cpp:395] Extracting layer tar 
> ball 
> '/tmp/mesos/store/docker/staging/6Zbc17/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
>  to rootfs 
> 

[jira] [Updated] (MESOS-7663) Update the documentation to reflect the addition of reservation refinement.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7663:
--
  Sprint: Mesosphere Sprint 70
Story Points: 2

> Update the documentation to reflect the addition of reservation refinement.
> ---
>
> Key: MESOS-7663
> URL: https://issues.apache.org/jira/browse/MESOS-7663
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Blocker
>
> There are a few things we need to be sure to document:
> * What reservation refinement is.
> * The new "format" for Resource, when using the RESERVATION_REFINEMENT 
> capability.
> * The filtering of resources if a framework is not RESERVATION_REFINEMENT 
> capable.
> * The current limitations that only a single reservation can be pushed / 
> popped within a single RESERVE / UNRESERVE operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8221) Use protobuf reflection to simplify downgrading of resources.

2017-12-07 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8221:
--
Story Points: 5

> Use protobuf reflection to simplify downgrading of resources.
> -
>
> Key: MESOS-8221
> URL: https://issues.apache.org/jira/browse/MESOS-8221
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Michael Park
>Assignee: Michael Park
>
> We currently have a {{downgradeResources}} function which is called on every
> {{repeated Resource}} field in every message that we checkpoint. We should 
> leverage
> protobuf reflection to automatically downgrade any instances of {{Resource}} 
> within any
> protobuf message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8271) Libprocess SSL test fail in Mesos 17.10

2017-12-07 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-8271:


Assignee: Greg Mann

> Libprocess SSL test fail in Mesos 17.10
> ---
>
> Key: MESOS-8271
> URL: https://issues.apache.org/jira/browse/MESOS-8271
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Mesos Configuration:
> {noformat}
> ../configure --disable-java --disable-python --enable-libevent --enable-ssl
> {noformat}
> Ubuntu 17.10
> {noformat}
> $ uname -r
> Linux localhost 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
> {noformat}
> {noformat}
> $ sudo apt libevent-dev
> Package: libevent-dev
> Version: 2.1.8-stable-4
> Priority: optional
> Section: libdevel
> Source: libevent
> Origin: Ubuntu
> Maintainer: Ubuntu Developers 
> Original-Maintainer: Anibal Monsalve Salazar 
> Bugs: https://bugs.launchpad.net/ubuntu/+filebug
> Installed-Size: 1,694 kB
> Depends: libevent-2.1-6 (= 2.1.8-stable-4), libevent-core-2.1-6 (= 
> 2.1.8-stable-4), libevent-extra-2.1-6 (= 2.1.8-stable-4), 
> libevent-pthreads-2.1-6 (= 2.1.8-stable-4), libevent-openssl-2.1-6 (= 
> 2.1.8-stable-4)
> Homepage: http://libevent.org/
> Supported: 9m
> Download-Size: 262 kB
> APT-Manual-Installed: yes
> APT-Sources: http://de.archive.ubuntu.com/ubuntu artful/main amd64 Packages
> Description: Asynchronous event notification library (development files)
>  Libevent is an asynchronous event notification library that provides a
>  mechanism to execute a callback function when a specific event occurs
>  on a file descriptor or after a timeout has been reached.
>  .
>  This package includes development files for compiling against libevent.
> {noformat}
> {noformat}
> $ sudo apt libssl-dev
> Package: libssl-dev
> Version: 1.0.2g-1ubuntu13.2
> Priority: optional
> Section: libdevel
> Source: openssl
> Origin: Ubuntu
> Maintainer: Ubuntu Developers 
> Original-Maintainer: Debian OpenSSL Team 
> 
> Bugs: https://bugs.launchpad.net/ubuntu/+filebug
> Installed-Size: 7,216 kB
> Depends: libssl1.0.0 (= 1.0.2g-1ubuntu13.2), zlib1g-dev
> Recommends: libssl-doc
> Supported: 9m
> Download-Size: 1,357 kB
> APT-Manual-Installed: yes
> APT-Sources: http://de.archive.ubuntu.com/ubuntu artful-updates/main amd64 
> Packages
> Description: Secure Sockets Layer toolkit - development files
>  This package is part of the OpenSSL project's implementation of the SSL
>  and TLS cryptographic protocols for secure communication over the
>  Internet.
>  .
>  It contains development libraries, header files, and manpages for libssl
> {noformat}
>Reporter: Alexander Rojas
>Assignee: Greg Mann
>  Labels: mesosphere
>
> The following tests constantly fail when building Mesos in Ubuntu 17.10:
> {noformat}
> SSLTest.SSLSocket
> SSLTest.NoVerifyBadCA
> SSLTest.VerifyCertificate
> SSLTest.ProtocolMismatch
> SSLTest.ECDHESupport
> SSLTest.PeerAddress
> SSLTest.HTTPSGet
> SSLTest.HTTPSPost
> SSLTest.SilentSocket
> SSLTest.ShutdownThenSend
> SSLVerifyIPAdd/SSLTest.BasicSameProcess/0, where GetParam() = "false"
> SSLVerifyIPAdd/SSLTest.BasicSameProcess/1, where GetParam() = "true"
> SSLVerifyIPAdd/SSLTest.BasicSameProcessUnix/0, where GetParam() = "false"
> SSLVerifyIPAdd/SSLTest.BasicSameProcessUnix/1, where GetParam() = "true"
> SSLVerifyIPAdd/SSLTest.RequireCertificate/0, where GetParam() = "false"
> SSLVerifyIPAdd/SSLTest.RequireCertificate/1, where GetParam() = "true"
> {noformat}
> The interesting [log 
> line|https://github.com/apache/mesos/blob/f06e6184b0d6cc4a3245a714e5b56f26eb454233/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L1188]
>  is:
> {noformat}
> I1128 10:08:48.235530 26158 libevent_ssl_socket.cpp:1188] Socket error: 
> error::lib(0):func(0):reason(0)
> {noformat}
> After investigating some more one can notices that in the libevent callback 
> the [event bit error is marked as 
> BEV_EVENT_ERROR|https://github.com/apache/mesos/blob/f06e6184b0d6cc4a3245a714e5b56f26eb454233/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L1176],
>  however neither 
> {{[EVUTIL_SOCKET_ERROR()|https://github.com/apache/mesos/blob/f06e6184b0d6cc4a3245a714e5b56f26eb454233/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L1178]}}
>  nor 
> {{[bufferevent_get_openssl_error()|https://github.com/apache/mesos/blob/f06e6184b0d6cc4a3245a714e5b56f26eb454233/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L1182]}}
>  report any failure.
> Googling for the issue reports at least someone who suffered the same problem 
> some years ago but there are no clues as to how it was fixed.
> Full test log output:
> 

[jira] [Commented] (MESOS-7303) Support Isolator capabilities.

2017-12-07 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282117#comment-16282117
 ] 

Joseph Wu commented on MESOS-7303:
--

{{1.5.0}} (as you added) sounds good.

> Support Isolator capabilities.
> --
>
> Key: MESOS-7303
> URL: https://issues.apache.org/jira/browse/MESOS-7303
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> Currently, isolators have one capability: whether it supports nesting or not. 
> To support launching containers that are not tied to Mesos tasks or executors 
> (standalone containers), we need to add another capability to the Isolator 
> interface so that we can avoid invoking those isolators that are not yet 
> support that when launching standalone containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8309) Introduce a UUID message type

2017-12-07 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8309:
--
Target Version/s: 1.5.0

> Introduce a UUID message type
> -
>
> Key: MESOS-8309
> URL: https://issues.apache.org/jira/browse/MESOS-8309
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> Currently when UUID need to be part of a protobuf message, we use a byte 
> array field for that. This has some drawbacks, especially when it comes to 
> outputting the UUID in logs: To stringify the UUID field, we first have to 
> create a stout UUID, then call {{.toString()}} of that one. It would help to 
> have a UUID type in {{mesos.proto}} and provide a stringification function 
> for it in {{type_utils.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8309) Introduce a UUID message type

2017-12-07 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8309:
---

 Summary: Introduce a UUID message type
 Key: MESOS-8309
 URL: https://issues.apache.org/jira/browse/MESOS-8309
 Project: Mesos
  Issue Type: Task
Reporter: Jan Schlicht
Assignee: Jan Schlicht
 Fix For: 1.5.0


Currently when UUID need to be part of a protobuf message, we use a byte array 
field for that. This has some drawbacks, especially when it comes to outputting 
the UUID in logs: To stringify the UUID field, we first have to create a stout 
UUID, then call {{.toString()}} of that one. It would help to have a UUID type 
in {{mesos.proto}} and provide a stringification function for it in 
{{type_utils.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)