[jira] [Commented] (MESOS-9619) Mesos Master Crashes with Launch Group when using Port Resources

2019-04-25 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826548#comment-16826548
 ] 

Greg Mann commented on MESOS-9619:
--

1.6.x branch:
{code}
commit f7e3a8ee649424ef72676c1244be6b1c68e96bcf
Author: Meng Zhu 
Date:   Tue Mar 12 15:25:25 2019 -0700

Added `>`, `<` and `>=` operators to `Value::Scalar`.

(cherry-picked from commit 393f39dd8ea49d0d70e2ca562d8e0a2ab88d5238)
{code}
{code}
commit dc790fd2533ed1ba97a476ba75463d774a9d21f1
Author: Greg Mann 
Date:   Wed Apr 24 15:37:14 2019 -0700

Refactored and augmented `class ResourceQuantities`.

This patch removed the map interface of
`class ResourceQuantities`, added a few built-in
arithmetic operations. Now, absent resource items imply
there is no (zero) such resources.

Also added a to-do to add `class ResourceLimits` which
is similar but treats absent resource entries as having
infinite amount of such resource.

Also changed affected call sites and tests accordingly.

(cherry-picked from commit aa7b75fecd336b951fa8577ac5db26d3d0e28c1b)

NOTE: This cherry-pick omits all changes made to the
sorters, as including those parts of the diff would have
required the backport of other dependent patches which
are not related to the fix for MESOS-9619, which is the
motivation for this cherry-pick.
{code}
{code}
commit 712095346ebab4ee2a9e21dfd3d0c2590d8a0710
Author: Meng Zhu 
Date:   Wed Mar 27 17:09:13 2019 -0700

Added `==` and `!=` operator in `ResourceQuantities`.

(cherry-picked from commit 7c8a9a9218b5b3a9a2acbf8c10899355773377ef)
{code}
{code}
commit 085b3826235dc07a065f8c65111e7caf0b0dd0c5
Author: Greg Mann 
Date:   Fri Apr 19 00:34:15 2019 -0700

Enabled construction of `ResourceQuantities` from `Resources`.

This patch adds a new static method which enables the
construction of `ResourceQuantities` from `Resources`.
Namely, this permits the inclusion of sets and ranges in the
input resources used to construct `ResourceQuantities`.

(cherry-picked from commit cbae57b7e790b8b46c79052975406d603e7d175a)
{code}
{code}
commit ec893078e8a9d72eb108f9abae84e1cc3bc5ac39
Author: Greg Mann 
Date:   Sat Apr 20 11:48:39 2019 -0700

Ensured that task groups do not specify overlapping ranges or sets.

This patch adds validation to the master to ensure that task
groups do not include resources with overlapping set- or
range-valued resources, as this can crash the allocator.

(cherry-picked from commit f8ffdb7bbf3ff58e1e7a411cdd66767519d9a7ad)
{code}

> Mesos Master Crashes with Launch Group when using Port Resources
> 
>
> Key: MESOS-9619
> URL: https://issues.apache.org/jira/browse/MESOS-9619
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.3, 1.7.1
> Environment:  
> Testing in both Mesos 1.4.3 and Mesos 1.7.1
>Reporter: Nimi Wariboko Jr.
>Assignee: Greg Mann
>Priority: Critical
>  Labels: foundations, master, mesosphere
> Attachments: mesos-master.log, mesos-master.snippet.log
>
>
> Original Issue: 
> [https://lists.apache.org/thread.html/979c8799d128ad0c436b53f2788568212f97ccf324933524f1b4d189@%3Cuser.mesos.apache.org%3E]
>  When the ports resources is removed, Mesos functions normally (I'm able to 
> launch the task as many times as possible, while it always fails continually).
> Attached is a snippet of the mesos master log from OFFER to crash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9619) Mesos Master Crashes with Launch Group when using Port Resources

2019-04-25 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826532#comment-16826532
 ] 

Greg Mann commented on MESOS-9619:
--

1.7.x branch:
{code}
commit 206e44006540a4d9487da1cb60b64e58b2bf2a51
Author: Meng Zhu 
Date:   Tue Feb 26 15:06:07 2019 -0800

Added method `fromScalarResources` in `ResourceQuantities`.

This method takes all the scalar type `Resource` entries in
the `Resources` and combine them to a `ResourceQuantities`.
Only the resource name and its scalar value are used and the
rest of the meta-data is ignored. Non-scalar resource entries
are silently dropped. Since internal resources are always
validated to have positive scalar values, this conversion will
always succeed.

Also added a dedicated test.

(cherry-picked from commit da1f95714f29165ea603ad109efae4e822c7408a)
{code}
{code}
commit a0eb394d44ecc7608a4c40aca8ee6beee943e41e
Author: Meng Zhu 
Date:   Tue Mar 12 15:25:25 2019 -0700

Added `>`, `<` and `>=` operators to `Value::Scalar`.

(cherry-picked from commit 393f39dd8ea49d0d70e2ca562d8e0a2ab88d5238)
{code}
{code}
commit 6aec558976df2d6f29a4a99dc3715acae6ec8992
Author: Meng Zhu 
Date:   Tue Feb 26 15:45:08 2019 -0800

Refactored and augmented `class ResourceQuantities`.

This patch removed the map interface of
`class ResourceQuantities`, added a few built-in
arithmetic operations. Now, absent resource items imply
there is no (zero) such resources.

Also added a to-do to add `class ResourceLimits` which
is similar but treats absent resource entries as having
infinite amount of such resource.

Also changed affected call sites and tests accordingly.

(cherry-picked from commit aa7b75fecd336b951fa8577ac5db26d3d0e28c1b)
{code}
{code}
commit 000aa6ea5e022435c3b55f6c4ca425601dc8e01f
Author: Meng Zhu 
Date:   Wed Mar 27 17:09:13 2019 -0700

Added `==` and `!=` operator in `ResourceQuantities`.

(cherry-picked from commit 7c8a9a9218b5b3a9a2acbf8c10899355773377ef)
{code}
{code}
commit 975066ad8438968df6edc325e90d6c9e2610f60e
Author: Greg Mann 
Date:   Fri Apr 19 00:34:15 2019 -0700

Enabled construction of `ResourceQuantities` from `Resources`.

This patch adds a new static method which enables the
construction of `ResourceQuantities` from `Resources`.
Namely, this permits the inclusion of sets and ranges in the
input resources used to construct `ResourceQuantities`.

(cherry-picked from commit cbae57b7e790b8b46c79052975406d603e7d175a)
{code}
{code}
commit 55ae37ee5a04a17c6f8c996d1c31681124782894
Author: Greg Mann 
Date:   Sat Apr 20 11:48:39 2019 -0700

Ensured that task groups do not specify overlapping ranges or sets.

This patch adds validation to the master to ensure that task
groups do not include resources with overlapping set- or
range-valued resources, as this can crash the allocator.

(cherry-picked from commit f8ffdb7bbf3ff58e1e7a411cdd66767519d9a7ad)
{code}

> Mesos Master Crashes with Launch Group when using Port Resources
> 
>
> Key: MESOS-9619
> URL: https://issues.apache.org/jira/browse/MESOS-9619
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.3, 1.7.1
> Environment:  
> Testing in both Mesos 1.4.3 and Mesos 1.7.1
>Reporter: Nimi Wariboko Jr.
>Assignee: Greg Mann
>Priority: Critical
>  Labels: foundations, master, mesosphere
> Attachments: mesos-master.log, mesos-master.snippet.log
>
>
> Original Issue: 
> [https://lists.apache.org/thread.html/979c8799d128ad0c436b53f2788568212f97ccf324933524f1b4d189@%3Cuser.mesos.apache.org%3E]
>  When the ports resources is removed, Mesos functions normally (I'm able to 
> launch the task as many times as possible, while it always fails continually).
> Attached is a snippet of the mesos master log from OFFER to crash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9724) Flatten the weighted shuffling in the random sorter.

2019-04-25 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826492#comment-16826492
 ] 

Meng Zhu commented on MESOS-9724:
-

{noformat}
commit 5108f076e6a5c275cae6b124bbcb110bc6785f94
Author: Meng Zhu 
Date:   Wed Apr 24 11:32:38 2019 -0700

Avoided some recalculation in the random sorter.

This patch keeps the sorting related information in the memory
and accompanies a dirty bit with it. This helps to avoid
unnecessary recalculation of this info in `sort()`.

Review: https://reviews.apache.org/r/70430

commit 5a756402ad15cedbc6ccb8fa5de096745967f36f
Author: Meng Zhu 
Date:   Wed Apr 24 10:51:06 2019 -0700

Fixed a bug in the random sorter.

Currently, in the presence of hierarchical roles, the
random sorter shuffles roles level by level and then pick
the active leave nodes using DFS. This could generate
non-uniform random result since active leaves in a subtree
are always picked together.

This patch fixes the issue by first calculating the relative
weights of each active leaf node and shuffle all of them
only once.

Review: https://reviews.apache.org/r/70429

commit 5e52c686c29819113f42c6bde7d90324673b42dc
Author: Meng Zhu 
Date:   Tue Apr 23 18:44:33 2019 -0700

Added a random sorter helper to find active internal nodes.

Active internal nodes are defined as internal nodes that have
at least one active leaf node.

Review: https://reviews.apache.org/r/70542
{noformat}

> Flatten the weighted shuffling in the random sorter.
> 
>
> Key: MESOS-9724
> URL: https://issues.apache.org/jira/browse/MESOS-9724
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>Priority: Major
>  Labels: performance, resource-management
>
> Due to the presence of hierarchical weights, the random sorter currently 
> shuffles level-by-level. We should be able to shuffle all the active leaves 
> only once by calculating (and caching) active leaves' relative weights. This 
> should improve the performance in the presence of hierarchical roles. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9739) When recovered agent marked gone, retain agent ID

2019-04-25 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826320#comment-16826320
 ] 

Vinod Kone commented on MESOS-9739:
---

The marked agent is already retained in the registry right? Right now if a gone 
agent attempts to reregister, master refuses it and shuts it down. Any 
reconciliation requests should've been answered with TASK_GONE_BY_OPERATOR 
already. So not sure if there is more to do?

> When recovered agent marked gone, retain agent ID
> -
>
> Key: MESOS-9739
> URL: https://issues.apache.org/jira/browse/MESOS-9739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Priority: Major
>  Labels: foundations, mesosphere
>
> When a recovered agent is marked gone, we could retain its agent ID so that 
> if it attempts to reregister, we could send task status updates for its tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9740) Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents from reregistering with 1.8+ masters

2019-04-25 Thread Benno Evers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826149#comment-16826149
 ] 

Benno Evers commented on MESOS-9740:


Preliminary review: https://reviews.apache.org/r/70538/

> Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents 
> from reregistering with 1.8+ masters
> ---
>
> Key: MESOS-9740
> URL: https://issues.apache.org/jira/browse/MESOS-9740
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Joseph Wu
>Assignee: Benno Evers
>Priority: Blocker
>  Labels: foundations, mesosphere
>
> As part of MESOS-6874, the master now validates protobuf unions passed as 
> part of an {{ExecutorInfo::ContainerInfo}}.  This prevents a task from 
> specifying, for example, a {{ContainerInfo::MESOS}}, but filling out the 
> {{docker}} field (which is then ignored by the agent).
> However, if a task was already launched with an invalid protobuf union, the 
> same validation will happen when the agent tries to reregister with the 
> master.  In this case, if the master is upgraded to validate protobuf unions, 
> the agent reregistration will be rejected.
> {code}
> master.cpp:7201] Dropping re-registration of agent at 
> slave(1)@172.31.47.126:5051 because it sent an invalid re-registration: 
> Protobuf union `mesos.ContainerInfo` with `Type == MESOS` should not have the 
> field `docker` set.
> {code}
> This bug was found when upgrading a 1.7.x test cluster to 1.8.0.  When 
> MESOS-6874 was committed, I had assumed the invalid protobufs would be rare.  
> However, on the test cluster, 13/17 agents had at least one invalid 
> ContainerInfo when reregistering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9743) Argument forwaring in CMake build result in glog 0.4.0 build as shared library

2019-04-25 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825847#comment-16825847
 ] 

Jan Schlicht commented on MESOS-9743:
-

cc [~asekretenko]
Looks like this change was intended? In https://reviews.apache.org/r/70387/ the 
imported location is changed from {{glog}} to {{libglog}}, i.e. from a static 
to a dynamic library. In that case, it's probably related to the Ninja build 
system and a byproduct isn't copied. But then, building with 
{{BUILD_SHARED_LIBS=ON}} will cause problems, because GLog would be build as 
static lib and we expect a dynamic library now.

> Argument forwaring in CMake build result in glog 0.4.0 build as shared library
> --
>
> Key: MESOS-9743
> URL: https://issues.apache.org/jira/browse/MESOS-9743
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.8.0
> Environment: macOS 10.14.4, clang 8.0.0, Ninja build system
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>Priority: Major
>  Labels: build, easyfix, mesosphere, triaged
>
> GLog versions >= 0.3.5 introduces a {{BUILD_SHARED_LIBS}} CMake option. The 
> CMake configuration of Mesos also has such an option. Because these options 
> are forwarded to third-party packages, GLog will be build as a shared library 
> if Mesos is build with {{BUILD_SHARED_LIBS=OFF}}. This is not intended, as in 
> that case the GLog shared library is not copied over, resulting in Mesos 
> binaries failing to start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9743) Argument forwaring in CMake build result in glog 0.4.0 build as shared library

2019-04-25 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-9743:
---

 Summary: Argument forwaring in CMake build result in glog 0.4.0 
build as shared library
 Key: MESOS-9743
 URL: https://issues.apache.org/jira/browse/MESOS-9743
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Affects Versions: 1.8.0
 Environment: macOS 10.14.4, clang 8.0.0
Reporter: Jan Schlicht
Assignee: Jan Schlicht


GLog versions >= 0.3.5 introduces a {{BUILD_SHARED_LIBS}} CMake option. The 
CMake configuration of Mesos also has such an option. Because these options are 
forwarded to third-party packages, GLog will be build as a shared library if 
Mesos is build with {{BUILD_SHARED_LIBS=OFF}}. This is not intended, as in that 
case the GLog shared library is not copied over, resulting in Mesos binaries 
failing to start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)