[jira] [Commented] (MESOS-7639) Oversubscription could crash the master due to CHECK failure in the allocator

2017-07-21 Thread Dmitriy Shirchenko (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097047#comment-16097047
 ] 

Dmitriy Shirchenko commented on MESOS-7639:
---

A small update on that we saw another instance of this crash. Since we have a 
patched version I will provide the code below with logs 

{code}
F0721 21:43:29.141577  7454 master.cpp:9218] CHECK_SOME(resources): Invalid 
RESERVE Operation: cpus(*):24; mem(*):122880; ports(*):[31000-32000]; 
disk(*):849596; cpus(*)(allocated: aurora){REV}:12 does not contain 
ports(aurora, aurora, {instance_key: foo/foo/foo.foo/0})(allocated: 
aurora):[31139-31139, 31773-31773, 31827-31827]
{code}

Crash was happening on CHECK_SOME line.

{code}
void Slave::apply(const Offer::Operation& operation)
{
  Try resources = totalResources.apply(operation);
  CHECK_SOME(resources);

  totalResources = resources.get();
  checkpointedResources = totalResources.filter(needCheckpointing);
}
{code}

Context is that a large job was getting updated with RESERVE resources. 
[~bmahler] please let me know what else I can provide. Sorry, this may not be 
enough for you to go off on.

> Oversubscription could crash the master due to CHECK failure in the allocator
> -
>
> Key: MESOS-7639
> URL: https://issues.apache.org/jira/browse/MESOS-7639
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yan Xu
>
> As I described in MESOS-7566, the following scenario is possible when the 
> agent sends updated oversubscribed resources to the master:
> - The agent's {{UpdateSlaveMessage}} reduces the the oversubscribed resources.
> - {{Master::updateSlave}} upon receiving the update would first call 
> {{HierarchicalAllocatorProcess::updateSlave}}, followed by 
> {{allocator->recoverResources}}.
> - {{HierarchicalAllocatorProcess::updateSlave}} would update 
> {{roleSorter.total_}} to reduce to total so the total could go below the 
> allocation.
> - In the subsequent {{allocator->recoverResources}} call the attempt to 
> remove outstanding allocation may fail to reduce it to below the total 
> because some allocation may not be in outstanding offers. It could be in 
> offered resources pending between {{Master::accept}} and {{Master::_accept}}. 
> So the end result could still be {{total < allocation}}.
> - Then when {{Master::_accept}} is executed, it will then call 
> {{allocator->updateAllocation}}, in which the {{total < allocation}} 
> condition could trigger such crash.
> The gist is that there are resources that are neither in master's {{offers}} 
> or tracked in the allocator when {{Master::updateSlave}} is called.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7402) Allocated quota of a child role should be also charged on all ancestors of that role

2017-07-21 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7402:
---
Priority: Blocker  (was: Major)

> Allocated quota of a child role should be also charged on all ancestors of 
> that role
> 
>
> Key: MESOS-7402
> URL: https://issues.apache.org/jira/browse/MESOS-7402
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Blocker
>  Labels: multitenancy
> Attachments: hrole_quota_test.patch
>
>
> Consider following case: role {{a}} is quota'd with resource 100, role 
> {{a/b}} is quota'd with resource 40. In current implementation, quota of 
> parent role is actually the aggregation of quota in whole subtree, including 
> itself. Therefore, the internal node of {{a}} is actually quota'd with 60, 
> instead of 100. In another word, allocation made for quota of {{a/b}} should 
> also be charged from the quota of its parent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6101) Add Framwork events to master's operator API

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6101:
--
Story Points: 5

> Add Framwork events to master's operator API
> 
>
> Key: MESOS-6101
> URL: https://issues.apache.org/jira/browse/MESOS-6101
> Project: Mesos
>  Issue Type: Task
>Reporter: Zhitao Li
>Assignee: Quinn
>
> Consider the following case:
> 1) a subscriber connects to master;
> 2) a new scheduler registered as a new framework;
> 3) a task is launched from this framework.
> In this sequence, subscriber does not have a way to know the FrameworkInfo 
> belonging to the FrameworkId.
> We should support an event (e.g. when framework info in master is 
> added/changed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7491) Build a CSI client to talk to a CSI plugin.

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7491:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Build a CSI client to talk to a CSI plugin.
> ---
>
> Key: MESOS-7491
> URL: https://issues.apache.org/jira/browse/MESOS-7491
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The abstraction should be something like the following:
> {code}
> namespace csi {
> public Client
> {
> public:
>   Future CreateVolume(const CreateVolumeRequest& 
> request);
>   Future DeleteVolume(const DeleteVolumeRequest& 
> request);
>   ...
> };
> } // namespace csi {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7747) Improve metrics around active subscribers.

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7747:
--
Sprint: Mesosphere Sprint 58, Mesosphere Sprint 59, Mesosphere Sprint 60  
(was: Mesosphere Sprint 58, Mesosphere Sprint 59)

> Improve metrics around active subscribers.
> --
>
> Key: MESOS-7747
> URL: https://issues.apache.org/jira/browse/MESOS-7747
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere, metrics, reliability
>
> Active subscribers to, e.g., Mesos streaming API, may influence Mesos master 
> performance. To improve triaging and having a better understanding of master 
> workload, we should add metrics to track active subscribers, send queue size 
> and so on.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7806) Add copy assignment operator to `net::IP::Network`

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7806:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Add copy assignment operator to `net::IP::Network`
> --
>
> Key: MESOS-7806
> URL: https://issues.apache.org/jira/browse/MESOS-7806
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>
> Currently, we can't extend the class `net::IP::Network` with out adding a 
> copy assignment operator in the derived class, due to the use of 
> `std::unique_ptr` in the base class. Hence, need to introduce a copy 
> assignment operator into the base class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7660) HierarchicalAllocator uses the default filter instead of a very long one

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7660:
--
Sprint: Mesosphere Sprint 58, Mesosphere Sprint 59, Mesosphere Sprint 60  
(was: Mesosphere Sprint 58, Mesosphere Sprint 59)

> HierarchicalAllocator uses the default filter instead of a very long one
> 
>
> Key: MESOS-7660
> URL: https://issues.apache.org/jira/browse/MESOS-7660
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> If a framework accepts/refuses an offer using a very long filter, [the 
> {{HierarchicalAllocator}} will use the default {{Filter}} 
> instead|https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1046-L1052].
>  Meaning that it will filter the resources for only 5 seconds.
> This can happen when a framework sets {{Filter::refuse_seconds}} to a number 
> of seconds [larger than what fits in 
> {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L401-L405].
> The following [tests are 
> flaky|https://issues.apache.org/jira/browse/MESOS-7514] because of this: 
> {{ReservationTest.ReserveShareWithinRole}} and 
> {{ReservationTest.PreventUnreservingAlienResources}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7792) Add support for ECDH ciphers

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7792:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Add support for ECDH ciphers
> 
>
> Key: MESOS-7792
> URL: https://issues.apache.org/jira/browse/MESOS-7792
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Affects Versions: 1.3.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: security
>
> [Elliptic curve 
> ciphers|https://wiki.openssl.org/index.php/Elliptic_Curve_Cryptography] are a 
> family of ciphers supported by OpenSSL. They allow to have smaller keys, but 
> require an extra configuration parameter, the actual curve to be used, which 
> can't be done through libprocess as it is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7601) Some container launch failures are mistakenly treated as errors.

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7601:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Some container launch failures are mistakenly treated as errors.
> 
>
> Key: MESOS-7601
> URL: https://issues.apache.org/jira/browse/MESOS-7601
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: containerizer, mesosphere, tech-debt
>
> I've observed a case when a scheduler stops (i.e. calls TEARDOWN) while some 
> of its tasks are being launched. While this is a valid behaviour, the agent 
> prints an error and increased container launch errors metrics.
> Below are log excerpts for such framework, 
> {{6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092}}.
> *Master log*
> {noformat}
> [centos@ip-172-31-6-200 ~]$ journalctl _PID=29716 --since "2 hours ago" 
> --no-pager | grep 
> "6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092"
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226218 29724 master.cpp:6072] Updating 
> info for framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226356 29728 hierarchical.cpp:274] Added 
> framework 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.226405 29728 hierarchical.cpp:379] 
> Deactivated framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.228570 29728 hierarchical.cpp:343] 
> Activated framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.246068 29721 master.cpp:7105] Sending 1 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.247851 29721 master.cpp:7194] Sending 1 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:58.912937 29728 master.cpp:4806] Processing 
> DECLINE call for offers: [ 92434aef-27da-4fd1-a5c4-b286d640d5b3-O509464 ] for 
> framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:59.804184 29727 master.cpp:7105] Sending 2 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:32:59.804411 29727 master.cpp:7194] Sending 2 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.248924 29721 master.cpp:7105] Sending 2 
> offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.249289 29721 master.cpp:7194] Sending 2 
> inverse offers to framework 
> 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 
> (TeraValidate) at 
> scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531
> Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal 
> mesos-master[29716]: I0601 11:33:01.249724 29721 master.cpp:3851] Processing 
> ACCEPT call for offers: [ 92434aef-27da-4fd1-a5c4-b286d640d5b3-O509469 ] on 
> agent 36a25adb-4ea2-49d3-a195-448cff1dc146-S35 at slave(1)@172.31.13.122:5051 
> (172.31.13.122) for framework 
> 

[jira] [Updated] (MESOS-7625) Create script to automate publishing website

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7625:
--
Sprint: Mesosphere Sprint 58, Mesosphere Sprint 59, Mesosphere Sprint 60  
(was: Mesosphere Sprint 58, Mesosphere Sprint 59)

> Create script to automate publishing website
> 
>
> Key: MESOS-7625
> URL: https://issues.apache.org/jira/browse/MESOS-7625
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> These script will be run via ASF CI and be responsible for 
> 1) checking out the latest master branch
> 2) build mesos and generate endpoints help
> 3) generate website contents
> 4) publish website by doing a git commit to `mesos-site` repo



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7661) Libprocess timers with long durations trigger immediately

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7661:
--
Sprint: Mesosphere Sprint 58, Mesosphere Sprint 59, Mesosphere Sprint 60  
(was: Mesosphere Sprint 58, Mesosphere Sprint 59)

> Libprocess timers with long durations trigger immediately
> -
>
> Key: MESOS-7661
> URL: https://issues.apache.org/jira/browse/MESOS-7661
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> {{process::delay()}} will schedule a method to be run right ahead when called 
> with a vry long {{Duration}}.
> This happens because [{{Timeout}} tries to add two long 
> durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38],
>  leading to an [integer overflow in 
> {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116].
> I'd expect libprocess to either:
>   1. Never run the method.
>   2. Schedule it in the longest possible {{Duration}}.
> {{Duration::operator+=()}} should probably also handle integer overflows 
> differently. If an addition leads to an integer overflow, it might make more 
> sense to return {{Duration::max()}} than a negative duration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7780) Add `SUBSCRIBE` call handling to the resource provider manager

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7780:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Add `SUBSCRIBE` call handling to the resource provider manager
> --
>
> Key: MESOS-7780
> URL: https://issues.apache.org/jira/browse/MESOS-7780
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: storage
>
> Resource providers will use the HTTP API to subscribe to the 
> {{ResourceProviderManager}}. Handling these calls needs to be implemented. On 
> subscription, a unique resource provider ID will be assigned to the resource 
> provider and a {{SUBSCRIBED}} event will be sent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7809) Building gRPC with Autotools

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7809:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Building gRPC with Autotools
> 
>
> Key: MESOS-7809
> URL: https://issues.apache.org/jira/browse/MESOS-7809
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> grpc does not come with an autotools script and have a hand-written makefile 
> which assumes certain libraries pre-installed in the system. We need to write 
> proper rules that override the default path options in grpc's Makefile in our 
> autotools configurations to support grpc in autotools.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7808) Bundling gRPC into 3rdparty with CMake under Linux

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7808:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> Bundling gRPC into 3rdparty with CMake under Linux
> --
>
> Key: MESOS-7808
> URL: https://issues.apache.org/jira/browse/MESOS-7808
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> grpc comes with a hand-written makefile and cmake file, but no autotool 
> configuration scripts. As a first step to support grpc in mesos, we could 
> integrate gRPC into our cmake build process under Linux, and make it a 
> dependency of libprocess. Since it also depends on protobuf, this will create 
> a triangular dependency between protobuf, grpc and libprocess, so the 
> existing build configurations needs to be adjusted as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7810) gRPC support in libprocess

2017-07-21 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7810:
--
Sprint: Mesosphere Sprint 59, Mesosphere Sprint 60  (was: Mesosphere Sprint 
59)

> gRPC support in libprocess
> --
>
> Key: MESOS-7810
> URL: https://issues.apache.org/jira/browse/MESOS-7810
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> We would like to introduce a grpc wrapper in libprocess. The wrapper provides 
> a clean interface for gRPC asynchronous calls and returns a {{Future}}, so 
> others can easily use actor-based programming with libprocess to support grpc 
> communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7822) Adopt X509_check_host.

2017-07-21 Thread James Peach (JIRA)
James Peach created MESOS-7822:
--

 Summary: Adopt X509_check_host.
 Key: MESOS-7822
 URL: https://issues.apache.org/jira/browse/MESOS-7822
 Project: Mesos
  Issue Type: Bug
  Components: libprocess, security
Reporter: James Peach


{{libprocess}} is carrying custom hostname verification code, which uses 
deprecated OpenSSL API:

{noformat}
../../../3rdparty/libprocess/src/openssl.cpp: In function ‘Try 
process::network::openssl::verify(const SSL*, const 
Option&, const Option&)’:
../../../3rdparty/libprocess/src/openssl.cpp:677:42: warning: ‘unsigned char* 
ASN1_STRING_data(ASN1_STRING*)’ is deprecated [-Wdeprecated-declarations]
   current_name->d.dNSName));
  ^
In file included from /usr/include/openssl/opensslconf.h:42:0,
 from /usr/include/openssl/bn.h:31,
 from /usr/include/openssl/asn1.h:24,
 from /usr/include/openssl/objects.h:916,
 from /usr/include/openssl/evp.h:27,
 from /usr/include/openssl/x509.h:23,
 from /usr/include/openssl/ssl.h:50,
 from ../../../3rdparty/libprocess/src/openssl.hpp:16,
 from ../../../3rdparty/libprocess/src/openssl.cpp:13:
/usr/include/openssl/asn1.h:553:1: note: declared here
 DEPRECATEDIN_1_1_0(unsigned char *ASN1_STRING_data(ASN1_STRING *x))
 ^
{noformat}

We should replace this (optionally with a OpenSSL version check) with a call to 
[X509_check_host|https://www.openssl.org/docs/man1.1.0/crypto/X509_check_host.html]
 which is available since OpenSSL 1.0.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6566) The Docker executor should not leak task env variables in the Docker command cmd line.

2017-07-21 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096869#comment-16096869
 ] 

Joseph Wu commented on MESOS-6566:
--

NOTE: We basically reversed this change in {{1.2.1}} because of MESOS-6951.

> The Docker executor should not leak task env variables in the Docker command 
> cmd line.
> --
>
> Key: MESOS-6566
> URL: https://issues.apache.org/jira/browse/MESOS-6566
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, security
>Reporter: Gastón Kleiman
>Assignee: Till Toenshoff
> Fix For: 1.2.0
>
>
> Task environment variables are sensitive, as they might contain secrets.
> The Docker executor starts tasks by executing a {{docker run}} command, and 
> it includes the env variables in the cmd line of the docker command, exposing 
> them to all the users in the machine:
> {code}
> $ ./src/mesos-execute --command="sleep 200" --containerizer=docker 
> --docker_image=alpine --env='{"foo": "bar"}' --master=10.0.2.15:5050 
> --name=test
> $ ps aux | grep bar
> [...] docker -H unix:///var/run/docker.sock run [...] -e foo=bar [...] alpine 
> -c sleep 200
> $
> {code}
> The Docker executor could pass Docker the {{--env-file}} flag, pointing it to 
> a file with the environment variables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7821) Resource refinement does downgrade task.executor.resources in LAUNCH_GROUP handler.

2017-07-21 Thread Jie Yu (JIRA)
Jie Yu created MESOS-7821:
-

 Summary: Resource refinement does downgrade 
task.executor.resources in LAUNCH_GROUP handler.
 Key: MESOS-7821
 URL: https://issues.apache.org/jira/browse/MESOS-7821
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Jie Yu


Looks like we need to downgrade task.executor.resources as well:
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L4970-L4982




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5254) Add URI parsing function/library

2017-07-21 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096692#comment-16096692
 ] 

Adam B commented on MESOS-5254:
---

Reviews have been discarded. Progress has been paused for a long time.

> Add URI parsing function/library
> 
>
> Key: MESOS-5254
> URL: https://issues.apache.org/jira/browse/MESOS-5254
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher, libprocess
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> The {{uri::Fetcher}} theoretically supports all URIs, per 
> [RFC3986|http://tools.ietf.org/html/rfc3986].  To do this, we need a 
> spec-compliant parser from string to URI.
> [uriparser|http://uriparser.sourceforge.net/] appears to fit the bill.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7820) Ability to review and control Mesos Reservations/Roles from the Web UI

2017-07-21 Thread Jeffrey Zampieron (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096639#comment-16096639
 ] 

Jeffrey Zampieron commented on MESOS-7820:
--

In light of MESOS-6441, I'm really getting at the ability to make simple 
changes via the UI. 

An example is that deleting a framework on DC/OS often leaves the resource role 
in ZK and you have to go and clean it out, amongst other similar things. 

The larger problem is that it's opaque to the user (but seems like that's 
better in 1.4).

> Ability to review and control Mesos Reservations/Roles from the Web UI
> --
>
> Key: MESOS-7820
> URL: https://issues.apache.org/jira/browse/MESOS-7820
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Jeffrey Zampieron
>
> When managing Mesos frameworks on DC/OS (i.e. Kafka/Cassandra, etc) it is 
> sometimes necessary to review and/or clean out various resource reservations 
> manually.
> It would be really helpful to be able to see/edit the various role 
> reservations (i.e. cassandra-role, etc) in a dedicated section of the web UI.
> Controlling orphaned items is of key interest.
> This is a really handy operational item for long running installations. 
> Poking around the API is great for DevOps tooling, but when a quick fix to a 
> running system is required, a full featured webui is vital.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7730) CUDA not working anymore on 1.3.0

2017-07-21 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096554#comment-16096554
 ] 

Kevin Klues commented on MESOS-7730:


Hmm. I'm not sure what would have changed to cause this error. I'll dig into it 
soon.

> CUDA not working anymore on 1.3.0
> -
>
> Key: MESOS-7730
> URL: https://issues.apache.org/jira/browse/MESOS-7730
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Adam Cecile
>Assignee: Kevin Klues
> Fix For: 1.2.1
>
>
> Hello,
> My docker container using CUDA do not detect it anymore.
> Here the tensorflow output with 1.2.1:
> {noformat}
> I0628 12:39:45.505900 16309 exec.cpp:162] Version: 1.2.1
> I0628 12:39:45.508358 16301 exec.cpp:237] Executor registered on agent 
> 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcublas.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcudnn.so.5 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcufft.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcuda.so.1 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcurand.so.8.0 locally
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE3 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE4.1 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE4.2 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use AVX instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use AVX2 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use FMA instructions, but these are available on your 
> machine and could speed up CPU computations.
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with 
> properties: 
> name: GeForce GTX 1080
> major: 6 minor: 1 memoryClockRate (GHz) 1.7335
> pciBusID :82:00.0
> Total memory: 7.92GiB
> Free memory: 7.81GiB
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow 
> device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 
> :82:00.0)
> {noformat}
> And with 1.3.0
> {noformat}
> I0628 12:40:30.833947 16854 exec.cpp:162] Version: 1.3.0
> I0628 12:40:30.836612 16845 exec.cpp:237] Executor registered on agent 
> 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcublas.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcudnn.so.5 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcufft.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library 
> libcuda.so.1. LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 
> zelda.service.earthlab.lu
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported 
> version is: Not found: was unable to find libcuda.so DSO loaded into this 
> program
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version 
> file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.66  Mon 
> May  1 15:29:16 PDT 2017
> GCC version:  gcc version 4.9.2 (Debian 4.9.2-10) 
> """
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported 
> version is: 375.66.0
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find 
> libcuda.so on this system: Failed precondition: could not dlopen DSO: 
> libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such 
> file or directory
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened 

[jira] [Assigned] (MESOS-7730) CUDA not working anymore on 1.3.0

2017-07-21 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-7730:
--

Assignee: Kevin Klues

> CUDA not working anymore on 1.3.0
> -
>
> Key: MESOS-7730
> URL: https://issues.apache.org/jira/browse/MESOS-7730
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Adam Cecile
>Assignee: Kevin Klues
> Fix For: 1.2.1
>
>
> Hello,
> My docker container using CUDA do not detect it anymore.
> Here the tensorflow output with 1.2.1:
> {noformat}
> I0628 12:39:45.505900 16309 exec.cpp:162] Version: 1.2.1
> I0628 12:39:45.508358 16301 exec.cpp:237] Executor registered on agent 
> 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcublas.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcudnn.so.5 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcufft.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcuda.so.1 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcurand.so.8.0 locally
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE3 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE4.1 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use SSE4.2 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use AVX instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use AVX2 instructions, but these are available on your 
> machine and could speed up CPU computations.
> W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library 
> wasn't compiled to use FMA instructions, but these are available on your 
> machine and could speed up CPU computations.
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with 
> properties: 
> name: GeForce GTX 1080
> major: 6 minor: 1 memoryClockRate (GHz) 1.7335
> pciBusID :82:00.0
> Total memory: 7.92GiB
> Free memory: 7.81GiB
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
> I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow 
> device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 
> :82:00.0)
> {noformat}
> And with 1.3.0
> {noformat}
> I0628 12:40:30.833947 16854 exec.cpp:162] Version: 1.3.0
> I0628 12:40:30.836612 16845 exec.cpp:237] Executor registered on agent 
> 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcublas.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcudnn.so.5 locally
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcufft.so.8.0 locally
> I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library 
> libcuda.so.1. LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 
> zelda.service.earthlab.lu
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported 
> version is: Not found: was unable to find libcuda.so DSO loaded into this 
> program
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version 
> file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.66  Mon 
> May  1 15:29:16 PDT 2017
> GCC version:  gcc version 4.9.2 (Debian 4.9.2-10) 
> """
> I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported 
> version is: 375.66.0
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: 
> I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find 
> libcuda.so on this system: Failed precondition: could not dlopen DSO: 
> libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such 
> file or directory
> I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA 
> library libcurand.so.8.0 locally
> W tensorflow/core/platform/cpu_feature_guard.cc:45] 

[jira] [Commented] (MESOS-6101) Add Framwork events to master's operator API

2017-07-21 Thread Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096448#comment-16096448
 ] 

Quinn commented on MESOS-6101:
--

https://reviews.apache.org/r/60928/
https://reviews.apache.org/r/60929/
https://reviews.apache.org/r/60930/
https://reviews.apache.org/r/60931/

> Add Framwork events to master's operator API
> 
>
> Key: MESOS-6101
> URL: https://issues.apache.org/jira/browse/MESOS-6101
> Project: Mesos
>  Issue Type: Task
>Reporter: Zhitao Li
>Assignee: Quinn
>
> Consider the following case:
> 1) a subscriber connects to master;
> 2) a new scheduler registered as a new framework;
> 3) a task is launched from this framework.
> In this sequence, subscriber does not have a way to know the FrameworkInfo 
> belonging to the FrameworkId.
> We should support an event (e.g. when framework info in master is 
> added/changed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6101) Add event for Framwork added to master operator API

2017-07-21 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6101:
-
Sprint: Mesosphere Sprint 60

> Add event for Framwork added to master operator API
> ---
>
> Key: MESOS-6101
> URL: https://issues.apache.org/jira/browse/MESOS-6101
> Project: Mesos
>  Issue Type: Task
>Reporter: Zhitao Li
>Assignee: Quinn
>
> Consider the following case:
> 1) a subscriber connects to master;
> 2) a new scheduler registered as a new framework;
> 3) a task is launched from this framework.
> In this sequence, subscriber does not have a way to know the FrameworkInfo 
> belonging to the FrameworkId.
> We should support an event (e.g. when framework info in master is 
> added/changed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6101) Add Framwork events to master's operator API

2017-07-21 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6101:
-
Summary: Add Framwork events to master's operator API  (was: Add event for 
Framwork added to master operator API)

> Add Framwork events to master's operator API
> 
>
> Key: MESOS-6101
> URL: https://issues.apache.org/jira/browse/MESOS-6101
> Project: Mesos
>  Issue Type: Task
>Reporter: Zhitao Li
>Assignee: Quinn
>
> Consider the following case:
> 1) a subscriber connects to master;
> 2) a new scheduler registered as a new framework;
> 3) a task is launched from this framework.
> In this sequence, subscriber does not have a way to know the FrameworkInfo 
> belonging to the FrameworkId.
> We should support an event (e.g. when framework info in master is 
> added/changed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7820) Ability to review and control Mesos Reservations/Roles from the Web UI

2017-07-21 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096213#comment-16096213
 ] 

Benjamin Bannier commented on MESOS-7820:
-

Could you be more specific about what you see missing in the webui regarding 
reservations now that MESOS-6441 has landed for 1.4?

Note that currently the webui does not allow to perform any actions, but is 
only a viewer.

> Ability to review and control Mesos Reservations/Roles from the Web UI
> --
>
> Key: MESOS-7820
> URL: https://issues.apache.org/jira/browse/MESOS-7820
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Jeffrey Zampieron
>
> When managing Mesos frameworks on DC/OS (i.e. Kafka/Cassandra, etc) it is 
> sometimes necessary to review and/or clean out various resource reservations 
> manually.
> It would be really helpful to be able to see/edit the various role 
> reservations (i.e. cassandra-role, etc) in a dedicated section of the web UI.
> Controlling orphaned items is of key interest.
> This is a really handy operational item for long running installations. 
> Poking around the API is great for DevOps tooling, but when a quick fix to a 
> running system is required, a full featured webui is vital.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7820) Ability to review and control Mesos Reservations/Roles from the Web UI

2017-07-21 Thread Jeffrey Zampieron (JIRA)
Jeffrey Zampieron created MESOS-7820:


 Summary: Ability to review and control Mesos Reservations/Roles 
from the Web UI
 Key: MESOS-7820
 URL: https://issues.apache.org/jira/browse/MESOS-7820
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Jeffrey Zampieron


When managing Mesos frameworks on DC/OS (i.e. Kafka/Cassandra, etc) it is 
sometimes necessary to review and/or clean out various resource reservations 
manually.

It would be really helpful to be able to see/edit the various role reservations 
(i.e. cassandra-role, etc) in a dedicated section of the web UI.

Controlling orphaned items is of key interest.

This is a really handy operational item for long running installations. Poking 
around the API is great for DevOps tooling, but when a quick fix to a running 
system is required, a full featured webui is vital.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6831) Add metrics for `slave` libprocess' event queue

2017-07-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6831:
---
Labels: metrics monitoring  (was: monitoring)

> Add metrics for `slave` libprocess' event queue
> ---
>
> Key: MESOS-6831
> URL: https://issues.apache.org/jira/browse/MESOS-6831
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>  Labels: metrics, monitoring
>
> We have event queue metrics for master and allocator in 
> http://mesos.apache.org/documentation/latest/monitoring/, but we don't have 
> the event queue length for the most important libprocess actor in agent 
> `slave`.
> I propose we add similar metrics to this actor. This is at least useful in 
> debugging the issues of whether  Mesos agent is overloaded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7812) Request to add artifacts ( binary ) creation steps to the mesos-ppc64le jenkins job

2017-07-21 Thread Amitkumar Ghatwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095835#comment-16095835
 ] 

Amitkumar Ghatwal commented on MESOS-7812:
--

[~vinodkone] -  Hey Vinod , Thanks for the update on binaries !!  So we will 
await further inputs from the community on Mesos binaries for power ( ppc64le ).


> Request to add artifacts ( binary ) creation steps to the mesos-ppc64le 
> jenkins job
> ---
>
> Key: MESOS-7812
> URL: https://issues.apache.org/jira/browse/MESOS-7812
> Project: Mesos
>  Issue Type: Wish
> Environment: OS - Ubuntu
> Platform - ppc64le
>Reporter: Amitkumar Ghatwal
>
> Hi All,
> In reference to the job re-enabled for ppc64le via this JIRA ticket  - 
> https://issues.apache.org/jira/browse/INFRA-14367 again . Wanted to know if 
> its possible to add steps to this jenkins job so that we can get artifacts 
> such as binary/installers ( *.deb) for mesos on ubuntu-ppc64le during the job 
> build.
> Job - https://builds.apache.org/job/Mesos-PPC64LE/.
> Binary installer ( *.deb) for mesos on ppc64le will come in handy for one 
> step installation on power.
> Requesting [~vinodkone] , to comment if you have any information to add 
> artifacts creation for this jenkins job. 
> Regards,
> Amit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)