[jira] [Commented] (MESOS-8248) Expose information about GPU assigned to a task

2018-12-04 Thread Chang Lan (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709443#comment-16709443
 ] 

Chang Lan commented on MESOS-8248:
--

Any update on this?

> Expose information about GPU assigned to a task
> ---
>
> Key: MESOS-8248
> URL: https://issues.apache.org/jira/browse/MESOS-8248
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, gpu
>Reporter: Karthik Anantha Padmanabhan
>Priority: Major
>  Labels: GPU
>
> As a framework author I'd like information about the gpu that was assigned to 
> a task.
> `nvidia-smi` for example provides the following information GPU UUID, boardId 
> minor number etc. It would useful to expose this information when a task is 
> assigned to a GPU instance.
> This will make it possible to monitor resource usage for a task on GPU which 
> is not possible when



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9451) Libprocess endpoints can ignore required gzip compression

2018-12-04 Thread Ilya Pronin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709355#comment-16709355
 ] 

Ilya Pronin edited comment on MESOS-9451 at 12/4/18 10:32 PM:
--

Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] 
{{Accept-Encoding}} header field is an advertisement that a particular encoding 
is supported by the requester. The server may still use {{identity}} encoding 
(no encoding) unless the client forbids it with {{identity;q=0}}.

I think it's OK for libprocess to continue to apply body length threshold as 
long as it checks that {{identity}}'s weight is not 0.


was (Author: ipronin):
Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] 
{{Accept-Encoding}} header field is an advertisement that a particular encoding 
is supported by the requestor. The server may still use {{identity}} encoding 
(no encoding) unless the client forbids it with {{identity;q=0}}.

I think it's OK for libprocess to continue to apply body length threshold as 
long as it checks that {{identity}}'s weight is not 0.

> Libprocess endpoints can ignore required gzip compression
> -
>
> Key: MESOS-9451
> URL: https://issues.apache.org/jira/browse/MESOS-9451
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>  Labels: libprocess
>
> Currently, libprocess decides whether a response should be compressed by the 
> following conditional:
> {noformat}
> if (response.type == http::Response::BODY &&
> response.body.length() >= GZIP_MINIMUM_BODY_LENGTH &&
> !headers.contains("Content-Encoding") &&
> request.acceptsEncoding("gzip")) {
>   [...]
> {noformat}
> However, this implies that a request sent with the header "Accept-Encoding: 
> gzip" can not rely on actually getting a gzipped response, e.g. when the 
> response size is below the threshold:
> {noformat}
> $ nc localhost 5050
> GET /tasks HTTP/1.1
> Accept-Encoding: gzip
> HTTP/1.1 200 OK
> Date: Tue, 04 Dec 2018 12:49:56 GMT
> Content-Type: application/json
> Content-Length: 12
> {"tasks":[]}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9451) Libprocess endpoints can ignore required gzip compression

2018-12-04 Thread Ilya Pronin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709355#comment-16709355
 ] 

Ilya Pronin commented on MESOS-9451:


Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] 
{{Accept-Encoding}} header field is an advertisement that a particular encoding 
is supported by the requestor. The server may still use {{identity}} encoding 
(no encoding) unless the client forbids it with {{identity;q=0}}.

I think it's OK for libprocess to continue to apply body length threshold as 
long as it checks that {{identity}}'s weight is not 0.

> Libprocess endpoints can ignore required gzip compression
> -
>
> Key: MESOS-9451
> URL: https://issues.apache.org/jira/browse/MESOS-9451
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>  Labels: libprocess
>
> Currently, libprocess decides whether a response should be compressed by the 
> following conditional:
> {noformat}
> if (response.type == http::Response::BODY &&
> response.body.length() >= GZIP_MINIMUM_BODY_LENGTH &&
> !headers.contains("Content-Encoding") &&
> request.acceptsEncoding("gzip")) {
>   [...]
> {noformat}
> However, this implies that a request sent with the header "Accept-Encoding: 
> gzip" can not rely on actually getting a gzipped response, e.g. when the 
> response size is below the threshold:
> {noformat}
> $ nc localhost 5050
> GET /tasks HTTP/1.1
> Accept-Encoding: gzip
> HTTP/1.1 200 OK
> Date: Tue, 04 Dec 2018 12:49:56 GMT
> Content-Type: application/json
> Content-Length: 12
> {"tasks":[]}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9452) Improve the performance of random sorter.

2018-12-04 Thread Meng Zhu (JIRA)
Meng Zhu created MESOS-9452:
---

 Summary: Improve the performance of random sorter.
 Key: MESOS-9452
 URL: https://issues.apache.org/jira/browse/MESOS-9452
 Project: Mesos
  Issue Type: Improvement
Reporter: Meng Zhu


The random sorter currently has a lot of unnecessary tracking logic inherited 
from the DRF sorter. Eliminating these could simplify the logic and improve 
performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9410) CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat is flaky.

2018-12-04 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709034#comment-16709034
 ] 

Till Toenshoff commented on MESOS-9410:
---

Saw it again;
{noformat}
16:39:10  [ RUN  ] 
CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat
16:39:10  ../../src/tests/containerizer/cgroups_tests.cpp:555: Failure
16:39:10  Expected: (result->get("rss").get()) > (0llu), actual: 0 vs 0
16:39:10  [  FAILED  ] 
CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat (17 ms)
{noformat}

> CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat is flaky.
> 
>
> Key: MESOS-9410
> URL: https://issues.apache.org/jira/browse/MESOS-9410
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.8.0
> Environment: Ubuntu 16.04
>Reporter: Till Toenshoff
>Priority: Major
>  Labels: containerizer, flaky, flaky-test
>
> Just observed this on Ubuntu 16.04 in a private CI:
> {noformat}
> 17:06:28  [ RUN  ] 
> CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat
> 17:06:28  ../../src/tests/containerizer/cgroups_tests.cpp:555: Failure
> 17:06:28  Expected: (result->get("rss").get()) > (0llu), actual: 0 vs 0
> 17:06:28  [  FAILED  ] 
> CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat (16 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8783) Transition pending operations to OPERATION_UNREACHABLE when an agent is removed.

2018-12-04 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-8783:


Assignee: Gastón Kleiman

> Transition pending operations to OPERATION_UNREACHABLE when an agent is 
> removed.
> 
>
> Key: MESOS-8783
> URL: https://issues.apache.org/jira/browse/MESOS-8783
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: foundations
> Fix For: 1.8.0
>
>
> Pending operations on an agent should be transitioned to 
> `OPERATION_UNREACHABLE` when an agent is removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8782) Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent gone.

2018-12-04 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-8782:


Assignee: Greg Mann

> Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent 
> gone.
> ---
>
> Key: MESOS-8782
> URL: https://issues.apache.org/jira/browse/MESOS-8782
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>Priority: Critical
>  Labels: foundations
> Fix For: 1.8.0
>
>
> The master should transition operations to the state 
> {{OPERATION_GONE_BY_OPERATOR}} when an agent is marked gone, sending an 
> operation status update to the frameworks that created them.
> We should also remove them from {{Master::frameworks}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9247) MasterAPITest.EventAuthorizationFiltering is flaky

2018-12-04 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708852#comment-16708852
 ] 

Till Toenshoff commented on MESOS-9247:
---

Reducing priority as it failed only once in a long time. 

> MasterAPITest.EventAuthorizationFiltering is flaky
> --
>
> Key: MESOS-9247
> URL: https://issues.apache.org/jira/browse/MESOS-9247
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.0
>Reporter: Greg Mann
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: flaky, flaky-test, integration, mesosphere
> Attachments: MasterAPITest.EventAuthorizationFiltering.txt
>
>
> Saw this failure on a CentOS 6 SSL build in our internal CI. Build log 
> attached. For some reason, it seems that the initial {{TASK_ADDED}} event is 
> missed:
> {code}
> ../../src/tests/api_tests.cpp:2922
>   Expected: v1::master::Event::TASK_ADDED
>   Which is: TASK_ADDED
> To be equal to: event->get().type()
>   Which is: TASK_UPDATED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9448) Semantics of RECONCILE_OPERATIONS framework API call are incorrect

2018-12-04 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708723#comment-16708723
 ] 

Benjamin Bannier commented on MESOS-9448:
-

Thanks for the additional details, [~gkleiman].

With the current approach we expect the master's HTTP handler to have enough 
information to assemble a response immediately, i.e., without deferring to the 
agent or and resource provider managers. If one wanted to say send different 
operation status for operations on (1) active, but currently unsubscribed 
resource providers, and on (2) removed resource providers, we would need to 
sync at least resource providers ever active in the cluster to the master. 
Currently master and agent communicate via {{UpdateSlaveMessage}} which is 
about_active providers_ and their operations, but not about all providers (both 
present and past) which the master would need to distinguish a disconnected 
provider from a removed one. Explicitly communicating that information to the 
master seems wasteful (after all,  a resource provider manager would have this 
information already) and potentially not scalable (e.g., to many agents with a 
lot of provider churn).

Currently master sends {{OPERATION_UNKNOWN}} for any resource provider it has 
not yet seen which is too coarse-grained for frameworks, see MESOS-9318. It 
seems that the current semantics impose a huge cost on improving that.

All this would seem much simpler in a world were a call to reconcile operations 
would trigger asynchronously triggered operation status update events from all 
involved entities (i.e., agents, and resource provider managers). Here master 
would defer the work to the entity actually managing that state.

> Semantics of RECONCILE_OPERATIONS framework API call are incorrect
> --
>
> Key: MESOS-9448
> URL: https://issues.apache.org/jira/browse/MESOS-9448
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Reporter: Benjamin Bannier
>Priority: Major
>
> The typical pattern in the framework HTTP API is that frameworks send calls 
> to which the master responds with {{Accepted}} responses and which trigger 
> events. The only designed exception to this are {{SUBSCRIBE}} calls to which 
> the master responds with an {{Ok}} response containing the assigned framework 
> ID. This is even codified in {{src/scheduler.cpp:646ff}},
> {code}
> if (response->code == process::http::Status::OK) {
>   // Only SUBSCRIBE call should get a "200 OK" response.
>   CHECK_EQ(Call::SUBSCRIBE, call.type());
> {code}
> Currently, the handling of {{RECONCILE_OPERATIONS}} calls does not follow 
> this pattern. Instead of sending events, the master immediately responds with 
> a {{Ok}} and a list of operations. This e.g., leads to assertion failures in 
> above hard check whenever one uses the {{Scheduler::send}} instead of 
> {{Scheduler::call}}. One can reproduce this by modifying the existing tests 
> in {{src/operation_reconciliation_tests.cpp}},
> {code}
> mesos.send({createCallReconcileOperations(frameworkId, {operation})}); // ADD 
> THIS.
> const Future result =
>   mesos.call({createCallReconcileOperations(frameworkId, {operation})});
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9451) Libprocess endpoints can ignore required gzip compression

2018-12-04 Thread Benno Evers (JIRA)
Benno Evers created MESOS-9451:
--

 Summary: Libprocess endpoints can ignore required gzip compression
 Key: MESOS-9451
 URL: https://issues.apache.org/jira/browse/MESOS-9451
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers


Currently, libprocess decides whether a response should be compressed by the 
following conditional:
{noformat}
if (response.type == http::Response::BODY &&
response.body.length() >= GZIP_MINIMUM_BODY_LENGTH &&
!headers.contains("Content-Encoding") &&
request.acceptsEncoding("gzip")) {
  [...]
{noformat}

However, this implies that a request sent with the header "Accept-Encoding: 
gzip" can not rely on actually getting a gzipped response, e.g. when the 
response size is below the threshold:
{noformat}
$ nc localhost 5050
GET /tasks HTTP/1.1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Date: Tue, 04 Dec 2018 12:49:56 GMT
Content-Type: application/json
Content-Length: 12

{"tasks":[]}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9157) cannot pull docker image from dockerhub

2018-12-04 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708647#comment-16708647
 ] 

Andrei Budnik commented on MESOS-9157:
--

[~MichaelBowie] feel free to reach out to me directly if you need any help on 
this ticket via [https://mesos.slack.com/] 

> cannot pull docker image from dockerhub
> ---
>
> Key: MESOS-9157
> URL: https://issues.apache.org/jira/browse/MESOS-9157
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.6.1
>Reporter: Michael Bowie
>Priority: Blocker
>  Labels: containerization
>
> I am not able to pull docker images from docker hub through marathon/mesos. 
> I get one of two errors:
>  * `Aug 15 10:11:02 michael-b-dcos-agent-1 dockerd[5974]: 
> time="2018-08-15T10:11:02.770309104-04:00" level=error msg="Not continuing 
> with pull after error: context canceled"`
>  * `Failed to run docker -H ... Error: No such object: 
> mesos-d2f333a8-fef2-48fb-8b99-28c52c327790`
> However, I can manually ssh into one of the agents and successfully pull the 
> image from the command line. 
> Any pointers in the right direction?
> Thank you!
> Similar Issues:
> https://github.com/mesosphere/marathon/issues/3869



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)