[jira] [Commented] (MESOS-8248) Expose information about GPU assigned to a task
[ https://issues.apache.org/jira/browse/MESOS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709443#comment-16709443 ] Chang Lan commented on MESOS-8248: -- Any update on this? > Expose information about GPU assigned to a task > --- > > Key: MESOS-8248 > URL: https://issues.apache.org/jira/browse/MESOS-8248 > Project: Mesos > Issue Type: Improvement > Components: containerization, gpu >Reporter: Karthik Anantha Padmanabhan >Priority: Major > Labels: GPU > > As a framework author I'd like information about the gpu that was assigned to > a task. > `nvidia-smi` for example provides the following information GPU UUID, boardId > minor number etc. It would useful to expose this information when a task is > assigned to a GPU instance. > This will make it possible to monitor resource usage for a task on GPU which > is not possible when -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9451) Libprocess endpoints can ignore required gzip compression
[ https://issues.apache.org/jira/browse/MESOS-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709355#comment-16709355 ] Ilya Pronin edited comment on MESOS-9451 at 12/4/18 10:32 PM: -- Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] {{Accept-Encoding}} header field is an advertisement that a particular encoding is supported by the requester. The server may still use {{identity}} encoding (no encoding) unless the client forbids it with {{identity;q=0}}. I think it's OK for libprocess to continue to apply body length threshold as long as it checks that {{identity}}'s weight is not 0. was (Author: ipronin): Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] {{Accept-Encoding}} header field is an advertisement that a particular encoding is supported by the requestor. The server may still use {{identity}} encoding (no encoding) unless the client forbids it with {{identity;q=0}}. I think it's OK for libprocess to continue to apply body length threshold as long as it checks that {{identity}}'s weight is not 0. > Libprocess endpoints can ignore required gzip compression > - > > Key: MESOS-9451 > URL: https://issues.apache.org/jira/browse/MESOS-9451 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Priority: Major > Labels: libprocess > > Currently, libprocess decides whether a response should be compressed by the > following conditional: > {noformat} > if (response.type == http::Response::BODY && > response.body.length() >= GZIP_MINIMUM_BODY_LENGTH && > !headers.contains("Content-Encoding") && > request.acceptsEncoding("gzip")) { > [...] > {noformat} > However, this implies that a request sent with the header "Accept-Encoding: > gzip" can not rely on actually getting a gzipped response, e.g. when the > response size is below the threshold: > {noformat} > $ nc localhost 5050 > GET /tasks HTTP/1.1 > Accept-Encoding: gzip > HTTP/1.1 200 OK > Date: Tue, 04 Dec 2018 12:49:56 GMT > Content-Type: application/json > Content-Length: 12 > {"tasks":[]} > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9451) Libprocess endpoints can ignore required gzip compression
[ https://issues.apache.org/jira/browse/MESOS-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709355#comment-16709355 ] Ilya Pronin commented on MESOS-9451: Per [RFC 7231|https://tools.ietf.org/html/rfc7231#section-5.3.4] {{Accept-Encoding}} header field is an advertisement that a particular encoding is supported by the requestor. The server may still use {{identity}} encoding (no encoding) unless the client forbids it with {{identity;q=0}}. I think it's OK for libprocess to continue to apply body length threshold as long as it checks that {{identity}}'s weight is not 0. > Libprocess endpoints can ignore required gzip compression > - > > Key: MESOS-9451 > URL: https://issues.apache.org/jira/browse/MESOS-9451 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Priority: Major > Labels: libprocess > > Currently, libprocess decides whether a response should be compressed by the > following conditional: > {noformat} > if (response.type == http::Response::BODY && > response.body.length() >= GZIP_MINIMUM_BODY_LENGTH && > !headers.contains("Content-Encoding") && > request.acceptsEncoding("gzip")) { > [...] > {noformat} > However, this implies that a request sent with the header "Accept-Encoding: > gzip" can not rely on actually getting a gzipped response, e.g. when the > response size is below the threshold: > {noformat} > $ nc localhost 5050 > GET /tasks HTTP/1.1 > Accept-Encoding: gzip > HTTP/1.1 200 OK > Date: Tue, 04 Dec 2018 12:49:56 GMT > Content-Type: application/json > Content-Length: 12 > {"tasks":[]} > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9452) Improve the performance of random sorter.
Meng Zhu created MESOS-9452: --- Summary: Improve the performance of random sorter. Key: MESOS-9452 URL: https://issues.apache.org/jira/browse/MESOS-9452 Project: Mesos Issue Type: Improvement Reporter: Meng Zhu The random sorter currently has a lot of unnecessary tracking logic inherited from the DRF sorter. Eliminating these could simplify the logic and improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9410) CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat is flaky.
[ https://issues.apache.org/jira/browse/MESOS-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709034#comment-16709034 ] Till Toenshoff commented on MESOS-9410: --- Saw it again; {noformat} 16:39:10 [ RUN ] CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat 16:39:10 ../../src/tests/containerizer/cgroups_tests.cpp:555: Failure 16:39:10 Expected: (result->get("rss").get()) > (0llu), actual: 0 vs 0 16:39:10 [ FAILED ] CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat (17 ms) {noformat} > CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat is flaky. > > > Key: MESOS-9410 > URL: https://issues.apache.org/jira/browse/MESOS-9410 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.8.0 > Environment: Ubuntu 16.04 >Reporter: Till Toenshoff >Priority: Major > Labels: containerizer, flaky, flaky-test > > Just observed this on Ubuntu 16.04 in a private CI: > {noformat} > 17:06:28 [ RUN ] > CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat > 17:06:28 ../../src/tests/containerizer/cgroups_tests.cpp:555: Failure > 17:06:28 Expected: (result->get("rss").get()) > (0llu), actual: 0 vs 0 > 17:06:28 [ FAILED ] > CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat (16 ms) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8783) Transition pending operations to OPERATION_UNREACHABLE when an agent is removed.
[ https://issues.apache.org/jira/browse/MESOS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8783: Assignee: Gastón Kleiman > Transition pending operations to OPERATION_UNREACHABLE when an agent is > removed. > > > Key: MESOS-8783 > URL: https://issues.apache.org/jira/browse/MESOS-8783 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.5.0, 1.6.0 >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Critical > Labels: foundations > Fix For: 1.8.0 > > > Pending operations on an agent should be transitioned to > `OPERATION_UNREACHABLE` when an agent is removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8782) Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent gone.
[ https://issues.apache.org/jira/browse/MESOS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8782: Assignee: Greg Mann > Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent > gone. > --- > > Key: MESOS-8782 > URL: https://issues.apache.org/jira/browse/MESOS-8782 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.5.0, 1.6.0 >Reporter: Gastón Kleiman >Assignee: Greg Mann >Priority: Critical > Labels: foundations > Fix For: 1.8.0 > > > The master should transition operations to the state > {{OPERATION_GONE_BY_OPERATOR}} when an agent is marked gone, sending an > operation status update to the frameworks that created them. > We should also remove them from {{Master::frameworks}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9247) MasterAPITest.EventAuthorizationFiltering is flaky
[ https://issues.apache.org/jira/browse/MESOS-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708852#comment-16708852 ] Till Toenshoff commented on MESOS-9247: --- Reducing priority as it failed only once in a long time. > MasterAPITest.EventAuthorizationFiltering is flaky > -- > > Key: MESOS-9247 > URL: https://issues.apache.org/jira/browse/MESOS-9247 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.7.0 >Reporter: Greg Mann >Assignee: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test, integration, mesosphere > Attachments: MasterAPITest.EventAuthorizationFiltering.txt > > > Saw this failure on a CentOS 6 SSL build in our internal CI. Build log > attached. For some reason, it seems that the initial {{TASK_ADDED}} event is > missed: > {code} > ../../src/tests/api_tests.cpp:2922 > Expected: v1::master::Event::TASK_ADDED > Which is: TASK_ADDED > To be equal to: event->get().type() > Which is: TASK_UPDATED > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9448) Semantics of RECONCILE_OPERATIONS framework API call are incorrect
[ https://issues.apache.org/jira/browse/MESOS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708723#comment-16708723 ] Benjamin Bannier commented on MESOS-9448: - Thanks for the additional details, [~gkleiman]. With the current approach we expect the master's HTTP handler to have enough information to assemble a response immediately, i.e., without deferring to the agent or and resource provider managers. If one wanted to say send different operation status for operations on (1) active, but currently unsubscribed resource providers, and on (2) removed resource providers, we would need to sync at least resource providers ever active in the cluster to the master. Currently master and agent communicate via {{UpdateSlaveMessage}} which is about_active providers_ and their operations, but not about all providers (both present and past) which the master would need to distinguish a disconnected provider from a removed one. Explicitly communicating that information to the master seems wasteful (after all, a resource provider manager would have this information already) and potentially not scalable (e.g., to many agents with a lot of provider churn). Currently master sends {{OPERATION_UNKNOWN}} for any resource provider it has not yet seen which is too coarse-grained for frameworks, see MESOS-9318. It seems that the current semantics impose a huge cost on improving that. All this would seem much simpler in a world were a call to reconcile operations would trigger asynchronously triggered operation status update events from all involved entities (i.e., agents, and resource provider managers). Here master would defer the work to the entity actually managing that state. > Semantics of RECONCILE_OPERATIONS framework API call are incorrect > -- > > Key: MESOS-9448 > URL: https://issues.apache.org/jira/browse/MESOS-9448 > Project: Mesos > Issue Type: Bug > Components: framework, HTTP API, master >Reporter: Benjamin Bannier >Priority: Major > > The typical pattern in the framework HTTP API is that frameworks send calls > to which the master responds with {{Accepted}} responses and which trigger > events. The only designed exception to this are {{SUBSCRIBE}} calls to which > the master responds with an {{Ok}} response containing the assigned framework > ID. This is even codified in {{src/scheduler.cpp:646ff}}, > {code} > if (response->code == process::http::Status::OK) { > // Only SUBSCRIBE call should get a "200 OK" response. > CHECK_EQ(Call::SUBSCRIBE, call.type()); > {code} > Currently, the handling of {{RECONCILE_OPERATIONS}} calls does not follow > this pattern. Instead of sending events, the master immediately responds with > a {{Ok}} and a list of operations. This e.g., leads to assertion failures in > above hard check whenever one uses the {{Scheduler::send}} instead of > {{Scheduler::call}}. One can reproduce this by modifying the existing tests > in {{src/operation_reconciliation_tests.cpp}}, > {code} > mesos.send({createCallReconcileOperations(frameworkId, {operation})}); // ADD > THIS. > const Future result = > mesos.call({createCallReconcileOperations(frameworkId, {operation})}); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9451) Libprocess endpoints can ignore required gzip compression
Benno Evers created MESOS-9451: -- Summary: Libprocess endpoints can ignore required gzip compression Key: MESOS-9451 URL: https://issues.apache.org/jira/browse/MESOS-9451 Project: Mesos Issue Type: Bug Reporter: Benno Evers Currently, libprocess decides whether a response should be compressed by the following conditional: {noformat} if (response.type == http::Response::BODY && response.body.length() >= GZIP_MINIMUM_BODY_LENGTH && !headers.contains("Content-Encoding") && request.acceptsEncoding("gzip")) { [...] {noformat} However, this implies that a request sent with the header "Accept-Encoding: gzip" can not rely on actually getting a gzipped response, e.g. when the response size is below the threshold: {noformat} $ nc localhost 5050 GET /tasks HTTP/1.1 Accept-Encoding: gzip HTTP/1.1 200 OK Date: Tue, 04 Dec 2018 12:49:56 GMT Content-Type: application/json Content-Length: 12 {"tasks":[]} {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9157) cannot pull docker image from dockerhub
[ https://issues.apache.org/jira/browse/MESOS-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708647#comment-16708647 ] Andrei Budnik commented on MESOS-9157: -- [~MichaelBowie] feel free to reach out to me directly if you need any help on this ticket via [https://mesos.slack.com/] > cannot pull docker image from dockerhub > --- > > Key: MESOS-9157 > URL: https://issues.apache.org/jira/browse/MESOS-9157 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.6.1 >Reporter: Michael Bowie >Priority: Blocker > Labels: containerization > > I am not able to pull docker images from docker hub through marathon/mesos. > I get one of two errors: > * `Aug 15 10:11:02 michael-b-dcos-agent-1 dockerd[5974]: > time="2018-08-15T10:11:02.770309104-04:00" level=error msg="Not continuing > with pull after error: context canceled"` > * `Failed to run docker -H ... Error: No such object: > mesos-d2f333a8-fef2-48fb-8b99-28c52c327790` > However, I can manually ssh into one of the agents and successfully pull the > image from the command line. > Any pointers in the right direction? > Thank you! > Similar Issues: > https://github.com/mesosphere/marathon/issues/3869 -- This message was sent by Atlassian JIRA (v7.6.3#76005)