[jira] [Comment Edited] (MESOS-8241) Add metrics for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765518#comment-16765518 ] Greg Mann edited comment on MESOS-8241 at 2/12/19 6:50 AM: --- This should include: * operation in each operation state ** counters for terminal states, showing total number of operations which the master has observed transition to that state *** note that for the OPERATION_ERROR case, there will be operations dropped by the master, e.g. because they are invalidated, which do not request feedback and thus do not have any OPERATION_ERROR updates associated with them. we should probably still increment the corresponding {{operation_error}} metric in these cases * gauges for non-terminal states, representing the number of operations currently in that state in the system. note that we should use the newer {{PushGauge}} for all of these. was (Author: greggomann): This should include: * operation in each operation state ** counters for terminal states, showing total number of operations which the master has observed transition to that state *** note that for the OPERATION_ERROR case, there will be operations dropped by the master, e.g. because they are invalidated, which do not request feedback and thus do not have any OPERATION_ERROR updates associated with them. we should probably still increment the corresponding {{operation_error}} metric in these cases ** gauges for non-terminal states, representing the number of operations currently in that state in the system. note that we should use the newer {{PushGauge}} for all of these. > Add metrics for offer operation feedback > > > Key: MESOS-8241 > URL: https://issues.apache.org/jira/browse/MESOS-8241 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Priority: Major > Labels: foundations, mesosphere, operation-feedback > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8241) Add metrics for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765518#comment-16765518 ] Greg Mann edited comment on MESOS-8241 at 2/12/19 6:50 AM: --- This should include: * operation in each operation state ** counters for terminal states, showing total number of operations which the master has observed transition to that state *** note that for the OPERATION_ERROR case, there will be operations dropped by the master, e.g. because they are invalidated, which do not request feedback and thus do not have any OPERATION_ERROR updates associated with them. we should probably still increment the corresponding {{operation_error}} metric in these cases ** gauges for non-terminal states, representing the number of operations currently in that state in the system. note that we should use the newer {{PushGauge}} for all of these. was (Author: greggomann): This should include metrics for dropped operations and operation status updates. > Add metrics for offer operation feedback > > > Key: MESOS-8241 > URL: https://issues.apache.org/jira/browse/MESOS-8241 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Priority: Major > Labels: foundations, mesosphere, operation-feedback > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8241) Add metrics for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765518#comment-16765518 ] Greg Mann edited comment on MESOS-8241 at 2/12/19 6:38 AM: --- This should include metrics for dropped operations and operation status updates. was (Author: greggomann): This should include metrics for dropped operations. > Add metrics for offer operation feedback > > > Key: MESOS-8241 > URL: https://issues.apache.org/jira/browse/MESOS-8241 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Priority: Major > Labels: foundations, mesosphere, operation-feedback > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9473) Add end to end tests for operations on agent default resources.
[ https://issues.apache.org/jira/browse/MESOS-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765703#comment-16765703 ] Greg Mann commented on MESOS-9473: -- {code} commit 02e4477187173fcae86ca6cb8db002f0f90fcf5f Author: Gastón Kleiman Date: Fri Feb 8 18:29:44 2019 -0800 Added tests for feedback for operations on agent default resources. Review: https://reviews.apache.org/r/69910/ {code} {code} commit fa6ea019109e74a23576c3f736fda7d4faa16bc2 Author: Gastón Kleiman Date: Fri Feb 8 18:29:46 2019 -0800 Removed `MasterAPITest.OperationFeedbackOnAgentDefaultResources`. This patch removes a `MasterAPITest`, because the new test suite `AgentOperationFeedbackTest` already covers the scenario from the original test. Review: https://reviews.apache.org/r/69920/ {code} {code} commit 156da38ef63bf1815b59abd20a719156f4b1fc6d Author: Gastón Kleiman Date: Fri Feb 8 18:29:55 2019 -0800 Added missing periods at the end of comments. Review: https://reviews.apache.org/r/69919/ {code} {code} commit 34b0adc83b5eef4a7b2bb125203e5efa56497121 Author: Gastón Kleiman Date: Fri Feb 8 18:29:56 2019 -0800 Added tests for reconciliation of operations on agent default resources. Review: https://reviews.apache.org/r/69911/ {code} > Add end to end tests for operations on agent default resources. > --- > > Key: MESOS-9473 > URL: https://issues.apache.org/jira/browse/MESOS-9473 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Major > Labels: foundations, mesosphere, operation-feedback > > Making note of particular cases we need to test: > * Verify that frameworks will receive OPERATION_GONE_BY_OPERATOR for > operations on agent default resources when an agent is marked gone > * Verify that frameworks will receive OPERATION_GONE_BY_OPERATOR when they > reconcile operations on agents which have been marked gone -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9565) Unit tests for destroying persistent volumes in SLRP.
Chun-Hung Hsiao created MESOS-9565: -- Summary: Unit tests for destroying persistent volumes in SLRP. Key: MESOS-9565 URL: https://issues.apache.org/jira/browse/MESOS-9565 Project: Mesos Issue Type: Task Components: test Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao The plan is to add/update the following unit tests to test persistent volume destroy: * CreateDestroyDisk * CreateDestroyDiskWithRecovery * CreateDestroyPersistentMountVolume * CreateDestroyPersistentMountVolumeWithRecovery * CreateDestroyPersistentMountVolumeWithReboot * CreateDestroyPersistentBlockVolume * DestroyPersistentMountVolumeFailed * DestroyUnpublishedPersistentVolume * DestroyUnpublishedPersistentVolumeWithRecovery * DestroyUnpublishedPersistentVolumeWithReboot * RecoverPublishedPersistentVolumeFailed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9562) Authorization for DESTROY and UNRESERVE is not symmetrical.
[ https://issues.apache.org/jira/browse/MESOS-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765693#comment-16765693 ] Chun-Hung Hsiao commented on MESOS-9562: For {{UNRESERVE}}, we current support the following two use cases: 1. If all resources the {{UNRESERVE}} operation applies to have reservation principals, there will be one authorization request for each resource. 2. If none of the resources has any principal, there will be one single authorization request to verify if the subject is authorized to perform an {{UNRESERVE}} operation. Equivalently, if a subject is authorized to do {{UNRESERVE}} on any reservation with a principal, Mesos would implicitly assume that the subject has the right to do {{UNRESERVE}} on a reservation without a principal as well. We should either document this, or issue a request per resource, with or without a principle. Since we're deprecating the {{value}} field in favor of the {{resource}} field, it seems to me that we should issue a request for each resource, no matter it is reserved by a principal or not. For {{DESTROY}}, it seems to me that setting a default empty string is an undocumented behavior, and also having a magic string (which is the empty string) doesn't sound a good idea in an API. > Authorization for DESTROY and UNRESERVE is not symmetrical. > --- > > Key: MESOS-9562 > URL: https://issues.apache.org/jira/browse/MESOS-9562 > Project: Mesos > Issue Type: Improvement > Components: master, scheduler api >Affects Versions: 1.7.1 >Reporter: Alexander Rukletsov >Priority: Major > Labels: integration, mesosphere, tech-debt > > For [the {{UNRESERVE}} > case|https://github.com/apache/mesos/blob/5d3ed364c6d1307d88e6b950ae0eef423c426673/src/master/master.cpp#L3661-L3677], > if the principal was not set, {{.has_principal()}} will be {{false}}, hence > we will not call {{authorizations.push_back()}}, and hence we will not create > an authz request with this resource as an object. For [the {{DESTROY}} > case|https://github.com/apache/mesos/blob/5d3ed364c6d1307d88e6b950ae0eef423c426673/src/master/master.cpp#L3772-L3773], > if the principal was not set, a default value {{""}} for string will be used > and hence we will create an authz request with this resource as an object. > We definitely need to make the behaviour consistent. I'm not sure which > approach is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9544) SLRP does not clean up destroyed persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765696#comment-16765696 ] Chun-Hung Hsiao commented on MESOS-9544: I've paused the development of the {{RecoverPublishedVolumeFailed}} test. I have some WIP patches and plan to integrate them into MESOS-8745, then we can use the {{GET_RESOURCE_PROVIDRES}} call to verify the resource provider recovery failure in the test. > SLRP does not clean up destroyed persistent volumes. > > > Key: MESOS-9544 > URL: https://issues.apache.org/jira/browse/MESOS-9544 > Project: Mesos > Issue Type: Bug > Components: storage >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1 >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Blocker > Labels: mesosphere, mesosphere-dss-beta, storage > > When a persistent volume created on a {{ROOT}} disk is destroyed, the agent > will clean up its data: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/slave/slave.cpp#L4397 > However, this is not the case for PVs on SLRP disks. The agent relies on the > SLRP to do the cleanup: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/slave/slave.cpp#L4472 > But SLRP simply updates its metadata and do nothing: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/resource_provider/storage/provider.cpp#L2805 > This would lead to data leakage if the framework does not call `CREATE_DISK` > but just unreserve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9544) SLRP does not clean up destroyed persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765690#comment-16765690 ] Chun-Hung Hsiao commented on MESOS-9544: Two more patches for testing: https://reviews.apache.org/r/69954/ (CreateDestroyPersistentBlockVolume) https://reviews.apache.org/r/69955/ (DestroyUbpublishedPersistentVolume{,WithRecovery,WithReboot} > SLRP does not clean up destroyed persistent volumes. > > > Key: MESOS-9544 > URL: https://issues.apache.org/jira/browse/MESOS-9544 > Project: Mesos > Issue Type: Bug > Components: storage >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1 >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Blocker > Labels: mesosphere, mesosphere-dss-beta, storage > > When a persistent volume created on a {{ROOT}} disk is destroyed, the agent > will clean up its data: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/slave/slave.cpp#L4397 > However, this is not the case for PVs on SLRP disks. The agent relies on the > SLRP to do the cleanup: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/slave/slave.cpp#L4472 > But SLRP simply updates its metadata and do nothing: > https://github.com/apache/mesos/blob/f44535bca811720fc272c9abad2bc78652d61fe3/src/resource_provider/storage/provider.cpp#L2805 > This would lead to data leakage if the framework does not call `CREATE_DISK` > but just unreserve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9560) ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
[ https://issues.apache.org/jira/browse/MESOS-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765687#comment-16765687 ] Greg Mann commented on MESOS-9560: -- Observed again with a slightly different stack trace (notice {{connected()}} vs. {{disconnected()}}): {code} PC: @ 0x7f1ed2b30fe6 mesos::v1::resource_provider::Driver::send() *** SIGSEGV (@0x0) received by PID 15010 (TID 0x7f1ec6657700) from PID 0; stack trace: *** @ 0x7f1e9b9765f2 (unknown) @ 0x7f1e9b97ac19 (unknown) @ 0x7f1e9b96dd28 (unknown) @ 0x7f1ecf459390 (unknown) @ 0x7f1ed2b30fe6 mesos::v1::resource_provider::Driver::send() @ 0x563f6934d1ef mesos::internal::tests::resource_provider::MockResourceProvider<>::connectedDefault() @ 0x563f6926cbfe testing::internal::FunctionMockerBase<>::UntypedPerformDefaultAction() @ 0x563f6a788e96 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() @ 0x563f692990c4 mesos::internal::tests::resource_provider::MockResourceProvider<>::connected() @ 0x7f1ed2820110 process::AsyncExecutorProcess::execute<>() @ 0x7f1ed282fe58 _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_ @ 0x7f1ed3a636c1 process::ProcessBase::consume() @ 0x7f1ed3a85b2a process::ProcessManager::resume() @ 0x7f1ed3a89866 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv @ 0x7f1ecfc3cc80 (unknown) @ 0x7f1ecf44f6ba start_thread @ 0x7f1ecf18541d (unknown) {code} > ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky > > > Key: MESOS-9560 > URL: https://issues.apache.org/jira/browse/MESOS-9560 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Priority: Critical > Labels: flaky, flaky-test, mesosphere, storage, test > Attachments: consoleText.txt > > > We observed a segfault in > {{ContentType/AgentAPITest.MarkResourceProviderGone/1}} on test teardown. > {noformat} > I0131 23:55:59.378453 6798 slave.cpp:923] Agent terminating > I0131 23:55:59.378813 31143 master.cpp:1269] Agent > a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 > (ip-172-16-10-236.ec2.internal) disconnected > I0131 23:55:59.378831 31143 master.cpp:3272] Disconnecting agent > a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 > (ip-172-16-10-236.ec2.internal) > I0131 23:55:59.378846 31143 master.cpp:3291] Deactivating agent > a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 > (ip-172-16-10-236.ec2.internal) > I0131 23:55:59.378891 31143 hierarchical.cpp:793] Agent > a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 deactivated > F0131 23:55:59.378891 31149 logging.cpp:67] RAW: Pure virtual method called > @ 0x7f633aaaebdd google::LogMessage::Fail() > @ 0x7f633aab6281 google::RawLog__() > @ 0x7f6339821262 __cxa_pure_virtual > @ 0x55671cacc113 > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0x55671b532e78 > mesos::internal::tests::resource_provider::MockResourceProvider<>::disconnected() > @ 0x7f633978f6b0 process::AsyncExecutorProcess::execute<>() > @ 0x7f633979f218 > _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_ > @ 0x7f633a9f5d01 process::ProcessBase::consume() > @ 0x7f633aa1a08a process::ProcessManager::resume() > @ 0x7f633aa1db06 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > @ 0x7f633acc9f80 execute_native_thread_routine > @ 0x7f6337142e25 start_thread > @ 0x7f6336241bad __clone > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9564) Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace
Joseph Wu created MESOS-9564: Summary: Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace Key: MESOS-9564 URL: https://issues.apache.org/jira/browse/MESOS-9564 Project: Mesos Issue Type: Bug Components: agent, modules Reporter: Joseph Wu The non-default {{LogrotateContainerLogger}} module allows tasks to configure sandbox log rotation (See http://mesos.apache.org/documentation/latest/logging/#Containers ). The {{logrotate_stdout_options}} and {{logrotate_stderr_options}} in particular let the task specify free-form text, which is written to a configuration file located in the task's sandbox. The module does not sanitize or check this configuration at all. The logger itself will eventually run {{logrotate}} against the written configuration file, but the logger is not isolated in the same way as the task. For both the Mesos and Docker containerizers, the logger binary will run in the same namespace as the Mesos agent. This makes it possible to affect files outside of the task's mount namespace. Two modes of attack are known to be problematic: * Changing or adding entries to the configuration file. Normally, the configuration file contains a single file to rotate: {code} /path/to/sandbox/stdout { } {code} It is trivial to add text to the {{logrotate_stdout_options}} to add a new entry: {code} /path/to/sandbox/stdout { } /path/to/other/file/on/disk { } {code} * Logrotate's {{postrotate}} option allows for execution of arbitrary commands. This can again be supplied with the {{logrotate_stdout_options}} variable. {code} /path/to/sandbox/stdout { postrotate rm -rf / endscript } {code} Some potential fixes to consider: * Overwrite the .logrotate.conf files each time. This would give only milliseconds between writing and calling logrotate for a thirdparty to modify the config files maliciously. This would not help if the task itself had postrotate options in its environment variables. * Sanitize the free-form options field in the environment variables to remove postrotate or injection attempts like }\n/path/to/some/file\noptions{. * Refactor parts of the Mesos isolation code path so that the logger and IO switchboard binary live in the same namespaces as the container (instead of the agent). This would also be nice in that the logger's CPU usage would then be accounted for within the container's resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.
[ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang reassigned MESOS-9507: - Assignee: Qian Zhang (was: Andrei Budnik) > Agent could not recover due to empty docker volume checkpointed files. > -- > > Key: MESOS-9507 > URL: https://issues.apache.org/jira/browse/MESOS-9507 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Qian Zhang >Priority: Critical > Labels: containerizer > > Agent could not recover due to empty docker volume checkpointed files. Please > see logs: > {noformat} > Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 > slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect > failed: Collect failed: Failed to recover docker volumes for orphan container > e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line > 1 near: > Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f > /var/lib/mesos/slave/meta/slaves/latest > Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover > old live executors. > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered > failed state. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. > {noformat} > This is caused by agent recovery after the volume state file is created but > before checkpointing finishes. Basically the docker volume is not mounted > yet, so the docker volume isolator should skip recovering this volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8241) Add metrics for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765518#comment-16765518 ] Greg Mann commented on MESOS-8241: -- This should include metrics for dropped operations. > Add metrics for offer operation feedback > > > Key: MESOS-8241 > URL: https://issues.apache.org/jira/browse/MESOS-8241 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Priority: Major > Labels: foundations, mesosphere, operation-feedback > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9557) Operations are leaked in Framework struct when agents are removed
[ https://issues.apache.org/jira/browse/MESOS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-9557: Assignee: Joseph Wu > Operations are leaked in Framework struct when agents are removed > - > > Key: MESOS-9557 > URL: https://issues.apache.org/jira/browse/MESOS-9557 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Greg Mann >Assignee: Joseph Wu >Priority: Major > Labels: foundations, mesosphere > > Currently, when agents are removed from the master, their operations are not > removed from the {{Framework}} structs. We should ensure that this occurs in > all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9315) Adding support for implicit allocation of mandatory custom resources in Mesos
[ https://issues.apache.org/jira/browse/MESOS-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-9315: -- Assignee: Clément Michaud > Adding support for implicit allocation of mandatory custom resources in Mesos > - > > Key: MESOS-9315 > URL: https://issues.apache.org/jira/browse/MESOS-9315 > Project: Mesos > Issue Type: Improvement >Reporter: Clément Michaud >Assignee: Clément Michaud >Priority: Minor > Labels: resource-management > Attachments: mesos-community-email.txt > > > I sent a an email (attached) few days ago to propose the introduction of a > new hook to append resources implicitly to tasks for mandatory resources. > This would allow Mesos to support mandatory resources like network bandwidth > or disk IO for instance. > In a nutshell, we propose to add a hook with the following signature > {code:java} > Result masterLaunchTaskResourceDecorator( > const Resources& slaveResources, > TaskInfo& task) > {code} > and call it in the master in the ACCEPT message handler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9554) Allocator might skip allocations because a single framework is incapable of receiving certain resources
[ https://issues.apache.org/jira/browse/MESOS-9554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765237#comment-16765237 ] Benjamin Mahler commented on MESOS-9554: Test: https://reviews.apache.org/r/69942/ > Allocator might skip allocations because a single framework is incapable of > receiving certain resources > --- > > Key: MESOS-9554 > URL: https://issues.apache.org/jira/browse/MESOS-9554 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Benjamin Bannier >Assignee: Benjamin Mahler >Priority: Major > > Currently in the hierarchical allocator allocation loops we compute > {{available}} resources by taking into account the capabilities of the > current framework. Further down in the loop we might then {{break}} out of > the iteration under the assumption that no other framework can receive the > resources in question. > This is only correct if all considered frameworks have identical capabilities. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9563) Improve master's 'AcknowledgeOperationStatus' validation
Greg Mann created MESOS-9563: Summary: Improve master's 'AcknowledgeOperationStatus' validation Key: MESOS-9563 URL: https://issues.apache.org/jira/browse/MESOS-9563 Project: Mesos Issue Type: Improvement Components: master, scheduler api Reporter: Greg Mann Currently, the master quickly returns a 202 ACCEPTED response to schedulers for the ACKNOWLEDGE_OPERATION_STATUS call, with most validation that depends on the master's internal state being performed afterward in the {{acknowledgeOperationStatus()}} handler. The master's HTTP code could instead perform this state-dependent validation before returning a response, improving the UX of the scheduler API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)