Added documentation for resource provider and CSI plugin metrics. Review: https://reviews.apache.org/r/67303
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/db075fc6 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/db075fc6 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/db075fc6 Branch: refs/heads/master Commit: db075fc67aceb8f75bbc204aae042a30b65c57e3 Parents: 318aca9 Author: Chun-Hung Hsiao <chhs...@mesosphere.io> Authored: Thu May 24 18:01:26 2018 -0700 Committer: Chun-Hung Hsiao <chhs...@mesosphere.io> Committed: Thu May 31 18:29:56 2018 -0700 ---------------------------------------------------------------------- docs/home.md | 2 +- docs/monitoring.md | 184 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 185 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/db075fc6/docs/home.md ---------------------------------------------------------------------- diff --git a/docs/home.md b/docs/home.md index 5471c70..adefc4d 100644 --- a/docs/home.md +++ b/docs/home.md @@ -23,7 +23,7 @@ layout: documentation * [Maintenance](maintenance.md) for performing maintenance on a Mesos cluster. * [Upgrades](upgrades.md) for upgrading a Mesos cluster. * [Logging](logging.md) -* [Monitoring](monitoring.md) +* [Monitoring / Metrics](monitoring.md) * [Operational Guide](operational-guide.md) * [Fetcher Cache Configuration](fetcher.md) * [Fault Domains](fault-domains.md) http://git-wip-us.apache.org/repos/asf/mesos/blob/db075fc6/docs/monitoring.md ---------------------------------------------------------------------- diff --git a/docs/monitoring.md b/docs/monitoring.md index d9dc793..2985f68 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -1764,3 +1764,187 @@ the master it is registered with. <td>Counter</td> </tr> </table> + +#### Resource Providers + +The following metrics provide information about ongoing and completed +[operations](operations.md) that apply to resources provided by a +[resource provider](resource-provider.md) with the given _type_ and _name_. In +the following metrics, the _operation_ placeholder refers to the name of a +particular operation type, which is described in the list of +[supported operation types](#supported-operation-types). + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/operations/<i><operation></i>/pending</code> + </td> + <td>Number of ongoing <i>operation</i>s</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/operations/<i><operation></i>/finished</code> + </td> + <td>Number of finished <i>operation</i>s</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/operations/<i><operation></i>/failed</code> + </td> + <td>Number of failed <i>operation</i>s</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/operations/<i><operation></i>/dropped</code> + </td> + <td>Number of dropped <i>operation</i>s</td> + <td>Counter</td> +</tr> +</table> + +##### Supported Operation Types + +Since the supported operation types may vary among different resource providers, +the following is a comprehensive list of operation types and the corresponding +resource providers that support them. Note that the name column is for the +_operation_ placeholder in the above metrics. + +<table class="table table-striped"> +<thead> +<tr><th>Type</th><th>Name</th><th>Supported Resource Provider Types</th> +</thead> +<tr> + <td><code><a href="reservation.md">RESERVE</a></code></td> + <td><code>reserve</code></td> + <td>All</td> +</tr> +<tr> + <td><code><a href="reservation.md">UNRESERVE</a></code></td> + <td><code>unreserve</code></td> + <td>All</td> +</tr> +<tr> + <td><code><a href="persistent-volume.md#-offer-operation-create-">CREATE</a></code></td> + <td><code>create</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +<tr> + <td><code><a href="persistent-volume.md#-offer-operation-destroy-">DESTROY</a></code></td> + <td><code>destroy</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +<tr> + <td><code><a href="csi.md#-create_volume-operation">CREATE_VOLUME</a></code></td> + <td><code>create_volume</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +<tr> + <td><code><a href="csi.md#-destroy_volume-operation">DESTROY_VOLUME</a></code></td> + <td><code>destroy_volume</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +<tr> + <td><code><a href="csi.md#-create_block-operation">CREATE_BLOCK</a></code></td> + <td><code>create_block</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +<tr> + <td><code><a href="csi.md#-destroy_block-operation">DESTROY_BLOCK</a></code></td> + <td><code>destroy_block</code></td> + <td><code>org.apache.mesos.rp.local.storage</code></td> +</tr> +</table> + +For example, cluster operators can monitor the number of successful +`CREATE_VOLUME` operations that are applied to the resource provider with type +`org.apache.mesos.rp.local.storage` and name `lvm` through the +`resource_providers/org.apache.mesos.rp.local.storage.lvm/operations/create_volume/finished` +metric. + +#### CSI Plugins + +Storage resource providers in Mesos are backed by +[CSI plugins](csi.md#standalone-containers-for-csi-plugins) running in +[standalone containers](standalone-container.md). To monitor the health of these +CSI plugins for a storage resource provider with _type_ and _name_, the +following metrics provide information about plugin terminations and ongoing and +completed CSI calls made to the plugin. In the following metrics, the _rpc_ +placeholder refers to the name of a particular CSI call, which is described in +the list of [supported CSI calls](#supported-csi-calls). + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/csi_plugin/container_terminations</code> + </td> + <td>Number of terminated CSI plugin containers</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/csi_plugin/rpcs/<i><rpc></i>/pending</code> + </td> + <td>Number of ongoing <i>rpc</i> calls</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/csi_plugin/rpcs/<i><rpc></i>/successes</code> + </td> + <td>Number of successful <i>rpc</i> calls</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/csi_plugin/rpcs/<i><rpc></i>/errors</code> + </td> + <td>Number of erroneous <i>rpc</i> calls</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>resource_providers/<i><type></i>.<i><name></i>/csi_plugin/rpcs/<i><rpc></i>/cancelled</code> + </td> + <td>Number of cancelled <i>rpc</i> calls</td> + <td>Counter</td> +</tr> +</table> + +##### Supported CSI Calls + +The following is a comprehensive list of CSI calls that are used in storage +resource providers. These names are used to replace the _rpc_ placeholder in the +above metrics. + +* [`csi.v0.Identity.GetPluginInfo`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getplugininfo) +* [`csi.v0.Identity.GetPluginCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getplugincapabilities) +* [`csi.v0.Identity.Probe`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#probe) +* [`csi.v0.Controller.CreateVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#createvolume) +* [`csi.v0.Controller.DeleteVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#deletevolume) +* [`csi.v0.Controller.ControllerPublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllerpublishvolume) +* [`csi.v0.Controller.ControllerUnpublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllerunpublishvolume) +* [`csi.v0.Controller.ValidateVolumeCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#validatevolumecapabilities) +* [`csi.v0.Controller.ListVolumes`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#listvolumes) +* [`csi.v0.Controller.GetCapacity`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#getcapacity) +* [`csi.v0.Controller.ControllerGetCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#controllergetcapabilities) +* [`csi.v0.Node.NodeStageVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#node-service-rpc) +* [`csi.v0.Node.NodeUnstageVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodeunstagevolume) +* [`csi.v0.Node.NodePublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodepublishvolume) +* [`csi.v0.Node.NodeUnpublishVolume`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodeunpublishvolume) +* [`csi.v0.Node.NodeGetId`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodegetid) +* [`csi.v0.Node.NodeGetCapabilities`](https://github.com/container-storage-interface/spec/blob/v0.2.0/spec.md#nodegetcapabilities) + +For example, cluster operators can monitor the number of successful +`csi.v0.Controller.CreateVolume` calls that are made by the resource provider +with type `org.apache.mesos.rp.local.storage` and name `lvm` through the +`resource_providers/org.apache.mesos.rp.local.storage.lvm/csi_plugin/rpcs/csi.v0.Controller.CreateVolume/successes` +metric.