Hi Bhanu,

Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> writes:

> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing cores(PMD thread cores) by dispatching heartbeats at regular
> intervals. Incase of heartbeat misses additional health checks are
> enabled on the PMD thread to detect the failure and the same shall be
> reported to higher level fault management systems/frameworks.
>
> The implementation uses OVSDB for reporting the health of the PMD threads.
> Any external monitoring application can read the status from OVSDB at 
> regular intervals (or) subscribe to the updates in OVSDB so that they get
> notified when the changes happen on OVSDB.
>
> keepalive info struct is created and initialized for storing the
> status of the PMD threads. This is initialized by main thread(vswitchd)
> as part of init process and will be periodically updated by 'keepalive'
> thread. keepalive feature can be enabled through below OVSDB settings.
>
>     enable-keepalive=true
>       - Keepalive feature is disabled by default.
>
>     keepalive-interval="5000"
>       - Timer interval in milliseconds for monitoring the packet
>         processing cores.
>
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd cores
> in keepalive info struct. This information shall be read by vswitchd thread
> and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.
>
> An external monitoring framework like collectd with ovs events support
> can read (or) subscribe to the datapath status changes in ovsdb. When the 
> state
> is updated, the collectd shall be notified and will eventually relay the 
> status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
>
>     Compute Node            Controller            Compute Node
>
>     Collectd  <----------> Ceilometer <-------->   Collectd
>
>     OvS DPDK                                       OvS DPDK
>
>     +-----+
>     | VM  |
>     +--+--+
>     \---+---/
>     |
>     +--+---+       +------------+----------+     +------+-------+
>     | OVS  |-----> |   ovsevents plugin    | --> |   collectd   |
>     +--+---+       +------------+----------+     +------+-------+
>
>     +------+-----+     +---------------+------------+     |
>     | Ceilometer | <-- | collectd ceilometer plugin |  <---
>     +------+-----+     +---------------+------------+
>
> github: The patches can be found here:
>   https://github.com/bbodired/ovs (Last master commit e7cd8c363)
>
> Performance impact:
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
>
> -------------------------------------------------

Quick comment before I do an in-depth review.

One thing that is missing in this series is some form of documentation
added to explain why this feature should exist (for instance, why can't
the standard posix process accounting information suffice?) and what the
high-level concepts are (you have the states being used, but I don't see
a definition that will be needed to understand when reading a keep-alive
report).

I think there could be a reason to provide this, but I think it's
important to explain why collectd will need to use the ovsdb interface,
rather than calling ex: times[1] or parsing /proc/<tid>/stat for the
runtime (and watching accumulation).

Without that, it's difficult to evaluate the relative usefulness of
this.  I know I did ask for some of this documentation back in April,
and some was added for your larger series.  However, I think it's
important to add it as you go instead of a large plunk of it at the
end.  It really does help to understand why the feature should exist.

-Aaron
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to