Hi Bhanu, This time around I just reviewed the documentation. I'll spend some time on the other patches as I can, but I think at least the documentation needs more work.
Comments inline. Bhanuprakash Bodireddy <[email protected]> writes: > Keepalive feature is aimed at achieving Fastpath Service Assurance > in OVS-DPDK deployments. It adds support for monitoring the packet > processing threads by dispatching heartbeats at regular intervals. This belongs in the document. > The implementation uses OvSDB for reporting the health of the PMD threads. s/implementation/keepalive framework/ > Any external monitoring application can query the OvSDB for status > at regular intervals (or) subscribe to OvSDB updates. ^ for > > keepalive feature can be enabled through below OVSDB settings. ^ The > > enable-keepalive=true > - Keepalive feature is disabled by default and should be enabled > at startup before ovs-vswitchd daemon is started. > > keepalive-interval="5000" > - Timer interval in milliseconds for monitoring the packet > processing cores. > > When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes > up at regular intervals to update the timestamp and status of pmd > threads in process map. This information shall be read by vswitchd thread s/shall/will/ > and written in to 'keepalive' column of Open_vSwitch table in OVSDB. > > An external monitoring framework like collectd with ovs events support s/like collectd/(such as collectd)/ > can read (or) subscribe to the datapath status changes in ovsdb. When the > state s/(or)/or/ Everything after 'When the state is updated' is specific to collectd/ceilometer. Should it go into a separate document? > is updated, the collectd shall be notified and will eventually relay the > status > to ceilometer service running in the controller. Below is the high level > overview of deployment model. > > Compute Node Controller Compute Node > > Collectd <----------> Ceilometer <--------> Collectd > > OvS DPDK OvS DPDK > > +-----+ > | VM | > +--+--+ > \---+---/ > | > +--+---+ +------------+----------+ +------+-------+ > | OVS |-----> | ovsevents plugin | --> | collectd | > +--+---+ +------------+----------+ +------+-------+ > > +------+-----+ +---------------+------------+ | > | Ceilometer | <-- | collectd ceilometer plugin | <--- > +------+-----+ +---------------+------------+ > > Performance impact: > No noticeable performance or latency impact is observed with > KA feature enabled. > > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> > --- A good chunk of this commit message should be in the documentation. Either as a duplicate, or just moved there. While commit messages are meant to describe the 'why' of a change, this commit merely adds documentation for the keepalive feature. The actual description of the feature should live on in the documentation, so that when a user is trying to figure out what it is, they can read it. I hope this feature isn't exclusively for ceilometer / collectd. > Documentation/howto/dpdk.rst | 112 > +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 112 insertions(+) Is the dpdk documentation the correct place for this? Will this framework ever be applied to more than just PMDs? Isn't that a goal of monitoring the health of various tasks? > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > index d123819..e7a2b27 100644 > --- a/Documentation/howto/dpdk.rst > +++ b/Documentation/howto/dpdk.rst > @@ -439,6 +439,118 @@ For certain traffic profiles with many parallel flows, > it's recommended to set > > For more information on the EMC refer to :doc:`/intro/install/dpdk` . > > +.. _dpdk_keepalive: > + > +Keepalive > +--------- > + This needs a bit more description. I read this section, and I don't know what the Keepalive(KA) feature is - is it merely a reporting framework for the last time threads 'checked in'? Does it take any actions? What is a user meant to do with this data? Are there cases where this framework gives false positives / negatives? Downsides to enabling it? Alternatives? Some of that is explained in the commit message - but it really belongs here. > +OvS Keepalive(KA) feature is disabled by default. To enable KA feature:: > + > + $ ovs-vsctl --no-wait set Open_vSwitch . > other_config:enable-keepalive=true > + > +The KA feature can't be enabled at run time and should be done at startup > +before ovs-vswitchd daemon is started. > + > +The default timer interval for monitoring packet processing threads is > 1000ms. > +To set a different timer value, run:: > + > + $ ovs-vsctl --no-wait set Open_vSwitch . \ > + other_config:keepalive-interval="5000" > + Can this be changed at run-time? Such information should be included. > +The events comprise of thread states and the last seen timestamps. The events > +are written in to process map periodically by keepalive thread. > + > +The events in the process map are retrieved by main(vswitchd) thread and > +updated in to keepalive column of Open_vSwitch table in OVSDB. Any external > +monitoring application can read the status from OVSDB at intervals or > subscribe > +to the updates so that they get notified when the changes happen on OvSDB. > + > +To monitor the datapath status using ovsdb-client, run:: Will this monitor revalidator threads? RCU related threads? Is it only ever datapath status? Maybe should read 'To monitor the keepalive status'. > + $ ovsdb-client monitor Open_vSwitch > + $ ovsdb-client monitor Open_vSwitch Open_vSwitch keepalive > + > +The datapath thread states are explained below:: s/datapath/keepalive/ > + > + KA_STATE_UNUSED - Not registered to KA framework. Maybe "KA_STATE_NONE"? Or better yet - why should we have an enum for this? After all... the thread isn't tracked? > + KA_STATE_ALIVE - Thread alive. -- > + KA_STATE_MISSING - Thread missed first heartbeat. > + KA_STATE_DEAD - Thread missed two heartbeats. > + KA_STATE_GONE - Thread missed two or more heartbeats and burried. -- Is it useful to distinguish between 'missed one', 'missed two', and 'missed two or more?' Further, what does 'burried' mean? I hope we don't have to dig a thread out from the under a vast field of permafrost. Maybe it's better to have a state for MISSED, and show the missed counter. > + KA_STATE_SLEEP - Thread is sleeping. > + > +To query the datapath status, run:: > + > + $ ovs-appctl keepalive/pmd-health-show > + The following section seems to be more about installation and configuration of collectd. Perhaps that should be split into a different document? > +`collectd <https://collectd.org/>`__ has built-in support for DPDK and > provides > +a `ovs_events` and `ovs_stats` plugin that can be enabled to relay the > datapath > +status and the PMD status to OpenStack service `Ceilometer > +<https://docs.openstack.org/developer/ceilometer/>`__. > + > +To install and configure `collectd`, run:: > + > + # Clone collectd from Git repository > + $ git clone https://github.com/collectd/collectd.git > + > + # configure and install collectd > + $ cd collectd > + $ ./build.sh > + $ ./configure --enable-syslog --enable-logfile --with-libdpdk=/usr > + $ make > + $ make install > + > +`collectd` is installed in ``/opt/collectd`` by default. Edit the > configuration > +file in ``/opt/collectd/etc/collectd.conf`` to enable logfile, dpdkevents > +and csv plugin:: > + > + LoadPlugin logfile > + <Plugin logfile> > + LogLevel debug > + File "/var/log/collectd/collectd.log" > + Timestamp true > + PrintSeverity false > + </Plugin> > + > + <Plugin syslog> > + LogLevel info > + </Plugin> > + > +Enable `ovs_events` plugin and update the plugindetails as below:: > + > + LoadPlugin ovs_events > + > + <Plugin ovs_events> > + Port "6640" > + Address "127.0.0.1" > + Socket "/usr/local/var/run/openvswitch/db.sock" > + SendNotification true > + DispatchValues false > + </Plugin> > + > +Enable `ovs_stats` plugin and update the plugindetails as below:: > + > + LoadPlugin ovs_stats > + > + <Plugin ovs_stats> > + Port "6640" > + Address "127.0.0.1" > + Socket "/usr/local/var/run/openvswitch/db.sock" > + Bridges "br0" > + </Plugin> > + > +Enable ``csv`` plugin as below:: > + > + LoadPlugin csv > + > + <Plugin csv> > + DataDir "/var/log/collectd/csv" > + StoreRates false > + </Plugin> > + > +With csv plugin enabled, *meter* or *gauge* file is created and timestamp > +and thread status gets updated which are sent to ceilometer service. > + .. _dpdk-ovs-in-guest: OVS with DPDK Inside VMs -- 2.4.11 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
