>> This patch is aimed at achieving Fastpath Service Assurance in
>> OVS-DPDK deployments. This commit adds support for monitoring the
>> packet processing cores(pmd thread cores) by dispatching heartbeats at
>> regular intervals. Incase of heartbeat miss the failure shall be
>> detected & reported to higher level fault management
>systems/frameworks.
>>
>> The implementation uses POSIX shared memory object for storing the
>> events that will be read by monitoring framework. keep-alive feature
>> can be enabled through below OVSDB settings.
>
Hi Aaron,

Thanks for the comments and also adding Ben here. It would be nice to know his 
point of view of this design.

>I've been thinking about this design, and I'm concerned - shared memory is
>inflexible, and allows multiple actors to mess with the information.
>Is there a reason to use shared memory?  I am not sure of what advantage
>this form of reporting has vs. simply using a message passing interface.

This boils down to shared memory vs message passing program model and which of 
them is an elegant approach in sharing the state between 2 processes.  While I 
completely agree that sockets are good to share the state and pass on the 
information to any interested subscriber(within our outside system*) they also 
have some shortcomings. 

For example, one of the design goals of this feature is to support sub-second 
detection timeout of PMD thread stalls/locks in carrier grade NFV deployments. 
As you know when we look at speeds shared memory is clearly the winner vs 
sockets. But the speed gain of SHM will disappear when locks get deployed. In 
the KA design POSIX shared memory is used as it can handle the needed 
granularities and also semaphores are intentionally avoided. Most importantly 
there are only two actors(ovs_Keepalive, collectd) here. 'ovs_keepalive' thread 
will update the shared memory periodically with the core states and their last 
seen timestamps whereas the collectd events thread will read and check if there 
in any change in core state.

  With
>messages there is clear abstraction, and the projects / processes are truly
>separated.  Shared memory is leads to a situation of inextricably coupling two
>(or more) processes.

I completely agree to this. 

>
>As an example, if the constant changes, or a new statistic is desired to be
>tracked, the consumer which wants to use this data needs to be recompiled,
>and needs to have the *exact* correct version.  If the pad bits from the
>compiler change, if anything from the ovs side causes alignment to be shifted,
>if OvS wants to redefine the struct, if OvS uses any data from there as the
>rhs... the list of scenarios where this interface can fail goes on - and the
>failures are quite catastrophic.

While your concerns are genuine, the structure is well defined and has bare 
minimal information that is absolutely necessary. core states and last seen 
timestamps of the respective cores are absolutely needed to know the health of 
the cores handling the datapath in OvS-DPDK. 

We don't foresee any need in near future to extend this structure and want to 
keep it simple so that OvS-DPDK and collectd will work without worrying about 
version compatibilities. On the alignment front, I see that the structure is 
2048 byes and no padding is added here but can use attribute packed to be sure.

>
>I think maybe a design doc of this interface would be good to read through, as
>it will explain why this design was chosen.  It also might allow for better
>feedback, putting a more generic solution (for example, any new threads that
>OvS spawns we might want to monitor, as well - and that would be good to
>report).  Do you agree?

This design allows only the datapath cores to be monitored and not independent 
threads at this point. OvS do have '--monitor' option which would monitor and 
do the application restarts automatically In case of a crash.

This design is more aimed at monitoring the health of the datapath cores(PMD 
threads) that could impact the overall the switching performance of the 
compute.  I am open to take all the feedback and answer the questions In this 
thread.

Adding Maryam to the thread who is handling the collectd effort. 

Appreciate all your feedback and comments!

Regards,
Bhanuprakash. 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to