There two ways to go with the design.

1) Make it generic, so that it is not so PMD specific, as it is now. 
2) If it stays PMD specific, make it stronger; right now, the health check is 
limited – it
detects that a PMD thread is proceeding or not.  
For something like DPDK, I don’t think that will be enough in the long run.
This can result in some false negatives, as well.
              Maybe, we want to know that the ports and queues are getting 
processed, PMD/port/queue mappings as expected,
time spent processing packets per PMD, port state changes, packet stats, queue 
depths, etc
This information could be correlated by the final receiver of the data.

I also agree that socket communication is preferred over shm, although I don’t 
think any shm usage
will necessarily lead to a meltdown.



On 4/28/17, 5:12 AM, "ovs-dev-boun...@openvswitch.org on behalf of Bodireddy, 
Bhanuprakash" <ovs-dev-boun...@openvswitch.org on behalf of 
bhanuprakash.bodire...@intel.com> wrote:

    >> This patch is aimed at achieving Fastpath Service Assurance in
    >> OVS-DPDK deployments. This commit adds support for monitoring the
    >> packet processing cores(pmd thread cores) by dispatching heartbeats at
    >> regular intervals. Incase of heartbeat miss the failure shall be
    >> detected & reported to higher level fault management
    >systems/frameworks.
    >>
    >> The implementation uses POSIX shared memory object for storing the
    >> events that will be read by monitoring framework. keep-alive feature
    >> can be enabled through below OVSDB settings.
    >
    Hi Aaron,
    
    Thanks for the comments and also adding Ben here. It would be nice to know 
his point of view of this design.
    
    >I've been thinking about this design, and I'm concerned - shared memory is
    >inflexible, and allows multiple actors to mess with the information.
    >Is there a reason to use shared memory?  I am not sure of what advantage
    >this form of reporting has vs. simply using a message passing interface.
    
    This boils down to shared memory vs message passing program model and which 
of them is an elegant approach in sharing the state between 2 processes.  While 
I completely agree that sockets are good to share the state and pass on the 
information to any interested subscriber(within our outside system*) they also 
have some shortcomings. 
    
    For example, one of the design goals of this feature is to support 
sub-second detection timeout of PMD thread stalls/locks in carrier grade NFV 
deployments. As you know when we look at speeds shared memory is clearly the 
winner vs sockets. But the speed gain of SHM will disappear when locks get 
deployed. In the KA design POSIX shared memory is used as it can handle the 
needed granularities and also semaphores are intentionally avoided. Most 
importantly there are only two actors(ovs_Keepalive, collectd) here. 
'ovs_keepalive' thread will update the shared memory periodically with the core 
states and their last seen timestamps whereas the collectd events thread will 
read and check if there in any change in core state.
    
      With
    >messages there is clear abstraction, and the projects / processes are truly
    >separated.  Shared memory is leads to a situation of inextricably coupling 
two
    >(or more) processes.
    
    I completely agree to this. 
    
    >
    >As an example, if the constant changes, or a new statistic is desired to be
    >tracked, the consumer which wants to use this data needs to be recompiled,
    >and needs to have the *exact* correct version.  If the pad bits from the
    >compiler change, if anything from the ovs side causes alignment to be 
shifted,
    >if OvS wants to redefine the struct, if OvS uses any data from there as the
    >rhs... the list of scenarios where this interface can fail goes on - and 
the
    >failures are quite catastrophic.
    
    While your concerns are genuine, the structure is well defined and has bare 
minimal information that is absolutely necessary. core states and last seen 
timestamps of the respective cores are absolutely needed to know the health of 
the cores handling the datapath in OvS-DPDK. 
    
    We don't foresee any need in near future to extend this structure and want 
to keep it simple so that OvS-DPDK and collectd will work without worrying 
about version compatibilities. On the alignment front, I see that the structure 
is 2048 byes and no padding is added here but can use attribute packed to be 
sure.
    
    >
    >I think maybe a design doc of this interface would be good to read 
through, as
    >it will explain why this design was chosen.  It also might allow for better
    >feedback, putting a more generic solution (for example, any new threads 
that
    >OvS spawns we might want to monitor, as well - and that would be good to
    >report).  Do you agree?
    
    This design allows only the datapath cores to be monitored and not 
independent threads at this point. OvS do have '--monitor' option which would 
monitor and do the application restarts automatically In case of a crash.
    
    This design is more aimed at monitoring the health of the datapath 
cores(PMD threads) that could impact the overall the switching performance of 
the compute.  I am open to take all the feedback and answer the questions In 
this thread.
    
    Adding Maryam to the thread who is handling the collectd effort. 
    
    Appreciate all your feedback and comments!
    
    Regards,
    Bhanuprakash. 
    _______________________________________________
    dev mailing list
    d...@openvswitch.org
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=s9SXMphUzcrNzwrXtAjaDpQ02PE5IN7WdKclM9i7zxE&s=j6P4IG8Dxwpl38Twufx_zy5AkwBZc2qRyP7lo23tVqk&e=
 
    

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to