Re: [ovs-dev] [RFC PATCH 00/21] Add OVS DPDK keep-alive functionality

2017-06-21 Thread Aaron Conole
"Bodireddy, Bhanuprakash"  writes:

> Hi Aaron,
>
>>>
>>>I've been playing with this a little bit;  is it too late to consider 
>>>tracking
>>'threads'
>>>instead of 'cores'?  I'm not sure what it means for a particular core
>>>ID to be 'healthy' - but I know what 'pmd24' not responding means.
>>
>>That's an interesting input. It's not late and all suggestions are
>> most welcome.
>>I will try doing this in the next series.
>
> I reworked and sent out V3 patch series here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334229.html
> In this series.
>   -  Posix shared memory is removed.
>   - Logic has been changed to track threads as suggested in this
> thread. I have used hash maps for this.
>   
>>
>>>
>>>Additionally, I'd suggest keeping words like 'healthy', and 'unhealthy'
>>>out of it.  I'd basically just have this keepalive report things on the
>>>thread you
>>>*know* - last time it poked your status register (and you can also
>>>track things like cpu utilization, etc, if you'd like).  Then let your
>>>higher level thing that reads ceilometer make those "healthy"
>>>determinations.  After all, sometimes 0% utilization is "healthy," and
>>>sometimes it isn't.
>>
>>This makes sense. Infact It was the case in the beginning where only the core
>>status was reported.
>> Only recently I added this Datapath status row with the overall
>> status. I shall
>>remove this and leave it to external monitoring apps to parse the data and
>>decide it.
>
> I have also removed this logic and now only the thread status is
> shown. It's now the job of monitoring framework to read the thread
> status and determine the health of the compute.

Awesome to hear.  I'm currently traveling with family (but it has been
raining so I figured I'd do a quick check on INBOX), but will review
next week.  I like the direction it is going.

> Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH 00/21] Add OVS DPDK keep-alive functionality

2017-06-19 Thread Bodireddy, Bhanuprakash
Hi Aaron,

>>
>>I've been playing with this a little bit;  is it too late to consider tracking
>'threads'
>>instead of 'cores'?  I'm not sure what it means for a particular core
>>ID to be 'healthy' - but I know what 'pmd24' not responding means.
>
>That's an interesting input. It's not late and all suggestions are most 
>welcome.
>I will try doing this in the next series.

I reworked and sent out V3 patch series here: 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334229.html
In this series.
  -  Posix shared memory is removed.
  -  Logic has been changed to track threads as suggested in this thread. I 
have used hash maps for this.
  
>
>>
>>Additionally, I'd suggest keeping words like 'healthy', and 'unhealthy'
>>out of it.  I'd basically just have this keepalive report things on the
>>thread you
>>*know* - last time it poked your status register (and you can also
>>track things like cpu utilization, etc, if you'd like).  Then let your
>>higher level thing that reads ceilometer make those "healthy"
>>determinations.  After all, sometimes 0% utilization is "healthy," and
>>sometimes it isn't.
>
>This makes sense. Infact It was the case in the beginning where only the core
>status was reported.
> Only recently I added this Datapath status row with the overall status. I 
> shall
>remove this and leave it to external monitoring apps to parse the data and
>decide it.

I have also removed this logic and now only the thread status is shown. It's 
now the job of monitoring framework to read the thread status and determine the 
health of the compute.

Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH 00/21] Add OVS DPDK keep-alive functionality

2017-06-14 Thread Bodireddy, Bhanuprakash
Hi Aaron,
>Hi Bhanu,
>
>I've been playing with this a little bit;  is it too late to consider tracking 
>'threads'
>instead of 'cores'?  I'm not sure what it means for a particular core ID to be
>'healthy' - but I know what 'pmd24' not responding means.

That's an interesting input. It's not late and all suggestions are most 
welcome. 
I will try doing this in the next series. 

>
>Additionally, I'd suggest keeping words like 'healthy', and 'unhealthy'
>out of it.  I'd basically just have this keepalive report things on the thread 
>you
>*know* - last time it poked your status register (and you can also track things
>like cpu utilization, etc, if you'd like).  Then let your higher level thing 
>that
>reads ceilometer make those "healthy"
>determinations.  After all, sometimes 0% utilization is "healthy," and
>sometimes it isn't.

This makes sense. Infact It was the case in the beginning where only the core 
status was reported.
 Only recently I added this Datapath status row with the overall status. I 
shall remove this and leave it to external monitoring apps to parse the data 
and decide it.

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH 00/21] Add OVS DPDK keep-alive functionality

2017-06-13 Thread Aaron Conole
Bhanuprakash Bodireddy  writes:

> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing cores(PMD thread cores) by dispatching heartbeats at regular
> intervals. Incase of heartbeat misses additional health checks are
> enabled on the PMD thread to detect the failure and the same shall be
> reported to higher level fault management systems/frameworks.
>
> The implementation uses OVSDB for reporting the datapath status and the
> health of the PMD threads. Any external monitoring application can read
> the status from OVSDB at regular intervals (or) subscribe to the updates
> in OVSDB so that they get notified when the changes happen on OVSDB.
>
> POSIX shared memory object is created and initialized for storing the
> status of the PMD threads. This is initialized by main thread(vswitchd)
> as part of init process and will be periodically updated by 'keepalive'
> thread. keepalive feature can be enabled through below OVSDB settings.
>
> enable-keepalive=true
>   - Keepalive feature is disabled by default.
>
> keepalive-interval="5000"
>   - Timer interval in milliseconds for monitoring the packet
> processing cores.
>
> keepalive-shm-name="/ovs_keepalive_shm_name"
>   - Shared memory block name where the events shall be updated.
>
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd cores
> in shared memory region. This information shall be read by vswitchd thread
> and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.
>
> An external monitoring framework like collectd with ovs events support
> can read (or) subscribe to the datapath status changes in ovsdb. When the 
> state
> is updated, the collectd shall be notified and will eventually relay the 
> status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
>
> Compute NodeControllerCompute Node
>
> Collectd  <--> Ceilometer <>   Collectd
>
> OvS DPDK   OvS DPDK
>
> +-+
> | VM  |
> +--+--+
> \---+---/
> |
> +--+---+   ++--+ +--+---+
> | OVS  |-> |   ovsevents plugin| --> |   collectd   |
> +--+---+   ++--+ +--+---+
>
> +--+-+ +---++ |
> | Ceilometer | <-- | collectd ceilometer plugin |  <---
> +--+-+ +---++
>
> Performance impact
> --
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
>
> Bhanuprakash Bodireddy (21):
>
> [10] Patches help update OVSDB with keepalive status
>
>   vswitch.xml: Add keepalive support.
>   ovsschema: Introduce 'keepalive' column in Open_vSwitch.
>   dpdk: Add helper functions for DPDK datapath keepalive.
>   process: Retrieve process status.
>   Keepalive: Add initial keepalive support.
>   bridge: Invoke keepalive framework.
>   keepalive: Add more helper functions to KA framework.
>   dpif-netdev: Register packet processing cores to KA framework.
>   dpif-netdev: Dispatch heartbeats for DPDK datapath.
>   keepalive: Retrieve PMD status periodically.
>   bridge: Update keepalive status in ovsdb
>
>   keepalive: Add support to query keepalive statistics.
>   keepalive: Add support to query keepalive status.
>   dpif-netdev: Add helper function to check false positives.
>
> [5] Following patches add additional health checks in case of heartbeat
> failure. The following can still be improved and WIP.
>
>   dpif-netdev: Add additional datapath health checks.
>   keepalive: Check the link status as part of PMD health checks.
>   keepalive: Check the packet statisitcs as part of PMD health checks.
>   keepalive: Check the PMD cycle stats as part of PMD health checks.
>   netdev-dpdk: Enable PMD health checks on heartbeat failure.
>
>   keepalive: Display extended Keepalive status.
>   Documentation: Update DPDK doc with Keepalive feature.
>

Hi Bhanu,

I've been playing with this a little bit;  is it too late to consider
tracking 'threads' instead of 'cores'?  I'm not sure what it means for a
particular core ID to be 'healthy' - but I know what 'pmd24' not
responding means.

Additionally, I'd suggest keeping words like 'healthy', and 'unhealthy'
out of it.  I'd basically just have this keepalive report things on the
thread you *know* - last time it poked your status register (and you can
also track things like cpu utilization, etc, if you'd like).  Then let
your higher level thing that reads ceilometer make those "healthy"
determinations.  After all, sometimes 0% utilization is "healthy," and
sometimes it isn't.

Jus