"Mooney, Sean K" <sean.k.moo...@intel.com> writes:
>> -----Original Message-----
>> From: Aaron Conole [mailto:acon...@redhat.com]
>> Sent: Monday, April 9, 2018 4:32 PM
>> To: Mooney, Sean K <sean.k.moo...@intel.com>
>> Cc: d...@openvswitch.org; Stokes, Ian <ian.sto...@intel.com>; Kevin
>> Traynor <ktray...@redhat.com>; Ilya Maximets <i.maxim...@samsung.com>;
>> Loftus, Ciara <ciara.lof...@intel.com>; Terry Wilson
>> <twil...@redhat.com>; Assaf Muller <as...@redhat.com>
>> Subject: Re: [RFC 0/2] dpdk: minor refactor of the initialization step
>> "Mooney, Sean K" <sean.k.moo...@intel.com> writes:
>> > So just from a deployment tools point of view I would like to point
>> > out that This change could break existing workflow that deploy ovs in
>> a docker container.
>> > Kolla ansible assumes that if the docker ovs_vswitchd container is
>> > still running that the is infact running in dpdk mode when we set
>> > dpdk-init=true.
>> Is there a way to test this out and see the behavior?
> [Mooney, Sean K] well you could use kolla to deploy ovs-dpdk :)
> Am when I wrote the code I relied on the existing behavior.
> when kolla ansible is deploying openstack we first deploy the ovsdb.
> Then we start the ovs-vswitchd container
> finally we configure the bridges and physical interfaces.
> the "- name: Ensuring ovsdpdk bridges are properly setup named" task
> does not use --no-wait when creating bridges and adding interfaces so
> will fail if the vswitchd is not running. This will result in ansible
> stopping to run any futher task on that node and reporting the error
> to the user. If for some reason the ensure bridge task passed the next
> task that check an ip is assigned to the ovs bride would fail.
Okay - thanks for the pointers, I'll check it out.
>> It does seem strange that for a possible configuration error we abort()
> [Mooney, Sean K] why I would expect this to be standard behavior for any
> e.g. the damon would validate it config is correct and exit if invalid.
> If we don't abort the vswitch is in an undefined state. Is is still
> using hugepages
> For example if the eal init fails after they are allocated.
I see. I think as of 18.11 (which won't come for 6 months), it will
always 'fail' in a known state. But I'll confirm.
One of the reasons I want this field is because there will be ways of
'uninitializing'. I think it's nice to have the ability for the user to
dynamically enable *and* disable dpdk. Also, just from a 'selfish'
view, I recently was fixing a bug where dpdk initialization would fail
early with a particular hardware, and it wasn't nice to watch the
respawns (racy gdb attach, and coredumps all over the drive).
>> running the vswitchd (and with --monitor set, it will continue to
>> abort() over and over - so I guess you're also not using the monitor
>> thread?). In the case that an abort does happen, does the Kolla script
>> distinguish between issues where dpdk setup failed vs. some other
>> software issue?
>> > Can I request that if you make this change you add something along
>> > lines of dpdk-init-is-fatal=true/false so that we can explicitly say
>> which behavior we want.
>> > I would not be surprised if people have built monitoring around "is
>> > the ovs-vswitchd running"
>> I think they have, but I don't know that they use it to infer such low-
>> level details (meaning a crash implies that dpdk configuration is
> [Mooney, Sean K] they don't use is to infer that dpdk configuring is wronge
> But rather that some configuration was wrong. Dpdk-init is currently
> Fatal if it fails so it was treated the same as any other error that would
> caused the vsiwtchd process to exit. I belive in the opnfv community they used
> the liveleyness of the vswitchd process and in the future dpdk keepalive
> functionality to set the datapalne status filed in neutron for the host.
> this allows openstack to be aware of dataplane outages.
One thing I'll think about more is the dpdk-init. Maybe, since it is
stored as a string in the database, we can add a few other values there.
It's like 'true' - only it will continue if there's a failure. Maybe
that works? Just spit-balling. Thanks for providing insight into the
way these behaviors are used.
>> > To infer at least at a highlevel that "everything is fine" where as
>> > the log message/db field proposed Here will invalidate that.
>> I've added Assaf Mueller from our Open Stack team as well - maybe he
>> has some additional details on those mechanisms outside of Kolla (maybe
>> it exists in some kind of director / other software too, as you point
>> > it would be ease to check that field but its work that needs to be
>> > done in multiple places.
>> I think such a knob wouldn't be useful. I believe it would either have
>> to be defaulted to 'dpdk-init-is-fatal=true' to abort on failure (which
>> most users would want to change making it an undesirable default)
> [Mooney, Sean K] I would argue against that I would never deploy with
> dpdk-init-is-fatal=false. If your datapane does not start what is the point
> of running ovs at all? It will not be able to forward packets.
> , or
>> the Kolla ansible scripts (and other detection mechanisms for dpdk
>> failure - if they exist) would need to change. Maybe there's another
>> approach, though?
>> >> -----Original Message-----
>> >> From: Aaron Conole [mailto:acon...@redhat.com]
>> >> Sent: Thursday, April 5, 2018 10:23 PM
>> >> To: d...@openvswitch.org
>> >> Cc: Stokes, Ian <ian.sto...@intel.com>; Kevin Traynor
>> >> <ktray...@redhat.com>; Ilya Maximets <i.maxim...@samsung.com>;
>> >> Loftus, Ciara <ciara.lof...@intel.com>; Mooney, Sean K
>> >> <sean.k.moo...@intel.com>; Terry Wilson <twil...@redhat.com>
>> >> Subject: [RFC 0/2] dpdk: minor refactor of the initialization step
>> >> Sometimes, DPDK initialization can fail, but ovs-vswitchd will abort
>> >> in that case. When that occurs, ovs-vswitchd will be restarted by
>> >> the monitor and immediately abort. This is rather unfriendly to
>> >> users, who would prefer to possibly correct the issue or at least,
>> >> not have lots of processes continually spawning.
>> >> This series accepts that rte_eal_init() can and does fail for real.
>> >> It reflects the initialization status in the database, as well as
>> >> adding the DPDK version (where appropriate).
>> >> Submitted as RFC to spawn discussion around the type to reflect for
>> >> the initialized information. Presented here as a boolean - however,
>> >> it might be more interesting to be a 'string' and have more
>> >> details (ex: 'failed - ovs_strerror(rte_errno)' or 'uninitialized'
>> >> 'initialized').
>> >> Aaron Conole (2):
>> >> dpdk: allow init to fail
>> >> dpdk: reflect status and version in the database
>> >> lib/dpdk-stub.c | 10 ++++++++++
>> >> lib/dpdk.c | 31 +++++++++++++++++++++++++------
>> >> lib/dpdk.h | 3 ++-
>> >> vswitchd/bridge.c | 5 +++++
>> >> vswitchd/vswitch.ovsschema | 11 ++++++++---
>> >> vswitchd/vswitch.xml | 11 +++++++++++
>> >> 6 files changed, 61 insertions(+), 10 deletions(-)
>> >> --
>> >> 2.14.3
> dev mailing list
dev mailing list