"Mooney, Sean K" <sean.k.moo...@intel.com> writes: >> -----Original Message----- >> From: Aaron Conole [mailto:acon...@redhat.com] >> Sent: Monday, April 9, 2018 4:32 PM >> To: Mooney, Sean K <sean.k.moo...@intel.com> >> Cc: d...@openvswitch.org; Stokes, Ian <ian.sto...@intel.com>; Kevin >> Traynor <ktray...@redhat.com>; Ilya Maximets <i.maxim...@samsung.com>; >> Loftus, Ciara <ciara.lof...@intel.com>; Terry Wilson >> <twil...@redhat.com>; Assaf Muller <as...@redhat.com> >> Subject: Re: [RFC 0/2] dpdk: minor refactor of the initialization step >> >> "Mooney, Sean K" <sean.k.moo...@intel.com> writes: >> >> > So just from a deployment tools point of view I would like to point >> > out that This change could break existing workflow that deploy ovs in >> a docker container. >> > Kolla ansible assumes that if the docker ovs_vswitchd container is >> > still running that the is infact running in dpdk mode when we set >> > dpdk-init=true. >> >> Is there a way to test this out and see the behavior? > [Mooney, Sean K] well you could use kolla to deploy ovs-dpdk :) > Am when I wrote the code I relied on the existing behavior. > when kolla ansible is deploying openstack we first deploy the ovsdb. > https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L2-L37 > Then we start the ovs-vswitchd container > https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L55-L73 > finally we configure the bridges and physical interfaces. > https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L75-L90 > > the "- name: Ensuring ovsdpdk bridges are properly setup named" task > does not use --no-wait when creating bridges and adding interfaces so > it > will fail if the vswitchd is not running. This will result in ansible > stopping to run any futher task on that node and reporting the error > to the user. If for some reason the ensure bridge task passed the next > task that check an ip is assigned to the ovs bride would fail.
Okay - thanks for the pointers, I'll check it out. >> >> It does seem strange that for a possible configuration error we abort() > [Mooney, Sean K] why I would expect this to be standard behavior for any > deamon. > e.g. the damon would validate it config is correct and exit if invalid. > If we don't abort the vswitch is in an undefined state. Is is still > using hugepages > For example if the eal init fails after they are allocated. I see. I think as of 18.11 (which won't come for 6 months), it will always 'fail' in a known state. But I'll confirm. One of the reasons I want this field is because there will be ways of 'uninitializing'. I think it's nice to have the ability for the user to dynamically enable *and* disable dpdk. Also, just from a 'selfish' view, I recently was fixing a bug where dpdk initialization would fail early with a particular hardware, and it wasn't nice to watch the respawns (racy gdb attach, and coredumps all over the drive). >> running the vswitchd (and with --monitor set, it will continue to >> abort() over and over - so I guess you're also not using the monitor >> thread?). In the case that an abort does happen, does the Kolla script >> distinguish between issues where dpdk setup failed vs. some other >> software issue? >> >> > Can I request that if you make this change you add something along >> the >> > lines of dpdk-init-is-fatal=true/false so that we can explicitly say >> which behavior we want. >> > I would not be surprised if people have built monitoring around "is >> > the ovs-vswitchd running" >> >> I think they have, but I don't know that they use it to infer such low- >> level details (meaning a crash implies that dpdk configuration is >> wrong). > [Mooney, Sean K] they don't use is to infer that dpdk configuring is wronge > But rather that some configuration was wrong. Dpdk-init is currently > considered > Fatal if it fails so it was treated the same as any other error that would > have > caused the vsiwtchd process to exit. I belive in the opnfv community they used > the liveleyness of the vswitchd process and in the future dpdk keepalive > functionality to set the datapalne status filed in neutron for the host. > this allows openstack to be aware of dataplane outages. One thing I'll think about more is the dpdk-init. Maybe, since it is stored as a string in the database, we can add a few other values there. For instance, dpdk-init=try It's like 'true' - only it will continue if there's a failure. Maybe that works? Just spit-balling. Thanks for providing insight into the way these behaviors are used. >> >> > To infer at least at a highlevel that "everything is fine" where as >> > the log message/db field proposed Here will invalidate that. >> >> I've added Assaf Mueller from our Open Stack team as well - maybe he >> has some additional details on those mechanisms outside of Kolla (maybe >> it exists in some kind of director / other software too, as you point >> out). >> >> > it would be ease to check that field but its work that needs to be >> > done in multiple places. >> >> I think such a knob wouldn't be useful. I believe it would either have >> to be defaulted to 'dpdk-init-is-fatal=true' to abort on failure (which >> most users would want to change making it an undesirable default) > [Mooney, Sean K] I would argue against that I would never deploy with > dpdk-init-is-fatal=false. If your datapane does not start what is the point > of running ovs at all? It will not be able to forward packets. > , or >> the Kolla ansible scripts (and other detection mechanisms for dpdk >> failure - if they exist) would need to change. Maybe there's another >> approach, though? >> >> >> -----Original Message----- >> >> From: Aaron Conole [mailto:acon...@redhat.com] >> >> Sent: Thursday, April 5, 2018 10:23 PM >> >> To: d...@openvswitch.org >> >> Cc: Stokes, Ian <ian.sto...@intel.com>; Kevin Traynor >> >> <ktray...@redhat.com>; Ilya Maximets <i.maxim...@samsung.com>; >> >> Loftus, Ciara <ciara.lof...@intel.com>; Mooney, Sean K >> >> <sean.k.moo...@intel.com>; Terry Wilson <twil...@redhat.com> >> >> Subject: [RFC 0/2] dpdk: minor refactor of the initialization step >> >> >> >> Sometimes, DPDK initialization can fail, but ovs-vswitchd will abort >> >> in that case. When that occurs, ovs-vswitchd will be restarted by >> >> the monitor and immediately abort. This is rather unfriendly to >> >> users, who would prefer to possibly correct the issue or at least, >> >> not have lots of processes continually spawning. >> >> >> >> This series accepts that rte_eal_init() can and does fail for real. >> >> It reflects the initialization status in the database, as well as >> >> adding the DPDK version (where appropriate). >> >> >> >> Submitted as RFC to spawn discussion around the type to reflect for >> >> the initialized information. Presented here as a boolean - however, >> >> it might be more interesting to be a 'string' and have more >> elaborate >> >> details (ex: 'failed - ovs_strerror(rte_errno)' or 'uninitialized' >> or >> >> 'initialized'). >> >> >> >> Aaron Conole (2): >> >> dpdk: allow init to fail >> >> dpdk: reflect status and version in the database >> >> >> >> lib/dpdk-stub.c | 10 ++++++++++ >> >> lib/dpdk.c | 31 +++++++++++++++++++++++++------ >> >> lib/dpdk.h | 3 ++- >> >> vswitchd/bridge.c | 5 +++++ >> >> vswitchd/vswitch.ovsschema | 11 ++++++++--- >> >> vswitchd/vswitch.xml | 11 +++++++++++ >> >> 6 files changed, 61 insertions(+), 10 deletions(-) >> >> >> >> -- >> >> 2.14.3 > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev