On 7/3/23 18:10, David Marchand wrote:
> On Thu, Jun 29, 2023 at 9:43 PM Ilya Maximets <[email protected]> wrote:
>>
>> On 6/20/23 13:26, David Marchand wrote:
>>> At some point in OVS history, some virtio features were announced as
>>> supported (ECN and UFO virtio features).
>>>
>>> The userspace TSO code, which has been added later, does not support
>>> those features and tries to disable them.
>>>
>>> This breaks OVS upgrades: if an existing VM already negotiated such
>>> features, their lack on reconnection to an upgraded OVS triggers a
>>> vhost socket disconnection by Qemu.
>>> This results in an endless loop because Qemu then retries with the same
>>> set of virtio features.
>>>
>>> This patch proposes to try and detect those vhost socket disconnection
>>> and fallback restoring the old virtio features (and disabling TSO for this
>>> vhost port).
>>
>> I suppose, even if we can't fix live migration case, this patch sill
>> makes some sense, right?
>>
>> We will not be able to enable TSO by default though without some extra
>> work, i.e. adding some knobs...
>
> Sorry, I have been thinking about the live migration case and did not
> reply yet on the other thread (well, I have no solution for it).
>
> Handling a simple case of upgrading OVS could still be handled with
> this current patch.
> So I think it still makes sense to take this patch.
>
>
>>
>> Some comments inline.
>>
>>>
>>> Signed-off-by: David Marchand <[email protected]>
>>> Acked-by: Mike Pattrick <[email protected]>
>>> ---
>>> Note: resending this v4 for CI (I added Mike ack).
>>>
>>> Changelog since v3:
>>> - updated documentation now that the interface offloads status is reported
>>> in ovsdb,
>>> - fixed one coding style issue,
>>>
>>> Changelog since v2:
>>> - reported workaround presence in the ovsdb port status field and
>>> updated documentation accordingly,
>>> - tried to use "better" names, to distinguish ECN virtio feature from
>>> TSO OVS netdev feature,
>>>
>>> Changelog since v1:
>>> - added a note in the documentation,
>>> - fixed vhost unregister trigger (so that both disabling and re-enabling
>>> TSO is handled),
>>> - cleared netdev features when disabling TSO,
>>> - changed level and ratelimited log message on vhost socket disconnect,
>>>
>>> ---
>>> Documentation/topics/userspace-tso.rst | 30 ++++++++-
>>> lib/netdev-dpdk.c | 85 +++++++++++++++++++++++++-
>>> 2 files changed, 111 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/Documentation/topics/userspace-tso.rst
>>> b/Documentation/topics/userspace-tso.rst
>>> index 5a43c2e86b..d4111f1ab2 100644
>>> --- a/Documentation/topics/userspace-tso.rst
>>> +++ b/Documentation/topics/userspace-tso.rst
>>> @@ -68,7 +68,7 @@ as follows.
>>> connection is established, `TSO` is thus advertised to the guest as an
>>> available feature:
>>>
>>> -QEMU Command Line Parameter::
>>> +1. QEMU Command Line Parameter::
>>
>> If you want to fix the numbered list, we should also fix the second point.
>> It is broken in two lines incorrectly (indentation is wrond), so it's not
>> counted as part of the numbered list, while this one will.
>
> Mm, not easy to notice on the generated documentation.
> I'll fix it too.
>
>
>>
>>>
>>> $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
>>> ...
>>> @@ -83,6 +83,34 @@ used to enable same::
>>> $ ethtool -K eth0 tso on
>>> $ ethtool -k eth0
>>>
>>> +**Note:** Enabling this feature impacts the virtio features exposed by the
>>> DPDK
>>> +vHost User backend to a guest. If a guest was already connected to OvS
>>> before
>>> +enabling TSO and restarting OvS, this guest ports won't have TSO
>>> available::
>>> +
>>> + $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^
>>> ]*"'
>>> + tx_ip_csum_offload="true"
>>> + tx_sctp_csum_offload="true"
>>> + tx_tcp_csum_offload="true"
>>> + tx_tcp_seg_offload="false"
>>> + tx_udp_csum_offload="true"
>>> +
>>> +To help diagnose the issue, those ports have some additional information in
>>> +their status field in ovsdb::
>>> +
>>> + $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"'
>>> + disabled_tso="true"
>>> +
>>> +To restore TSO for this guest ports, this guest QEMU process must be
>>> stopped,
>>> +then started again. OvS will then report::
>>> +
>>> + $ ovs-vsctl get interface vhost0 status | grep -o '[^ ]*_offload=[^
>>> ]*"'
>>> + tx_ip_csum_offload="true"
>>> + tx_sctp_csum_offload="true"
>>> + tx_tcp_csum_offload="true"
>>> + tx_tcp_seg_offload="true"
>>> + tx_udp_csum_offload="true"
>>> + $ ovs-vsctl get interface vhost0 status | grep -o 'disabled_tso=[^ ]*"'
>>> +
>>> ~~~~~~~~~~~
>>> Limitations
>>> ~~~~~~~~~~~
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index c6b41e8365..2dbb8b42e3 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -473,6 +473,15 @@ struct netdev_dpdk {
>>> /* True if vHost device is 'up' and has been reconfigured at least
>>> once */
>>> bool vhost_reconfigured;
>>>
>>> + /* Set on driver start (which means after a vHost connection is
>>> + * accepted), and cleared when the vHost device gets configured. */
>>> + bool vhost_initial_config;
>>> +
>>> + /* Set on disconnection if an initial configuration did not finish.
>>> + * This triggers a workaround for Virtio features negotiation, that
>>> + * makes TSO unavailable. */
>>> + bool vhost_workaround_enable_ecn;
>>> +
>>> atomic_uint8_t vhost_tx_retries_max;
>>> /* 2 pad bytes here. */
>>
>> These bytes are consumed now.
>
> Argh, I had made a mental note to fix this.
>
>
>>
>>> );
>>> @@ -1359,6 +1368,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
>>> port_no,
>>> dev->requested_lsc_interrupt_mode = 0;
>>> ovsrcu_index_init(&dev->vid, -1);
>>> dev->vhost_reconfigured = false;
>>> + dev->vhost_initial_config = false;
>>> dev->attached = false;
>>> dev->started = false;
>>> dev->reset_needed = false;
>>> @@ -3883,6 +3893,10 @@ netdev_dpdk_vhost_user_get_status(const struct
>>> netdev *netdev,
>>> xasprintf("%d", vring.size));
>>> }
>>>
>>> + if (dev->vhost_workaround_enable_ecn) {
>>> + smap_add_format(args, "disabled_tso", "true");
>>
>> I'm not sure if it's a good name, but I'm not sure how the good one
>> should look like either. Maybe "userspace-tso": "disabled" ?
>> We can put any string as a value.
>
> Either we go with your proposal, or we could add a mention in the
> documentation to check if the virtio features ECN bit is present.
> Something like:
> # if [ $(($(ovs-vsctl get interface vhost0 status:features | sed
> 's/"//g') & (1 << 13))) != 0 ]; then
> echo KO;
> else
> echo ok;
> fi
>
> This is not user friendly, but I don't think we really want users to
> start relying on a specific string for this ugly situation.
> WDYT?
I don't think we need to expose too much here too.
But I guess it would be really hard to debug if something goes sideways
or user doesn't understand all the details about the workaround.
So, "userspace-tso: disabled" might be better than manual feature parsing.
>
>
>>
>>> + }
>>> +
>>> ovs_mutex_unlock(&dev->mutex);
>>> return 0;
>>> }
>>> @@ -4214,6 +4228,12 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
>>> free(enabled_queues);
>>> }
>>>
>>> +static bool
>>> +netdev_dpdk_vhost_tso_enabled(struct netdev_dpdk *dev)
>>> +{
>>> + return userspace_tso_enabled() && !dev->vhost_workaround_enable_ecn;
>>> +}
>>> +
>>> /*
>>> * A new virtio-net device is added to a vhost port.
>>> */
>>> @@ -4256,6 +4276,7 @@ new_device(int vid)
>>> } else {
>>> /* Reconfiguration not required. */
>>> dev->vhost_reconfigured = true;
>>> + dev->vhost_initial_config = false;
>>
>> So, we're in the new_device() call. That means the guest successfully
>> negotiated features. Shouldn't we drop vhost_initial_config here
>> unconditionally? Why we need to wait for reconfiguration? Or does
>> QEMU disconnect after the new_device() ?
>
> Indeed, QEMU does not disconnect because of features once we reach the
> new_device stage.
> So it seems ok.
>
>
>>
>>> }
>>>
>>> if (rte_vhost_get_negotiated_features(vid, &features)) {
>>> @@ -4268,7 +4289,7 @@ new_device(int vid)
>>> dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD;
>>> }
>>>
>>> - if (userspace_tso_enabled()) {
>>> + if (netdev_dpdk_vhost_tso_enabled(dev)) {
>>> if (features & (1ULL << VIRTIO_NET_F_GUEST_TSO4)
>>> && features & (1ULL << VIRTIO_NET_F_GUEST_TSO6)) {
>>>
>>> @@ -4501,6 +4522,21 @@ vring_state_changed(int vid, uint16_t queue_id, int
>>> enable)
>>> return 0;
>>> }
>>>
>>> +/* Requires that the 'path' socket had been registered before. */
>>> +static bool
>>> +dpdk_vhost_has_ecn_feature(const char *path)
>>> +{
>>> + /* In the past we had this feature enabled, report it by default. */
>>> + bool has_ecn_feature = true;
>>> + uint64_t driver_features;
>>> +
>>> + if (rte_vhost_driver_get_features(path, &driver_features) == 0) {
>>> + has_ecn_feature = driver_features & (1ULL <<
>>> VIRTIO_NET_F_HOST_ECN);
>>> + }
>>> +
>>> + return has_ecn_feature;
>>> +}
>>> +
>>> static void
>>> destroy_connection(int vid)
>>> {
>>> @@ -4515,6 +4551,7 @@ destroy_connection(int vid)
>>> ovs_mutex_lock(&dev->mutex);
>>> if (nullable_string_is_equal(ifname, dev->vhost_id)) {
>>> uint32_t qp_num = NR_QUEUE;
>>> + bool failed_without_ecn;
>>>
>>> if (netdev_dpdk_get_vid(dev) >= 0) {
>>> VLOG_ERR("Connection on socket '%s' destroyed while vhost "
>>> @@ -4528,6 +4565,26 @@ destroy_connection(int vid)
>>> dev->requested_n_txq = qp_num;
>>> netdev_request_reconfigure(&dev->up);
>>> }
>>> +
>>> + if (dev->vhost_initial_config) {
>>> + VLOG_WARN_RL(&rl, "Connection on socket '%s' closed during
>>> "
>>> + "initialization.", dev->vhost_id);
>>> + }
>>> +
>>> + failed_without_ecn = dev->vhost_initial_config
>>> + && !dpdk_vhost_has_ecn_feature(dev->vhost_id);
>>> + if (failed_without_ecn != dev->vhost_workaround_enable_ecn) {
>>> + /* Either an early disconnection was detected (and we can
>>> try
>>> + * to re-enable ECN for a next connection) or the
>>> disconnection
>>> + * does not seem incorrect (and we can try to disable ECN
>>> for a
>>> + * next connection).
>>> + *
>>> + * In both cases, this vhost socket must be reconfigured
>>> + * (see netdev_dpdk_vhost_client_reconfigure()). */
>>> + dev->vhost_workaround_enable_ecn = failed_without_ecn;
>>> + netdev_request_reconfigure(&dev->up);
>>> + }
>>
>> If we reset vhost_initial_config unconditionally in new_device(),
>> vhost_initial_config == true here will mean that the guest, likely, failed
>> feature negotiation. We can't be sure for 100%, but we can't be sure with
>> the current code either. So, we could enable workaround in this case.
>> If vhost_initial_config == false, we already had a successful connection
>> sometime in the past, so we should not try the workaround.
>>
>> Does that make sense?
>
> It seems ok but I need to retest.
>
>
>>
>>> +
>>> ovs_mutex_unlock(&dev->mutex);
>>> exists = true;
>>> break;
>>> @@ -5431,6 +5488,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>>>
>>> if (dev->vhost_reconfigured == false) {
>>> dev->vhost_reconfigured = true;
>>> + dev->vhost_initial_config = false;
>>
>> This will not be needed if we clear in the new_device.
>
> Ok.
>
>
>>
>>> /* Carrier status may need updating. */
>>> netdev_change_seq_changed(&dev->up);
>>> }
>>> @@ -5458,10 +5516,30 @@ static int
>>> netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
>>> {
>>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> + bool unregister = false;
>>> + bool enable_tso;
>>> + char *vhost_id;
>>> int err;
>>>
>>> ovs_mutex_lock(&dev->mutex);
>>>
>>> + enable_tso = netdev_dpdk_vhost_tso_enabled(dev);
>>> + if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT && dev->vhost_id
>>> + && enable_tso != !dpdk_vhost_has_ecn_feature(dev->vhost_id)) {
>>
>> In context of previous comments, the check should be;
>>
>> && dev->vhost_initial_config && dev->vhost_workaround_enable_ecn
>>
>> ?
>
> For a "registered" vhost-user port, we have two transitions to handle:
> enabling the workaround (for a "faulty" guest after a first try
> withouth ECN) and disabling it (after a "faulty" guest, which had
> successfully configured, is stopped and OVS prepares for the next
> connection).
> Checking dev->vhost_initial_config here would prevent from handling
> the second case.
Can we drop it to 'true' if it is 'false' on connection close?
We probably don't need two variables...
Maybe a 4 state enum instead?
OVS_VHOST_F_CLEAN
OVS_VHOST_F_WORKAROUND
OVS_VHOST_F_CLEAN_CONFIRMED
OVS_VHOST_F_WORKAROUND_CONFIRMED
Start with:
OVS_VHOST_F_CLEAN
event: destroy_connection()
state:
OVS_VHOST_F_CLEAN -> OVS_VHOST_F_WORKAROUND
OVS_VHOST_F_WORKAROUND -> OVS_VHOST_F_CLEAN (unrelated disconnection?, try
again)
OVS_VHOST_F_CLEAN_CONFIRMED -> OVS_VHOST_F_CLEAN_CONFIRMED (don't
downgrade clean)
OVS_VHOST_F_WORKAROUND_CONFIRMED -> OVS_VHOST_F_CLEAN (anticipate clean)
event: new_device()
state:
OVS_VHOST_F_CLEAN -> OVS_VHOST_F_CLEAN_CONFIRMED
OVS_VHOST_F_WORKAROUND -> OVS_VHOST_F_WORKAROUND_CONFIRMED
OVS_VHOST_F_CLEAN_CONFIRMED -> OVS_VHOST_F_CLEAN_CONFIRMED (guest driver
restart)
OVS_VHOST_F_WORKAROUND_CONFIRMED -> OVS_VHOST_F_WORKAROUND_CONFIRMED (guest
driver restart)
event: netdev_dpdk_vhost_client_reconfigure():
state:
OVS_VHOST_F_CLEAN -> re-register if userspace_tso_enabled()
OVS_VHOST_F_WORKAROUND -> re-register if userspace_tso_enabled()
OVS_VHOST_F_CLEAN_CONFIRMED -> don't re-register
OVS_VHOST_F_WORKAROUND_CONFIRMED -> don't re-register
Technically, two booleans provide 4 states, and above seems equal to having
2 booleans 'workarund' and 'confirmed', but it might be easier to follow
the logic with enums. What do you think?
Exact names can be different, for sure.
Also thinking about this state machine, we will re-register vhost driver
each time netdev_dpdk_vhost_client_reconfigure() called for unrelated
reason until the first successful connection. Is it the same with the
current patch (I guess so)? Not sure if that's a big problem thouhg.
>
>
>>
>> We might as well rename it to 'vhost_has_ever_connected' or something
>> and inverse the value.
>>
>>> +
>>> + dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
>>> + vhost_id = dev->vhost_id;
>>> + unregister = true;
>>> + }
>>> +
>>> + ovs_mutex_unlock(&dev->mutex);
>>> +
>>> + if (unregister) {
>>> + dpdk_vhost_driver_unregister(dev, vhost_id);
>>> + }
>>> +
>>> + ovs_mutex_lock(&dev->mutex);
>>> +
>>> /* Configure vHost client mode if requested and if the following
>>> criteria
>>> * are met:
>>> * 1. Device hasn't been registered yet.
>>> @@ -5491,7 +5569,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
>>> *netdev)
>>> }
>>>
>>> /* Enable External Buffers if TCP Segmentation Offload is enabled.
>>> */
>>> - if (userspace_tso_enabled()) {
>>> + if (enable_tso) {
>>> vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT;
>>> }
>>>
>>> @@ -5516,7 +5594,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
>>> *netdev)
>>> goto unlock;
>>> }
>>>
>>> - if (userspace_tso_enabled()) {
>>> + if (enable_tso) {
>>> virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN
>>> | 1ULL << VIRTIO_NET_F_HOST_UFO;
>>> VLOG_DBG("%s: TSO enabled on vhost port",
>>> @@ -5539,6 +5617,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
>>> *netdev)
>>> goto unlock;
>>> }
>>>
>>> + dev->vhost_initial_config = true;
>>> err = rte_vhost_driver_start(dev->vhost_id);
>>> if (err) {
>>> VLOG_ERR("rte_vhost_driver_start failed for vhost user "
>>
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev