On 27/10/2020 14:03, Ilya Maximets wrote:
> On 10/20/20 1:15 PM, Kevin Traynor wrote:
>> On 16/09/2020 16:17, Gaetan Rivet wrote:
>>> In some cloud topologies, using DPDK VF representors in guest requires
>>> configuring a VF before it is assigned to the guest.
>>>
>>> A first basic option for such configuration is setting the VF MAC
>>> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
>>> table.
>>>
>>> This option can be used as such:
>>>
>>> $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
>>> options:dpdk-vf-mac=00:11:22:33:44:55
>>>
>>
>> If I got it right, it doesn't seem like it solves any issues by limiting
>> to representor, but is just an attempt to limit some of the edge cases
>> around bifurcated devices to the devices where the requested
>> functionality is really needed now.
>>
>> This limitation has some costs...
>>
>> We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
>> requested_hwaddr for the user to understand. I think it can be confusing.
>
> To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
> exist at least in the output of 'get_config()' function. Same applies
> to all existing '{configured,requested}_something' reports.
> 'get_config()' intended to report things that could be copied and passed
> directly
> to database. OTOH, 'get_status()' should report what is an actual device
> status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
> should be in 'get_status()', but it should be named differently, e.g. just
> 'dpdk-vf-mac' and the callback it placed in already says that it's actually
> 'configured'. In short:
> - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
> - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.
If an invalid mac is attempted or it fails to be set, this is what the
user will see for dpdk-vf-mac:
ovs-appctl dpctl/show -> dpdk-vf-mac=00:11:22:33:44:55
ovs-vsctl show -> dpdk-vf-mac=33:11:22:33:44:55
ovs-vsctl get Interface myport status -> dpdk-vf-mac=00:11:22:33:44:55
For completeness, i'll add the failed attempt is logged clearly.
I suppose we can tell the user to check the 'mac_in_use', but multiple
dpdk-vf-mac entries for the same device with different values is confusing.
requested_dpdk-vf-mac configured_dpdk-vf-mac from 'ovs-appctl
dpctl/show' would be explicit. At a minimum we should document which one
is requested and which one is configured.
> Of course, all this should be reported only if device is a representor.
>
>
>>
>> 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
>> document, but it's not clear if it was successful or not. (yes there is
>> a log entry, but it is a separate place).
>>
>> As it is specific for representors, if we ever want to allow setting of
>> mac on any non representor DPDK ports, we have an awkward user
>> interface. We would need to make another 'dpdk-mac' type option, or we
>> decide then to allow use mac in interface table but that it doesn't work
>> for representors, or there are two ways to set for representors etc.
>> None of this seems great.
>>
>> My feeling is that it is making things complicated for the user with
>> this many knobs, where we could just have mac and mac_in_use, live with
>> the edge cases (as Gaetan pointed out, we already do for MTU) and we
>> know as well if we ever need to extend further, the user interface will
>> stay simple. Just my 2C, maybe there's a good argument why we can't do
>> it like this.
>
> Setting mac for physical device has lots of split-brain issues and unclear
> ways how and when undo changes as we do not know what was original mac or
> what to do if there was no mac at all at the start. Moreover, some devices
> will not allow setting mac to it's original value (e.g. all zeroes).
> One more point is that vf and representor itself are different devices and
> these devices could have different mac addresses. For now we agreed that
> DPDK api works for representors in a way that it always changes parameters
> of a virtual function, not representors', but that is not a very clear thing
> for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac'
> specifically
> points to that difference.
>
That's a good point, it does distinguish and iirc some people made
the point that this could be important.
> After a long discussion we decided to make this option as specific as
> possible,
> so users will not try to use it to configure anything else beside mac of a
> virtual function. All restrictions and downsides of configuration of vf mac
> was/should be documented.
>
ok, I can see the pros of this type of design too so I won't object, but
agree the downsides like the additional fields and ability for user to
understand clearly the output from commands above should be documented.
>
>>
>> Few comments on the code as it is now below.
>>
>>
>>> Signed-off-by: Gaetan Rivet <[email protected]>
>>> Suggested-by: Ilya Maximets <[email protected]>
>>> Acked-by: Eli Britstein <[email protected]>
>>> ---
>>> Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
>>> NEWS | 2 ++
>>> lib/netdev-dpdk.c | 60 +++++++++++++++++++++++++++++++
>>> vswitchd/vswitch.xml | 13 +++++++
>>> 4 files changed, 97 insertions(+)
>>>
>>> diff --git a/Documentation/topics/dpdk/phy.rst
>>> b/Documentation/topics/dpdk/phy.rst
>>> index 38e52c8de..73dcb7c28 100644
>>> --- a/Documentation/topics/dpdk/phy.rst
>>> +++ b/Documentation/topics/dpdk/phy.rst
>>> @@ -379,6 +379,28 @@ an eth device whose mac address is
>>> ``00:11:22:33:44:55``::
>>> $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
>>> options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
>>>
>>> +Representor specific configuration
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +In some topologies, a VF must be configured before being assigned to a
>>> +guest (VM) machine. This configuration is done through VF-specific fields
>>> +in the ``options`` column of the Interface table.
>>> +
>>> +.. important::
>>> +
>>> + Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
>>> + which means that a kernel netdevice remains when Open vSwitch is
>>> stopped.
>>> +
>>> + In such case, any configuration applied to a VF would remain set on the
>>> + kernel netdevice, and be inherited from it when Open vSwitch is
>>> restarted,
>>> + even if the options described in this section are unset from Open
>>> vSwitch.
>>> +
>>> +.. _bifurcated-drivers:
>>> http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
>>> +
>>> +- Configure the VF MAC address::
>>> +
>>> + $ ovs-vsctl set Interface dpdk-rep0
>>> options:dpdk-vf-mac=00:11:22:33:44:55
>>> +
>>
>> As mentioned above, if for some reason setting the mac fails there will
>> be warning in the logs, but 'ovs-vsctl show' will still show the
>> dpdk-vf-mac that was not applied and it's quite prominent. Also, there
>> are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
>>
>> Suggest to add a small summary here for each one as it relates
>> representors and make clear to users,
>> 1. how to find the mac the user requested
>> 2. how to find the current mac
>>
>>> Jumbo Frames
>>> ------------
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 2f67d5047..5ece29474 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
>>
>> 2.15....hopefully :-)
>>
>>> to list and change log levels in DPDK components.
>>> * Vhost-user Dequeue zero-copy support is deprecated and will be
>>> removed
>>> in the next release.
>>> + * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>>> + that allows configuring the MAC address of a VF representor.
>>> - Linux datapath:
>>> * Support for kernel versions up to 5.5.x.
>>> - AF_XDP:
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 4ef7f78a6..060348369 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -523,6 +523,9 @@ struct netdev_dpdk {
>>> * otherwise interrupt mode is used. */
>>> bool requested_lsc_interrupt_mode;
>>> bool lsc_interrupt_mode;
>>> +
>>> + /* VF configuration. */
>>> + struct eth_addr requested_hwaddr;
>>> );
>>>
>>> PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> @@ -1685,6 +1688,16 @@ out:
>>> return ret;
>>> }
>>>
>>> +static bool
>>> +dpdk_port_is_representor(struct netdev_dpdk *dev)
>>> + OVS_REQUIRES(dev->mutex)
>>> +{
>>> + struct rte_eth_dev_info dev_info;
>>> +
>>> + rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> + return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
>>> +}
>>> +
>>> static int
>>> netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>> {
>>> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev,
>>> struct smap *args)
>>> }
>>> smap_add(args, "lsc_interrupt_mode",
>>> dev->lsc_interrupt_mode ? "true" : "false");
>>> +
>>> + if (dpdk_port_is_representor(dev)) {
>>> + smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
>>> + ETH_ADDR_ARGS(dev->requested_hwaddr));
>>> + smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
>>> + ETH_ADDR_ARGS(dev->hwaddr));
>>
>> '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
>> better to keep consistent
>
> Since this is 'config' it should report things in a way they could be passed
> to the datbase. So, it should be 'dpdk-vf-mac' and it should report the
> value that was requested, i.e. dev->requested_hwaddr.
>
>>
>>> + }
>>> }
>>> ovs_mutex_unlock(&dev->mutex);
>>>
>>> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const
>>> struct smap *args,
>>> {RTE_FC_RX_PAUSE, RTE_FC_FULL }
>>> };
>>> const char *new_devargs;
>>> + const char *vf_mac;
>>> int err = 0;
>>>
>>> ovs_mutex_lock(&dpdk_mutex);
>>> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const
>>> struct smap *args,
>>> goto out;
>>> }
>>>
>>> + vf_mac = smap_get(args, "dpdk-vf-mac");
>>> + if (vf_mac) {
>>> + struct eth_addr mac;
>>> +
>>> + err = EINVAL;
>>> +
>>> + if (!dpdk_port_is_representor(dev)) {
>>> + VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
>>> + "but 'options:dpdk-vf-mac' is only supported for "
>>> + "VF representors.",
>>> + netdev_get_name(netdev), vf_mac);
>>
>> Warnings here seems more in line with the other configs that cannot be
>> applied.
>
> I agree, if it's not a representor we could just issue a warning and continue.
>
>>
>>> + goto out;
>>> + }
>>> + if (!eth_addr_from_string(vf_mac, &mac)) {
>>> + VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
>>> + netdev_get_name(netdev), vf_mac);
>>
>> as above
>>
>>> + goto out;
>>> + }
>>> + if (eth_addr_is_multicast(mac)) {
>>> + VLOG_ERR_BUF(errp,
>>> + "interface '%s': cannot set VF MAC to multicast "
>>> + "address", netdev_get_name(netdev));
>>
>> as above
>>
>>> + goto out;
>>> + }
>>> + if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
>>> + dev->requested_hwaddr = mac;
>>> + netdev_request_reconfigure(netdev);
>>> + }
>>> + }
>>> +
>>> lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>> if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>>> dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>>> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>> && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>> && dev->rxq_size == dev->requested_rxq_size
>>> && dev->txq_size == dev->requested_txq_size
>>> + && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>> && dev->socket_id == dev->requested_socket_id
>>> && dev->started && !dev->reset_needed) {
>>> /* Reconfiguration is unnecessary */
>>> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>> dev->txq_size = dev->requested_txq_size;
>>>
>>> rte_free(dev->tx_q);
>>> +
>>> + if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
>>
>> Looks like if dpdk-vf-mac is not set, after initialization there will be
>> a configured hwaddr, and requested_hwaddr will be all zeros.
>>
>> If this function is called because of some other config change, then
>> hwaddr != requested_hwaddr, and it will attempt to set the mac to the
>> requested_hwaddr of all zeros. This may also fail causing the
>> reconfigure to fail for other configs items.
>>
>> (I admit, I hacked a phy port dev_info to say it was a representor, when
>> playing with this but the issue seems valid)
>>
>>
>>> + err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
>>> + if (err) {
>>> + goto out;
>>> + }
>>> + }
>>> +
>>> err = dpdk_eth_dev_init(dev);
>>> if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>>> netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 81c84927f..c0bf03c95 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>>> type=patch options:peer=p1 \
>>> descriptors will be used by default.
>>> </p>
>>> </column>
>>> +
>>> + <column name="options" key="dpdk-vf-mac">
>>> + <p>
>>> + Ethernet address to set for this VF interface. If unset then the
>>> + default MAC address is used:
>>> + </p>
>>> + <ul>
>>> + <li>For most drivers, the default MAC address assigned by
>>> + their hardware.</li><li>For bifurcated drivers, the MAC currently
>>> + used by the kernel netdevice.</li>
>>
>> nit: the formatting is a little different compared to the other
>> <li></li> usage, newline/indent etc.
>>
>> thanks,
>> Kevin.
>>
>>> + </ul>
>>> + <p>This option may only be used with dpdk VF representors.</p>
>>> + </column>
>>> </group>
>>>
>>> <group title="EMC (Exact Match Cache) Configuration">
>>>
>>
>>
>>
>>
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev