On 1/22/26 1:43 PM, Matteo Perin via dev wrote:
> Add population of a new peer_ifindex column in the interface
> status of (Linux) veth devices. This allows applications like
> OVN to programmatically discover veth pairs without requiring
> additional system calls or manual configuration.
> 
> The implementation uses the existing ethtool infrastructure:
> - Uses ETHTOOL_GDRVINFO to get n_stats from the veth driver
> - Dynamically allocates buffer based on n_stats
> - Queries ETHTOOL_GSTATS to retrieve veth statistics
> - Extracts peer_ifindex (first statistic for veth devices)
> - Stores it in the status column as "peer_ifindex"
> 
> The value is cached to avoid performance degradation with an
> increase on the number of ports.
> 
> This feature can enable automatic veth peer discovery for routing
> daemon integrations in OVN, eliminating the need for manual
> port-mapping configuration when using veth pairs for BGP/BFD
> routing protocols.
> 
> Signed-off-by: Matteo Perin <[email protected]>

Hi, Matteo.  Thanks for the patch!  See some comments below.

> ---
> This is the first part of a rework of a recent contribution I made in an 
> effort
> to simplify the OVN routing-protocol-redirect option setup on an LRP when 
> using veth pairs to
> connect routing daemons to OVN. This requires supporting the lookup of the 
> veth peers bound to
> the LSP referred to.
> 
> Since this added capability requires a call to ethtool gstats, it is better 
> suited to be called
> directly from the OVS code (using the infrastructure already in-place) 
> instead of shoehorning it
> inside the OVN code (as in the original patch). The veth peer value can be 
> cached inside the
> netdev status information (for veth devices) and then retrieved by OVN.
> 
> As pointed out on the original contribution discussion, some issues about 
> stale veth values may
> arise, though. Even if the peer ifindex does not change if the veth is moved 
> to a different
> namespace, it can happen that a veth is removed and recreated later with the 
> same name, causing
> the OVS get_status to return a stale ifindex value.
> 
> Unfortunately, I have not been able to find a satisfactory way of handling 
> such situation (at least
> not is userspace). While in kernel space we can rely on 
> RTM_NEWLINK/RTM_DELLINK to invalidate
> VALID_DRVINFO and retrigger fetching the peer ifindex, in userspace the only 
> way to achieve the same 
> seems to actively check if the ifindex has changed. Of course this is not 
> really possible since making
> a query to ethtool or get_ifindex on every call to get_status will slow down 
> the kernel when a huge
> number of ports is present.

We're actually already subscribed to RTM_NEWLINK/RTM_DELLINK events.
See netdev_linux_run()->netdev_linux_update() that parses the updates
received from the kernel.  VALID_DRVINFO already gets invalidated there
when the device is deleted.

> I do not really know how to tackle this situation in an elegant way, maybe a 
> good compromise would
> be documenting this limitation (the need to remove and recreate veths seems 
> not a very common 
> occurrence to me, after all)?
> 
> This will allow us to avoid moving too much code around for only a mere 
> quality-of-life improvement,
> but I will really appreciate if someone has any insights or suggestions on 
> how to tackle this issue.
> 
> Best Regards,
> Matteo
> 
>  lib/netdev-linux-private.h |  1 +
>  lib/netdev-linux.c         | 34 ++++++++++++++++++++++++++++++++++
>  tests/system-interface.at  | 34 ++++++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml       | 19 +++++++++++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index cf4a021d3..c86a68510 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -97,6 +97,7 @@ struct netdev_linux {
>      uint8_t current_duplex;          /* Cached from ETHTOOL_GSET. */
>  
>      struct ethtool_drvinfo drvinfo;  /* Cached from ETHTOOL_GDRVINFO. */
> +    uint64_t peer_ifindex;           /* Cached from ETHTOOL_GSTATS (if 
> veth). */

This line is a bit too long.

>      struct tc *tc;
>  
>      /* For devices of class netdev_tap_class only. */
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index f4d2685a2..dac9cfb7d 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -3728,12 +3728,41 @@ netdev_linux_get_status(const struct netdev *netdev_, 
> struct smap *smap)
>  
>          COVERAGE_INC(netdev_get_ethtool);
>          memset(&netdev->drvinfo, 0, sizeof netdev->drvinfo);
> +        netdev->peer_ifindex = 0;
>          error = netdev_linux_do_ethtool(netdev->up.name,
>                                          cmd,
>                                          ETHTOOL_GDRVINFO,
>                                          "ETHTOOL_GDRVINFO");
>          if (!error) {
>              netdev->cache_valid |= VALID_DRVINFO;

Might be good to move this to the end of the 'if' body.

> +
> +            /* For veth devices, also get the peer_ifindex
> +             * via ETHTOOL_GSTATS */

The 'via ETHTOOL_GSTATS' seems unnecessary in the comment.  It's clear
from the code.  Also, all sentences should end with a period.

> +            if (!strcmp(netdev->drvinfo.driver, "veth") &&
> +                netdev->drvinfo.n_stats > 0) {
> +                /* Dynamically allocate based on n_stats
> +                 * from ETHTOOL_GDRVINFO. */

This comment is also fairly obvious from the code.

> +                size_t n_stats = netdev->drvinfo.n_stats;

May move this out of this 'if' statement and use already in the condition.

> +                size_t req_size =
> +                    sizeof(struct ethtool_stats) + sizeof(__u64) * n_stats;
> +                
> +                struct ethtool_stats *req = xzalloc(req_size);

We should avoid sizeof(type) if possible.  In this case, if you just define
the variable and then do the assignment, we can write it like:

                   req = xzalloc(sizeof *req + n_stats * sizeof req->data[0]);

> +                req->cmd = ETHTOOL_GSTATS;
> +                req->n_stats = n_stats;
> +                

There are trailing whitespaces here and in other places in the patch.
You can run checkpatch script before sending to catch this kind of
problems.

> +                int peer_error = netdev_linux_do_ethtool(netdev->up.name,

You may define this variable at the top of the block and name it a little
shorter like 's_error', so the indentation is easier to do.

> +                                                (struct ethtool_cmd *) req,
> +                                                ETHTOOL_GSTATS,
> +                                                "ETHTOOL_GSTATS");
> +                
> +                /* peer_ifindex is the first stat for veth devices */

May put the 'peer_ifindex' into single quotes.  Also, period at the end.

At the same time, it's not guaranteed that the peer_ifindex will
alwasy be the first in the array.  We actually need to request the
string set for the names and find the correct one in the set.

We already have a function to get the string set.  See the
netdev_linux_read_definitions().  It is tailored for ETH_SS_FEATURES,
but it should be easy enough to allow it to query any type of the
string set and then use here with the ETH_SS_STATS.  Then we can look
for the peer ifindex in the names and choose the correct value.

> +                if (!peer_error &&
> +                    req->n_stats > 0 && req->data[0] > 0) {

This can be a single line.

> +                    netdev->peer_ifindex = req->data[0];
> +                }
> +                
> +                free(req);
> +            }
>          }
>      }
>  
> @@ -3741,6 +3770,11 @@ netdev_linux_get_status(const struct netdev *netdev_, 
> struct smap *smap)
>          smap_add(smap, "driver_name", netdev->drvinfo.driver);
>          smap_add(smap, "driver_version", netdev->drvinfo.version);
>          smap_add(smap, "firmware_version", netdev->drvinfo.fw_version);
> +
> +        /* For veth devices, report the cached peer_ifindex. */

Also quotes.

> +        if (netdev->peer_ifindex > 0) {
> +            smap_add_format(smap, "peer_ifindex", "%"PRIu64, 
> netdev->peer_ifindex);

This line is too long.

> +        }
>      }
>      ovs_mutex_unlock(&netdev->mutex);
>  
> diff --git a/tests/system-interface.at b/tests/system-interface.at
> index 15c4a0e2e..8db97b106 100644
> --- a/tests/system-interface.at
> +++ b/tests/system-interface.at
> @@ -221,3 +221,37 @@ half
>  
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +dnl Test that peer_ifindex is correctly reported for veth devices.
> +AT_SETUP([interface - veth peer_ifindex status])
> +AT_SKIP_IF([test $HAVE_ETHTOOL = "no"])
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +dnl Create a veth pair: ovs-veth0 <-> ovs-veth1

Period.   Same for all other comments here.

> +AT_CHECK([ip link add ovs-veth0 type veth peer name ovs-veth1])
> +on_exit 'ip link del ovs-veth0'
> +
> +dnl Add ovs-veth0 to br0
> +AT_CHECK([ovs-vsctl add-port br0 ovs-veth0])
> +
> +dnl Get the actual ifindex of ovs-veth1 (the peer) and verify it's reported
> +PEER_IFINDEX=$(ip -j link show ovs-veth1 | sed -n 
> 's/.*"ifindex":\([[0-9]]*\).*/\1/p')

I'm not sure if we can rely on the JSON output.  It's not a particularly
recent thing, but still some systems may not support it.  Better if we
get it from a standard output.

> +
> +dnl Verify OVS reports the correct peer_ifindex
> +OVS_WAIT_UNTIL([
> +    OVS_PEER1=$(ovs-vsctl get interface ovs-veth0 status:peer_ifindex 
> 2>/dev/null | tr -d '"')
> +    test "$OVS_PEER1" = "$PEER_IFINDEX"
> +])

This should be OVS_WAIT_UNTIL_EQUAL.

> +
> +dnl Clean up
> +AT_CHECK([ip link del ovs-veth0])

No need to do that, the on_exit hook will delete the port.

> +
> +OVS_TRAFFIC_VSWITCHD_STOP(["dnl
> +/could not open network device ovs-veth0/d
> +/cannot get .*STP status on nonexistent port/d
> +/ethtool command .*on network device ovs-veth0 failed/d
> +/error receiving .*ovs-veth0/d
> +/ovs-veth0: removing policing failed/d"])
> +
> +AT_CLEANUP
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 29ad9c13e..e7fe5ba86 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3770,6 +3770,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>          The version string of the network adapter's firmware, if available.
>        </column>
>  
> +      <column name="status" key="peer_ifindex">
> +        <p>
> +          For veth devices on Linux, this column contains the interface index
> +          of the peer veth device. This allows applications to

Double spaces between sentences.  Here and below.

> +          programmatically discover veth pairs without requiring additional
> +          system calls or parsing network interface information.
> +        </p>
> +        <p>
> +          This value is obtained via the <code>ETHTOOL_GSTATS</code> ioctl,
> +          which queries the veth driver statistics. The peer interface name
> +          can be determined from this index using standard system functions
> +          like <code>if_indextoname()</code>.

This paragraph seems unnecessary.

> +        </p>
> +        <p>
> +          Only present for veth devices (when <ref column="status"
> +          key="driver_name"/> is "veth"). Not available on non-Linux systems.
> +        </p>
> +      </column>
> +
>        <column name="status" key="source_ip">
>          The source IP address used for an IPv4/IPv6 tunnel end-point, such as
>          <code>gre</code>.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to