The nipa/ai-review bot found these problems:

| 
https://netdev-ai.bots.linux.dev/ai-review.html?id=e2753951-0455-4ce7-ad95-3da80456b40e

commit bab72ffd354779765bf84bc17cc185e0662d3717
Author: Matias Ezequiel Vara Larsen <[email protected]>

can: virtio: Add virtio CAN driver

This commit adds a new virtio CAN driver implementing a complete CAN bus
interface over Virtio transport, supporting both CAN Classic and CAN-FD
frames.

> diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
> index e15e320db476..ec941f3c25f6 100644
> --- a/drivers/net/can/Kconfig
> +++ b/drivers/net/can/Kconfig
> @@ -234,6 +234,18 @@ config CAN_XILINXCAN
>         Xilinx CAN driver. This driver supports both soft AXI CAN IP and
>         Zynq CANPS IP.
>
> +config CAN_VIRTIO_CAN
> +     depends on VIRTIO
> +     tristate "Virtio CAN device support"

The CAN_VIRTIO_CAN config entry is placed after CAN_XILINXCAN, breaking
the alphabetical ordering maintained by all other config entries in this
file. Since V comes before X alphabetically, should CAN_VIRTIO_CAN be
placed between CAN_TI_HECC and CAN_XILINXCAN? The Makefile entry is
correctly ordered (virtio_can.o appears before xilinx_can.o), which makes
this inconsistency more evident.

> +     default n
> +     help
> +       Say Y here if you want to support for Virtio CAN.
> +
> +       To compile this driver as a module, choose M here: the
> +       module will be called virtio-can.
> +
> +       If unsure, say N.
> +

[ ... ]

> diff --git a/drivers/net/can/virtio_can.c b/drivers/net/can/virtio_can.c
> new file mode 100644
> index 000000000000..8d60336b3ebe
> --- /dev/null
> +++ b/drivers/net/can/virtio_can.c
> @@ -0,0 +1,934 @@

[ ... ]

> +static int virtio_can_alloc_tx_idx(struct virtio_can_priv *priv)
> +{
> +     int tx_idx;
> +
> +     tx_idx = ida_alloc_max(&priv->tx_putidx_ida,
> +                      priv->can.echo_skb_max - 1, GFP_KERNEL);

Is GFP_KERNEL safe here? This function is called from
virtio_can_start_xmit(), which is the ndo_start_xmit callback. This
callback can be invoked from softirq context with BH disabled, where
sleeping is not allowed. Should this use GFP_ATOMIC to match the calling
context?

> +     if (tx_idx >= 0)
> +             atomic_inc(&priv->tx_inflight);
> +
> +     return tx_idx;
> +}

[ ... ]

> +static int virtio_can_open(struct net_device *ndev)
> +{
> +     virtio_can_start(ndev);
> +
> +     netif_start_queue(ndev);
> +
> +     return 0;
> +}

Is open_candev() missing? The CAN framework requires drivers to call
open_candev() in their ndo_open handler. open_candev() validates that
bittiming has been configured (returns -EINVAL if bitrate is 0), checks
CAN FD data bitrate requirements, and manages carrier state. Without this
call, can the device be brought up without configuring bitrate? The driver
does call close_candev() in virtio_can_close(), showing awareness of the
CAN lifecycle. Every other CAN driver in the kernel calls open_candev() in
its open function.

[ ... ]

> +static netdev_tx_t virtio_can_start_xmit(struct sk_buff *skb,
> +                                      struct net_device *dev)
> +{

[ ... ]

> +     can_tx_msg = kzalloc(sizeof(*can_tx_msg) + cf->len, GFP_ATOMIC);
> +     if (!can_tx_msg) {
> +             dev->stats.tx_dropped++;
> +             goto kick; /* No way to return NET_XMIT_DROP here */
> +     }

Does this leak the skb? When can_dev_dropped_skb() returns false (valid
frame, skb not consumed) and kzalloc() for can_tx_msg fails, the code
jumps to 'kick' without freeing the skb. The skb is still owned by the
caller at this point. Should kfree_skb(skb) be called before goto kick?

> +
> +     can_tx_msg->tx_out.msg_type = cpu_to_le16(VIRTIO_CAN_TX);
> +     can_flags = 0;

[ ... ]

> +     putidx = virtio_can_alloc_tx_idx(priv);
> +
> +     if (unlikely(putidx < 0)) {
> +             /* -ENOMEM or -ENOSPC here. -ENOSPC should not be possible as
> +              * tx_inflight >= can.echo_skb_max is checked in flow control
> +              */
> +             WARN_ON_ONCE(putidx == -ENOSPC);
> +             kfree(can_tx_msg);
> +             dev->stats.tx_dropped++;
> +             goto kick; /* No way to return NET_XMIT_DROP here */
> +     }

Does this also leak the skb? When ida_alloc_max() returns -ENOMEM,
can_tx_msg is correctly freed but the skb is not. can_put_echo_skb() has
not been called yet at this point, so the skb is still owned by
virtio_can_start_xmit. Should kfree_skb(skb) be called before goto kick?

> +
> +     can_tx_msg->putidx = (unsigned int)putidx;
> +
> +     /* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
> +     can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);

[ ... ]

> +static void virtio_can_populate_rx_vq(struct virtio_device *vdev)
> +{
> +     struct virtio_can_priv *priv = vdev->priv;
> +     struct virtqueue *vq = priv->vqs[VIRTIO_CAN_QUEUE_RX];
> +     struct virtio_can_rx *buf = priv->rpkt;
> +     unsigned int buf_size = priv->rpkt_len;
> +     int num_elements = vq->num_free;
> +     unsigned int idx;
> +     int ret;
> +
> +     for (idx = 0; idx < num_elements; idx++) {
> +             ret = virtio_can_add_inbuf(vq, buf, buf_size);
> +             if (ret < 0) {
> +                     dev_dbg(&vdev->dev, "rpkt fill: ret=%d, idx=%u, 
> size=%u\n",
> +                             ret, idx, buf_size);
> +                     break;
> +             }
> +             buf += buf_size;
> +     }

Does this cause an out-of-bounds memory access? 'buf' is declared as
'struct virtio_can_rx *' and 'buf_size' equals 'priv->rpkt_len'
(sizeof(struct virtio_can_rx) + CAN[FD]_MAX_DLEN). The expression
'buf += buf_size' advances by buf_size * sizeof(struct virtio_can_rx)
bytes instead of the intended buf_size bytes, because C pointer arithmetic
scales by the pointed-to type size.

For CAN classic (rpkt_len=24, sizeof(struct virtio_can_rx)=16), this
advances 384 bytes per iteration instead of 24. The allocation in
virtio_can_probe only provides rpkt_len bytes per element, so doesn't the
second and all subsequent loop iterations pass out-of-bounds pointers to
the virtqueue as RX buffers? When the device writes to these buffers, can
this corrupt adjacent kernel memory?

virtio_can_probe():
    priv->rpkt = devm_kzalloc(&vdev->dev, priv->rpkt_len *
                                priv->vqs[VIRTIO_CAN_QUEUE_RX]->num_free,
                                GFP_KERNEL);

Should a 'char *' or 'u8 *' pointer be used for the byte-level iteration
instead, e.g. 'u8 *p = (u8 *)priv->rpkt; ... p += buf_size;' and pass 'p'
to virtio_can_add_inbuf?

> +
> +     if (!ret)

Does this use an uninitialized variable? The variable 'ret' is declared as
'int ret;' without initialization. If vq->num_free equals 0, the for loop
body never executes, and 'ret' is read uninitialized here. Should this be
initialized to 0?

> +             virtqueue_kick(vq);
> +
> +     dev_dbg(&vdev->dev, "%u rpkt added\n", idx);
> +}

[ ... ]

> +/* Function must not be called before virtio_can_find_vqs() has been run */
> +static void virtio_can_del_vq(struct virtio_device *vdev)
> +{
> +     struct virtio_can_priv *priv = vdev->priv;
> +     int q;
> +
> +     /* Reset the device */
> +     if (vdev->config->reset)
> +             vdev->config->reset(vdev);

Should this use virtio_reset_device() instead? When
CONFIG_VIRTIO_HARDEN_NOTIFICATION is enabled, virtio_reset_device() calls
virtio_break_device() to mark all virtqueues as broken and
virtio_synchronize_cbs() to wait for in-flight callbacks to complete
before performing the actual reset. Calling config->reset directly skips
these hardening steps, meaning virtqueue callbacks may still be executing
when del_vqs destroys the queues. All other virtio network drivers
(virtio_net.c, caif_virtio.c, mac80211_hwsim.c) use virtio_reset_device().
The NULL check is also unnecessary as this op is always present in virtio
devices.

> +
> +     /* From here we have dead silence from the device side so no locks
> +      * are needed to protect against device side events.
> +      */
> +
> +     for (q = 0; q < VIRTIO_CAN_QUEUE_COUNT; q++)
> +             while (virtqueue_detach_unused_buf(priv->vqs[q]))
> +                     ;
> +
> +     if (vdev->config->del_vqs)
> +             vdev->config->del_vqs(vdev);
> +}
> +
> +static void virtio_can_remove(struct virtio_device *vdev)
> +{
> +     struct virtio_can_priv *priv = vdev->priv;
> +     struct net_device *dev = priv->dev;
> +
> +     unregister_candev(dev);
> +
> +     virtio_can_del_vq(vdev);

Are napi_disable() calls missing here? The probe function enables both NAPI
instances:

virtio_can_probe():
    napi_enable(&priv->napi);
    napi_enable(&priv->napi_tx);

And virtio_can_freeze() correctly disables them:

virtio_can_freeze():
    napi_disable(&priv->napi);
    napi_disable(&priv->napi_tx);

But virtio_can_remove() never disables NAPI. The ndo_stop callback
(virtio_can_close) intentionally leaves NAPI active per its comment.
Can the NAPI poll callbacks (virtio_can_rx_poll, virtio_can_tx_poll)
execute concurrently with virtio_can_del_vq() deleting virtqueues and
virtio_can_free_candev() freeing the net_device? Should napi_disable() be
called for both instances before unregister_candev()?

> +
> +     virtio_can_free_candev(dev);
> +}

[ ... ]
--
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

Attachment: signature.asc
Description: PGP signature

Reply via email to