Re: [PATCH v2] accel/qaic: Handle DBC deactivation if the owner went away

2026-01-12 Thread Youssef Abdulrahman
On Wed, Jan 7, 2026 at 6:14 PM Lizhi Hou  wrote:
>
> > @@ -1108,6 +1108,9 @@ static void *msg_xfer(struct qaic_device *qdev, 
> > struct wrapper_list *wrappers, u
> >   mutex_lock(&qdev->cntl_mutex);
> >   if (!list_empty(&elem.list))
> >   list_del(&elem.list);
> > + /* resp_worker() processed the response but the wait was interrupted 
> > */
> > + else if (list_empty(&elem.list) && ret == -ERESTARTSYS)
>
> Rechecking list_empty(&elem.list) can be removed.
Good point, I'll apply this in the next revision.
>
> And if ret == -ERESTARTSYS, elem.buf is not NULL?
No, this check handles the case where resp_worker() is able to get to
the element inside list_for_each_*(), which will set elem->buf to the
resp message, at the same time the user interrupts the wait in
msg_xfer(). So, it will be treated as a valid response message.

- Youssef


Re: [PATCH v2] accel/qaic: Handle DBC deactivation if the owner went away

2026-01-07 Thread Lizhi Hou



On 12/24/25 06:30, Youssef Samir wrote:

When a DBC is released, the device sends a QAIC_TRANS_DEACTIVATE_FROM_DEV
transaction to the host over the QAIC_CONTROL MHI channel. QAIC handles
this by calling decode_deactivate() to release the resources allocated for
that DBC. Since that handling is done in the qaic_manage_ioctl() context,
if the user goes away before receiving and handling the deactivation, the
host will be out-of-sync with the DBCs available for use, and the DBC
resources will not be freed unless the device is removed. If another user
loads and requests to activate a network, then the device assigns the same
DBC to that network, QAIC will "indefinitely" wait for dbc->in_use = false,
leading the user process to hang.

As a solution to this, handle QAIC_TRANS_DEACTIVATE_FROM_DEV transactions
that are received after the user has gone away.

Fixes: 129776ac2e38 ("accel/qaic: Add control path")
Signed-off-by: Youssef Samir 
---
Changes in V2:
- Add missing closing bracket in resp_worker()
- Link to V1: 
https://lore.kernel.org/all/[email protected]
---
  drivers/accel/qaic/qaic_control.c | 47 +--
  1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/accel/qaic/qaic_control.c 
b/drivers/accel/qaic/qaic_control.c
index 428d8f65bff3..53afb647ecc4 100644
--- a/drivers/accel/qaic/qaic_control.c
+++ b/drivers/accel/qaic/qaic_control.c
@@ -913,7 +913,7 @@ static int decode_deactivate(struct qaic_device *qdev, void 
*trans, u32 *msg_len
 */
return -ENODEV;
  
-	if (status) {

+   if (usr && status) {
/*
 * Releasing resources failed on the device side, which puts
 * us in a bind since they may still be in use, so enable the
@@ -1108,6 +1108,9 @@ static void *msg_xfer(struct qaic_device *qdev, struct 
wrapper_list *wrappers, u
mutex_lock(&qdev->cntl_mutex);
if (!list_empty(&elem.list))
list_del(&elem.list);
+   /* resp_worker() processed the response but the wait was interrupted */
+   else if (list_empty(&elem.list) && ret == -ERESTARTSYS)


Rechecking list_empty(&elem.list) can be removed.

And if ret == -ERESTARTSYS, elem.buf is not NULL?

Lizhi


+   ret = 0;
if (!ret && !elem.buf)
ret = -ETIMEDOUT;
else if (ret > 0 && !elem.buf)
@@ -1418,9 +1421,49 @@ static void resp_worker(struct work_struct *work)
}
mutex_unlock(&qdev->cntl_mutex);
  
-	if (!found)

+   if (!found) {
+   /*
+* The user might have gone away at this point without waiting
+* for QAIC_TRANS_DEACTIVATE_FROM_DEV transaction coming from
+* the device. If this is not handled correctly, the host will
+* not know that the DBC[n] has been freed on the device.
+* Due to this failure in synchronization between the device and
+* the host, if another user requests to activate a network, and
+* the device assigns DBC[n] again, save_dbc_buf() will hang,
+* waiting for dbc[n]->in_use to be set to false, which will not
+* happen unless the qaic_dev_reset_clean_local_state() gets
+* called by resetting the device (or re-inserting the module).
+*
+* As a solution, we look for QAIC_TRANS_DEACTIVATE_FROM_DEV
+* transactions in the message before disposing of it, then
+* handle releasing the DBC resources.
+*
+* Since the user has gone away, if the device could not
+* deactivate the network (status != 0), there is no way to
+* enable and reassign the DBC to the user. We can put trust in
+* the device that it will release all the active DBCs in
+* response to the QAIC_TRANS_TERMINATE_TO_DEV transaction,
+* otherwise, the user can issue an soc_reset to the device.
+*/
+   u32 msg_count = le32_to_cpu(msg->hdr.count);
+   u32 msg_len = le32_to_cpu(msg->hdr.len);
+   u32 len = 0;
+   int j;
+
+   for (j = 0; j < msg_count && len < msg_len; ++j) {
+   struct wire_trans_hdr *trans_hdr;
+
+   trans_hdr = (struct wire_trans_hdr *)(msg->data + len);
+   if (le32_to_cpu(trans_hdr->type) == 
QAIC_TRANS_DEACTIVATE_FROM_DEV) {
+   if (decode_deactivate(qdev, trans_hdr, &len, 
NULL))
+   len += le32_to_cpu(trans_hdr->len);
+   } else {
+   len += le32_to_cpu(trans_hdr->len);
+   }
+   }
/* request must have timed out, drop packet */
kfree(msg);
+