On Wed, Aug 20, 2025 at 09:32:44PM -0400, Brian Song wrote:
> On 8/17/25 9:45 AM, Stefan Hajnoczi wrote:
> > On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote:
> >> Due to kernel limitations, when the FUSE-over-io_uring option is
> >> enabled,
> >> you must create and assign nr_cpu IOThreads. For example:
> >
> > While it would be nice for the kernel to support a more flexible queue
> > mapping policy, userspace can work around this.
> >
> > I think Kevin suggested creating the number of FUSE queues required by
> > the kernel and configuring them across the user's IOThreads. That way
> > the number of IOThreads can be smaller than the number of FUSE queues.
> >
> > Stefan
> 
> If we are mapping user specified IOThreads to nr_cpu queues Q, when we
> register entries, we need to think about how many entries in each Q[i]
> go to different IOThreads, and bind the qid when submitting. Once a CQE
> comes back, the corresponding IOThread handles it. Looks like we don't
> really need a round robin for dispatching. The actual question is how

Round-robin is needed for qid -> IOThread mapping, not for dispatching
individual requests. The kernel currently dispatches requests based on a
1:1 CPU:Queue mapping.

> to split entries in each queue across IOThreads.
> 
> For example, if we split entries evenly:
> 
> USER: define 2 IOThreads to submit and recv ring entries
> NR_CPU: 4
> 
> Q = malloc(sizeof(entry) * 32 * nr_cpu);
> 
> IOThread-1:
> Q[0] Q[1] Q[2] Q[3]
>   16   16   16   16
> 
> IOThread-2:
> Q[0] Q[1] Q[2] Q[3]
>   16   16   16   16

There is no need to have nr_cpus queues in each IOThread. The constraint
is that the total number of queues across all IOThreads must equal
nr_cpus.

The malloc in your example implies that each FuseQueue will have 32
entries (REGISTER uring_cmds). nr_cpu is 4 so the mapping should look
like this:

IOThread-1:
Q[0] Q[2]
  32   32

IOThread-2:
Q[1] Q[3]
  32   32

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to