On Wed, Aug 20, 2025 at 09:32:44PM -0400, Brian Song wrote: > On 8/17/25 9:45 AM, Stefan Hajnoczi wrote: > > On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote: > >> Due to kernel limitations, when the FUSE-over-io_uring option is > >> enabled, > >> you must create and assign nr_cpu IOThreads. For example: > > > > While it would be nice for the kernel to support a more flexible queue > > mapping policy, userspace can work around this. > > > > I think Kevin suggested creating the number of FUSE queues required by > > the kernel and configuring them across the user's IOThreads. That way > > the number of IOThreads can be smaller than the number of FUSE queues. > > > > Stefan > > If we are mapping user specified IOThreads to nr_cpu queues Q, when we > register entries, we need to think about how many entries in each Q[i] > go to different IOThreads, and bind the qid when submitting. Once a CQE > comes back, the corresponding IOThread handles it. Looks like we don't > really need a round robin for dispatching. The actual question is how
Round-robin is needed for qid -> IOThread mapping, not for dispatching individual requests. The kernel currently dispatches requests based on a 1:1 CPU:Queue mapping. > to split entries in each queue across IOThreads. > > For example, if we split entries evenly: > > USER: define 2 IOThreads to submit and recv ring entries > NR_CPU: 4 > > Q = malloc(sizeof(entry) * 32 * nr_cpu); > > IOThread-1: > Q[0] Q[1] Q[2] Q[3] > 16 16 16 16 > > IOThread-2: > Q[0] Q[1] Q[2] Q[3] > 16 16 16 16 There is no need to have nr_cpus queues in each IOThread. The constraint is that the total number of queues across all IOThreads must equal nr_cpus. The malloc in your example implies that each FuseQueue will have 32 entries (REGISTER uring_cmds). nr_cpu is 4 so the mapping should look like this: IOThread-1: Q[0] Q[2] 32 32 IOThread-2: Q[1] Q[3] 32 32 Stefan
signature.asc
Description: PGP signature