----- Original Message ----- > > > ----- Original Message ----- > > On 08/28/2013 09:44 PM, Paolo Bonzini wrote: > > > Il 26/08/2013 10:43, Andrew Jones ha scritto: > > >> > > >> ----- Original Message ----- > > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote: > > >>>>>>>>>> Is this patch still necessary? I thought that dropping the > > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid > > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses? > > >>>>>>>>>> > > >>>>>>>>>> Yes, in 08/12 we also use mbind(), > > >>>>>> You don't need a whole library for mbind(), it's a syscall. See > > >>>>>> syscall(2). > > >>>>>> > > >>>>>>>>>> and in 09/12 we use max_numa_node(). > > >>>>>> Really? I didn't see it there. And anyway, that goes back to our > > >>>>>> discussion > > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should > > >>>>>> support, > > >>>>>> and then just checking that we don't blow that limit whenever > > >>>>>> reading > > >>>>>> host node info, i.e. > > >>>>>> > > >>>>>> maxnode = 0; > > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES) > > >>>>>> node_read(&info[maxnode++]); > > >>>>>> > > >>>>>> type of a thing. > > >>>>>> > > >>>>>> And, if there's a place you really need to know the current online > > >>>>>> number > > >>>>>> of host nodes, then, like I said earlier, you should just go to > > >>>>>> sysfs > > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only > > >>>>>> initializes > > >>>>>> at library load time, so it's not going to adapt to > > >>>>>> onlining/offlining. > > >>>> > > >>>> OK, thank you. > > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall > > >>>> directly, > > >>>> right? > > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a > > >> more > > >> general lib. Whether or not we want to redefine those symbols within > > >> qemu, in order to avoid the dependency on installing numactl-devel, > > >> isn't > > >> something I can answer. That's a better question for Anthony. Anthony? > > >> Paolo, > > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the > > >> linux-header synch script? > > >> > > > > > > I think using libnuma is fine. In principle this could be used on other > > > OSes than Linux, I think? > > > > But seems that mbind(2) is Linux-specific syscall, right? > > > > You would need to avoid directly calling mbind, i.e. use libnuma for all > numa related calls. Then, if libnuma were to support more OSes, qemu would > automatically (wrt to numa) as well. Your mbind() with libnuma would look > like this > > numa_set_bind_policy(strict) > numa_tonodemask_memory(addr, size, nodemask) > > The problem is that set_bind_policy only takes a bool, and thus only > allows two of the four possibly policies > > MPOL_BIND strict == 1 > MPOL_PREFERRED strict == 0 >
Ah, there is a way to get interleave policy if (policy == interleave) { numa_interleave_memory(addr, size, nodemask) } else { numa_set_bind_policy(strict) numa_tonodemask_memory(addr, size, nodemask) } a bit clunky. And I still don't see a way to select MPOL_DEFAULT, nor a way to use any additional flags, such as MPOL_F_RELATIVE_NODES. > So, due to libnuma's policy setting limitations, and the fact it doesn't > currently support more OSes than Linux, then I prefer your current > series version that drops libnuma. If qemu will need to support NUMA on > another OS, then we can cross this bridge when we get there.