On Mon, Dec 09, 2013 at 06:33:41PM +0100, Paolo Bonzini wrote: > Il 06/12/2013 19:49, Marcelo Tosatti ha scritto: > >> > You'll have with your patches (without them it's worse of course): > >> > > >> > RAM offset physical address node 0 > >> > 0-3840M 0-3840M host node 0 > >> > 4096M-4352M 4096M-4352M host node 0 > >> > 4352M-8192M 4352M-8192M host node 1 > >> > 3840M-4096M 8192M-8448M host node 1 > >> > > >> > So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use > >> > gigabyte pages because they are split across host nodes. > > AFAIK the TLB caches virt->phys translations, why specifics of > > a given phys address is a factor into TLB caching? > > The problem is that "-numa mem" receives memory sizes and these do not > take into account the hole below 4G. > > Thus, two adjacent host-physical addresses (two adjacent ram_addr_t-s) > map to very far guest-physical addresses, are assigned to different > guest nodes, and from there to different host nodes. In the above > example this happens for 3G-5G.
Physical address which is what the TLB uses does not take node information into account. > On second thought, this is not particularly important, or at least not > yet. It's not really possible to control the NUMA policy for > hugetlbfs-allocated memory, right? It is possible. I don't know what happens if conflicting NUMA policies are specified for different virtual address ranges that map to a single huge page. In whatever way that is resolved by the kernel, it is not relevant since the TLB caches phys->virt translations and not {phys, node info}->virt translations. > >> > So rather than your patches, it seems simpler to just widen the PCI hole > >> > to 1G for i440FX and 2G for q35. > >> > > >> > What do you think? > > > > Problem is its a guest visible change. To get 1GB TLB entries with > > "legacy guest visible machine types" (which require new machine types > > at the host side, but invisible to guest), that won't work. > > Windows registration invalidation etc. > > Yeah, that's a tradeoff to make. Perhaps increasing the PCI hole size should be done for other reasons? Note that dropping the 1GB alignment piix.c patch requires the hole size + start to be 1G aligned.