On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote: > > > + /* > > > + * Align on HPAGE_SIZE so "(gfn ^ pfn)& > > > + * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage > > > + * of hugepages with NPT/EPT. > > > + */ > > > + new_block->host = qemu_memalign(1<< TARGET_HPAGE_BITS, size); > > This should not be target dependent. i.e. it should be the host page size.
Yep I noticed. I'm not aware of an official way to get that information out of the kernel (hugepagesize in /proc/meminfo is dependent on hugetlbfs which in turn is not a dependency for transparent hugepage support) but hey I can add it myself to /sys/kernel/mm/transparent_hugepage/hugepage_size ! > > That is a little wasteful. How about a hint to mmap() requesting proper > > alignment (MAP_HPAGE_ALIGN)? > > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart enough > to automatically align large allocations anyway. Kernel won't do that, and the main reason is to avoid creating more vmas, it's more efficient to waste virtual space and have userland allocate more than needed, than ask the kernel alignment and force it to create more vmas because of holes generated out of it. virtual memory costs nothing. Also khugepaged can later zero out the pte_none regions to create a full segment all backed by hugepages, however if we do that khugepaged will eat into the free memory space. At the moment I kept khugepaged a zero-memory-footprint thing. But I'm currently adding an option called collapse_unmapped to allow khugepaged to collapse unmapped pages too so if there are only 2/3 pages in the region before the memalign, they also can be mapped by a large tlb to allow qemu run faster. > This is probably a useful optimization regardless of KVM. HPAGE alignment is only useful with KVM because it can only payoff with EPT/NPT, transparent hugepage already works fine without that (but ok it'd be a microoptimization for the first and last few pages in the whole vma). This is why I made it conditional to kvm_enabled(). I can remove the kvm_enabled() check if you worry about the first and last pages in the huge anon vma. OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM normally runs on 64bit hosts, so it's no big deal if we waste 1M of virtual memory here and there but I thought on qemu you preferred not to have alignment and have the first few and last few pages in a vma not backed by large tlb. Ideally we should also align on hpage size if sizeof(long) = 8. Not sure what's the recommended way to code that though and it'll make it a bit more complex for little good.