On Wed, Nov 25, 2020 at 09:45:19AM +0100, David Hildenbrand wrote: > On 25.11.20 09:38, Andrew Jones wrote: > > On Tue, Nov 24, 2020 at 08:17:35PM +0100, David Hildenbrand wrote: > >> On 24.11.20 19:11, Jonathan Cameron wrote: > >>> On Mon, 9 Nov 2020 20:47:09 +0100 > >>> David Hildenbrand <da...@redhat.com> wrote: > >>> > >>> +CC Eric based on similar query in other branch of the thread. > >>> > >>>> On 05.11.20 18:43, Jonathan Cameron wrote: > >>>>> Basically a cut and paste job from the x86 support with the exception of > >>>>> needing a larger block size as the Memory Block Size (MIN_SECTION_SIZE) > >>>>> on ARM64 in Linux is 1G. > >>>>> > >>>>> Tested: > >>>>> * In full emulation and with KVM on an arm64 server. > >>>>> * cold and hotplug for the virtio-mem-pci device. > >>>>> * Wide range of memory sizes, added at creation and later. > >>>>> * Fairly basic memory usage of memory added. Seems to function as > >>>>> normal. > >>>>> * NUMA setup with virtio-mem-pci devices on each node. > >>>>> * Simple migration test. > >>>>> > >>>>> Related kernel patch just enables the Kconfig item for ARM64 as an > >>>>> alternative to x86 in drivers/virtio/Kconfig > >>>>> > >>>>> The original patches from David Hildenbrand stated that he thought it > >>>>> should > >>>>> work for ARM64 but it wasn't enabled in the kernel [1] > >>>>> It appears he was correct and everything 'just works'. > >>>>> > >>>>> The build system related stuff is intended to ensure virtio-mem support > >>>>> is > >>>>> not built for arm32 (build will fail due no defined block size). > >>>>> If there is a more elegant way to do this, please point me in the right > >>>>> direction. > >>>> > >>>> You might be aware of https://virtio-mem.gitlab.io/developer-guide.html > >>>> and the "issue" with 64k base pages - 512MB granularity. Similar as the > >>>> question from Auger, have you tried running arm64 with differing page > >>>> sizes in host/guest? > >>>> > >>> > >>> Hi David, > >>> > >>>> With recent kernels, you can use "memhp_default_state=online_movable" on > >>>> the kernel cmdline to make memory unplug more likely to succeed - > >>>> especially with 64k base pages. You just have to be sure to not hotplug > >>>> "too much memory" to a VM. > >>> > >>> Thanks for the pointer - that definitely simplifies testing. Was getting > >>> a bit > >>> tedious with out that. > >>> > >>> As ever other stuff got in the way, so I only just got back to looking at > >>> this. > >>> > >>> I've not done a particularly comprehensive set of tests yet, but things > >>> seem > >>> to 'work' with mixed page sizes. > >>> > >>> With 64K pages in general, you run into a problem with the device > >>> block_size being > >>> smaller than the subblock_size. I've just added a check for that into the > >> > >> "device block size smaller than subblock size" - that's very common, > >> e.g., on x86-64. > >> > >> E.g., device_block_size is 2MiB, subblock size 4MiB - until we improve > >> that in the future in Linux guests. > >> > >> Or did you mean something else? > >> > >>> virtio-mem kernel driver and have it fail to probe if that happens. I > >>> don't > >>> think such a setup makes any sense anyway so no loss there. Should it > >>> make sense > >>> to drop that restriction in the future we can deal with that then without > >>> breaking > >>> backwards compatibility. > >>> > >>> So the question is whether it makes sense to bother with virtio-mem > >>> support > >>> at all on ARM64 with 64k pages given currently the minimum workable > >>> block_size > >>> is 512MiB? I guess there is an argument of virtio-mem being a possibly > >>> more > >>> convenient interface than full memory HP. Curious to hear what people > >>> think on > >>> this? > >> > >> IMHO we really want it. For example, RHEL is always 64k. This is a > >> current guest limitation, to be improved in the future - either by > >> moving away from 512MB huge pages with 64k or by improving > >> alloc_contig_range(). > > > > Even with 64k pages you may be able to have 2MB huge pages by setting > > default_hugepagesz=2M on the kernel command line. > > Yes, but not for THP, right? Last time I checked that move was not > performed yet - resulting in MAX_ORDER/pageblock_order in Linux > corresponding to 512 MB. >
Yes, I believe you're correct. At least on the machine I've booted with default_hugepagesz=2M, I see $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size 536870912 (I'm not running a latest mainline kernel though.) Thanks, drew