[PATCH v7 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") tried to optimize the loop in memmap_init_zone(). But there is still some room for improvement. Patch 1 remain the memblock_next_valid_pfn on arm and arm64 Patch 2 optimizes the memblock_next_valid_pfn() Patch 3~5 optimizes the early_pfn_valid() As for the performance improvement, after this set, I can see the time overhead of memmap_init() is reduced from 41313 us to 24345 us in my armv8a server(QDF2400 with 96G memory). Attached the memblock region information in my server. [ 86.956758] Zone ranges: [ 86.959452] DMA [mem 0x0020-0x] [ 86.966041] Normal [mem 0x0001-0x0017] [ 86.972631] Movable zone start for each node [ 86.977179] Early memory node ranges [ 86.980985] node 0: [mem 0x0020-0x0021] [ 86.987666] node 0: [mem 0x0082-0x0307] [ 86.994348] node 0: [mem 0x0308-0x0308] [ 87.001029] node 0: [mem 0x0309-0x031f] [ 87.007710] node 0: [mem 0x0320-0x033f] [ 87.014392] node 0: [mem 0x0341-0x0563] [ 87.021073] node 0: [mem 0x0564-0x0567] [ 87.027754] node 0: [mem 0x0568-0x056d] [ 87.034435] node 0: [mem 0x056e-0x086f] [ 87.041117] node 0: [mem 0x0870-0x0871] [ 87.047798] node 0: [mem 0x0872-0x0894] [ 87.054479] node 0: [mem 0x0895-0x08ba] [ 87.061161] node 0: [mem 0x08bb-0x08bc] [ 87.067842] node 0: [mem 0x08bd-0x08c4] [ 87.074524] node 0: [mem 0x08c5-0x08e2] [ 87.081205] node 0: [mem 0x08e3-0x08e4] [ 87.087886] node 0: [mem 0x08e5-0x08fc] [ 87.094568] node 0: [mem 0x08fd-0x0910] [ 87.101249] node 0: [mem 0x0911-0x092e] [ 87.107930] node 0: [mem 0x092f-0x0930] [ 87.114612] node 0: [mem 0x0931-0x0963] [ 87.121293] node 0: [mem 0x0964-0x0e61] [ 87.127975] node 0: [mem 0x0e62-0x0e64] [ 87.134657] node 0: [mem 0x0e65-0x0fff] [ 87.141338] node 0: [mem 0x1080-0x17fe] [ 87.148019] node 0: [mem 0x1c00-0x1c00] [ 87.154701] node 0: [mem 0x1c01-0x1c7f] [ 87.161383] node 0: [mem 0x1c81-0x7efb] [ 87.168064] node 0: [mem 0x7efc-0x7efd] [ 87.174746] node 0: [mem 0x7efe-0x7efe] [ 87.181427] node 0: [mem 0x7eff-0x7eff] [ 87.188108] node 0: [mem 0x7f00-0x0017] [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] Without this patchset: [ 117.106153] Initmem setup node 0 [mem 0x0020-0x0017] [ 117.113677] before memmap_init [ 117.118195] after memmap_init >>> memmap_init takes 4518 us [ 117.121446] before memmap_init [ 117.154992] after memmap_init >>> memmap_init takes 33546 us [ 117.158241] before memmap_init [ 117.161490] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 41313 us With this patchset: [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] [ 87.202314] before memmap_init [ 87.206164] after memmap_init >>> memmap_init takes 3850 us [ 87.209416] before memmap_init [ 87.226662] after memmap_init >>> memmap_init takes 17246 us [ 87.229911] before memmap_init [ 87.233160] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 24345 us Changelog: V7: - fix i386 compilation error. refine the commit description V6: - simplify the codes, move arm/arm64 common codes to one file. - refine patches as suggested by Danial Vacek and Ard Biesheuvel V5: - further refining as suggested by Danial Vacek. Make codes arm/arm64 more arch specific V4: - refine patches as suggested by Danial Vacek and Wei Yang - optimized on arm besides arm64 V3: - fix 2 issues reported by kbuild test robot V2: - rebase to mmotm latest - remain memblock_next_valid_pfn on arm64 - refine memblock_search_pfn_regions and pfn_valid_region Jia He (5): mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64 arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() mm/memblock: introduce memblock_search_pfn_regions() arm: arm64: introduce pfn_valid_region() mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() arch/arm/mm/init.c
[PATCH v7 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") tried to optimize the loop in memmap_init_zone(). But there is still some room for improvement. Patch 1 remain the memblock_next_valid_pfn on arm and arm64 Patch 2 optimizes the memblock_next_valid_pfn() Patch 3~5 optimizes the early_pfn_valid() As for the performance improvement, after this set, I can see the time overhead of memmap_init() is reduced from 41313 us to 24345 us in my armv8a server(QDF2400 with 96G memory). Attached the memblock region information in my server. [ 86.956758] Zone ranges: [ 86.959452] DMA [mem 0x0020-0x] [ 86.966041] Normal [mem 0x0001-0x0017] [ 86.972631] Movable zone start for each node [ 86.977179] Early memory node ranges [ 86.980985] node 0: [mem 0x0020-0x0021] [ 86.987666] node 0: [mem 0x0082-0x0307] [ 86.994348] node 0: [mem 0x0308-0x0308] [ 87.001029] node 0: [mem 0x0309-0x031f] [ 87.007710] node 0: [mem 0x0320-0x033f] [ 87.014392] node 0: [mem 0x0341-0x0563] [ 87.021073] node 0: [mem 0x0564-0x0567] [ 87.027754] node 0: [mem 0x0568-0x056d] [ 87.034435] node 0: [mem 0x056e-0x086f] [ 87.041117] node 0: [mem 0x0870-0x0871] [ 87.047798] node 0: [mem 0x0872-0x0894] [ 87.054479] node 0: [mem 0x0895-0x08ba] [ 87.061161] node 0: [mem 0x08bb-0x08bc] [ 87.067842] node 0: [mem 0x08bd-0x08c4] [ 87.074524] node 0: [mem 0x08c5-0x08e2] [ 87.081205] node 0: [mem 0x08e3-0x08e4] [ 87.087886] node 0: [mem 0x08e5-0x08fc] [ 87.094568] node 0: [mem 0x08fd-0x0910] [ 87.101249] node 0: [mem 0x0911-0x092e] [ 87.107930] node 0: [mem 0x092f-0x0930] [ 87.114612] node 0: [mem 0x0931-0x0963] [ 87.121293] node 0: [mem 0x0964-0x0e61] [ 87.127975] node 0: [mem 0x0e62-0x0e64] [ 87.134657] node 0: [mem 0x0e65-0x0fff] [ 87.141338] node 0: [mem 0x1080-0x17fe] [ 87.148019] node 0: [mem 0x1c00-0x1c00] [ 87.154701] node 0: [mem 0x1c01-0x1c7f] [ 87.161383] node 0: [mem 0x1c81-0x7efb] [ 87.168064] node 0: [mem 0x7efc-0x7efd] [ 87.174746] node 0: [mem 0x7efe-0x7efe] [ 87.181427] node 0: [mem 0x7eff-0x7eff] [ 87.188108] node 0: [mem 0x7f00-0x0017] [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] Without this patchset: [ 117.106153] Initmem setup node 0 [mem 0x0020-0x0017] [ 117.113677] before memmap_init [ 117.118195] after memmap_init >>> memmap_init takes 4518 us [ 117.121446] before memmap_init [ 117.154992] after memmap_init >>> memmap_init takes 33546 us [ 117.158241] before memmap_init [ 117.161490] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 41313 us With this patchset: [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] [ 87.202314] before memmap_init [ 87.206164] after memmap_init >>> memmap_init takes 3850 us [ 87.209416] before memmap_init [ 87.226662] after memmap_init >>> memmap_init takes 17246 us [ 87.229911] before memmap_init [ 87.233160] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 24345 us Changelog: V7: - fix i386 compilation error. refine the commit description V6: - simplify the codes, move arm/arm64 common codes to one file. - refine patches as suggested by Danial Vacek and Ard Biesheuvel V5: - further refining as suggested by Danial Vacek. Make codes arm/arm64 more arch specific V4: - refine patches as suggested by Danial Vacek and Wei Yang - optimized on arm besides arm64 V3: - fix 2 issues reported by kbuild test robot V2: - rebase to mmotm latest - remain memblock_next_valid_pfn on arm64 - refine memblock_search_pfn_regions and pfn_valid_region Jia He (5): mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64 arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() mm/memblock: introduce memblock_search_pfn_regions() arm: arm64: introduce pfn_valid_region() mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() arch/arm/mm/init.c