Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Pierre, On Mon, Mar 18, 2024 at 4:59 PM Pierre-Clément Tosi wrote: > I notice that the mem_map in these logs have overlapping ranges, which results > in unnecessary work when creating the PTs. For this reason, it would make > sense > to prune it in arch/arm/mach-imx/imx8/cpu.c before calling dcache_enable(), > IMO. > On this point, you also have contiguous ranges with identical attributes > (mem_map[0] and mem_map[6]), which could be merged into a single entry. This > could result in more efficient MMU mappings if the mem_map entries can share a > block mapping. Also note that mem_map[4].size=0 so could be dropped. > > In any case, if we assume that overlapping mem_map entries are a valid input > to > the arch code (maybe not as it leads to potentially ambiguous behavior?), then > 41e2787f5ec4 had removed support for splitting existing block mappings. > > In your case, my assumption is that the function was then treating block > mappings in the range 0x1c00-0x8000 (which get mapped, at least > partially, by mem_map[0], mem_map[1], then mem_map[2]) as table mappings and > was > issuing read/write instructions in that range (accessing them as PTEs). As > those > seem to be device memory (I see they get mapped as MT_DEVICE_NGNRNE), these > accesses might explain the SError you were experiencing. > > Would you mind testing [1] and giving it "Tested-by:" if it addresses the > issue? Your patch fixed the boot regression. Thanks for your fix, appreciated it! I have replied with my Tested-by tag. Cheers, Fabio Estevam
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Fabio, On Mon, Mar 18, 2024 at 10:31:25AM -0300, Fabio Estevam wrote: > Please find the new logs here: > > https://pastebin.com/raw/qF3GbJry I notice that the mem_map in these logs have overlapping ranges, which results in unnecessary work when creating the PTs. For this reason, it would make sense to prune it in arch/arm/mach-imx/imx8/cpu.c before calling dcache_enable(), IMO. On this point, you also have contiguous ranges with identical attributes (mem_map[0] and mem_map[6]), which could be merged into a single entry. This could result in more efficient MMU mappings if the mem_map entries can share a block mapping. Also note that mem_map[4].size=0 so could be dropped. In any case, if we assume that overlapping mem_map entries are a valid input to the arch code (maybe not as it leads to potentially ambiguous behavior?), then 41e2787f5ec4 had removed support for splitting existing block mappings. In your case, my assumption is that the function was then treating block mappings in the range 0x1c00-0x8000 (which get mapped, at least partially, by mem_map[0], mem_map[1], then mem_map[2]) as table mappings and was issuing read/write instructions in that range (accessing them as PTEs). As those seem to be device memory (I see they get mapped as MT_DEVICE_NGNRNE), these accesses might explain the SError you were experiencing. Would you mind testing [1] and giving it "Tested-by:" if it addresses the issue? Thanks, [1]: https://lore.kernel.org/u-boot/43haokus4jdxguk4arig5tsqcgq2wbezwpbj7oti6pdkvrfual@wa7vz2iypcv5/ -- Pierre
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Mon, Mar 18, 2024 at 10:31 AM Fabio Estevam wrote: > I tried dumping the page table entries, but could not notice anything > that looked suspicious. This looks suspicious: With 41e2787f5ec4 reverted: Checking if pte fits for virt=1c00 size=6400 blocksize=4000 Checking if pte fits for virt=1c00 size=6400 blocksize=20 Checking if pte fits for virt=1c20 size=63e0 blocksize=4000 Checking if pte fits for virt=1c20 size=63e0 blocksize=20 Checking if pte fits for virt=1c40 size=63c0 blocksize=4000 In U-Boot master: Checking if pte fits for virt=1c00 size=6400 blocksize=4000 Checking if pte fits for virt=1c00 size=2400 blocksize=20 Checking if pte fits for virt=1c20 size=23e0 blocksize=20 Checking if pte fits for virt=1c40 size=23c0 blocksize=20 The second entry has size=2400 instead of size=6400.
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Pierre, On Fri, Mar 15, 2024 at 12:13 PM Pierre-Clément Tosi wrote: > I had a quick look through your logs and notice that U-Boot master attempts to > map addresses in the high VA range e.g. > > Checking if pte fits for virt=e400 [...] > > while the logs that boot successfully only use the low VA range e.g. > > Checking if pte fits for virt=80193000 [...] > > Unless that has recently changed (since I last worked with U-Boot), U-Boot on > AArch64 only supports identity mappings and therefore was only taught how to > program TTBR0_ELx (i.e. is only able to map low virtual addresses). This means > that the code - with or without 41e2787f5ec4 - would be unable to map > addresses > such as 0xe400. Yes, I found it strange too. I may have done something wrong the last time I instrumented the code. I tried it again and no longer see these unexpected high virtual addresses. Please find the new logs here: https://pastebin.com/raw/qF3GbJry > Now, given that 41e2787f5ec4 only affects implementation details of add_map(), > I am surprised that reverting that commit changes the arguments received by > the > function such as virt. As a reminder, add_map() is exclusively used on > mem_map: > > for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) > add_map(_map[i]); > > > > > That's the only issue preventing colibri-imx8x from booting mainline U-Boot. > > If I read the U-Boot configs right i.e. > > - configs/colibri-imx8x_defconfig: CONFIG_ARCH_IMX8=y > - arch/arm/mach-imx/imx8/Makefile: obj-y += cpu.o > - arch/arm/mach-imx/imx8/cpu.c: struct mm_region *mem_map = imx8_mem_map; Correct, these are the relevant files for the i.MXQ8XP. > There is a possibility that your mem_map is getting modified by MACH-specific > code. In particular, enable_caches() seems to dynamically get the MMU mappings > from some RPC mechanism (see get_owned_memreg() and sc_rm_get_memreg_info()). > > Could it be that whatever services those requests might be returning > unexpected > values? Instrumenting arch/arm/mach-imx/imx8/cpu.c and dumping mem_map (and > the RPC messages) with/without the patch would help clear this up. I tried dumping the page table entries, but could not notice anything that looked suspicious. Please let me know if you have any suggestions. Thanks
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Fabio, On Fri, Mar 15, 2024 at 08:56:17AM -0300, Fabio Estevam wrote: > Hi Marc, > > On Sat, Mar 9, 2024 at 11:36 AM Fabio Estevam wrote: > > > Does the log below help? > > > > https://pastebin.com/raw/1i1VBA0a > > > > If not, please send me a debug patch and I will be glad to run it here. > > I'm sorry to bother you, but have you had a chance to look at the log > I shared with you? I had a quick look through your logs and notice that U-Boot master attempts to map addresses in the high VA range e.g. Checking if pte fits for virt=e400 [...] while the logs that boot successfully only use the low VA range e.g. Checking if pte fits for virt=80193000 [...] Unless that has recently changed (since I last worked with U-Boot), U-Boot on AArch64 only supports identity mappings and therefore was only taught how to program TTBR0_ELx (i.e. is only able to map low virtual addresses). This means that the code - with or without 41e2787f5ec4 - would be unable to map addresses such as 0xe400. Now, given that 41e2787f5ec4 only affects implementation details of add_map(), I am surprised that reverting that commit changes the arguments received by the function such as virt. As a reminder, add_map() is exclusively used on mem_map: for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) add_map(_map[i]); > > That's the only issue preventing colibri-imx8x from booting mainline U-Boot. If I read the U-Boot configs right i.e. - configs/colibri-imx8x_defconfig: CONFIG_ARCH_IMX8=y - arch/arm/mach-imx/imx8/Makefile: obj-y += cpu.o - arch/arm/mach-imx/imx8/cpu.c: struct mm_region *mem_map = imx8_mem_map; There is a possibility that your mem_map is getting modified by MACH-specific code. In particular, enable_caches() seems to dynamically get the MMU mappings from some RPC mechanism (see get_owned_memreg() and sc_rm_get_memreg_info()). Could it be that whatever services those requests might be returning unexpected values? Instrumenting arch/arm/mach-imx/imx8/cpu.c and dumping mem_map (and the RPC messages) with/without the patch would help clear this up. HTH, > > We would like to get this fixed for U-Boot 2024.04. > > Thanks for your help -- Pierre
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Marc, On Sat, Mar 9, 2024 at 11:36 AM Fabio Estevam wrote: > Does the log below help? > > https://pastebin.com/raw/1i1VBA0a > > If not, please send me a debug patch and I will be glad to run it here. I'm sorry to bother you, but have you had a chance to look at the log I shared with you? That's the only issue preventing colibri-imx8x from booting mainline U-Boot. We would like to get this fixed for U-Boot 2024.04. Thanks for your help
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Sat, Mar 9, 2024 at 9:39 AM Marc Zyngier wrote: > Can you figure out what memory access is triggering it? Even at > narrowing it to the subsystem level would be a good indication. The problem happens so early that I am not able to narrow it down at subsystem level. > You could just dump the entries as they are written. The order may not > be the same, but for a given VA you should observe the same entries > being written. Does the log below help? https://pastebin.com/raw/1i1VBA0a If not, please send me a debug patch and I will be glad to run it here. Thanks
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Sat, 09 Mar 2024 12:29:10 +, Fabio Estevam wrote: > > Hi Marc, > > On Sat, Mar 9, 2024 at 6:53 AM Marc Zyngier wrote: > > > It would be good to narrow down which access is generating this. It is > > an asynchronous error, so the code above won't help. > > > > Alternatively, and if you are sure that this is due to this change, > > dumping the page tables and comparing them before and after would > > help. > > Yes, I am sure the error is due to this change. It is 100% reproducible. Can you figure out what memory access is triggering it? Even at narrowing it to the subsystem level would be a good indication. > I am not familiar with this part of the code, so I would appreciate > it if you could tell me how to dump the page tables so I can compare > them before and after. You could just dump the entries as they are written. The order may not be the same, but for a given VA you should observe the same entries being written. My hunch is that the new code is a lot more picky about the alignment of things, and that could result in something being similarly unaligned. But without access to the platform nor an idea of what gets mapped, it's a bit hard to have a clue. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Marc, On Sat, Mar 9, 2024 at 6:53 AM Marc Zyngier wrote: > It would be good to narrow down which access is generating this. It is > an asynchronous error, so the code above won't help. > > Alternatively, and if you are sure that this is due to this change, > dumping the page tables and comparing them before and after would > help. Yes, I am sure the error is due to this change. It is 100% reproducible. I am not familiar with this part of the code, so I would appreciate it if you could tell me how to dump the page tables so I can compare them before and after. Thanks, Fabio Estevam
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Fri, 08 Mar 2024 20:22:53 +, Fabio Estevam wrote: > > Hi Paul and Tom, > > On Tue, Feb 14, 2023 at 10:38 AM Ying-Chun Liu (PaulLiu) > wrote: > > > > From: Marc Zyngier > > > > In the add_map() function, for each level it populates, it iterates from > > the root of the PT tree, making it ineficient if a mapping needs to occur > > past level 1. > > > > Instead, replace it with a recursive (and much simpler) algorithm > > that keeps the complexity as low as possible. With this, mapping > > 512GB at level 2 goes from several seconds down to not measurable > > on an A55 machine. > > > > We keep the block mappings at level 1 for now though. > > > > Signed-off-by: Marc Zyngier > > Signed-off-by: Pierre-Clément Tosi > > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > > upstream ] > > Signed-off-by: Ying-Chun Liu (PaulLiu) > > Cc: Tom Rini > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 > > I know this is an old thread, but this commit causes the following > boot regression on a colibri-imx8qxp board: > > U-Boot 2024.04-rc3-00070-gecc9298a893b (Mar 08 2024 - 17:15:31 -0300) > > CPU: NXP i.MX8QXP RevC A35 at 1200 MHz at 51C > > DRAM: 2 GiB > "Error" handler, esr 0xbf00 SError. Not good. > elr: 80020914 lr : 800209c0 (reloc) > elr: ffec4914 lr : ffec49c0 > x0 : 006070800401 x1 : 7000 > x2 : 1000 x3 : 0002 > x4 : 4000 x5 : 00600401 > x6 : 0c00 x7 : fff45140 > x8 : 0060 x9 : fff45100 > x10: 0a200023 x11: 0002 > x12: 0002 x13: 800a10e8 > x14: x15: ffec4cb8 > x16: 80056a88 x17: > x18: fd6c1d70 x19: 0f60 > x20: x21: 00600401 > x22: 70a0 x23: 0020 > x24: 4c28 x25: 001f > x26: 0003 x27: 70a0 > x28: 0002 x29: fd6bbfd0 > > Code: 1100047a a90573fb aa0103fb 2a0303fc (b5000113) > Resetting CPU ... > > resetting ... > > Reverting this commit on top of master fixes the boot regression. > > Any ideas? It would be good to narrow down which access is generating this. It is an asynchronous error, so the code above won't help. Alternatively, and if you are sure that this is due to this change, dumping the page tables and comparing them before and after would help. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
Hi Paul and Tom, On Tue, Feb 14, 2023 at 10:38 AM Ying-Chun Liu (PaulLiu) wrote: > > From: Marc Zyngier > > In the add_map() function, for each level it populates, it iterates from > the root of the PT tree, making it ineficient if a mapping needs to occur > past level 1. > > Instead, replace it with a recursive (and much simpler) algorithm > that keeps the complexity as low as possible. With this, mapping > 512GB at level 2 goes from several seconds down to not measurable > on an A55 machine. > > We keep the block mappings at level 1 for now though. > > Signed-off-by: Marc Zyngier > Signed-off-by: Pierre-Clément Tosi > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > upstream ] > Signed-off-by: Ying-Chun Liu (PaulLiu) > Cc: Tom Rini > Link: > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > Link: > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 I know this is an old thread, but this commit causes the following boot regression on a colibri-imx8qxp board: U-Boot 2024.04-rc3-00070-gecc9298a893b (Mar 08 2024 - 17:15:31 -0300) CPU: NXP i.MX8QXP RevC A35 at 1200 MHz at 51C DRAM: 2 GiB "Error" handler, esr 0xbf00 elr: 80020914 lr : 800209c0 (reloc) elr: ffec4914 lr : ffec49c0 x0 : 006070800401 x1 : 7000 x2 : 1000 x3 : 0002 x4 : 4000 x5 : 00600401 x6 : 0c00 x7 : fff45140 x8 : 0060 x9 : fff45100 x10: 0a200023 x11: 0002 x12: 0002 x13: 800a10e8 x14: x15: ffec4cb8 x16: 80056a88 x17: x18: fd6c1d70 x19: 0f60 x20: x21: 00600401 x22: 70a0 x23: 0020 x24: 4c28 x25: 001f x26: 0003 x27: 70a0 x28: 0002 x29: fd6bbfd0 Code: 1100047a a90573fb aa0103fb 2a0303fc (b5000113) Resetting CPU ... resetting ... Reverting this commit on top of master fixes the boot regression. Any ideas? Thanks
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Tue, 01 Aug 2023 09:53:52 +0100, Oliver Graute wrote: > > On 14/02/23, Ying-Chun Liu (PaulLiu) wrote: > > From: Marc Zyngier > > > > In the add_map() function, for each level it populates, it iterates from > > the root of the PT tree, making it ineficient if a mapping needs to occur > > past level 1. > > > > Instead, replace it with a recursive (and much simpler) algorithm > > that keeps the complexity as low as possible. With this, mapping > > 512GB at level 2 goes from several seconds down to not measurable > > on an A55 machine. > > > > We keep the block mappings at level 1 for now though. > > > > Signed-off-by: Marc Zyngier > > Signed-off-by: Pierre-Clément Tosi > > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > > upstream ] > > Signed-off-by: Ying-Chun Liu (PaulLiu) > > Cc: Tom Rini > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 > > Hello Marc, > > this patch somehow broke the boot of my imx8qm board. I run into this > issue: > > U-Boot 2023.07-rc6-4-g5527698481 (Aug 01 2023 - 10:10:52 +0200) > > Model: Advantech iMX8QM DMSSE20 > Board: DMS-SE20A1 8GB > Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363 > Boot: USB > DRAM: 8 GiB > "Error" handler, esr 0xbf00 SError. > elr: 80020938 lr : 800209c8 (reloc) > elr: ffecf938 lr : ffecf9c8 > x0 : 0001 x1 : 6000 > x2 : 1000 x3 : 0002 > x4 : 4000 x5 : 00600401 > x6 : 0800 x7 : fff44a60 > x8 : 00680481 x9 : 0008 > x10: 0a200023 x11: 0002 > x12: 0002 x13: 80095a00 > x14: x15: ffecfd2c > x16: 8005454c x17: > x18: fd6afd50 x19: 0fe0 > x20: x21: 00600401 > x22: 6020 x23: 0020 > x24: 4808 x25: 001f > x26: 0003 x27: 6020 > x28: 0002 x29: fd6ab110 > > Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334) > Resetting CPU ... > > resetting ... > SCI reboot > request > > After some bisecting this patch poped up: > > 41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit > arm64: Reduce add_map() complexity > > After I reverted everything on top of this patch its booting again with > v2023.07 > > commit c1da6fdb5c239b432440721772d993e63cfdeb20 > armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present > > commit 836b8d4b205d2175b57cb9ef271e638b0c116e89 > arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present > > commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213 > arm64: Use FEAT_HAFDBS to track dirty pages when available > > > Do you have any idea whats going on here? Not really. I can only tell you that your HW has generated a SError, which is usually pretty fatal. There could be all sort of reasons why this happen, such as mapping a piece of address space that you are not supposed to access. If the previous incarnation of the PT code was behaving in a better way, the way to debug this is to dump both sets of page tables and compare what they actually map, figuring out where things go wrong. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On 01/08/23, Oliver Graute wrote: > On 14/02/23, Ying-Chun Liu (PaulLiu) wrote: > > From: Marc Zyngier > > > > In the add_map() function, for each level it populates, it iterates from > > the root of the PT tree, making it ineficient if a mapping needs to occur > > past level 1. > > > > Instead, replace it with a recursive (and much simpler) algorithm > > that keeps the complexity as low as possible. With this, mapping > > 512GB at level 2 goes from several seconds down to not measurable > > on an A55 machine. > > > > We keep the block mappings at level 1 for now though. > > > > Signed-off-by: Marc Zyngier > > Signed-off-by: Pierre-Clément Tosi > > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > > upstream ] > > Signed-off-by: Ying-Chun Liu (PaulLiu) > > Cc: Tom Rini > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > > Link: > > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 > > Hello Marc, > > this patch somehow broke the boot of my imx8qm board. I run into this > issue: > > U-Boot 2023.07-rc6-4-g5527698481 (Aug 01 2023 - 10:10:52 +0200) > > Model: Advantech iMX8QM DMSSE20 > Board: DMS-SE20A1 8GB > Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363 > Boot: USB > DRAM: 8 GiB > "Error" handler, esr 0xbf00 > elr: 80020938 lr : 800209c8 (reloc) > elr: ffecf938 lr : ffecf9c8 > x0 : 0001 x1 : 6000 > x2 : 1000 x3 : 0002 > x4 : 4000 x5 : 00600401 > x6 : 0800 x7 : fff44a60 > x8 : 00680481 x9 : 0008 > x10: 0a200023 x11: 0002 > x12: 0002 x13: 80095a00 > x14: x15: ffecfd2c > x16: 8005454c x17: > x18: fd6afd50 x19: 0fe0 > x20: x21: 00600401 > x22: 6020 x23: 0020 > x24: 4808 x25: 001f > x26: 0003 x27: 6020 > x28: 0002 x29: fd6ab110 > > Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334) > Resetting CPU ... > > resetting ... > SCI reboot > request > > After some bisecting this patch poped up: > > 41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit > arm64: Reduce add_map() complexity > > After I reverted everything on top of this patch its booting again with > v2023.07 > > commit c1da6fdb5c239b432440721772d993e63cfdeb20 > armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present > > commit 836b8d4b205d2175b57cb9ef271e638b0c116e89 > arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present > > commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213 > arm64: Use FEAT_HAFDBS to track dirty pages when available > > > Do you have any idea whats going on here? Is this behavior somehow releated to the known Cache coherency issue on A53 Core on NXP imx8qm? https://lore.kernel.org/linux-arm-kernel/ZDflS%2FCnEx8iCspk@FVFF77S0Q05N/T/#mf733406e618244b0b21fd25077febd69b31b686e +Jason Liu +Peng Fan Best Regards, Oliver
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On 14/02/23, Ying-Chun Liu (PaulLiu) wrote: > From: Marc Zyngier > > In the add_map() function, for each level it populates, it iterates from > the root of the PT tree, making it ineficient if a mapping needs to occur > past level 1. > > Instead, replace it with a recursive (and much simpler) algorithm > that keeps the complexity as low as possible. With this, mapping > 512GB at level 2 goes from several seconds down to not measurable > on an A55 machine. > > We keep the block mappings at level 1 for now though. > > Signed-off-by: Marc Zyngier > Signed-off-by: Pierre-Clément Tosi > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > upstream ] > Signed-off-by: Ying-Chun Liu (PaulLiu) > Cc: Tom Rini > Link: > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > Link: > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 Hello Marc, this patch somehow broke the boot of my imx8qm board. I run into this issue: U-Boot 2023.07-rc6-4-g5527698481 (Aug 01 2023 - 10:10:52 +0200) Model: Advantech iMX8QM DMSSE20 Board: DMS-SE20A1 8GB Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363 Boot: USB DRAM: 8 GiB "Error" handler, esr 0xbf00 elr: 80020938 lr : 800209c8 (reloc) elr: ffecf938 lr : ffecf9c8 x0 : 0001 x1 : 6000 x2 : 1000 x3 : 0002 x4 : 4000 x5 : 00600401 x6 : 0800 x7 : fff44a60 x8 : 00680481 x9 : 0008 x10: 0a200023 x11: 0002 x12: 0002 x13: 80095a00 x14: x15: ffecfd2c x16: 8005454c x17: x18: fd6afd50 x19: 0fe0 x20: x21: 00600401 x22: 6020 x23: 0020 x24: 4808 x25: 001f x26: 0003 x27: 6020 x28: 0002 x29: fd6ab110 Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334) Resetting CPU ... resetting ... SCI reboot request After some bisecting this patch poped up: 41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit arm64: Reduce add_map() complexity After I reverted everything on top of this patch its booting again with v2023.07 commit c1da6fdb5c239b432440721772d993e63cfdeb20 armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present commit 836b8d4b205d2175b57cb9ef271e638b0c116e89 arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213 arm64: Use FEAT_HAFDBS to track dirty pages when available Do you have any idea whats going on here? Best Regards, Oliver
Re: [PATCH 1/2] arm64: Reduce add_map() complexity
On Tue, Feb 14, 2023 at 09:38:13PM +0800, Ying-Chun Liu (PaulLiu) wrote: > From: Marc Zyngier > > In the add_map() function, for each level it populates, it iterates from > the root of the PT tree, making it ineficient if a mapping needs to occur > past level 1. > > Instead, replace it with a recursive (and much simpler) algorithm > that keeps the complexity as low as possible. With this, mapping > 512GB at level 2 goes from several seconds down to not measurable > on an A55 machine. > > We keep the block mappings at level 1 for now though. > > Signed-off-by: Marc Zyngier > Signed-off-by: Pierre-Clément Tosi > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the > upstream ] > Signed-off-by: Ying-Chun Liu (PaulLiu) > Cc: Tom Rini > Link: > https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 > Link: > https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 Applied to u-boot/next, thanks! -- Tom signature.asc Description: PGP signature
[PATCH 1/2] arm64: Reduce add_map() complexity
From: Marc Zyngier In the add_map() function, for each level it populates, it iterates from the root of the PT tree, making it ineficient if a mapping needs to occur past level 1. Instead, replace it with a recursive (and much simpler) algorithm that keeps the complexity as low as possible. With this, mapping 512GB at level 2 goes from several seconds down to not measurable on an A55 machine. We keep the block mappings at level 1 for now though. Signed-off-by: Marc Zyngier Signed-off-by: Pierre-Clément Tosi [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the upstream ] Signed-off-by: Ying-Chun Liu (PaulLiu) Cc: Tom Rini Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6 Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059 --- arch/arm/cpu/armv8/cache_v8.c | 94 +-- 1 file changed, 46 insertions(+), 48 deletions(-) diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c index f333ad8889..876344e1b4 100644 --- a/arch/arm/cpu/armv8/cache_v8.c +++ b/arch/arm/cpu/armv8/cache_v8.c @@ -299,61 +299,59 @@ static void split_block(u64 *pte, int level) set_pte_table(pte, new_table); } -/* Add one mm_region map entry to the page tables */ -static void add_map(struct mm_region *map) +static void map_range(u64 virt, u64 phys, u64 size, int level, + u64 *table, u64 attrs) { - u64 *pte; - u64 virt = map->virt; - u64 phys = map->phys; - u64 size = map->size; - u64 attrs = map->attrs | PTE_TYPE_BLOCK | PTE_BLOCK_AF; - u64 blocksize; - int level; - u64 *new_table; + u64 map_size = BIT_ULL(level2shift(level)); + int i, idx; - while (size) { - pte = find_pte(virt, 0); - if (pte && (pte_type(pte) == PTE_TYPE_FAULT)) { - debug("Creating table for virt 0x%llx\n", virt); - new_table = create_table(); - set_pte_table(pte, new_table); - } + idx = (virt >> level2shift(level)) & (MAX_PTE_ENTRIES - 1); + for (i = idx; size; i++) { + u64 next_size, *next_table; - for (level = 1; level < 4; level++) { - pte = find_pte(virt, level); - if (!pte) - panic("pte not found\n"); - - blocksize = 1ULL << level2shift(level); - debug("Checking if pte fits for virt=%llx size=%llx blocksize=%llx\n", - virt, size, blocksize); - if (size >= blocksize && !(virt & (blocksize - 1))) { - /* Page fits, create block PTE */ - debug("Setting PTE %p to block virt=%llx\n", - pte, virt); - if (level == 3) - *pte = phys | attrs | PTE_TYPE_PAGE; - else - *pte = phys | attrs; - virt += blocksize; - phys += blocksize; - size -= blocksize; - break; - } else if (pte_type(pte) == PTE_TYPE_FAULT) { - /* Page doesn't fit, create subpages */ - debug("Creating subtable for virt 0x%llx blksize=%llx\n", - virt, blocksize); - new_table = create_table(); - set_pte_table(pte, new_table); - } else if (pte_type(pte) == PTE_TYPE_BLOCK) { - debug("Split block into subtable for virt 0x%llx blksize=0x%llx\n", - virt, blocksize); - split_block(pte, level); - } + if (level >= 1 && + size >= map_size && !(virt & (map_size - 1))) { + if (level == 3) + table[i] = phys | attrs | PTE_TYPE_PAGE; + else + table[i] = phys | attrs; + + virt += map_size; + phys += map_size; + size -= map_size; + + continue; } + + /* Going one level down */ + if (pte_type([i]) == PTE_TYPE_FAULT) + set_pte_table([i], create_table()); + + next_table = (u64 *)(table[i] & GENMASK_ULL(47, PAGE_SHIFT)); + next_size = min(map_size - (virt & (map_size - 1)), size); + + map_range(virt, phys, next_size, level + 1, next_table, attrs); +