Re: [RESEND PATCH v1] arch: riscv: jh7110: Correctly zero L2 LIM

2023-06-11 Thread Rick Chen
> From: Bo Gan 
> Sent: Monday, June 12, 2023 7:59 AM
> To: u-boot@lists.denx.de
> Cc: Bo Gan ; samin . guo ; 
> Yanhong Wang ; Rick Jian-Zhi Chen(陳建志) 
> ; Leo Yu-Chi Liang(梁育齊) ; Sean 
> Anderson ; Lukasz Majewski 
> Subject: [RESEND PATCH v1] arch: riscv: jh7110: Correctly zero L2 LIM
>
> Background information:
>  JH7110 SPL runs in L2 LIM (2M in size mapped at 0x800). It  consists of 
> 16 0x2 sized regions, each one can be used as  either L2 cache way or 
> SRAM (not both). From top to bottom, there're  ways 0-15. The way 0 is always 
> enabled, at most 0x1e can be used.
>
> In SPL, we don't enable any cache ways, thus all 15 (except way 0) ways can 
> be used. However, due to HW requirement, we must zero the LIM before use. 
> This is because ECC is applied to LIM, and if not cleared first, the ECC part 
> is invalid, which can trigger ECC errors when reading/writing data.
>
> There are several issues currently. We clear L2 LIM from __bss_end to 
> 0x81F in `harts_early_init`. This is wrong because:
>
>  a. Each hart (in the middle of a function call) overwriting its own
> stack and other harts' stacks.
> (data-race and data-corruption)
>  b. Lottery winner hart can be doing `board_init_f_init_reserve`,
> while other harts're in the middle of zeroing L2 LIM.
> (data-race)
>
> To fix this, we split the job, such that there's one and only one owner of 
> zeroing a specific region (No data-race). A new SPL config option 
> `SPL_ZERO_MEM_BEFORE_USE` is introduced. Allowing The zeroing to happen in 
> the same code path. (much easier to reason about).
>
> Another option `SPL_SYS_MALLOC_CLEAR_ON_INIT` also gets introduced.
> It allows us to also zero late malloc (dlmalloc) area, in case it gets 
> configured to be inside L2 LIM (via `CUSTOM_SYS_SPL_MALLOC_ADDR`) We by 
> default enable it to be on the safe side.
>
> `CONFIG_SPL_STACK` is adjusted to reduce the waste of L2 LIM space.
> When setting
>
> CONFIG_CUSTOM_SYS_SPL_MALLOC_ADDR=0x810
> CONFIG_SYS_SPL_MALLOC_SIZE=0xe
>
> A 0.875M late malloc arena is available in L2 LIM. It's sufficient for 
> loading JH7110 FIT Image (gzip compressed OpenSBI+UBOOT+DTB).
> The advantage of this config is allowing OpenSBI/UBOOT to be loaded at any 
> DDR memory address. By default the malloc arena is configured in DDR memory, 
> so OpenSBI/UBOOT loading address must not collide with it.
>
> CONFIG_CUSTOM_SYS_SPL_MALLOC_ADDR=0x8000 (DDR +1GB)
> CONFIG_SYS_SPL_MALLOC_SIZE=0x40
>
> JH7110 SPL memory map (based on the following defconfig):
>
>  CONFIG_NR_CPUS=8
>  CONFIG_STACK_SIZE_SHIFT=14
>  CONFIG_SPL_STACK=0x810
>  CONFIG_SPL_SYS_MALLOC_F_LEN=0x1
>  CONFIG_SPL_BSS_START_ADDR=0x804
>
>   ++ 0x81e
>   |  Free  |
>   ||
>   ++ 0x810
>   |  hart 0 stack  |   <--- cleared by each hart (start.S)
>   ++ 0x80fc000
>   |  hart 1 stack  |.
>   ++ 0x80f8000  .
>   |  ..|.
>   ++
>   |  hart N-1 stack|   <--- cleared by each hart (start.S)
>   ++ 0x80e
>   |  ..|   <--- cleared by lottery winner hart
>   |  malloc_base   |  board_init_f_init_reserve()
>   ++ 0x80d
>   |  GD|   <--- cleared by lottery winner hart
>   ++  board_init_f_init_reserve()
>   ||
>   |  hole  |
>   ||
>   ++
>   |  BSS   |   <--- cleared by lottery winner hart
>   ++ 0x804spl_clear_bss (start.S)
>   |  hole  |
>   ++
>   |  Image+DTB |   <--- Assuming cleared/loaded by ROM
>   ++ 0x800
>
> Signed-off-by: Bo Gan 
> Cc: samin . guo 
> Cc: Yanhong Wang 
> Cc: Rick Chen 
> Cc: Leo 
> Cc: Sean Anderson 
> Cc: Lukasz Majewski 
> ---
>
> v1:
> - patch is on top of 
> https://patchwork.ozlabs.org/project/uboot/patch/1684650044-313122-1-git-send-email-ganbo...@gmail.com/
> - Tested on VisionFive 2 board (4G)
> ---
>  Kconfig| 11 +++
>  arch/riscv/Kconfig |  7 +++
>  arch/riscv/cpu/jh7110/Kconfig  |  2 ++
>  arch/riscv/cpu/jh7110/spl.c| 25 -
>  arch/riscv/cpu/start.S | 12 
>  common/dlmalloc.c  |  6 +++---
>  common/init/board_init.c   |  3 +++
>  configs/starfive_visionfive2_defconfig |  3 ++-
>  8 files changed, 40 insertions(+), 29 deletions(-)

Please separate as several PATCHs, but not mix them as one PATCH

Thanks,
Rick


[RESEND PATCH v1] arch: riscv: jh7110: Correctly zero L2 LIM

2023-06-11 Thread Bo Gan
Background information:
 JH7110 SPL runs in L2 LIM (2M in size mapped at 0x800). It
 consists of 16 0x2 sized regions, each one can be used as
 either L2 cache way or SRAM (not both). From top to bottom, there're
 ways 0-15. The way 0 is always enabled, at most 0x1e can be used.

In SPL, we don't enable any cache ways, thus all 15 (except way 0)
ways can be used. However, due to HW requirement, we must zero the
LIM before use. This is because ECC is applied to LIM, and if not
cleared first, the ECC part is invalid, which can trigger ECC errors
when reading/writing data.

There are several issues currently. We clear L2 LIM from __bss_end
to 0x81F in `harts_early_init`. This is wrong because:

 a. Each hart (in the middle of a function call) overwriting its own
stack and other harts' stacks.
(data-race and data-corruption)
 b. Lottery winner hart can be doing `board_init_f_init_reserve`,
while other harts're in the middle of zeroing L2 LIM.
(data-race)

To fix this, we split the job, such that there's one and only one
owner of zeroing a specific region (No data-race). A new SPL config
option `SPL_ZERO_MEM_BEFORE_USE` is introduced. Allowing The zeroing
to happen in the same code path. (much easier to reason about).

Another option `SPL_SYS_MALLOC_CLEAR_ON_INIT` also gets introduced.
It allows us to also zero late malloc (dlmalloc) area, in case it gets
configured to be inside L2 LIM (via `CUSTOM_SYS_SPL_MALLOC_ADDR`)
We by default enable it to be on the safe side.

`CONFIG_SPL_STACK` is adjusted to reduce the waste of L2 LIM space.
When setting

CONFIG_CUSTOM_SYS_SPL_MALLOC_ADDR=0x810
CONFIG_SYS_SPL_MALLOC_SIZE=0xe

A 0.875M late malloc arena is available in L2 LIM. It's sufficient for
loading JH7110 FIT Image (gzip compressed OpenSBI+UBOOT+DTB).
The advantage of this config is allowing OpenSBI/UBOOT to be loaded at
any DDR memory address. By default the malloc arena is configured in
DDR memory, so OpenSBI/UBOOT loading address must not collide with it.

CONFIG_CUSTOM_SYS_SPL_MALLOC_ADDR=0x8000 (DDR +1GB)
CONFIG_SYS_SPL_MALLOC_SIZE=0x40

JH7110 SPL memory map (based on the following defconfig):

 CONFIG_NR_CPUS=8
 CONFIG_STACK_SIZE_SHIFT=14
 CONFIG_SPL_STACK=0x810
 CONFIG_SPL_SYS_MALLOC_F_LEN=0x1
 CONFIG_SPL_BSS_START_ADDR=0x804

  ++ 0x81e
  |  Free  |
  ||
  ++ 0x810
  |  hart 0 stack  |   <--- cleared by each hart (start.S)
  ++ 0x80fc000
  |  hart 1 stack  |.
  ++ 0x80f8000  .
  |  ..|.
  ++
  |  hart N-1 stack|   <--- cleared by each hart (start.S)
  ++ 0x80e
  |  ..|   <--- cleared by lottery winner hart
  |  malloc_base   |  board_init_f_init_reserve()
  ++ 0x80d
  |  GD|   <--- cleared by lottery winner hart
  ++  board_init_f_init_reserve()
  ||
  |  hole  |
  ||
  ++
  |  BSS   |   <--- cleared by lottery winner hart
  ++ 0x804spl_clear_bss (start.S)
  |  hole  |
  ++
  |  Image+DTB |   <--- Assuming cleared/loaded by ROM
  ++ 0x800

Signed-off-by: Bo Gan 
Cc: samin . guo 
Cc: Yanhong Wang 
Cc: Rick Chen 
Cc: Leo 
Cc: Sean Anderson 
Cc: Lukasz Majewski 
---

v1:
- patch is on top of 
https://patchwork.ozlabs.org/project/uboot/patch/1684650044-313122-1-git-send-email-ganbo...@gmail.com/
- Tested on VisionFive 2 board (4G)
---
 Kconfig| 11 +++
 arch/riscv/Kconfig |  7 +++
 arch/riscv/cpu/jh7110/Kconfig  |  2 ++
 arch/riscv/cpu/jh7110/spl.c| 25 -
 arch/riscv/cpu/start.S | 12 
 common/dlmalloc.c  |  6 +++---
 common/init/board_init.c   |  3 +++
 configs/starfive_visionfive2_defconfig |  3 ++-
 8 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/Kconfig b/Kconfig
index 70efb41..e5eec1b 100644
--- a/Kconfig
+++ b/Kconfig
@@ -372,6 +372,17 @@ if EXPERT
  When disabling this, please check if malloc calls, maybe
  should be replaced by calloc - if one expects zeroed memory.
 
+   config SPL_SYS_MALLOC_CLEAR_ON_INIT
+   bool "Init with zeros the memory reserved for malloc (slow) in SPL"
+   depends on SPL
+   default SYS_MALLOC_CLEAR_ON_INIT
+   help
+ Same as SYS_MALLOC_CLEAR_ON_INIT, but for SPL. It's possible to
+ Enable it without SYS_MALLOC_CLEAR_ON_INIT. It's useful for boards
+ that must have particular memory regions zero'ed before first use.
+ If SYS_SPL_MALLOC_START is configured to be in such region, this
+ option