Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-06 Thread Richard Henderson

On 3/5/24 21:38, maobibo wrote:

Sorry, manual is updated already and we do not notice that still.

https://www.loongson.cn/uploads/images/2023102309132647981.%E9%BE%99%E8%8A%AF%E6%9E%B6%E6%9E%84%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C%E5%8D%B7%E4%B8%80_r1p10.pdf

It is Chinese web link, English manual is not updated. Here is English translation by 
manual with instruction  "lddir rd, rj, level"


If the bit[14:13] of the register rj is not equal to 0 and its bit[6] is 1, the value of 
the register rj is a marked as HugePage page entries. In this case, the value from 
register rj is written directly to register rd.


If the bit[14:13] of the register rj is equal to 0 and its bit[6] is 1, the value of the 
register rj is an Hugepage table entry. In this case, replace the bit[14:13] of the 
register RJ value with level[1:0], the val is written to the register rd.


If the bit[6] bit of register rj is 0, the value of the universal register rj is the page 
table entry, it is  physical address base page table. In this case, if the LDDIR command 
is executed, the address will be refilled according to the TLB currently processed. 
Retrieve the base address of the next-level page table and write it to the common register 
rd.


We will remove temporary lddir_ps, and record page size with bit[14:13] in next 
version.


Excellent, thank you for that translation.


r~



Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-05 Thread maobibo

Sorry, manual is updated already and we do not notice that still.

https://www.loongson.cn/uploads/images/2023102309132647981.%E9%BE%99%E8%8A%AF%E6%9E%B6%E6%9E%84%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C%E5%8D%B7%E4%B8%80_r1p10.pdf

It is Chinese web link, English manual is not updated. Here is English 
translation by manual with instruction  "lddir rd, rj, level"


If the bit[14:13] of the register rj is not equal to 0 and its bit[6] 
is 1, the value of the register rj is a marked as HugePage page entries. 
In this case, the value from register rj is written directly to register rd.


If the bit[14:13] of the register rj is equal to 0 and its bit[6] is 1, 
the value of the register rj is an Hugepage table entry. In this case, 
replace the bit[14:13] of the register RJ value with level[1:0], the val 
is written to the register rd.


If the bit[6] bit of register rj is 0, the value of the universal 
register rj is the page table entry, it is  physical address base page 
table. In this case, if the LDDIR command is executed, the address will 
be refilled according to the TLB currently processed. Retrieve the base 
address of the next-level page table and write it to the common register 
rd.


We will remove temporary lddir_ps, and record page size with bit[14:13] 
in next version.


Regards
Bibo Mao

On 2024/3/6 下午12:10, Richard Henderson wrote:

On 3/5/24 17:52, lixianglai wrote:
The LDDIR_PS variable is not described in detail in the manual, but is 
only an intermediate variable to assist in page size calculation 
during tcg simulation.


This is exactly why I believe adding this intermediate variable is wrong.

What happens if LDPTE is *not* preceded by LDDIR?  It's not the usual 
way a tlb fill routine works, but *something* should happen if you 
construct a valid huge page tlb entry by hand and pass it directly to 
LDPTE.


With your implementation, this will not work because lddir_ps will not 
be initialized. But I expect that on real hardware it would work.


If this does not work on real hardware, then there *is* some heretofore 
undocumented hardware state.  If so, then we need a description of this 
state from the hardware engineers -- the documentation of LDDIR and 
LDPTE need updating.  Finally, this new hardware state needs to be added 
to the migration state.



r~





Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-05 Thread Richard Henderson

On 3/5/24 17:52, lixianglai wrote:
The LDDIR_PS variable is not described in detail in the manual, but is only an 
intermediate variable to assist in page size calculation during tcg simulation.


This is exactly why I believe adding this intermediate variable is wrong.

What happens if LDPTE is *not* preceded by LDDIR?  It's not the usual way a tlb fill 
routine works, but *something* should happen if you construct a valid huge page tlb entry 
by hand and pass it directly to LDPTE.


With your implementation, this will not work because lddir_ps will not be initialized. 
But I expect that on real hardware it would work.


If this does not work on real hardware, then there *is* some heretofore undocumented 
hardware state.  If so, then we need a description of this state from the hardware 
engineers -- the documentation of LDDIR and LDPTE need updating.  Finally, this new 
hardware state needs to be added to the migration state.



r~



Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-05 Thread lixianglai

Hi Richard :

On 3/4/24 20:21, lixianglai wrote:

Hi Richard:

On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create 
the page table,
it is found that the content of the corresponding address space is 
abnormal,
resulting in the bios can not start the operating system and 
graphical interface normally.


The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h    |  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
    uint32_t cpucfg[21];
+    uint32_t lddir_ps;


This magical cpu state does not appear in the manual.


The hardware instruction manual is hosted on github at

https://github.com/loongson/LoongArch-Documentation

Are you sure that large pages above level 2 are really supported by 
LDDIR?



Yes,We have done tests on the physical cpu of loongarch64 and

it works fine with a level 2 large page on the physical cpu.




Some explanation from the hardware engineering side is required.


The description of lddir hardware manual is as follows:


Instruction formats:

|lddir rd, rj, level|

The|LDDIR|instruction is used for accessing directory entries during 
software page table walking.


If bit|[6]|of the general register|rj|is|0|, it means that the 
content of|rj|is the physical address of the


base address of the level page table at this time. In this case, 
the|LDDIR|instruction will access the level


page table according to the current TLB refill address, retrieve the 
base address of the corresponding


|level+1|page table, and write it to the general register|rd|.


reference:

https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html 




  4.2.5.1.|LDDIR|


Yes, I have this manual.  Please highlight the portion of this 
description that corresponds to the LDDIR_PS variable that you add.




Sorry, I don't think I quite understand what you mean.

Do you mean that you want me to point out the detailed description of 
LDDIR_PS in the manual


or suggest that I add a corresponding comment to the LDDIR_PS variable 
in the patch?



I think the description I quoted is missing a key part:

If bit|[6]|of general register|rj|is|1|, it means that the content 
in|rj|is a large page (Huge Page) page table entry. In this case, after 
executing the|LDDRI|instruction, the value in the general 
register|rj|will be written directly to the general register|rd|.


The LDDIR_PS variable is not described in detail in the manual, but is 
only an intermediate variable to assist in page size calculation during 
tcg simulation.


However, in section 5.4.2 TLB, we learned that TLB is divided into STLB 
and MTLB.


The PS field in MTLB has the same meaning as the variable LDDIR_PS we 
defined.


Since TLB is divided into parity entries, Therefore, when the TLB is 
generated,


the size of each parity entry becomes half, that is, LDDIR_PS-1.

It should be noted here that all large-page tlb entries will be placed 
in the MTLB, because the PS field is only meaningful in the MTLB.



Thanks,

Xianglai.






r~


Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-05 Thread Richard Henderson

On 3/4/24 20:21, lixianglai wrote:

Hi Richard:

On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h    |  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
    uint32_t cpucfg[21];
+    uint32_t lddir_ps;


This magical cpu state does not appear in the manual.


The hardware instruction manual is hosted on github at

https://github.com/loongson/LoongArch-Documentation


Are you sure that large pages above level 2 are really supported by LDDIR?



Yes,We have done tests on the physical cpu of loongarch64 and

it works fine with a level 2 large page on the physical cpu.




Some explanation from the hardware engineering side is required.


The description of lddir hardware manual is as follows:


Instruction formats:

|lddir rd, rj, level|

The|LDDIR|instruction is used for accessing directory entries during software page table 
walking.


If bit|[6]|of the general register|rj|is|0|, it means that the content of|rj|is the 
physical address of the


base address of the level page table at this time. In this case, the|LDDIR|instruction 
will access the level


page table according to the current TLB refill address, retrieve the base address of the 
corresponding


|level+1|page table, and write it to the general register|rd|.


reference:

https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html


  4.2.5.1.|LDDIR|


Yes, I have this manual.  Please highlight the portion of this description that 
corresponds to the LDDIR_PS variable that you add.



r~



Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread lixianglai

Hi Richard:

On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create 
the page table,
it is found that the content of the corresponding address space is 
abnormal,
resulting in the bios can not start the operating system and 
graphical interface normally.


The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h    |  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
    uint32_t cpucfg[21];
+    uint32_t lddir_ps;


This magical cpu state does not appear in the manual.


The hardware instruction manual is hosted on github at

https://github.com/loongson/LoongArch-Documentation

Are you sure that large pages above level 2 are really supported by 
LDDIR?



Yes,We have done tests on the physical cpu of loongarch64 and

it works fine with a level 2 large page on the physical cpu.




Some explanation from the hardware engineering side is required.


The description of lddir hardware manual is as follows:


Instruction formats:

|lddir rd, rj, level|

The|LDDIR|instruction is used for accessing directory entries during 
software page table walking.


If bit|[6]|of the general register|rj|is|0|, it means that the content 
of|rj|is the physical address of the


base address of the level page table at this time. In this case, 
the|LDDIR|instruction will access the level


page table according to the current TLB refill address, retrieve the 
base address of the corresponding


|level+1|page table, and write it to the general register|rd|.


reference:

https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html


 4.2.5.1.|LDDIR|

Thanks,

Xianglai.




r~


Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread Richard Henderson

On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h|  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
  
  uint32_t cpucfg[21];

+uint32_t lddir_ps;


This magical cpu state does not appear in the manual.
Are you sure that large pages above level 2 are really supported by LDDIR?

Some explanation from the hardware engineering side is required.


r~



[PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread Xianglai Li
When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
 target/loongarch/cpu.h|  1 +
 target/loongarch/tcg/tlb_helper.c | 21 -
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
 uint32_t fcsr0_mask;
 
 uint32_t cpucfg[21];
+uint32_t lddir_ps;
 
 uint64_t lladdr; /* LL virtual address compared against SC */
 uint64_t llval;
diff --git a/target/loongarch/tcg/tlb_helper.c 
b/target/loongarch/tcg/tlb_helper.c
index a08c08b05a..3594c800b3 100644
--- a/target/loongarch/tcg/tlb_helper.c
+++ b/target/loongarch/tcg/tlb_helper.c
@@ -38,6 +38,7 @@ static void raise_mmu_exception(CPULoongArchState *env, 
target_ulong address,
 cs->exception_index = EXCCODE_PIF;
 }
 env->CSR_TLBRERA = FIELD_DP64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR, 
1);
+env->lddir_ps = 0;
 break;
 case TLBRET_INVALID:
 /* TLB match with no valid bit */
@@ -488,13 +489,6 @@ target_ulong helper_lddir(CPULoongArchState *env, 
target_ulong base,
 uint64_t dir_base, dir_width;
 bool huge = (base >> LOONGARCH_PAGE_HUGE_SHIFT) & 0x1;
 
-badvaddr = env->CSR_TLBRBADV;
-base = base & TARGET_PHYS_MASK;
-
-/* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
-shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
-shift = (shift + 1) * 3;
-
 if (huge) {
 return base;
 }
@@ -519,9 +513,18 @@ target_ulong helper_lddir(CPULoongArchState *env, 
target_ulong base,
 do_raise_exception(env, EXCCODE_INE, GETPC());
 return 0;
 }
+
+/* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
+shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
+shift = (shift + 1) * 3;
+badvaddr = env->CSR_TLBRBADV;
+base = base & TARGET_PHYS_MASK;
 index = (badvaddr >> dir_base) & ((1 << dir_width) - 1);
 phys = base | index << shift;
 ret = ldq_phys(cs->as, phys) & TARGET_PHYS_MASK;
+if (ret & BIT_ULL(LOONGARCH_PAGE_HUGE_SHIFT)) {
+env->lddir_ps = dir_base;
+}
 return ret;
 }
 
@@ -538,13 +541,13 @@ void helper_ldpte(CPULoongArchState *env, target_ulong 
base, target_ulong odd,
 base = base & TARGET_PHYS_MASK;
 
 if (huge) {
-/* Huge Page. base is paddr */
 tmp0 = base ^ (1 << LOONGARCH_PAGE_HUGE_SHIFT);
 /* Move Global bit */
 tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT))  >>
 LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT |
 (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT)));
-ps = ptbase + ptwidth - 1;
+
+ps = env->lddir_ps - 1;
 if (odd) {
 tmp0 += MAKE_64BIT_MASK(ps, 1);
 }
-- 
2.39.1