Re: qemu riscv, thead c906, Linux boot regression

2024-01-25 Thread LIU Zhiwei



On 2024/1/24 20:49, Björn Töpel wrote:

Hi!

I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
("target/riscv/cpu.c: restrict 'marchid' value")

Reverting that commit, or the hack below solves the boot issue:

--8<--
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..e18596c8a55a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  cpu->cfg.ext_xtheadsync = true;
  
  cpu->cfg.mvendorid = THEAD_VENDOR_ID;

+cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
+(QEMU_VERSION_MINOR << 8)  |
+(QEMU_VERSION_MICRO));
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
  #endif
--8<--

I'm unsure what the correct qemu way of adding a default value is,
or if c906 should have a proper marchid.

Maybe Christoph or Zhiwei can answer?

qemu command-line:
qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \
-cpu thead-c906 ...


Hi  Björn,

I think it is caused by an mmu extension(named XTheadMaee) not 
implemented on QEMU which is conflicts with Svpbmt, which is the reason 
for error-ta in Linux.


I will try to fix this on QEMU and at the same time give a way to 
implement vendor custom CSR(XTheadMaee is controlled by an CSR named 
mexstatus)  on QEMU.


Thanks,
Zhiwei



Thanks,
Björn




Re: qemu riscv, thead c906, Linux boot regression

2024-01-25 Thread Björn Töpel
Daniel Henrique Barboza  writes:

> On 1/24/24 16:26, Björn Töpel wrote:
>> Daniel Henrique Barboza  writes:
>> 
>>> On 1/24/24 09:49, Björn Töpel wrote:
 Hi!

 I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
 thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
 ("target/riscv/cpu.c: restrict 'marchid' value")

 Reverting that commit, or the hack below solves the boot issue:

 --8<--
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index 8cbfc7e781ad..e18596c8a55a 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
 @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
cpu->cfg.ext_xtheadsync = true;

cpu->cfg.mvendorid = THEAD_VENDOR_ID;
 +cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
 +(QEMU_VERSION_MINOR << 8)  |
 +(QEMU_VERSION_MICRO));
#ifndef CONFIG_USER_ONLY
set_satp_mode_max_supported(cpu, VM_1_10_SV39);
#endif
 --8<--

 I'm unsure what the correct qemu way of adding a default value is,
 or if c906 should have a proper marchid.
>>>
>>> In case you need to set a 'marchid' different than zero for c906, this hack 
>>> would
>>> be a proper fix. As mentioned in the commit msg of the patch you mentioned:
>>>
>>> "Named CPUs should set 'marchid' to a meaningful value instead, and generic
>>>CPUs can set to any valid value."
>>>
>>> That means that any specific marchid value that the CPU uses must to be set
>>> in its own cpu_init() function.
>> 
>> Got it. Thanks, Daniel!
>> 
>> For completeness (since it came up on the weekly PW call); Conor pointed
>> out that zero *is* indeed the right marchid for c906, and in fact, the
>> non-zero marchid pre commit d6a427e2c0b2 was incorrect.
>> 
>> Post commit d6a427e2c0b2, the correct alternative is picked up, and
>> ERRATA_THEAD_PBMT (using non-standard memory type bits in
>> page-table-entries) kicks in. AFAIU, that's not implemented by qemu's
>> c906 support, which then traps.
>
>
> This looks like a very good reason to actually push what you called 'hack' as
> a fix. Yeah, in theory that commit did nothing wrong, but the side effect
> (missing support for non-standard memory type bits) is kind of a QEMU problem.

For me, it'd be weird to add the hack (setting marchid to non-zero).
Claiming that it's a "thead-c906 emulation" in qemu, but w/o the proper
page-bit support. That's just cpu rv64 plus some extra instructions --
not the c906.


Björn



Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Daniel Henrique Barboza




On 1/24/24 16:26, Björn Töpel wrote:

Daniel Henrique Barboza  writes:


On 1/24/24 09:49, Björn Töpel wrote:

Hi!

I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
("target/riscv/cpu.c: restrict 'marchid' value")

Reverting that commit, or the hack below solves the boot issue:

--8<--
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..e18596c8a55a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
   cpu->cfg.ext_xtheadsync = true;
   
   cpu->cfg.mvendorid = THEAD_VENDOR_ID;

+cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
+(QEMU_VERSION_MINOR << 8)  |
+(QEMU_VERSION_MICRO));
   #ifndef CONFIG_USER_ONLY
   set_satp_mode_max_supported(cpu, VM_1_10_SV39);
   #endif
--8<--

I'm unsure what the correct qemu way of adding a default value is,
or if c906 should have a proper marchid.


In case you need to set a 'marchid' different than zero for c906, this hack 
would
be a proper fix. As mentioned in the commit msg of the patch you mentioned:

"Named CPUs should set 'marchid' to a meaningful value instead, and generic
   CPUs can set to any valid value."

That means that any specific marchid value that the CPU uses must to be set
in its own cpu_init() function.


Got it. Thanks, Daniel!

For completeness (since it came up on the weekly PW call); Conor pointed
out that zero *is* indeed the right marchid for c906, and in fact, the
non-zero marchid pre commit d6a427e2c0b2 was incorrect.

Post commit d6a427e2c0b2, the correct alternative is picked up, and
ERRATA_THEAD_PBMT (using non-standard memory type bits in
page-table-entries) kicks in. AFAIU, that's not implemented by qemu's
c906 support, which then traps.



This looks like a very good reason to actually push what you called 'hack' as
a fix. Yeah, in theory that commit did nothing wrong, but the side effect
(missing support for non-standard memory type bits) is kind of a QEMU problem.

You're welcome to format that hack into a patch, explaining in the commit msg 
why
we need to set marchid for c906 to that specific value. I'd even add a TODO
tag in rv64_thead_c906_cpu_init() to remind us that this is a band-aid and
that we should remove it once we implement the needed support.





That's the theory. Maybe Christoph knows if the non-standard bits are
implemented or not?

Regardless; I removed booting Qemu T-head c906 from the CI, and the
build/boot passes nicely ;-) [1]


I vote for setting marchid in c906 cpu_init and re-enable it in the CI.


Thanks,

Daniel




Björn

[1] 
https://github.com/linux-riscv/linux-riscv/actions/runs/7641764759/job/20819801235?pr=447




Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Björn Töpel
Daniel Henrique Barboza  writes:

> On 1/24/24 09:49, Björn Töpel wrote:
>> Hi!
>> 
>> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
>> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
>> ("target/riscv/cpu.c: restrict 'marchid' value")
>> 
>> Reverting that commit, or the hack below solves the boot issue:
>> 
>> --8<--
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 8cbfc7e781ad..e18596c8a55a 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
>>   cpu->cfg.ext_xtheadsync = true;
>>   
>>   cpu->cfg.mvendorid = THEAD_VENDOR_ID;
>> +cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
>> +(QEMU_VERSION_MINOR << 8)  |
>> +(QEMU_VERSION_MICRO));
>>   #ifndef CONFIG_USER_ONLY
>>   set_satp_mode_max_supported(cpu, VM_1_10_SV39);
>>   #endif
>> --8<--
>> 
>> I'm unsure what the correct qemu way of adding a default value is,
>> or if c906 should have a proper marchid.
>
> In case you need to set a 'marchid' different than zero for c906, this hack 
> would
> be a proper fix. As mentioned in the commit msg of the patch you mentioned:
>
> "Named CPUs should set 'marchid' to a meaningful value instead, and generic
>   CPUs can set to any valid value."
>
> That means that any specific marchid value that the CPU uses must to be set
> in its own cpu_init() function.

Got it. Thanks, Daniel!

For completeness (since it came up on the weekly PW call); Conor pointed
out that zero *is* indeed the right marchid for c906, and in fact, the
non-zero marchid pre commit d6a427e2c0b2 was incorrect.

Post commit d6a427e2c0b2, the correct alternative is picked up, and
ERRATA_THEAD_PBMT (using non-standard memory type bits in
page-table-entries) kicks in. AFAIU, that's not implemented by qemu's
c906 support, which then traps.

That's the theory. Maybe Christoph knows if the non-standard bits are
implemented or not?

Regardless; I removed booting Qemu T-head c906 from the CI, and the
build/boot passes nicely ;-) [1]


Björn

[1] 
https://github.com/linux-riscv/linux-riscv/actions/runs/7641764759/job/20819801235?pr=447



Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Conor Dooley
On Wed, Jan 24, 2024 at 02:27:10PM +0100, Björn Töpel wrote:
> Conor Dooley  writes:
> 
> > On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote:
> >> Hi!
> >> 
> >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
> >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
> >> ("target/riscv/cpu.c: restrict 'marchid' value")
> >> 
> >> Reverting that commit, or the hack below solves the boot issue:
> >> 
> >> --8<--
> >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> >> index 8cbfc7e781ad..e18596c8a55a 100644
> >> --- a/target/riscv/cpu.c
> >> +++ b/target/riscv/cpu.c
> >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
> >>  cpu->cfg.ext_xtheadsync = true;
> >>  
> >>  cpu->cfg.mvendorid = THEAD_VENDOR_ID;
> >> +cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
> >> +(QEMU_VERSION_MINOR << 8)  |
> >> +(QEMU_VERSION_MICRO));
> >>  #ifndef CONFIG_USER_ONLY
> >>  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
> >>  #endif
> >> --8<--
> >> 
> >> I'm unsure what the correct qemu way of adding a default value is,
> >> or if c906 should have a proper marchid.
> >
> > The "correct" marchid/mimpid values for the c906 are zero.
> 
> Ok! Thanks for clearing that up for me.
> 
> > I haven't looked into the code at all, so I am "assuming" that it is
> > being zero intialised at present. Linux applies the errata fixups for
> > the c906 when archid and impid are both zero - so your patch will avoid
> > these fixups being applied.
> 
> I'm also assuming 0, -- will double-check. Hmm, that means that the
> *previous* marchid was incorrect (pre d6a427e2c0b2).
> 
> > Do you think that perhaps the emulation in QEMU does not support what
> > the kernel uses once then errata fixups are enabled?
> 
> Did a quick look at the c906 "in_asm,int" logs:
> 
> | 0x80201040:  1273  sfence.vma  zero,zero
> | 0x80201044:  18051073  csrrw   zero,satp,a0
> | 
> | riscv_cpu_do_interrupt: hart:0, async:0, cause:000c, 
> epc:0x80201048, tval:0x80201048, desc=exec_page_fault
> | riscv_cpu_do_interrupt: hart:0, async:0, cause:000c, 
> epc:0x80001048, tval:0x80001048, desc=exec_page_fault
> | ...cont forever
> 
> So it looks like we're tripping over the page tables, when we're turning
> on paging.
> 
> Hmm, maybe it's not qemu, but the c906 that has been broken for a while?

I didn't know what you mean by "not qemu, but the c906", so I went and
boot tested my d1 nezha. On today's next (6.8.0-rc1-next-20240124) it
booted into my initramfs with no problems. Obivously though my config is
unlikely to match yours, but that seems like a core thing that should be
hit regardless of config.
So perhaps this is a c906-in-QEMU problem? Lacking emulation for
something the kernel uses perhaps? I know nothing about the capabilities
of its emulation in QEMU, so I am of no help.

Cheers,
Conor.

> 
> I'll disable it temporarily from CI anyhow, and will continue digging.
> 
> 
> Thanks for the pointers/clarifications, Conor!
> Björn
> 
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


signature.asc
Description: PGP signature


Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Daniel Henrique Barboza




On 1/24/24 09:49, Björn Töpel wrote:

Hi!

I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
("target/riscv/cpu.c: restrict 'marchid' value")

Reverting that commit, or the hack below solves the boot issue:

--8<--
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..e18596c8a55a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  cpu->cfg.ext_xtheadsync = true;
  
  cpu->cfg.mvendorid = THEAD_VENDOR_ID;

+cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
+(QEMU_VERSION_MINOR << 8)  |
+(QEMU_VERSION_MICRO));
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
  #endif
--8<--

I'm unsure what the correct qemu way of adding a default value is,
or if c906 should have a proper marchid.


In case you need to set a 'marchid' different than zero for c906, this hack 
would
be a proper fix. As mentioned in the commit msg of the patch you mentioned:

"Named CPUs should set 'marchid' to a meaningful value instead, and generic
 CPUs can set to any valid value."

That means that any specific marchid value that the CPU uses must to be set
in its own cpu_init() function.


Thanks,

Daniel




Maybe Christoph or Zhiwei can answer?

qemu command-line:
qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \
-cpu thead-c906 ...


Thanks,
Björn




Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Björn Töpel
Conor Dooley  writes:

> On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote:
>> Hi!
>> 
>> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
>> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
>> ("target/riscv/cpu.c: restrict 'marchid' value")
>> 
>> Reverting that commit, or the hack below solves the boot issue:
>> 
>> --8<--
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 8cbfc7e781ad..e18596c8a55a 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
>>  cpu->cfg.ext_xtheadsync = true;
>>  
>>  cpu->cfg.mvendorid = THEAD_VENDOR_ID;
>> +cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
>> +(QEMU_VERSION_MINOR << 8)  |
>> +(QEMU_VERSION_MICRO));
>>  #ifndef CONFIG_USER_ONLY
>>  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
>>  #endif
>> --8<--
>> 
>> I'm unsure what the correct qemu way of adding a default value is,
>> or if c906 should have a proper marchid.
>
> The "correct" marchid/mimpid values for the c906 are zero.

Ok! Thanks for clearing that up for me.

> I haven't looked into the code at all, so I am "assuming" that it is
> being zero intialised at present. Linux applies the errata fixups for
> the c906 when archid and impid are both zero - so your patch will avoid
> these fixups being applied.

I'm also assuming 0, -- will double-check. Hmm, that means that the
*previous* marchid was incorrect (pre d6a427e2c0b2).

> Do you think that perhaps the emulation in QEMU does not support what
> the kernel uses once then errata fixups are enabled?

Did a quick look at the c906 "in_asm,int" logs:

| 0x80201040:  1273  sfence.vma  zero,zero
| 0x80201044:  18051073  csrrw   zero,satp,a0
| 
| riscv_cpu_do_interrupt: hart:0, async:0, cause:000c, 
epc:0x80201048, tval:0x80201048, desc=exec_page_fault
| riscv_cpu_do_interrupt: hart:0, async:0, cause:000c, 
epc:0x80001048, tval:0x80001048, desc=exec_page_fault
| ...cont forever

So it looks like we're tripping over the page tables, when we're turning
on paging.

Hmm, maybe it's not qemu, but the c906 that has been broken for a while?

I'll disable it temporarily from CI anyhow, and will continue digging.


Thanks for the pointers/clarifications, Conor!
Björn



Re: qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Conor Dooley
On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote:
> Hi!
> 
> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
> ("target/riscv/cpu.c: restrict 'marchid' value")
> 
> Reverting that commit, or the hack below solves the boot issue:
> 
> --8<--
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 8cbfc7e781ad..e18596c8a55a 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
>  cpu->cfg.ext_xtheadsync = true;
>  
>  cpu->cfg.mvendorid = THEAD_VENDOR_ID;
> +cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
> +(QEMU_VERSION_MINOR << 8)  |
> +(QEMU_VERSION_MICRO));
>  #ifndef CONFIG_USER_ONLY
>  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
>  #endif
> --8<--
> 
> I'm unsure what the correct qemu way of adding a default value is,
> or if c906 should have a proper marchid.

The "correct" marchid/mimpid values for the c906 are zero.

I haven't looked into the code at all, so I am "assuming" that it is
being zero intialised at present. Linux applies the errata fixups for
the c906 when archid and impid are both zero - so your patch will avoid
these fixups being applied.
Do you think that perhaps the emulation in QEMU does not support what
the kernel uses once then errata fixups are enabled?

> 
> Maybe Christoph or Zhiwei can answer?
> 
> qemu command-line:
> qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \
>-cpu thead-c906 ...
> 
> 
> Thanks,
> Björn
> 


signature.asc
Description: PGP signature


qemu riscv, thead c906, Linux boot regression

2024-01-24 Thread Björn Töpel
Hi!

I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
("target/riscv/cpu.c: restrict 'marchid' value")

Reverting that commit, or the hack below solves the boot issue:

--8<--
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..e18596c8a55a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 cpu->cfg.ext_xtheadsync = true;
 
 cpu->cfg.mvendorid = THEAD_VENDOR_ID;
+cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
+(QEMU_VERSION_MINOR << 8)  |
+(QEMU_VERSION_MICRO));
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_SV39);
 #endif
--8<--

I'm unsure what the correct qemu way of adding a default value is,
or if c906 should have a proper marchid.

Maybe Christoph or Zhiwei can answer?

qemu command-line:
qemu-system-riscv64 -nodefaults -nographic -machine virt,acpi=off \
   -cpu thead-c906 ...


Thanks,
Björn