Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-05 Thread S V
чт, 5 окт. 2023 г., 22:17 Mark Kettenis :

> > > Really, if those secondary CPUs don't come up, your system is beyond
> > > repair.  You need to do some low-level debugging at that point and DDB
> > > isn't going to help you.  So no, let's keep this code as simple as we
> > > can.
> > >
> >
> > This cores are starting and working properly,
> > but something messes with order
> > (maybe related to shared caches per 2 cores cluster, I'm not so
> > hardware-good ).
> > So 7th core will answer that it booted after 6th core.
> > But with 'while' loop we never got to 6th core.
>
> That suggests that the interrupt that we think we're sending to the
> 6th core is actually going to the 7th core and vice-versa.  Feels like
> a firmware bug to me.  What hardware is this on?
>

This is different Baikal-M (arm64, cortex-a57) based boards with more
or less same firmware of different versions.
I can attach dtb and decompiled dts from one of it, if it can help.



> > > > Index: sys/arch/arm64//arm64/cpu.c
> > > > ===
> > > > RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
> > > > retrieving revision 1.98
> > > > diff -u -p -r1.98 cpu.c
> > > > --- sys/arch/arm64//arm64/cpu.c   10 Aug 2023 19:29:32 -  
> > > > 1.98
> > > > +++ sys/arch/arm64//arm64/cpu.c   25 Sep 2023 13:24:39 -
> > > > @@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
> > > >  void
> > > >  cpu_boot_secondary(struct cpu_info *ci)
> > > >  {
> > > > + int i;
> > > > +
> > > >   atomic_setbits_int(>ci_flags, CPUF_GO);
> > > >   __asm volatile("dsb sy; sev" ::: "memory");
> > > >
> > > > @@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
> > > >*/
> > > >   arm_send_ipi(ci, ARM_IPI_NOP);
> > > >
> > > > - while ((ci->ci_flags & CPUF_RUNNING) == 0)
> > > > + for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
> > > >   __asm volatile("wfe");
> > > > + }
> > > > + if (! (ci->ci_flags & CPUF_RUNNING)) {
> > > > + printf("cpu %d failed to start\n", ci->ci_cpuid);
> > > > +#if defined(MPDEBUG) && defined(DDB)
> > > > + printf("dropping into debugger; continue from here to 
> > > > resume boot\n");
> > > > + db_enter();
> > > > +#endif
> > > > + }
> > > >  }
> > > >
> > > >  void
> > > >
> > > >
> >


bober.patched.dts
Description: audio/vnd.dts


bober.patched.dtb
Description: Binary data


Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-05 Thread Mark Kettenis
> From: S V 
> Date: Thu, 5 Oct 2023 12:58:51 +0300
> 
> чт, 5 окт. 2023 г., 09:59 Mark Kettenis :
> >
> > > Date: Thu,  5 Oct 2023 04:10:10 +
> > > From: Klemens Nanni 
> > >
> > > On Mon, Sep 25, 2023 at 01:33:31PM +, Klemens Nanni wrote:
> > > > On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> > > > > Hi, pinging and refreshing this patch
> > > > >
> > > > > What it does:
> > > > > allow arm64 cpus to break from the loop of waiting to start core and
> > > > > drop to DDB or OS.
> > > > >
> > > > > Patch based on same concept in amd64 cpu.c
> > > > >
> > > > > Any suggestions? Good to go?
> > > >
> > > > So instead of waiting possibly forever for secondary CPUs to come up,
> > > > you can continue debug the system and/or continue boot with less CPUs.
> > > >
> > > > Apart from the trailing empty line you introduce, the approach does
> > > > match amd64 (down to the for loop lacking a space after semicolon).
> > > >
> > > > That allows making progress on these machines and I don't see a 
> > > > downside,
> > > > so OK kn modulo the empty line.
> > > >
> > > > Any input from our arm64 hackers?
> > > >
> > > > Same diff ten days ago: 
> > > > https://marc.info/?l=openbsd-bugs=169465443200821=2
> > >
> > > Anyone?
> >
> > Really, if those secondary CPUs don't come up, your system is beyond
> > repair.  You need to do some low-level debugging at that point and DDB
> > isn't going to help you.  So no, let's keep this code as simple as we
> > can.
> >
> 
> This cores are starting and working properly,
> but something messes with order
> (maybe related to shared caches per 2 cores cluster, I'm not so
> hardware-good ).
> So 7th core will answer that it booted after 6th core.
> But with 'while' loop we never got to 6th core.

That suggests that the interrupt that we think we're sending to the
6th core is actually going to the 7th core and vice-versa.  Feels like
a firmware bug to me.  What hardware is this on?

> > > Index: sys/arch/arm64//arm64/cpu.c
> > > ===
> > > RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
> > > retrieving revision 1.98
> > > diff -u -p -r1.98 cpu.c
> > > --- sys/arch/arm64//arm64/cpu.c   10 Aug 2023 19:29:32 -  1.98
> > > +++ sys/arch/arm64//arm64/cpu.c   25 Sep 2023 13:24:39 -
> > > @@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
> > >  void
> > >  cpu_boot_secondary(struct cpu_info *ci)
> > >  {
> > > + int i;
> > > +
> > >   atomic_setbits_int(>ci_flags, CPUF_GO);
> > >   __asm volatile("dsb sy; sev" ::: "memory");
> > >
> > > @@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
> > >*/
> > >   arm_send_ipi(ci, ARM_IPI_NOP);
> > >
> > > - while ((ci->ci_flags & CPUF_RUNNING) == 0)
> > > + for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
> > >   __asm volatile("wfe");
> > > + }
> > > + if (! (ci->ci_flags & CPUF_RUNNING)) {
> > > + printf("cpu %d failed to start\n", ci->ci_cpuid);
> > > +#if defined(MPDEBUG) && defined(DDB)
> > > + printf("dropping into debugger; continue from here to 
> > > resume boot\n");
> > > + db_enter();
> > > +#endif
> > > + }
> > >  }
> > >
> > >  void
> > >
> > >
> 



Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-05 Thread S V
чт, 5 окт. 2023 г., 09:59 Mark Kettenis :
>
> > Date: Thu,  5 Oct 2023 04:10:10 +
> > From: Klemens Nanni 
> >
> > On Mon, Sep 25, 2023 at 01:33:31PM +, Klemens Nanni wrote:
> > > On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> > > > Hi, pinging and refreshing this patch
> > > >
> > > > What it does:
> > > > allow arm64 cpus to break from the loop of waiting to start core and
> > > > drop to DDB or OS.
> > > >
> > > > Patch based on same concept in amd64 cpu.c
> > > >
> > > > Any suggestions? Good to go?
> > >
> > > So instead of waiting possibly forever for secondary CPUs to come up,
> > > you can continue debug the system and/or continue boot with less CPUs.
> > >
> > > Apart from the trailing empty line you introduce, the approach does
> > > match amd64 (down to the for loop lacking a space after semicolon).
> > >
> > > That allows making progress on these machines and I don't see a downside,
> > > so OK kn modulo the empty line.
> > >
> > > Any input from our arm64 hackers?
> > >
> > > Same diff ten days ago: 
> > > https://marc.info/?l=openbsd-bugs=169465443200821=2
> >
> > Anyone?
>
> Really, if those secondary CPUs don't come up, your system is beyond
> repair.  You need to do some low-level debugging at that point and DDB
> isn't going to help you.  So no, let's keep this code as simple as we
> can.
>

This cores are starting and working properly,
but something messes with order
(maybe related to shared caches per 2 cores cluster, I'm not so
hardware-good ).
So 7th core will answer that it booted after 6th core.
But with 'while' loop we never got to 6th core.

>
> > Index: sys/arch/arm64//arm64/cpu.c
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
> > retrieving revision 1.98
> > diff -u -p -r1.98 cpu.c
> > --- sys/arch/arm64//arm64/cpu.c   10 Aug 2023 19:29:32 -  1.98
> > +++ sys/arch/arm64//arm64/cpu.c   25 Sep 2023 13:24:39 -
> > @@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
> >  void
> >  cpu_boot_secondary(struct cpu_info *ci)
> >  {
> > + int i;
> > +
> >   atomic_setbits_int(>ci_flags, CPUF_GO);
> >   __asm volatile("dsb sy; sev" ::: "memory");
> >
> > @@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
> >*/
> >   arm_send_ipi(ci, ARM_IPI_NOP);
> >
> > - while ((ci->ci_flags & CPUF_RUNNING) == 0)
> > + for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
> >   __asm volatile("wfe");
> > + }
> > + if (! (ci->ci_flags & CPUF_RUNNING)) {
> > + printf("cpu %d failed to start\n", ci->ci_cpuid);
> > +#if defined(MPDEBUG) && defined(DDB)
> > + printf("dropping into debugger; continue from here to resume 
> > boot\n");
> > + db_enter();
> > +#endif
> > + }
> >  }
> >
> >  void
> >
> >



Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-05 Thread Mark Kettenis
> Date: Thu,  5 Oct 2023 04:10:10 +
> From: Klemens Nanni 
> 
> On Mon, Sep 25, 2023 at 01:33:31PM +, Klemens Nanni wrote:
> > On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> > > Hi, pinging and refreshing this patch
> > > 
> > > What it does:
> > > allow arm64 cpus to break from the loop of waiting to start core and
> > > drop to DDB or OS.
> > > 
> > > Patch based on same concept in amd64 cpu.c
> > > 
> > > Any suggestions? Good to go?
> > 
> > So instead of waiting possibly forever for secondary CPUs to come up,
> > you can continue debug the system and/or continue boot with less CPUs.
> > 
> > Apart from the trailing empty line you introduce, the approach does
> > match amd64 (down to the for loop lacking a space after semicolon).
> > 
> > That allows making progress on these machines and I don't see a downside,
> > so OK kn modulo the empty line.
> > 
> > Any input from our arm64 hackers?
> > 
> > Same diff ten days ago: 
> > https://marc.info/?l=openbsd-bugs=169465443200821=2
> 
> Anyone?

Really, if those secondary CPUs don't come up, your system is beyond
repair.  You need to do some low-level debugging at that point and DDB
isn't going to help you.  So no, let's keep this code as simple as we
can.


> Index: sys/arch/arm64//arm64/cpu.c
> ===
> RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 cpu.c
> --- sys/arch/arm64//arm64/cpu.c   10 Aug 2023 19:29:32 -  1.98
> +++ sys/arch/arm64//arm64/cpu.c   25 Sep 2023 13:24:39 -
> @@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
>  void
>  cpu_boot_secondary(struct cpu_info *ci)
>  {
> + int i;
> +
>   atomic_setbits_int(>ci_flags, CPUF_GO);
>   __asm volatile("dsb sy; sev" ::: "memory");
>  
> @@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
>*/
>   arm_send_ipi(ci, ARM_IPI_NOP);
>  
> - while ((ci->ci_flags & CPUF_RUNNING) == 0)
> + for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
>   __asm volatile("wfe");
> + }
> + if (! (ci->ci_flags & CPUF_RUNNING)) {
> + printf("cpu %d failed to start\n", ci->ci_cpuid);
> +#if defined(MPDEBUG) && defined(DDB)
> + printf("dropping into debugger; continue from here to resume 
> boot\n");
> + db_enter();
> +#endif
> + }
>  }
>  
>  void
> 
> 



Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-04 Thread Klemens Nanni
On Mon, Sep 25, 2023 at 01:33:31PM +, Klemens Nanni wrote:
> On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> > Hi, pinging and refreshing this patch
> > 
> > What it does:
> > allow arm64 cpus to break from the loop of waiting to start core and
> > drop to DDB or OS.
> > 
> > Patch based on same concept in amd64 cpu.c
> > 
> > Any suggestions? Good to go?
> 
> So instead of waiting possibly forever for secondary CPUs to come up,
> you can continue debug the system and/or continue boot with less CPUs.
> 
> Apart from the trailing empty line you introduce, the approach does
> match amd64 (down to the for loop lacking a space after semicolon).
> 
> That allows making progress on these machines and I don't see a downside,
> so OK kn modulo the empty line.
> 
> Any input from our arm64 hackers?
> 
> Same diff ten days ago: 
> https://marc.info/?l=openbsd-bugs=169465443200821=2

Anyone?


Index: sys/arch/arm64//arm64/cpu.c
===
RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
retrieving revision 1.98
diff -u -p -r1.98 cpu.c
--- sys/arch/arm64//arm64/cpu.c 10 Aug 2023 19:29:32 -  1.98
+++ sys/arch/arm64//arm64/cpu.c 25 Sep 2023 13:24:39 -
@@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
 void
 cpu_boot_secondary(struct cpu_info *ci)
 {
+   int i;
+
atomic_setbits_int(>ci_flags, CPUF_GO);
__asm volatile("dsb sy; sev" ::: "memory");
 
@@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
 */
arm_send_ipi(ci, ARM_IPI_NOP);
 
-   while ((ci->ci_flags & CPUF_RUNNING) == 0)
+   for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
__asm volatile("wfe");
+   }
+   if (! (ci->ci_flags & CPUF_RUNNING)) {
+   printf("cpu %d failed to start\n", ci->ci_cpuid);
+#if defined(MPDEBUG) && defined(DDB)
+   printf("dropping into debugger; continue from here to resume 
boot\n");
+   db_enter();
+#endif
+   }
 }
 
 void



Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-09-25 Thread Klemens Nanni
On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> Hi, pinging and refreshing this patch
> 
> What it does:
> allow arm64 cpus to break from the loop of waiting to start core and
> drop to DDB or OS.
> 
> Patch based on same concept in amd64 cpu.c
> 
> Any suggestions? Good to go?

So instead of waiting possibly forever for secondary CPUs to come up,
you can continue debug the system and/or continue boot with less CPUs.

Apart from the trailing empty line you introduce, the approach does
match amd64 (down to the for loop lacking a space after semicolon).

That allows making progress on these machines and I don't see a downside,
so OK kn modulo the empty line.

Any input from our arm64 hackers?

Same diff ten days ago: https://marc.info/?l=openbsd-bugs=169465443200821=2



Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-07-25 Thread Slava Voronzoff
Hi, pinging and refreshing this patch

What it does:
allow arm64 cpus to break from the loop of waiting to start core and
drop to DDB or OS.

Patch based on same concept in amd64 cpu.c

Any suggestions? Good to go?

ср, 19 апр. 2023 г. в 02:32, S V :
>
> Hello, tech!
>
> I'm attaching simple patch that adds two "features" to mp kernel for arm64 
> arch:
> 1. It allows to enable debug flag to drop to DDB to debug mp init problems
> 2. It allows some buggy arm64 processors to break from loop and dirty
> (or not so dirty) boot to multicore instead of eternal hang in while.
>
> This based on same idea as amd64 cpu.c function - instead of waiting for 
> while()
> to boot secondary core either continue to try boot next one or drop to DDB.
>
>
>
> --
> Nerfur Dragon
> -==(UDIC)==-
Index: sys/arch/arm64/arm64/cpu.c
===
RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
retrieving revision 1.97
diff -u -p -r1.97 cpu.c
--- sys/arch/arm64/arm64/cpu.c	16 Jul 2023 16:13:46 -	1.97
+++ sys/arch/arm64/arm64/cpu.c	25 Jul 2023 10:24:17 -
@@ -1087,6 +1087,8 @@ cpu_start_secondary(struct cpu_info *ci,
 void
 cpu_boot_secondary(struct cpu_info *ci)
 {
+	int i;
+
 	atomic_setbits_int(>ci_flags, CPUF_GO);
 	__asm volatile("dsb sy; sev" ::: "memory");
 
@@ -1096,8 +1098,17 @@ cpu_boot_secondary(struct cpu_info *ci)
 	 */
 	arm_send_ipi(ci, ARM_IPI_NOP);
 
-	while ((ci->ci_flags & CPUF_RUNNING) == 0)
+	for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
 		__asm volatile("wfe");
+	}
+	if (! (ci->ci_flags & CPUF_RUNNING)) {
+		printf("cpu %d failed to start\n", ci->ci_cpuid);
+#if defined(MPDEBUG) && defined(DDB)
+		printf("dropping into debugger; continue from here to resume boot\n");
+		db_enter();
+#endif
+	}
+
 }
 
 void


Fwd: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-04-18 Thread S V
Hello, tech!

I'm attaching simple patch that adds two "features" to mp kernel for arm64 arch:
1. It allows to enable debug flag to drop to DDB to debug mp init problems
2. It allows some buggy arm64 processors to break from loop and dirty
(or not so dirty) boot to multicore instead of eternal hang in while.

This based on same idea as amd64 cpu.c function - instead of waiting for while()
to boot secondary core either continue to try boot next one or drop to DDB.



--
Nerfur Dragon
-==(UDIC)==-


arm64_cpu.c.diff
Description: Binary data