Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Christophe Leroy




Le 15/02/2020 à 03:42, Larry Finger a écrit :

Christophe,

On 2/14/20 1:35 PM, Christophe Leroy wrote:

--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -270,6 +270,9 @@ __secondary_hold_acknowledge:
   * pointer when we take an exception from supervisor mode.)
   *    -- paulus.
   */
+#ifdef CONFIG_PPC_CHRP
+1:    b    machine_check_in_rtas
+#endif
  . = 0x200
  DO_KVM  0x200
  MachineCheck:
@@ -290,12 +293,9 @@ MachineCheck:
  7:    EXCEPTION_PROLOG_2
  addi    r3,r1,STACK_FRAME_OVERHEAD
  #ifdef CONFIG_PPC_CHRP
-    bne    cr1,1f
+    bne    cr1,1b
  #endif
  EXC_XFER_STD(0x200, machine_check_exception)
-#ifdef CONFIG_PPC_CHRP
-1:    b    machine_check_in_rtas
-#endif


I'll need to make it a bit different because it shoehorns into your 
config but won't fit if CONFIG_KVM_BOOK3S_32 is added.




  /* Data access exception. */
  . = 0x300


With the above changes and all the other patches applied, the machine 
finally boots. It is so bloody slow that it takes a long time to do 
anything, but you finally got all the places that needed patches. I 
really lost track of how many bugs were fixed in the process, but I can 
now put that old box aside until time for v5.7.0-rc1. As you can tell, 
it only gets used to verify that PPC32 is working on real G4 hardware. 
It has no real value for any other function.


Yes, I don't have a G4 myself but this is so much nested with other 
stuff for the powerpc 83xx than we can't avoid the changes impacting the 
G4 and other hash-MMU based PPC32 allthough the changes I'm doing are 
not targetted at those platform at first. And as the 83xx is a 603 core, 
it is non-hash so all hash related things can't be verified. Plus all 
those small parts like power saving, RTAS, etc... which are more specific.

And checking with all possible options is also not easy.

VMAP-STACK was really a challenging functionnality, I'm happy it made 
its way to mainline though.




Thanks for the help,


Thanks to you for testing and for your patience.

Christophe


Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Larry Finger

Christophe,

On 2/14/20 1:35 PM, Christophe Leroy wrote:

--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -270,6 +270,9 @@ __secondary_hold_acknowledge:
   * pointer when we take an exception from supervisor mode.)
   *    -- paulus.
   */
+#ifdef CONFIG_PPC_CHRP
+1:    b    machine_check_in_rtas
+#endif
  . = 0x200
  DO_KVM  0x200
  MachineCheck:
@@ -290,12 +293,9 @@ MachineCheck:
  7:    EXCEPTION_PROLOG_2
  addi    r3,r1,STACK_FRAME_OVERHEAD
  #ifdef CONFIG_PPC_CHRP
-    bne    cr1,1f
+    bne    cr1,1b
  #endif
  EXC_XFER_STD(0x200, machine_check_exception)
-#ifdef CONFIG_PPC_CHRP
-1:    b    machine_check_in_rtas
-#endif

  /* Data access exception. */
  . = 0x300


With the above changes and all the other patches applied, the machine finally 
boots. It is so bloody slow that it takes a long time to do anything, but you 
finally got all the places that needed patches. I really lost track of how many 
bugs were fixed in the process, but I can now put that old box aside until time 
for v5.7.0-rc1. As you can tell, it only gets used to verify that PPC32 is 
working on real G4 hardware. It has no real value for any other function.


Thanks for the help,

Larry


Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Christophe Leroy




On 02/14/2020 06:24 PM, Larry Finger wrote:

On 2/14/20 12:24 AM, Christophe Leroy wrote:


Did you try with the patch at 
https://patchwork.ozlabs.org/patch/1237387/ ?


Christophe,

When I apply that patch, there is an error at

--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -301,6 +301,39 @@  MachineCheck:
  . = 0x300
  DO_KVM  0x300
  DataAccess:

It complains about "an attempt to move .org backwards".



Argh !


When I change the 0x300 to 0x310 in two places, it builds OK. Is that OK?


No you can't do that.

The following should solve it for your case.

---
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 32875afb3319..f9941b766f63 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -270,6 +270,9 @@ __secondary_hold_acknowledge:
  * pointer when we take an exception from supervisor mode.)
  * -- paulus.
  */
+#ifdef CONFIG_PPC_CHRP
+1: b   machine_check_in_rtas
+#endif
. = 0x200
DO_KVM  0x200
 MachineCheck:
@@ -290,12 +293,9 @@ MachineCheck:
 7: EXCEPTION_PROLOG_2
addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_CHRP
-   bne cr1,1f
+   bne cr1,1b
 #endif
EXC_XFER_STD(0x200, machine_check_exception)
-#ifdef CONFIG_PPC_CHRP
-1: b   machine_check_in_rtas
-#endif

 /* Data access exception. */
. = 0x300
---

Christophe


Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Larry Finger

On 2/14/20 12:24 AM, Christophe Leroy wrote:


Did you try with the patch at https://patchwork.ozlabs.org/patch/1237387/ ?


Christophe,

When I apply that patch, there is an error at

--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -301,6 +301,39 @@  MachineCheck:
. = 0x300
DO_KVM  0x300
 DataAccess:

It complains about "an attempt to move .org backwards".

When I change the 0x300 to 0x310 in two places, it builds OK. Is that OK?

Larry



Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Larry Finger

On 2/14/20 12:24 AM, Christophe Leroy wrote:


Did you try with the patch at https://patchwork.ozlabs.org/patch/1237387/ ?


Christophe,

When I apply that patch, there is an error at

--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -301,6 +301,39 @@  MachineCheck:
. = 0x300
DO_KVM  0x300
 DataAccess:

It complains about "an attempt to move .org backwards".

Larry



Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-14 Thread Christophe Leroy




Le 14/02/2020 à 07:24, Christophe Leroy a écrit :

Larry,

Le 14/02/2020 à 00:09, Larry Finger a écrit :

Christophe,

With this patch, it gets further. Sometime after the boot process 
tries to start process init, it crashes with the unable to read data 
at 0x000157a0 with a faulting address of 0xc001683c. The screenshot is 
attached and the gzipped vmlinux is at 
http://www.lwfinger.com/download/vmlinux2.gz. The patches that were 
applied for this kernel are also attached,





Did you try with the patch at https://patchwork.ozlabs.org/patch/1237387/ ?

I see the problem happens in kprobe_handler(). Can you try without 
CONFIG_KPROBE ?




In fact, you hit two bugs. The first one is due to CONFIG_VMAP_STACK. 
The second one has always existed (at least since kernel source tree has 
been in git).


First bug is in function enter_rtas() which tries to read data on stack 
by using the linear physical address translation. This cannot be used 
with VM stack, it must re-enable data MMU translation to access data on 
the stack.


Second bug is in kprobe_handler() function, which does:

if (*addr != BREAKPOINT_INSTRUCTION)

addr is the address where the 'trap' happened. When a trap happens with 
MMU disabled, addr contains the physical address of the trap. 
kprobe_handler() tries to read the instruction using physical address 
whereas MMU is enabled, so you get a bad access either because the said 
address is not mapped, or because access to userspace is not allowed.



Due to the first bug, you get a 'machine check', and as 
current->thread.rtas_sp has not been cleared yet, the machine check 
handler jumps to 'machine_check_in_rtas'.


machine_check_in_rtas does a trap, which in turn triggers the second bug.


Once the first bug is fixed, the second one should not popup.

Can you test patch https://patchwork.ozlabs.org/patch/1237929/ that 
fixes the first bug ?


Christophe


Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-13 Thread Christophe Leroy

Larry,

Le 14/02/2020 à 00:09, Larry Finger a écrit :

Christophe,

With this patch, it gets further. Sometime after the boot process tries 
to start process init, it crashes with the unable to read data at 
0x000157a0 with a faulting address of 0xc001683c. The screenshot is 
attached and the gzipped vmlinux is at 
http://www.lwfinger.com/download/vmlinux2.gz. The patches that were 
applied for this kernel are also attached,





Did you try with the patch at https://patchwork.ozlabs.org/patch/1237387/ ?

I see the problem happens in kprobe_handler(). Can you try without 
CONFIG_KPROBE ?


Christophe


Re: RESEND: Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-13 Thread Christophe Leroy



On 02/13/2020 02:28 PM, Larry Finger wrote:

On 2/11/20 1:23 PM, Christophe Leroy wrote:
Can you send me a picture of that BUG Unable to handle kernel data 
access with all the registers values etc..., together with the 
matching vmlinux ?


First thing is to identify where we are when that happens. That mean 
see what is at 0xc0013674. Can be done with 'ppc-linux-objdump -d 
vmlinux' (Or whatever your PPC objdump is named) and get the function 
code.


Then we need to understand how we reach that function and why it tries 
to access a physical address.



Another thing I'm thinking about, not necessarily related to that 
problem: Some buggy drivers do DMA from stack. This doesn't work 
anymore with CONFIG_VMAP_STACK. Most of them can be detected with 
CONFIG_DEBUG_VIRTUAL so you should activate it.


Christophe,

The previous send of this message failed because the attached vmlinux 
was too large.


I have gone about as far as I can in debugging the problem. Setting 
CONFIG_DEBUG_VIRTUAL made no difference.


Attached are the final screenshot, and the patches that I have applied. 
You already have the gzipped vmlinux.




This screenshot makes more sense with the vmlinux you provided, problem 
at 0xc00136dc.


That's in function power_save_ppc32_restore() in 
arch/powerpc/kernel/idle_6xx.S.


c00136c0 :
c00136c0:   81 2b 00 a0 lwz r9,160(r11)
c00136c4:   91 2b 00 90 stw r9,144(r11)
c00136c8:   39 60 00 00 li  r11,0
c00136cc:   7d 30 fa a6 mfspr   r9,1008
c00136d0:   75 29 00 40 andis.  r9,r9,64
c00136d4:   41 82 00 18 beq c00136ec 
c00136d8:   3d 2b 00 7c addis   r9,r11,124
>> c00136dc:  81 29 92 5c lwz r9,-28068(r9)
c00136e0:   7d 36 fb a6 mtspr   1014,r9
c00136e4:   7c 00 04 ac hwsync
c00136e8:   4c 00 01 2c isync
c00136ec:   3d 2b 00 7c addis   r9,r11,124
c00136f0:   81 29 92 60 lwz r9,-28064(r9)
c00136f4:   7d 31 fb a6 mtspr   1009,r9
c00136f8:   48 00 19 c4 b   c00150bc 
c00136fc:   00 00 00 00 .long 0x0

Can you try the change below (won't work anymore without 
CONFIG_VMAP_STACK, will fix it properly later when you confirm it is OK).


diff --git a/arch/powerpc/kernel/idle_6xx.S b/arch/powerpc/kernel/idle_6xx.S
index 0ffdd18b9f26..7be8a0f3fac8 100644
--- a/arch/powerpc/kernel/idle_6xx.S
+++ b/arch/powerpc/kernel/idle_6xx.S
@@ -166,7 +166,7 @@ BEGIN_FTR_SECTION
mfspr   r9,SPRN_HID0
andis.  r9,r9,HID0_NAP@h
beq 1f
-   addis   r9,r11,(nap_save_msscr0-KERNELBASE)@ha
+   addis   r9,r11,nap_save_msscr0@ha
lwz r9,nap_save_msscr0@l(r9)
mtspr   SPRN_MSSCR0, r9
sync
@@ -174,7 +174,7 @@ BEGIN_FTR_SECTION
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_NAP_DISABLE_L2_PR)
 BEGIN_FTR_SECTION
-   addis   r9,r11,(nap_save_hid1-KERNELBASE)@ha
+   addis   r9,r11,nap_save_hid1@ha
lwz r9,nap_save_hid1@l(r9)
mtspr   SPRN_HID1, r9
 END_FTR_SECTION_IFSET(CPU_FTR_DUAL_PLL_750FX)


Thanks
Christophe


Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-13 Thread Christophe Leroy




On 02/12/2020 11:02 PM, Larry Finger wrote:

On 2/11/20 1:23 PM, Christophe Leroy wrote:


Can you send me a picture of that BUG Unable to handle kernel data 
access with all the registers values etc..., together with the 
matching vmlinux ?


The vmlinux file was too big for your mailbox. You can download it from 
http://www.lwfinger.com/download/vmlinux.gz




Hi,

Is that the vmlinux that corresponds to:

BUG Unable to handle kernel data access at 0x007a84fc.
The faulting instruction address was 0x00013674

Nevertheless, do you have a picture of the said BUG/Oops to see all 
registers ?


Because here, the address 0x13674 is not a data access:

c00135ec :
c00135ec:   3c 60 00 00 lis r3,0
c00135f0:   3c 60 00 80 lis r3,128
c00135f4:   3c 80 c0 83 lis r4,-16253
c00135f8:   80 84 f2 80 lwz r4,-3456(r4)
c00135fc:   80 84 00 0c lwz r4,12(r4)
c0013600:   70 80 00 08 andi.   r0,r4,8
c0013604:   41 82 00 18 beq c001361c 
c0013608:   3c 80 c0 84 lis r4,-16252
c001360c:   80 84 30 34 lwz r4,12340(r4)
c0013610:   2c 04 00 00 cmpwi   r4,0
c0013614:   41 82 00 08 beq c001361c 
c0013618:   3c 60 00 40 lis r3,64
c001361c:   2c 03 00 00 cmpwi   r3,0
c0013620:   4d 82 00 20 beqlr
c0013624:   74 60 00 40 andis.  r0,r3,64
c0013628:   41 82 00 30 beq c0013658 
c001362c:   7c 96 fa a6 mfspr   r4,1014
c0013630:   54 84 00 3a rlwinm  r4,r4,0,0,29
c0013634:   7c 00 04 ac hwsync
c0013638:   7c 96 fb a6 mtspr   1014,r4
c001363c:   7c 00 04 ac hwsync
c0013640:   4c 00 01 2c isync
c0013644:   3c 80 c0 00 lis r4,-16384
c0013648:   7c 00 20 ac dcbf0,r4
c001364c:   7c 00 20 ac dcbf0,r4
c0013650:   7c 00 20 ac dcbf0,r4
c0013654:   7c 00 20 ac dcbf0,r4
c0013658:   3c 80 c0 7c lis r4,-16260
c001365c:   80 84 92 64 lwz r4,-28060(r4)
c0013660:   2c 04 00 00 cmpwi   r4,0
c0013664:   41 82 00 10 beq c0013674 
c0013668:   7c 91 fa a6 mfspr   r4,1009
c001366c:   64 84 00 01 orisr4,r4,1
c0013670:   7c 91 fb a6 mtspr   1009,r4
>> c0013674:  7c 90 fa a6 mfspr   r4,1008
c0013678:   3c a0 00 60 lis r5,96
c001367c:   64 a5 00 80 orisr5,r5,128
c0013680:   7c 84 28 78 andcr4,r4,r5
c0013684:   7c 84 1b 78 or  r4,r4,r3
c0013688:   64 84 00 10 orisr4,r4,16
c001368c:   7c 90 fb a6 mtspr   1008,r4
c0013690:   7e 00 06 6c dssall
c0013694:   7c 00 04 ac hwsync
c0013698:   81 02 00 04 lwz r8,4(r2)
c001369c:   61 08 00 01 ori r8,r8,1
c00136a0:   91 02 00 04 stw r8,4(r2)
c00136a4:   7c e0 00 a6 mfmsr   r7
c00136a8:   60 e7 80 00 ori r7,r7,32768
c00136ac:   64 e7 00 04 orisr7,r7,4
c00136b0:   7c 00 04 ac hwsync
c00136b4:   7c e0 01 24 mtmsr   r7
c00136b8:   4c 00 01 2c isync
c00136bc:   4b ff ff f4 b   c00136b0 

Thanks
Christophe


Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-11 Thread Christophe Leroy




Le 11/02/2020 à 17:06, Larry Finger a écrit :

On 2/11/20 12:55 AM, Christophe Leroy wrote:



Le 10/02/2020 à 13:55, Larry Finger a écrit :

On 2/9/20 12:19 PM, Christophe Leroy wrote:

Do you have CONFIG_TRACE_IRQFLAGS in your config ?
If so, can you try the patch below ?

https://patchwork.ozlabs.org/patch/1235081/

Otherwise, can you send me your .config and tell me exactly where it 
stops during the boot.


Christophe,

That patch did not work. My .config is attached.

It does boot if CONFIG_VMAP_STACK is not set.

The console display ends with the "DMA ranges" output. A screen shot 
is also appended.


Larry



Hi,

I tried your config under QEMU, it works.

In fact your console display is looping on itself, it ends at "printk: 
bootconsole [udbg0] disabled".


Looks like you get stuck at the time of switching to graphic mode. 
Need to understand why.


I'm not surprised that a real G4 differs from QEMU. For one thing, the 
real hardware uses i2c to connect to the graphics hardware.


I realized that the screen was not scrolling and output was missing. To 
see what was missed, I added a call to btext_clearscreen(). As you 
noted, it ends at the bootconsole disabled statement.


As I could not find any console output after that point, I then turned 
off the bootconsole disable. I realize this action may cause a different 
problem, but in this configuration, the computer hit a BUG Unable to 
handle kernel data access at 0x007a84fc. The faulting instruction 
address was 0x00013674. Those addresses look like physical, not virtual, 
addresses.




Can you send me a picture of that BUG Unable to handle kernel data 
access with all the registers values etc..., together with the matching 
vmlinux ?


First thing is to identify where we are when that happens. That mean see 
what is at 0xc0013674. Can be done with 'ppc-linux-objdump -d vmlinux' 
(Or whatever your PPC objdump is named) and get the function code.


Then we need to understand how we reach that function and why it tries 
to access a physical address.



Another thing I'm thinking about, not necessarily related to that 
problem: Some buggy drivers do DMA from stack. This doesn't work anymore 
with CONFIG_VMAP_STACK. Most of them can be detected with 
CONFIG_DEBUG_VIRTUAL so you should activate it.


Christophe


Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-11 Thread Larry Finger

On 2/11/20 12:55 AM, Christophe Leroy wrote:



Le 10/02/2020 à 13:55, Larry Finger a écrit :

On 2/9/20 12:19 PM, Christophe Leroy wrote:

Do you have CONFIG_TRACE_IRQFLAGS in your config ?
If so, can you try the patch below ?

https://patchwork.ozlabs.org/patch/1235081/

Otherwise, can you send me your .config and tell me exactly where it stops 
during the boot.


Christophe,

That patch did not work. My .config is attached.

It does boot if CONFIG_VMAP_STACK is not set.

The console display ends with the "DMA ranges" output. A screen shot is also 
appended.


Larry



Hi,

I tried your config under QEMU, it works.

In fact your console display is looping on itself, it ends at "printk: 
bootconsole [udbg0] disabled".


Looks like you get stuck at the time of switching to graphic mode. Need to 
understand why.


I'm not surprised that a real G4 differs from QEMU. For one thing, the real 
hardware uses i2c to connect to the graphics hardware.


I realized that the screen was not scrolling and output was missing. To see what 
was missed, I added a call to btext_clearscreen(). As you noted, it ends at the 
bootconsole disabled statement.


As I could not find any console output after that point, I then turned off the 
bootconsole disable. I realize this action may cause a different problem, but in 
this configuration, the computer hit a BUG Unable to handle kernel data access 
at 0x007a84fc. The faulting instruction address was 0x00013674. Those addresses 
look like physical, not virtual, addresses.


I then added pr_info statements to bracket the failure. In file 
drivers/video/fbdev/core/fb_ddc.c, the code reaches line 66, which is

algo_data->setsda(algo_data->data, 1);
Both pointers seem OK with algo_data = 0xeedfb4bc, and algo_data->data = 
0xeedb25c. The code faults before returning. I then annotated that callback 
routine radeon_gpio_setsda(), and found that execution is OK to the end of the 
routine, but the fault happens on the return from this routine as though the 
stack were corrupted.


I will be busy for about 8 hours, but if you can think of any debugging I can do 
on this routine, please let me know.


Thanks,

Larry


Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-10 Thread Christophe Leroy




Le 10/02/2020 à 13:55, Larry Finger a écrit :

On 2/9/20 12:19 PM, Christophe Leroy wrote:

Do you have CONFIG_TRACE_IRQFLAGS in your config ?
If so, can you try the patch below ?

https://patchwork.ozlabs.org/patch/1235081/

Otherwise, can you send me your .config and tell me exactly where it 
stops during the boot.


Christophe,

That patch did not work. My .config is attached.

It does boot if CONFIG_VMAP_STACK is not set.

The console display ends with the "DMA ranges" output. A screen shot is 
also appended.


Larry



Hi,

I tried your config under QEMU, it works.

In fact your console display is looping on itself, it ends at "printk: 
bootconsole [udbg0] disabled".


Looks like you get stuck at the time of switching to graphic mode. Need 
to understand why.


Christophe