Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-11 Thread Mark Brown
On Wed, Jan 11, 2023 at 12:29:04PM +, Mark Brown wrote:

> We're seeing issues in all configs on meson-gxl-s905x-libretech-cc
> today, not just with the kselftest fragment.  The initial failuire seems
> to be:

> [   17.337253] WARNING: CPU: 3 PID: 123 at drivers/gpu/drm/drm_bridge.c:1257 
> drm_bridge_hpd_enable+0x8c/0x94 [drm]

> full log at:

>
> https://storage.kernelci.org/next/master/next-20230111/arm64/defconfig/gcc-10/lab-broonie/baseline-meson-gxl-s905x-libretech-cc.txt

> and links to other logs at:

>   
> https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230111/plan/baseline/

> Today's -next does have that fix in it so it's not fixing whatever the
> original issue was, I suspect it might even be exposing other issues.
> We are however still seeing the stack filling up, even with a GCC 10
> defconfig build.

A bisect landed on 0e4dcffd331fa7d ("drm/panel: raspberrypi-touchscreen:
Convert to i2c's .probe_new()") which is obviously not credible.  I
suspect that what's happening here is that the fix you applied is making
an issue somewhere else visible in defconfig and is as a result
confusing the bisect.  Ard mentioned an issue with non-EFI biits
introduced by EFI changes here:

https://lore.kernel.org/linux-arm-kernel/CAMj1kXGFa=zriyp_ms7bbqr0wiwikt0objokusngpjtfvlm...@mail.gmail.com/

which seems like a plausible culprit,

bisect log:

git bisect start
# bad: [c9e9cdd8bdcc3e1ea330d49ea587ec71884dd0f5] Add linux-next specific files 
for 20230111
git bisect bad c9e9cdd8bdcc3e1ea330d49ea587ec71884dd0f5
# good: [7dd4b804e08041ff56c88bdd8da742d14b17ed25] Merge tag 'nfsd-6.2-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect good 7dd4b804e08041ff56c88bdd8da742d14b17ed25
# good: [ecf8827ab7dd5731813f90146d9936165b170f32] Merge branch 'drm-next' of 
git://git.freedesktop.org/git/drm/drm.git
git bisect good ecf8827ab7dd5731813f90146d9936165b170f32
# bad: [64208e4940ede76709f1ff5b01d1b78efc2951cf] Merge branch 'rcu/next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git bisect bad 64208e4940ede76709f1ff5b01d1b78efc2951cf
# bad: [1077dd31ba60b39a231560beec24b97eadf8bd8f] Merge branch 'for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
git bisect bad 1077dd31ba60b39a231560beec24b97eadf8bd8f
# bad: [1577a2c2aad943fbc6a5e959ae83c4ef8bc3d4de] Merge branch 'drm-next' of 
https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 1577a2c2aad943fbc6a5e959ae83c4ef8bc3d4de
# good: [ec787deb2ddffc6cd6afe0e2fbbbd490ddc383ed] drm/amd: Use 
`amdgpu_ucode_*` helpers for GFX9
git bisect good ec787deb2ddffc6cd6afe0e2fbbbd490ddc383ed
# bad: [0e4dcffd331fa7d2a6ae628b51a7f418dfa90367] drm/panel: 
raspberrypi-touchscreen: Convert to i2c's .probe_new()
git bisect bad 0e4dcffd331fa7d2a6ae628b51a7f418dfa90367
# good: [c702545e19ebb6113d607f2a30ba2ee6cf881a3a] drm/gud: use new debugfs 
device-centered functions
git bisect good c702545e19ebb6113d607f2a30ba2ee6cf881a3a
# good: [977374cf481d3bea916b2775e6ecc682b9689550] drm/vc4: plane: Add 3:3:2 
and 4:4:4:4 RGB/RGBX/RGBA formats
git bisect good 977374cf481d3bea916b2775e6ecc682b9689550
# good: [67d0a30128c9f644595dfe67ac0fb941a716a6c9] drm/meson: dw-hdmi: Fix 
devm_regulator_*get_enable*() conversion
git bisect good 67d0a30128c9f644595dfe67ac0fb941a716a6c9
# good: [29ef7605e2fd44038a70df0f46b7821464081b22] drm/i2c/sil164: Convert to 
i2c's .probe_new()
git bisect good 29ef7605e2fd44038a70df0f46b7821464081b22
# good: [307259952625798fbea89b04aebbc5106ff18c68] drm/i2c/tda998x: Convert to 
i2c's .probe_new()
git bisect good 307259952625798fbea89b04aebbc5106ff18c68
# good: [446757576a646eba6fae085396bdfbd74245ff28] drm/panel: 
olimex-lcd-olinuxino: Convert to i2c's .probe_new()
git bisect good 446757576a646eba6fae085396bdfbd74245ff28
# first bad commit: [0e4dcffd331fa7d2a6ae628b51a7f418dfa90367] drm/panel: 
raspberrypi-touchscreen: Convert to i2c's .probe_new()


signature.asc
Description: PGP signature


Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-11 Thread Mark Brown
On Wed, Jan 11, 2023 at 11:34:41AM +0100, Neil Armstrong wrote:

> I merged a fix that could be related: 
> https://lore.kernel.org/all/20230109220033.31202-1-m.szyprow...@samsung.com/

> This could make the driver to return from probe while not totally probed, and 
> explain such error.

We're seeing issues in all configs on meson-gxl-s905x-libretech-cc
today, not just with the kselftest fragment.  The initial failuire seems
to be:

[   17.337253] WARNING: CPU: 3 PID: 123 at drivers/gpu/drm/drm_bridge.c:1257 
drm_bridge_hpd_enable+0x8c/0x94 [drm]

full log at:

   
https://storage.kernelci.org/next/master/next-20230111/arm64/defconfig/gcc-10/lab-broonie/baseline-meson-gxl-s905x-libretech-cc.txt

and links to other logs at:

  
https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230111/plan/baseline/

Today's -next does have that fix in it so it's not fixing whatever the
original issue was, I suspect it might even be exposing other issues.
We are however still seeing the stack filling up, even with a GCC 10
defconfig build.


signature.asc
Description: PGP signature


Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-11 Thread Neil Armstrong

Hi,

On 10/01/2023 17:41, Arnd Bergmann wrote:

On Tue, Jan 10, 2023, at 17:14, Naresh Kamboju wrote:

[ please ignore this email if this regression already reported ]

Today's Linux next tag next-20230110 boot passes with defconfig but
boot fails with
defconfig + kselftest merge config on arm64 devices and qemu-arm64.

Reported-by: Linux Kernel Functional Testing 

We are bisecting this problem and get back to you shortly.

GOOD: next-20230109  (defconfig + kselftests configs)
BAD: next-20230110 (defconfig + kselftests configs)

kernel crash log [1]:

[   15.302140] Unable to handle kernel paging request at virtual
address fff8
[   15.309906] Mem abort info:
[   15.312659]   ESR = 0x9604
[   15.316365]   EC = 0x25: DABT (current EL), IL = 32 bits
[   15.321626]   SET = 0, FnV = 0
[   15.324644]   EA = 0, S1PTW = 0
[   15.327744]   FSC = 0x04: level 0 translation fault
[   15.332619] Data abort info:
[   15.335422]   ISV = 0, ISS = 0x0004
[   15.339226]   CM = 0, WnR = 0
[   15.342154] swapper pgtable: 4k pages, 48-bit VAs, pgdp=1496c000
[   15.348795] [fff8] pgd=, p4d=
[   15.355524] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   15.361729] Modules linked in: meson_gxl dwmac_generic
snd_soc_meson_gx_sound_card snd_soc_meson_card_utils lima gpu_sched
drm_shmem_helper meson_drm drm_dma_helper crct10dif_ce meson_ir
rc_core meson_dw_hdmi dw_hdmi meson_canvas dwmac_meson8b
stmmac_platform meson_rng stmmac rng_core cec meson_gxbb_wdt
drm_display_helper snd_soc_meson_aiu snd_soc_meson_codec_glue pcs_xpcs
snd_soc_meson_t9015 amlogic_gxl_crypto crypto_engine display_connector
snd_soc_simple_amplifier drm_kms_helper drm nvmem_meson_efuse
[   15.405976] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
6.2.0-rc3-next-20230110 #1
[   15.413563] Hardware name: Libre Computer AML-S905X-CC (DT)
[   15.419086] Workqueue: events_unbound deferred_probe_work_func
[   15.424863] pstate: 0005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   15.431762] pc : of_drm_find_bridge+0x38/0x70 [drm]
[   15.436594] lr : of_drm_find_bridge+0x20/0x70 [drm]


The line is

drivers/gpu/drm/drm_bridge.c:1310:  if (bridge->of_node == np) {

The list_head here is a NULL pointer, so ->of_node points
to address negative 8, i.e. fff8

This is linked list corruption, which typically happens as
part of a use-after-free, and could be the result of a
failed registration causing an object to be freed after
it is added to the list.

Unfortunately, there are no patches to this file between
next-20230109 and next-20230110, so the bug probably is
not actually in this file.


[   15.515426] Call trace:
[   15.517863] Insufficient stack space to handle exception!
[   15.517867] ESR: 0x9647 -- DABT (current EL)
[   15.517871] FAR: 0x8a047ff0
[   15.517873] Task stack: [0x8a048000..0x8a04c000]
[   15.517877] IRQ stack:  [0x88008000..0x8800c000]
[   15.517880] Overflow stack: [0x7d9c1320..0x7d9c2320]
[   15.517884] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
6.2.0-rc3-next-20230110 #1
[   15.517890] Hardware name: Libre Computer AML-S905X-CC (DT)
[   15.517895] Workqueue: events_unbound deferred_probe_work_func
[   15.517915] pstate: 83c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   15.517923] pc : el1_abort+0x4/0x5c
[   15.517932] lr : el1h_64_sync_handler+0x60/0xac
[   15.517939] sp : 8a048020


Not sure about the missing stack trace: I can see that the stack
pointer is on a task stack, which is reported as having overflown,
but I don't see why it's unable to print the stack while running
from the overflow stack.

A stack overflow is often caused by unbounded recursion, which
can happen when a device driver binds itself to a device that it
has just created. The log does look a bit suspicious here,
with multiple registrations for c883a000.hdmi-tx:

   986 08:02:56.487871  [   15.141218] meson-drm d010.vpu: Queued 2 outputs 
on vpu
   987 08:02:56.493572  [   15.141615] meson8b-dwmac c941.ethernet: Ring 
mode enabled
   988 08:02:56.504769  [   15.150744] meson-drm d010.vpu: bound 
c883a000.hdmi-tx (ops meson_dw_hdmi_ops [meson_dw_hdmi])
   989 08:02:56.515743  [   15.154970] meson8b-dwmac c941.ethernet: Enable 
RX Mitigation via HW Watchdog Timer
   990 08:02:56.521531  [   15.159175] lima d00c.gpu: pp0 - mali450 version 
major 0 minor 0
   991 08:02:56.526718  [   15.161436] meson-drm d010.vpu: Failed to find 
HDMI transceiver bridge
   992 08:02:56.532417  [   15.168933] lima d00c.gpu: pp1 - mali450 version 
major 0 minor 0
   993 08:02:56.537747  [   15.206102] meson-drm d010.vpu: Queued 2 outputs 
on vpu
   994 08:02:56.543435  [   15.209608] lima d00c.gpu: pp2 - mali450 version 
major 0 minor 0
   995 08:02:56.554307  [   15.217027] meson-drm d010.vpu: bound 
c883a000.hdmi-tx (ops meson_dw_hdmi_ops 

Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-10 Thread Mark Brown
On Tue, Jan 10, 2023 at 04:32:59PM +, Will Deacon wrote:
> On Tue, Jan 10, 2023 at 09:44:40PM +0530, Naresh Kamboju wrote:

> > GOOD: next-20230109  (defconfig + kselftests configs)
> > BAD: next-20230110 (defconfig + kselftests configs)

> I couldn't find a kselftests .config in the tree (assumedly I'm just ont
> looking hard enough), but does that happen to enable CONFIG_STACK_TRACER=y?

It's adding on all the config fragments in

   tools/testing/selftests/*/config

which includes ftrace, which does set STACK_TRACER>

> If so, since you're using clang, I wonder if this is an issue with
> 68a63a412d18 ("arm64: Fix build with CC=clang, CONFIG_FTRACE=y and
> CONFIG_STACK_TRACER=y")?

ftrace also enables FTRACE.

> Please let us know how the bisection goes...

Not sure that Naresh has a bisection going, I don't think he's got
direct access to such a board.


signature.asc
Description: PGP signature


Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-10 Thread Arnd Bergmann
On Tue, Jan 10, 2023, at 17:14, Naresh Kamboju wrote:
> [ please ignore this email if this regression already reported ]
>
> Today's Linux next tag next-20230110 boot passes with defconfig but
> boot fails with
> defconfig + kselftest merge config on arm64 devices and qemu-arm64.
>
> Reported-by: Linux Kernel Functional Testing 
>
> We are bisecting this problem and get back to you shortly.
>
> GOOD: next-20230109  (defconfig + kselftests configs)
> BAD: next-20230110 (defconfig + kselftests configs)
>
> kernel crash log [1]:
>
> [   15.302140] Unable to handle kernel paging request at virtual
> address fff8
> [   15.309906] Mem abort info:
> [   15.312659]   ESR = 0x9604
> [   15.316365]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   15.321626]   SET = 0, FnV = 0
> [   15.324644]   EA = 0, S1PTW = 0
> [   15.327744]   FSC = 0x04: level 0 translation fault
> [   15.332619] Data abort info:
> [   15.335422]   ISV = 0, ISS = 0x0004
> [   15.339226]   CM = 0, WnR = 0
> [   15.342154] swapper pgtable: 4k pages, 48-bit VAs, pgdp=1496c000
> [   15.348795] [fff8] pgd=, p4d=
> [   15.355524] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   15.361729] Modules linked in: meson_gxl dwmac_generic
> snd_soc_meson_gx_sound_card snd_soc_meson_card_utils lima gpu_sched
> drm_shmem_helper meson_drm drm_dma_helper crct10dif_ce meson_ir
> rc_core meson_dw_hdmi dw_hdmi meson_canvas dwmac_meson8b
> stmmac_platform meson_rng stmmac rng_core cec meson_gxbb_wdt
> drm_display_helper snd_soc_meson_aiu snd_soc_meson_codec_glue pcs_xpcs
> snd_soc_meson_t9015 amlogic_gxl_crypto crypto_engine display_connector
> snd_soc_simple_amplifier drm_kms_helper drm nvmem_meson_efuse
> [   15.405976] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
> 6.2.0-rc3-next-20230110 #1
> [   15.413563] Hardware name: Libre Computer AML-S905X-CC (DT)
> [   15.419086] Workqueue: events_unbound deferred_probe_work_func
> [   15.424863] pstate: 0005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   15.431762] pc : of_drm_find_bridge+0x38/0x70 [drm]
> [   15.436594] lr : of_drm_find_bridge+0x20/0x70 [drm]

The line is 

drivers/gpu/drm/drm_bridge.c:1310:  if (bridge->of_node == np) {

The list_head here is a NULL pointer, so ->of_node points
to address negative 8, i.e. fff8

This is linked list corruption, which typically happens as
part of a use-after-free, and could be the result of a
failed registration causing an object to be freed after
it is added to the list.

Unfortunately, there are no patches to this file between
next-20230109 and next-20230110, so the bug probably is
not actually in this file.

> [   15.515426] Call trace:
> [   15.517863] Insufficient stack space to handle exception!
> [   15.517867] ESR: 0x9647 -- DABT (current EL)
> [   15.517871] FAR: 0x8a047ff0
> [   15.517873] Task stack: [0x8a048000..0x8a04c000]
> [   15.517877] IRQ stack:  [0x88008000..0x8800c000]
> [   15.517880] Overflow stack: [0x7d9c1320..0x7d9c2320]
> [   15.517884] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
> 6.2.0-rc3-next-20230110 #1
> [   15.517890] Hardware name: Libre Computer AML-S905X-CC (DT)
> [   15.517895] Workqueue: events_unbound deferred_probe_work_func
> [   15.517915] pstate: 83c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   15.517923] pc : el1_abort+0x4/0x5c
> [   15.517932] lr : el1h_64_sync_handler+0x60/0xac
> [   15.517939] sp : 8a048020

Not sure about the missing stack trace: I can see that the stack
pointer is on a task stack, which is reported as having overflown,
but I don't see why it's unable to print the stack while running
from the overflow stack.

A stack overflow is often caused by unbounded recursion, which
can happen when a device driver binds itself to a device that it
has just created. The log does look a bit suspicious here,
with multiple registrations for c883a000.hdmi-tx:

  986 08:02:56.487871  [   15.141218] meson-drm d010.vpu: Queued 2 outputs 
on vpu
  987 08:02:56.493572  [   15.141615] meson8b-dwmac c941.ethernet: Ring 
mode enabled
  988 08:02:56.504769  [   15.150744] meson-drm d010.vpu: bound 
c883a000.hdmi-tx (ops meson_dw_hdmi_ops [meson_dw_hdmi])
  989 08:02:56.515743  [   15.154970] meson8b-dwmac c941.ethernet: Enable 
RX Mitigation via HW Watchdog Timer
  990 08:02:56.521531  [   15.159175] lima d00c.gpu: pp0 - mali450 version 
major 0 minor 0
  991 08:02:56.526718  [   15.161436] meson-drm d010.vpu: Failed to find 
HDMI transceiver bridge
  992 08:02:56.532417  [   15.168933] lima d00c.gpu: pp1 - mali450 version 
major 0 minor 0
  993 08:02:56.537747  [   15.206102] meson-drm d010.vpu: Queued 2 outputs 
on vpu
  994 08:02:56.543435  [   15.209608] lima d00c.gpu: pp2 - mali450 version 
major 0 minor 0
  995 08:02:56.554307  [   15.217027] meson-drm d010.vpu: bound 

Re: next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-10 Thread Will Deacon
[+ James and Nathan]

On Tue, Jan 10, 2023 at 09:44:40PM +0530, Naresh Kamboju wrote:
> [ please ignore this email if this regression already reported ]
> 
> Today's Linux next tag next-20230110 boot passes with defconfig but
> boot fails with
> defconfig + kselftest merge config on arm64 devices and qemu-arm64.
> 
> Reported-by: Linux Kernel Functional Testing 
> 
> We are bisecting this problem and get back to you shortly.
> 
> GOOD: next-20230109  (defconfig + kselftests configs)
> BAD: next-20230110 (defconfig + kselftests configs)

I couldn't find a kselftests .config in the tree (assumedly I'm just ont
looking hard enough), but does that happen to enable CONFIG_STACK_TRACER=y?

If so, since you're using clang, I wonder if this is an issue with
68a63a412d18 ("arm64: Fix build with CC=clang, CONFIG_FTRACE=y and
CONFIG_STACK_TRACER=y")?

Please let us know how the bisection goes...

Will

> kernel crash log [1]:
> 
> [   15.302140] Unable to handle kernel paging request at virtual
> address fff8
> [   15.309906] Mem abort info:
> [   15.312659]   ESR = 0x9604
> [   15.316365]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   15.321626]   SET = 0, FnV = 0
> [   15.324644]   EA = 0, S1PTW = 0
> [   15.327744]   FSC = 0x04: level 0 translation fault
> [   15.332619] Data abort info:
> [   15.335422]   ISV = 0, ISS = 0x0004
> [   15.339226]   CM = 0, WnR = 0
> [   15.342154] swapper pgtable: 4k pages, 48-bit VAs, pgdp=1496c000
> [   15.348795] [fff8] pgd=, p4d=
> [   15.355524] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   15.361729] Modules linked in: meson_gxl dwmac_generic
> snd_soc_meson_gx_sound_card snd_soc_meson_card_utils lima gpu_sched
> drm_shmem_helper meson_drm drm_dma_helper crct10dif_ce meson_ir
> rc_core meson_dw_hdmi dw_hdmi meson_canvas dwmac_meson8b
> stmmac_platform meson_rng stmmac rng_core cec meson_gxbb_wdt
> drm_display_helper snd_soc_meson_aiu snd_soc_meson_codec_glue pcs_xpcs
> snd_soc_meson_t9015 amlogic_gxl_crypto crypto_engine display_connector
> snd_soc_simple_amplifier drm_kms_helper drm nvmem_meson_efuse
> [   15.405976] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
> 6.2.0-rc3-next-20230110 #1
> [   15.413563] Hardware name: Libre Computer AML-S905X-CC (DT)
> [   15.419086] Workqueue: events_unbound deferred_probe_work_func
> [   15.424863] pstate: 0005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   15.431762] pc : of_drm_find_bridge+0x38/0x70 [drm]
> [   15.436594] lr : of_drm_find_bridge+0x20/0x70 [drm]
> [   15.441423] sp : 8a04b9b0
> [   15.444700] x29: 8a04b9b0 x28: 08de5810 x27: 
> 08de5808
> [   15.451772] x26: 08de5800 x25: 084cb8b0 x24: 
> 01223c00
> [   15.458844] x23:  x22: 0001 x21: 
> 7fa61a28
> [   15.465917] x20: 084ca080 x19: 7fa61a28 x18: 
> 019bd700
> [   15.472989] x17: 6d64685f77645f6e x16:  x15: 
> 0004
> [   15.480062] x14: 89bab410 x13:  x12: 
> 0003
> [   15.487135] x11:  x10:  x9 : 
> 
> [   15.494207] x8 : 810a70a0 x7 : 64410079616b6f01 x6 : 
> 80416403
> [   15.501279] x5 : 03644100 x4 : 0080 x3 : 
> 00416400
> [   15.508352] x2 : 01128000 x1 :  x0 : 
> 
> [   15.515426] Call trace:
> [   15.517863] Insufficient stack space to handle exception!
> [   15.517867] ESR: 0x9647 -- DABT (current EL)
> [   15.517871] FAR: 0x8a047ff0
> [   15.517873] Task stack: [0x8a048000..0x8a04c000]
> [   15.517877] IRQ stack:  [0x88008000..0x8800c000]
> [   15.517880] Overflow stack: [0x7d9c1320..0x7d9c2320]
> [   15.517884] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
> 6.2.0-rc3-next-20230110 #1
> [   15.517890] Hardware name: Libre Computer AML-S905X-CC (DT)
> [   15.517895] Workqueue: events_unbound deferred_probe_work_func
> [   15.517915] pstate: 83c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   15.517923] pc : el1_abort+0x4/0x5c
> [   15.517932] lr : el1h_64_sync_handler+0x60/0xac
> [   15.517939] sp : 8a048020
> [   15.517941] x29: 8a048020 x28: 01128000 x27: 
> 08de5808
> [   15.517950] x26: 08de5800 x25: 8a04b608 x24: 
> 01128000
> [   15.517957] x23: a0c5 x22: 880321dc x21: 
> 8a048180
> [   15.517965] x20: 898e1000 x19: 8a048290 x18: 
> 019bd700
> [   15.517972] x17: 0011 x16:  x15: 
> 0004
> [   15.517979] x14: 89bab410 x13:  x12: 
> 
> [   15.517986] x11: 0030 x10: 89013a1c x9 : 
> 890401a0
> [   15.517994] x8 : 0025 x7 : 

next-20230110: arm64: defconfig+kselftest config boot failed - Unable to handle kernel paging request at virtual address fffffffffffffff8

2023-01-10 Thread Naresh Kamboju
[ please ignore this email if this regression already reported ]

Today's Linux next tag next-20230110 boot passes with defconfig but
boot fails with
defconfig + kselftest merge config on arm64 devices and qemu-arm64.

Reported-by: Linux Kernel Functional Testing 

We are bisecting this problem and get back to you shortly.

GOOD: next-20230109  (defconfig + kselftests configs)
BAD: next-20230110 (defconfig + kselftests configs)

kernel crash log [1]:

[   15.302140] Unable to handle kernel paging request at virtual
address fff8
[   15.309906] Mem abort info:
[   15.312659]   ESR = 0x9604
[   15.316365]   EC = 0x25: DABT (current EL), IL = 32 bits
[   15.321626]   SET = 0, FnV = 0
[   15.324644]   EA = 0, S1PTW = 0
[   15.327744]   FSC = 0x04: level 0 translation fault
[   15.332619] Data abort info:
[   15.335422]   ISV = 0, ISS = 0x0004
[   15.339226]   CM = 0, WnR = 0
[   15.342154] swapper pgtable: 4k pages, 48-bit VAs, pgdp=1496c000
[   15.348795] [fff8] pgd=, p4d=
[   15.355524] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   15.361729] Modules linked in: meson_gxl dwmac_generic
snd_soc_meson_gx_sound_card snd_soc_meson_card_utils lima gpu_sched
drm_shmem_helper meson_drm drm_dma_helper crct10dif_ce meson_ir
rc_core meson_dw_hdmi dw_hdmi meson_canvas dwmac_meson8b
stmmac_platform meson_rng stmmac rng_core cec meson_gxbb_wdt
drm_display_helper snd_soc_meson_aiu snd_soc_meson_codec_glue pcs_xpcs
snd_soc_meson_t9015 amlogic_gxl_crypto crypto_engine display_connector
snd_soc_simple_amplifier drm_kms_helper drm nvmem_meson_efuse
[   15.405976] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
6.2.0-rc3-next-20230110 #1
[   15.413563] Hardware name: Libre Computer AML-S905X-CC (DT)
[   15.419086] Workqueue: events_unbound deferred_probe_work_func
[   15.424863] pstate: 0005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   15.431762] pc : of_drm_find_bridge+0x38/0x70 [drm]
[   15.436594] lr : of_drm_find_bridge+0x20/0x70 [drm]
[   15.441423] sp : 8a04b9b0
[   15.444700] x29: 8a04b9b0 x28: 08de5810 x27: 08de5808
[   15.451772] x26: 08de5800 x25: 084cb8b0 x24: 01223c00
[   15.458844] x23:  x22: 0001 x21: 7fa61a28
[   15.465917] x20: 084ca080 x19: 7fa61a28 x18: 019bd700
[   15.472989] x17: 6d64685f77645f6e x16:  x15: 0004
[   15.480062] x14: 89bab410 x13:  x12: 0003
[   15.487135] x11:  x10:  x9 : 
[   15.494207] x8 : 810a70a0 x7 : 64410079616b6f01 x6 : 80416403
[   15.501279] x5 : 03644100 x4 : 0080 x3 : 00416400
[   15.508352] x2 : 01128000 x1 :  x0 : 
[   15.515426] Call trace:
[   15.517863] Insufficient stack space to handle exception!
[   15.517867] ESR: 0x9647 -- DABT (current EL)
[   15.517871] FAR: 0x8a047ff0
[   15.517873] Task stack: [0x8a048000..0x8a04c000]
[   15.517877] IRQ stack:  [0x88008000..0x8800c000]
[   15.517880] Overflow stack: [0x7d9c1320..0x7d9c2320]
[   15.517884] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted
6.2.0-rc3-next-20230110 #1
[   15.517890] Hardware name: Libre Computer AML-S905X-CC (DT)
[   15.517895] Workqueue: events_unbound deferred_probe_work_func
[   15.517915] pstate: 83c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   15.517923] pc : el1_abort+0x4/0x5c
[   15.517932] lr : el1h_64_sync_handler+0x60/0xac
[   15.517939] sp : 8a048020
[   15.517941] x29: 8a048020 x28: 01128000 x27: 08de5808
[   15.517950] x26: 08de5800 x25: 8a04b608 x24: 01128000
[   15.517957] x23: a0c5 x22: 880321dc x21: 8a048180
[   15.517965] x20: 898e1000 x19: 8a048290 x18: 019bd700
[   15.517972] x17: 0011 x16:  x15: 0004
[   15.517979] x14: 89bab410 x13:  x12: 
[   15.517986] x11: 0030 x10: 89013a1c x9 : 890401a0
[   15.517994] x8 : 0025 x7 : 205d363234353135 x6 : 352e35312020205b
[   15.518001] x5 : 89f766b7 x4 : 88fe695c x3 : 000c
[   15.518008] x2 : 9604 x1 : 9604 x0 : 8a048030
[   15.518017] Kernel panic - not syncing: kernel stack overflow
[   15.518020] SMP: stopping secondary CPUs
[   15.518027] Kernel Offset: disabled
[   15.518029] CPU features: 0x0,01000100,421b
[   15.518034] Memory Limit: none
[   15.679388] ---[ end Kernel panic - not syncing: kernel stack overflow ]---


[1]
https://storage.kernelci.org/next/master/next-20230110/arm64/defconfig/clang-16/lab-broonie/kselftest-arm64-meson-gxl-s905x-libretech-cc.html