Re: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

2020-11-02 Thread Krzysztof Kozlowski
On Mon, Nov 02, 2020 at 08:38:14AM +0100, Marek Szyprowski wrote:
> Hi Krzysztof,
> 
> On 31.10.2020 20:01, Krzysztof Kozlowski wrote:
> > I hit quite rare issue with pl330 DMA driver, difficult to reproduce
> > (actually failed to do so):
> >
> > Happened during early reboot
> >
> > [  OK  ] Stopped target Graphical Interface.
> > [  OK  ] Stopped target Multi-User System.
> > [  OK  ] Stopped target RPC Port Mapper.
> >   Stopping OpenSSH Daemonti[   75.447904] 8<--- cut here ---
> > [   75.449506] Unable to handle kernel NULL pointer dereference at virtual 
> > address 000c
> > ...
> > [   75.690850] [] (pl330_tasklet) from [] 
> > (tasklet_action_common+0x88/0x1f4)
> > [   75.699340] [] (tasklet_action_common) from [] 
> > (__do_softirq+0x108/0x428)
> > [   75.707850] [] (__do_softirq) from [] 
> > (run_ksoftirqd+0x2c/0x4c)
> > [   75.715486] [] (run_ksoftirqd) from [] 
> > (smpboot_thread_fn+0x13c/0x24c)
> > [   75.723693] [] (smpboot_thread_fn) from [] 
> > (kthread+0x13c/0x16c)
> > [   75.731390] [] (kthread) from [] 
> > (ret_from_fork+0x14/0x2c)
> >
> > Full log:
> > https://protect2.fireeye.com/v1/url?k=7445a1ab-2bde98a7-74442ae4-000babff3563-a368d542db0c5500=1=62e4887b-e224-48e5-80a2-71163caeeec8=https%3A%2F%2Fkrzk.eu%2F%23%2Fbuilders%2F20%2Fbuilds%2F954%2Fsteps%2F22%2Flogs%2Fserial0
> >
> > 1. Arch ARM Linux
> > 2. multi_v7_defconfig
> > 3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
> > 4. systemd, boot up with static IP set in kernel command line
> > 5. No swap
> > 6. Kernel, DTB and initramfs are downloaded with TFTP
> > 7. NFS root (NFS client) mounted from a NFSv4 server
> >
> > Since I was not able to reproduce it, obviously I did not run bisect. If
> > anyone has ideas, please share.
> 
> Well, I've also observed it a few times. IMHO it is related to the 
> broken UART (in DMA mode) shutdown procedure. Usually it can be easily 
> observed by flushing some random parts of the previously transmitted 
> data to the UART console during the system shutdown. This also depends 
> on the board and used system (especially the presence of systemd, which 
> plays with UART differently than the old sysv init). IMHO there is a 
> kind of use-after-free issue there, so the above pl330 stacktrace can be 
> also observed depending on the timing and system load. This issue is 
> there from the beginning of the DMA support. I have it on my todo list, 
> but it had too low priority to take a look into it. I only briefly 
> checked the related code a few years ago and noticed that the UART 
> shutdown is not really synchronized with DMA. However that time I didn't 
> find any simple fix, so I gave up.

Thanks for the explanation.

Best regards,
Krzysztof



Re: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

2020-11-01 Thread Marek Szyprowski
Hi Krzysztof,

On 31.10.2020 20:01, Krzysztof Kozlowski wrote:
> I hit quite rare issue with pl330 DMA driver, difficult to reproduce
> (actually failed to do so):
>
> Happened during early reboot
>
> [  OK  ] Stopped target Graphical Interface.
> [  OK  ] Stopped target Multi-User System.
> [  OK  ] Stopped target RPC Port Mapper.
>   Stopping OpenSSH Daemonti[   75.447904] 8<--- cut here ---
> [   75.449506] Unable to handle kernel NULL pointer dereference at virtual 
> address 000c
> ...
> [   75.690850] [] (pl330_tasklet) from [] 
> (tasklet_action_common+0x88/0x1f4)
> [   75.699340] [] (tasklet_action_common) from [] 
> (__do_softirq+0x108/0x428)
> [   75.707850] [] (__do_softirq) from [] 
> (run_ksoftirqd+0x2c/0x4c)
> [   75.715486] [] (run_ksoftirqd) from [] 
> (smpboot_thread_fn+0x13c/0x24c)
> [   75.723693] [] (smpboot_thread_fn) from [] 
> (kthread+0x13c/0x16c)
> [   75.731390] [] (kthread) from [] 
> (ret_from_fork+0x14/0x2c)
>
> Full log:
> https://protect2.fireeye.com/v1/url?k=7445a1ab-2bde98a7-74442ae4-000babff3563-a368d542db0c5500=1=62e4887b-e224-48e5-80a2-71163caeeec8=https%3A%2F%2Fkrzk.eu%2F%23%2Fbuilders%2F20%2Fbuilds%2F954%2Fsteps%2F22%2Flogs%2Fserial0
>
> 1. Arch ARM Linux
> 2. multi_v7_defconfig
> 3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
> 4. systemd, boot up with static IP set in kernel command line
> 5. No swap
> 6. Kernel, DTB and initramfs are downloaded with TFTP
> 7. NFS root (NFS client) mounted from a NFSv4 server
>
> Since I was not able to reproduce it, obviously I did not run bisect. If
> anyone has ideas, please share.

Well, I've also observed it a few times. IMHO it is related to the 
broken UART (in DMA mode) shutdown procedure. Usually it can be easily 
observed by flushing some random parts of the previously transmitted 
data to the UART console during the system shutdown. This also depends 
on the board and used system (especially the presence of systemd, which 
plays with UART differently than the old sysv init). IMHO there is a 
kind of use-after-free issue there, so the above pl330 stacktrace can be 
also observed depending on the timing and system load. This issue is 
there from the beginning of the DMA support. I have it on my todo list, 
but it had too low priority to take a look into it. I only briefly 
checked the related code a few years ago and noticed that the UART 
shutdown is not really synchronized with DMA. However that time I didn't 
find any simple fix, so I gave up.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R Institute Poland



dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

2020-10-31 Thread Krzysztof Kozlowski
Hi all,

I hit quite rare issue with pl330 DMA driver, difficult to reproduce
(actually failed to do so):

Happened during early reboot

[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Multi-User System.
[  OK  ] Stopped target RPC Port Mapper.
 Stopping OpenSSH Daemonti[   75.447904] 8<--- cut here ---
[   75.449506] Unable to handle kernel NULL pointer dereference at virtual 
address 000c
...
[   75.690850] [] (pl330_tasklet) from [] 
(tasklet_action_common+0x88/0x1f4)
[   75.699340] [] (tasklet_action_common) from [] 
(__do_softirq+0x108/0x428)
[   75.707850] [] (__do_softirq) from [] 
(run_ksoftirqd+0x2c/0x4c)
[   75.715486] [] (run_ksoftirqd) from [] 
(smpboot_thread_fn+0x13c/0x24c)
[   75.723693] [] (smpboot_thread_fn) from [] 
(kthread+0x13c/0x16c)
[   75.731390] [] (kthread) from [] 
(ret_from_fork+0x14/0x2c)

Full log:
https://krzk.eu/#/builders/20/builds/954/steps/22/logs/serial0

1. Arch ARM Linux
2. multi_v7_defconfig
3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
4. systemd, boot up with static IP set in kernel command line
5. No swap
6. Kernel, DTB and initramfs are downloaded with TFTP
7. NFS root (NFS client) mounted from a NFSv4 server

Since I was not able to reproduce it, obviously I did not run bisect. If
anyone has ideas, please share.

Best regards,
Krzysztof