Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()
Hi Ji-Ze, On Wed, Jun 3, 2020 at 9:35 AM Ji-Ze Hong (Peter Hong) wrote: > Geert Uytterhoeven 於 2020/6/3 下午 03:13 寫道: > > If devres_head is not empty, you have a serious problem on your system, > > as those resources may be in an unknown state (e.g. freed but still in > > use). While I had missed the probe_count imbalance when implementing > > the original change, it may actually be safer to not decrease > > probe_count, to prevent further probes from happening. But I guess it > > doesn't matter: if you get here, your system is in a bad state anyway. > > We want to fix the shutdown/reboot freeze issue and bisect to this > patch and found if the probe_count != 0, the PC will stuck with > wait_for_device_probe() with shutdown/reboot forever. So we just > change the increment after return -EBUSY. IC. And before my change, you got a big fat warning backtrace, telling you something is seriously wrong? ;-) > In this case, it maybe 8250_PNP & serial 8250 platform driver resources > conflict. I'll try to dump more message to debug. OK. > IMO, the shutdown/reboot operation should not block. Well, it depends. If there's an issue with resources, the system may crash, too. > >> with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot > >> dmesg below: > >> > >> Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled > >> 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a > >> 16550A > >> 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a > >> 16550A > >> 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a > >> 16550A > >> serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a > >> 16550A > >> > >> Reboot/Shutdown will freeze in wait_for_device_probe(), message as > >> following: > >> INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds. > > > > Now, how did you get to this state, i.e. which driver triggered the > > "Resources present before probing" message? Because that is the root > > issue that must be fixed, and the probe_count imbalance is IMHO just a > > red herring. > > > > Sorry for lost important dmesg: > > Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled > 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A > 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A > 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A > serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A > platform serial8250: Resources present before probing > ^^ OK. So the serial8250 driver does something fishy. When the warning triggered for me, it was due to a driver calling a devm_*() function on a different device than the one being probed, cfr. https://lore.kernel.org/r/alpine.deb.2.21.1911201053330.25...@ramsan.of.borg which was fixed by commit 32085f25d7b68404 ("mdio_bus: don't use managed reset-controller"). The serial8250 driver, or the subdriver for an SoC-specific variant, may do something similar. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()
Hi Geert, Geert Uytterhoeven 於 2020/6/3 下午 03:13 寫道: Hi Ji-Ze, If devres_head is not empty, you have a serious problem on your system, as those resources may be in an unknown state (e.g. freed but still in use). While I had missed the probe_count imbalance when implementing the original change, it may actually be safer to not decrease probe_count, to prevent further probes from happening. But I guess it doesn't matter: if you get here, your system is in a bad state anyway. We want to fix the shutdown/reboot freeze issue and bisect to this patch and found if the probe_count != 0, the PC will stuck with wait_for_device_probe() with shutdown/reboot forever. So we just change the increment after return -EBUSY. In this case, it maybe 8250_PNP & serial 8250 platform driver resources conflict. I'll try to dump more message to debug. IMO, the shutdown/reboot operation should not block. with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot dmesg below: Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A Reboot/Shutdown will freeze in wait_for_device_probe(), message as following: INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds. Now, how did you get to this state, i.e. which driver triggered the "Resources present before probing" message? Because that is the root issue that must be fixed, and the probe_count imbalance is IMHO just a red herring. Sorry for lost important dmesg: Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A platform serial8250: Resources present before probing ^^ -- With Best Regards, Peter Hong
Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()
Hi Ji-Ze, On Wed, Jun 3, 2020 at 8:45 AM Ji-Ze Hong (Peter Hong) wrote: > In previous patch, using return -EBUSY in really_probe() instead WARN_ON() > only. The following is the partial code. > > ... > atomic_inc(_count); > pr_debug("bus: '%s': %s: probing driver %s with device %s\n", > drv->bus->name, __func__, drv->name, dev_name(dev)); > if (!list_empty(>devres_head)) { > dev_crit(dev, "Resources present before probing\n"); > return -EBUSY; > } > ... > > When the devres_head is not empty, this code will return -EBUSY to prevent > resource conflict, but it forgot to balance probe_count. We can move the > increasement code below the resource checking. > > ... > pr_debug("bus: '%s': %s: probing driver %s with device %s\n", > drv->bus->name, __func__, drv->name, dev_name(dev)); > if (!list_empty(>devres_head)) { > dev_crit(dev, "Resources present before probing\n"); > return -EBUSY; > } > atomic_inc(_count); If devres_head is not empty, you have a serious problem on your system, as those resources may be in an unknown state (e.g. freed but still in use). While I had missed the probe_count imbalance when implementing the original change, it may actually be safer to not decrease probe_count, to prevent further probes from happening. But I guess it doesn't matter: if you get here, your system is in a bad state anyway. > The original code will cause lots motherboard freeze in reboot/shutdown > with systemd message "Reached target Reboot" or "Reached target Shutdown" > with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot > dmesg below: > > Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled > 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A > 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A > 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A > serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a > 16550A > > Reboot/Shutdown will freeze in wait_for_device_probe(), message as > following: > INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds. Now, how did you get to this state, i.e. which driver triggered the "Resources present before probing" message? Because that is the root issue that must be fixed, and the probe_count imbalance is IMHO just a red herring. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()
In previous patch, using return -EBUSY in really_probe() instead WARN_ON() only. The following is the partial code. ... atomic_inc(_count); pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); if (!list_empty(>devres_head)) { dev_crit(dev, "Resources present before probing\n"); return -EBUSY; } ... When the devres_head is not empty, this code will return -EBUSY to prevent resource conflict, but it forgot to balance probe_count. We can move the increasement code below the resource checking. ... pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); if (!list_empty(>devres_head)) { dev_crit(dev, "Resources present before probing\n"); return -EBUSY; } atomic_inc(_count); ... The original code will cause lots motherboard freeze in reboot/shutdown with systemd message "Reached target Reboot" or "Reached target Shutdown" with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot dmesg below: Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A Reboot/Shutdown will freeze in wait_for_device_probe(), message as following: INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds. Not tainted 5.7.0-rc7-tty-next+ #241 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: __schedule+0x3d2/0x700 ? printk+0x52/0x6e schedule+0x4f/0xc0 wait_for_device_probe+0xbb/0xl40 ? wait_woken+0x80/0x80 device_shutdown+0xl5/0xle0 kernel_power_off+0x35/0x70 __do_sys_reboot+0xla0/0x220 ? do_sigtimedwait+0xld0/0x210 ? do.writev+0x6a/0xll0 ? do.writev+0x6a/0xll0 ? sigprocmask+0x6f/Oxa0 __64_sys_reboot+0xle/0x20 do_syscall_64+0x57/0xlb0 Fixes: 7c35e699c88b ("driver core: Print device when resources present in really_probe()") Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/base/dd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 9a1d940342ac..5173b0766a26 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -487,7 +487,6 @@ static int really_probe(struct device *dev, struct device_driver *drv) if (ret) return ret; - atomic_inc(_count); pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); if (!list_empty(>devres_head)) { @@ -495,6 +494,8 @@ static int really_probe(struct device *dev, struct device_driver *drv) return -EBUSY; } + atomic_inc(_count); + re_probe: dev->driver = drv; -- 2.17.1