Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()

2020-06-03 Thread Geert Uytterhoeven
Hi Ji-Ze,

On Wed, Jun 3, 2020 at 9:35 AM Ji-Ze Hong (Peter Hong)  wrote:
> Geert Uytterhoeven 於 2020/6/3 下午 03:13 寫道:
> > If devres_head is not empty, you have a serious problem on your system,
> > as those resources may be in an unknown state (e.g. freed but still in
> > use).  While I had missed the probe_count imbalance when implementing
> > the original change, it may actually be safer to not decrease
> > probe_count, to prevent further probes from happening.  But I guess it
> > doesn't matter: if you get here, your system is in a bad state anyway.
>
> We want to fix the shutdown/reboot freeze issue and bisect to this
> patch and found if the probe_count != 0, the PC will stuck with
> wait_for_device_probe() with shutdown/reboot forever. So we just
> change the increment after return -EBUSY.

IC. And before my change, you got a big fat warning backtrace, telling you
something is seriously wrong? ;-)

> In this case, it maybe 8250_PNP & serial 8250 platform driver resources
> conflict. I'll try to dump more message to debug.

OK.

> IMO, the shutdown/reboot operation should not block.

Well, it depends.  If there's an issue with resources, the system may crash,
too.

> >> with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot
> >> dmesg below:
> >>
> >>  Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> >>  00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 
> >> 16550A
> >>  00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 
> >> 16550A
> >>  00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 
> >> 16550A
> >>  serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 
> >> 16550A
> >>
> >> Reboot/Shutdown will freeze in wait_for_device_probe(), message as
> >> following:
> >>  INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds.
> >
> > Now, how did you get to this state, i.e. which driver triggered the
> > "Resources present before probing" message? Because that is the root
> > issue that must be fixed, and the probe_count imbalance is IMHO just a
> > red herring.
> >
>
> Sorry for lost important dmesg:
>
> Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A
> 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A
> 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A
> serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A
> platform serial8250: Resources present before probing
> ^^

OK. So the serial8250 driver does something fishy.

When the warning triggered for me, it was due to a driver calling a devm_*()
function on a different device than the one being probed, cfr.
https://lore.kernel.org/r/alpine.deb.2.21.1911201053330.25...@ramsan.of.borg
which was fixed by commit 32085f25d7b68404 ("mdio_bus: don't use managed
reset-controller").

The serial8250 driver, or the subdriver for an SoC-specific variant, may
do something similar.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()

2020-06-03 Thread Ji-Ze Hong (Peter Hong)

Hi Geert,

Geert Uytterhoeven 於 2020/6/3 下午 03:13 寫道:

Hi Ji-Ze,

If devres_head is not empty, you have a serious problem on your system,
as those resources may be in an unknown state (e.g. freed but still in
use).  While I had missed the probe_count imbalance when implementing
the original change, it may actually be safer to not decrease
probe_count, to prevent further probes from happening.  But I guess it
doesn't matter: if you get here, your system is in a bad state anyway.


We want to fix the shutdown/reboot freeze issue and bisect to this
patch and found if the probe_count != 0, the PC will stuck with
wait_for_device_probe() with shutdown/reboot forever. So we just
change the increment after return -EBUSY.

In this case, it maybe 8250_PNP & serial 8250 platform driver resources
conflict. I'll try to dump more message to debug.

IMO, the shutdown/reboot operation should not block.



with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot
dmesg below:

 Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A
 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A
 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A
 serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 
16550A

Reboot/Shutdown will freeze in wait_for_device_probe(), message as
following:
 INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds.


Now, how did you get to this state, i.e. which driver triggered the
"Resources present before probing" message? Because that is the root
issue that must be fixed, and the probe_count imbalance is IMHO just a
red herring.



Sorry for lost important dmesg:

Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A
serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A
platform serial8250: Resources present before probing
^^

--
With Best Regards,
Peter Hong


Re: [PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()

2020-06-03 Thread Geert Uytterhoeven
Hi Ji-Ze,

On Wed, Jun 3, 2020 at 8:45 AM Ji-Ze Hong (Peter Hong)  wrote:
> In previous patch, using return -EBUSY in really_probe() instead WARN_ON()
> only. The following is the partial code.
>
> ...
> atomic_inc(_count);
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>  drv->bus->name, __func__, drv->name, dev_name(dev));
> if (!list_empty(>devres_head)) {
> dev_crit(dev, "Resources present before probing\n");
> return -EBUSY;
> }
> ...
>
> When the devres_head is not empty, this code will return -EBUSY to prevent
> resource conflict, but it forgot to balance probe_count. We can move the
> increasement code below the resource checking.
>
> ...
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>  drv->bus->name, __func__, drv->name, dev_name(dev));
> if (!list_empty(>devres_head)) {
> dev_crit(dev, "Resources present before probing\n");
> return -EBUSY;
> }
> atomic_inc(_count);

If devres_head is not empty, you have a serious problem on your system,
as those resources may be in an unknown state (e.g. freed but still in
use).  While I had missed the probe_count imbalance when implementing
the original change, it may actually be safer to not decrease
probe_count, to prevent further probes from happening.  But I guess it
doesn't matter: if you get here, your system is in a bad state anyway.

> The original code will cause lots motherboard freeze in reboot/shutdown
> with systemd message "Reached target Reboot" or "Reached target Shutdown"
> with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot
> dmesg below:
>
> Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A
> 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A
> 00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A
> serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 
> 16550A
>
> Reboot/Shutdown will freeze in wait_for_device_probe(), message as
> following:
> INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds.

Now, how did you get to this state, i.e. which driver triggered the
"Resources present before probing" message? Because that is the root
issue that must be fixed, and the probe_count imbalance is IMHO just a
red herring.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH 1/1] driver core: Fix unbalance probe_count in really_probe()

2020-06-03 Thread Ji-Ze Hong (Peter Hong)
In previous patch, using return -EBUSY in really_probe() instead WARN_ON()
only. The following is the partial code.

...
atomic_inc(_count);
pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
 drv->bus->name, __func__, drv->name, dev_name(dev));
if (!list_empty(>devres_head)) {
dev_crit(dev, "Resources present before probing\n");
return -EBUSY;
}
...

When the devres_head is not empty, this code will return -EBUSY to prevent
resource conflict, but it forgot to balance probe_count. We can move the
increasement code below the resource checking.

...
pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
 drv->bus->name, __func__, drv->name, dev_name(dev));
if (!list_empty(>devres_head)) {
dev_crit(dev, "Resources present before probing\n");
return -EBUSY;
}
atomic_inc(_count);
...

The original code will cause lots motherboard freeze in reboot/shutdown
with systemd message "Reached target Reboot" or "Reached target Shutdown"
with serial8250 platform driver. e.g. AOPEN DE6200. The conflict boot
dmesg below:

Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 921600) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 921600) is a 16550A
serial8250: ttyS3 at I/O 0x2e8 (irq = 3, base_baud = 921600) is a 16550A

Reboot/Shutdown will freeze in wait_for_device_probe(), message as
following:
INFQ: task systemd-shutdown: 1 blocked for more than 120 seconds.
Not tainted 5.7.0-rc7-tty-next+ #241
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
Call Trace:
  __schedule+0x3d2/0x700
  ? printk+0x52/0x6e
  schedule+0x4f/0xc0
  wait_for_device_probe+0xbb/0xl40
  ? wait_woken+0x80/0x80
  device_shutdown+0xl5/0xle0
  kernel_power_off+0x35/0x70
  __do_sys_reboot+0xla0/0x220
  ? do_sigtimedwait+0xld0/0x210
  ? do.writev+0x6a/0xll0
  ? do.writev+0x6a/0xll0
  ? sigprocmask+0x6f/Oxa0
  __64_sys_reboot+0xle/0x20
  do_syscall_64+0x57/0xlb0

Fixes: 7c35e699c88b ("driver core: Print device when resources present in 
really_probe()")
Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/base/dd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9a1d940342ac..5173b0766a26 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -487,7 +487,6 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
if (ret)
return ret;
 
-   atomic_inc(_count);
pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
 drv->bus->name, __func__, drv->name, dev_name(dev));
if (!list_empty(>devres_head)) {
@@ -495,6 +494,8 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
return -EBUSY;
}
 
+   atomic_inc(_count);
+
 re_probe:
dev->driver = drv;
 
-- 
2.17.1