Re: [PATCH v1] driver core: Fix device_pm_lock() locking for device links

2020-09-01 Thread Rafael J. Wysocki
On Tue, Sep 1, 2020 at 12:10 AM Saravana Kannan  wrote:
>
> This commit fixes two issues:
>
> 1. The lockdep warning reported by Dong Aisheng  [1].
>
> It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock)
> that was introduced when device-link devices were added to expose device
> link information in sysfs.
>
> The patch that "introduced" this cycle can't be reverted because it's fixes
> a real SRCU issue and also ensures that the device-link device is deleted
> as soon as the device-link is deleted. This is important to avoid sysfs
> name collisions if the device-link is create again immediately (this can
> happen a lot with deferred probing).
>
> 2. device_link_drop_managed() is not grabbing device_pm_lock().
>
> When device_link_del() calls __device_link_del() (device_link_del() ->
> device_link_put_kref() kref_put() -> __device_link_del()) it grabs the
> device_pm_lock().
>
> However, when device_link_drop_managed() calls __device_link_del()
> (device_link_drop_managed() -> kref_put() -> __device_link_del()) it
> doesn't grab device_pm_lock(). There's nothing special about managed
> device-links that remove the need for grabbing device_pm_lock(). So, this
> patch makes sure device_pm_lock() is always held when deleting managed
> links.
>
> And thanks to Stephen Boyd for helping me understand the lockdep splat.
>
> Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during 
> device link deletion")
> Fixes: 515db266a9da ("driver core: Remove device link creation limitation")
> [1] - 
> https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/
> Reported-by: Dong Aisheng 
> Signed-off-by: Saravana Kannan 
> ---
>
> Rafael,
>
> A bigger question I had is why we need to grab device_pm_lock() around
> device_link_del() in the first place. I understand the need to grab it
> during device_link_add() -- it's because we are checking the supplier is
> in the dpm_list and because we are reordering devices on the dpm_list.
>
> But during deletion, we don't need to do either one of those.  So, why
> do we even need to grab the device_pm_lock() in the first place.

It is not strictly necessary AFAICS.

> The device_links_write_lock() that we already grab before deleting a device
> link seems like it'd be sufficient. If you agree we don't need to grab
> device_pm_lock() during deletion, then I can change this patch to just
> delete that locking.

Yes, please.

Thanks!


Re: [PATCH v1] driver core: Fix device_pm_lock() locking for device links

2020-09-01 Thread Dong Aisheng
Hi Saravana

On Tue, Sep 1, 2020 at 6:10 AM Saravana Kannan  wrote:
>
> This commit fixes two issues:
>
> 1. The lockdep warning reported by Dong Aisheng  [1].
>
> It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock)
> that was introduced when device-link devices were added to expose device
> link information in sysfs.
>
> The patch that "introduced" this cycle can't be reverted because it's fixes
> a real SRCU issue and also ensures that the device-link device is deleted
> as soon as the device-link is deleted. This is important to avoid sysfs
> name collisions if the device-link is create again immediately (this can
> happen a lot with deferred probing).
>
> 2. device_link_drop_managed() is not grabbing device_pm_lock().
>
> When device_link_del() calls __device_link_del() (device_link_del() ->
> device_link_put_kref() kref_put() -> __device_link_del()) it grabs the
> device_pm_lock().
>
> However, when device_link_drop_managed() calls __device_link_del()
> (device_link_drop_managed() -> kref_put() -> __device_link_del()) it
> doesn't grab device_pm_lock(). There's nothing special about managed
> device-links that remove the need for grabbing device_pm_lock(). So, this
> patch makes sure device_pm_lock() is always held when deleting managed
> links.
>
> And thanks to Stephen Boyd for helping me understand the lockdep splat.
>
> Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during 
> device link deletion")
> Fixes: 515db266a9da ("driver core: Remove device link creation limitation")
> [1] - 
> https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/
> Reported-by: Dong Aisheng 
> Signed-off-by: Saravana Kannan 

Thanks a lot for the quick fix. It worked for me.

Tested-by: Dong Aisheng 

Regards
Aisheng

> ---
>
> Rafael,
>
> A bigger question I had is why we need to grab device_pm_lock() around
> device_link_del() in the first place. I understand the need to grab it
> during device_link_add() -- it's because we are checking the supplier is
> in the dpm_list and because we are reordering devices on the dpm_list.
>
> But during deletion, we don't need to do either one of those.  So, why
> do we even need to grab the device_pm_lock() in the first place. The
> device_links_write_lock() that we already grab before deleting a device
> link seems like it'd be sufficient. If you agree we don't need to grab
> device_pm_lock() during deletion, then I can change this patch to just
> delete that locking.
>
> -Saravana
>
>  drivers/base/core.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index f6f620aa9408..de1935e21d97 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -766,8 +766,10 @@ static void __device_link_del(struct kref *kref)
> if (link->flags & DL_FLAG_PM_RUNTIME)
> pm_runtime_drop_link(link->consumer);
>
> +   device_pm_lock();
> list_del_rcu(>s_node);
> list_del_rcu(>c_node);
> +   device_pm_unlock();
> device_unregister(>link_dev);
>  }
>  #else /* !CONFIG_SRCU */
> @@ -781,8 +783,10 @@ static void __device_link_del(struct kref *kref)
> if (link->flags & DL_FLAG_PM_RUNTIME)
> pm_runtime_drop_link(link->consumer);
>
> +   device_pm_lock();
> list_del(>s_node);
> list_del(>c_node);
> +   device_pm_unlock();
> device_unregister(>link_dev);
>  }
>  #endif /* !CONFIG_SRCU */
> @@ -807,9 +811,7 @@ static void device_link_put_kref(struct device_link *link)
>  void device_link_del(struct device_link *link)
>  {
> device_links_write_lock();
> -   device_pm_lock();
> device_link_put_kref(link);
> -   device_pm_unlock();
> device_links_write_unlock();
>  }
>  EXPORT_SYMBOL_GPL(device_link_del);
> @@ -830,7 +832,6 @@ void device_link_remove(void *consumer, struct device 
> *supplier)
> return;
>
> device_links_write_lock();
> -   device_pm_lock();
>
> list_for_each_entry(link, >links.consumers, s_node) {
> if (link->consumer == consumer) {
> @@ -839,7 +840,6 @@ void device_link_remove(void *consumer, struct device 
> *supplier)
> }
> }
>
> -   device_pm_unlock();
> device_links_write_unlock();
>  }
>  EXPORT_SYMBOL_GPL(device_link_remove);
> --
> 2.28.0.402.g5ffc5be6b7-goog
>


[PATCH v1] driver core: Fix device_pm_lock() locking for device links

2020-08-31 Thread Saravana Kannan
This commit fixes two issues:

1. The lockdep warning reported by Dong Aisheng  [1].

It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock)
that was introduced when device-link devices were added to expose device
link information in sysfs.

The patch that "introduced" this cycle can't be reverted because it's fixes
a real SRCU issue and also ensures that the device-link device is deleted
as soon as the device-link is deleted. This is important to avoid sysfs
name collisions if the device-link is create again immediately (this can
happen a lot with deferred probing).

2. device_link_drop_managed() is not grabbing device_pm_lock().

When device_link_del() calls __device_link_del() (device_link_del() ->
device_link_put_kref() kref_put() -> __device_link_del()) it grabs the
device_pm_lock().

However, when device_link_drop_managed() calls __device_link_del()
(device_link_drop_managed() -> kref_put() -> __device_link_del()) it
doesn't grab device_pm_lock(). There's nothing special about managed
device-links that remove the need for grabbing device_pm_lock(). So, this
patch makes sure device_pm_lock() is always held when deleting managed
links.

And thanks to Stephen Boyd for helping me understand the lockdep splat.

Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during 
device link deletion")
Fixes: 515db266a9da ("driver core: Remove device link creation limitation")
[1] - 
https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/
Reported-by: Dong Aisheng 
Signed-off-by: Saravana Kannan 
---

Rafael,

A bigger question I had is why we need to grab device_pm_lock() around
device_link_del() in the first place. I understand the need to grab it
during device_link_add() -- it's because we are checking the supplier is
in the dpm_list and because we are reordering devices on the dpm_list.

But during deletion, we don't need to do either one of those.  So, why
do we even need to grab the device_pm_lock() in the first place. The
device_links_write_lock() that we already grab before deleting a device
link seems like it'd be sufficient. If you agree we don't need to grab
device_pm_lock() during deletion, then I can change this patch to just
delete that locking.

-Saravana

 drivers/base/core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f6f620aa9408..de1935e21d97 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -766,8 +766,10 @@ static void __device_link_del(struct kref *kref)
if (link->flags & DL_FLAG_PM_RUNTIME)
pm_runtime_drop_link(link->consumer);
 
+   device_pm_lock();
list_del_rcu(>s_node);
list_del_rcu(>c_node);
+   device_pm_unlock();
device_unregister(>link_dev);
 }
 #else /* !CONFIG_SRCU */
@@ -781,8 +783,10 @@ static void __device_link_del(struct kref *kref)
if (link->flags & DL_FLAG_PM_RUNTIME)
pm_runtime_drop_link(link->consumer);
 
+   device_pm_lock();
list_del(>s_node);
list_del(>c_node);
+   device_pm_unlock();
device_unregister(>link_dev);
 }
 #endif /* !CONFIG_SRCU */
@@ -807,9 +811,7 @@ static void device_link_put_kref(struct device_link *link)
 void device_link_del(struct device_link *link)
 {
device_links_write_lock();
-   device_pm_lock();
device_link_put_kref(link);
-   device_pm_unlock();
device_links_write_unlock();
 }
 EXPORT_SYMBOL_GPL(device_link_del);
@@ -830,7 +832,6 @@ void device_link_remove(void *consumer, struct device 
*supplier)
return;
 
device_links_write_lock();
-   device_pm_lock();
 
list_for_each_entry(link, >links.consumers, s_node) {
if (link->consumer == consumer) {
@@ -839,7 +840,6 @@ void device_link_remove(void *consumer, struct device 
*supplier)
}
}
 
-   device_pm_unlock();
device_links_write_unlock();
 }
 EXPORT_SYMBOL_GPL(device_link_remove);
-- 
2.28.0.402.g5ffc5be6b7-goog