Re: [PATCH v1] driver core: Fix device_pm_lock() locking for device links
On Tue, Sep 1, 2020 at 12:10 AM Saravana Kannan wrote: > > This commit fixes two issues: > > 1. The lockdep warning reported by Dong Aisheng [1]. > > It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock) > that was introduced when device-link devices were added to expose device > link information in sysfs. > > The patch that "introduced" this cycle can't be reverted because it's fixes > a real SRCU issue and also ensures that the device-link device is deleted > as soon as the device-link is deleted. This is important to avoid sysfs > name collisions if the device-link is create again immediately (this can > happen a lot with deferred probing). > > 2. device_link_drop_managed() is not grabbing device_pm_lock(). > > When device_link_del() calls __device_link_del() (device_link_del() -> > device_link_put_kref() kref_put() -> __device_link_del()) it grabs the > device_pm_lock(). > > However, when device_link_drop_managed() calls __device_link_del() > (device_link_drop_managed() -> kref_put() -> __device_link_del()) it > doesn't grab device_pm_lock(). There's nothing special about managed > device-links that remove the need for grabbing device_pm_lock(). So, this > patch makes sure device_pm_lock() is always held when deleting managed > links. > > And thanks to Stephen Boyd for helping me understand the lockdep splat. > > Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during > device link deletion") > Fixes: 515db266a9da ("driver core: Remove device link creation limitation") > [1] - > https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/ > Reported-by: Dong Aisheng > Signed-off-by: Saravana Kannan > --- > > Rafael, > > A bigger question I had is why we need to grab device_pm_lock() around > device_link_del() in the first place. I understand the need to grab it > during device_link_add() -- it's because we are checking the supplier is > in the dpm_list and because we are reordering devices on the dpm_list. > > But during deletion, we don't need to do either one of those. So, why > do we even need to grab the device_pm_lock() in the first place. It is not strictly necessary AFAICS. > The device_links_write_lock() that we already grab before deleting a device > link seems like it'd be sufficient. If you agree we don't need to grab > device_pm_lock() during deletion, then I can change this patch to just > delete that locking. Yes, please. Thanks!
Re: [PATCH v1] driver core: Fix device_pm_lock() locking for device links
Hi Saravana On Tue, Sep 1, 2020 at 6:10 AM Saravana Kannan wrote: > > This commit fixes two issues: > > 1. The lockdep warning reported by Dong Aisheng [1]. > > It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock) > that was introduced when device-link devices were added to expose device > link information in sysfs. > > The patch that "introduced" this cycle can't be reverted because it's fixes > a real SRCU issue and also ensures that the device-link device is deleted > as soon as the device-link is deleted. This is important to avoid sysfs > name collisions if the device-link is create again immediately (this can > happen a lot with deferred probing). > > 2. device_link_drop_managed() is not grabbing device_pm_lock(). > > When device_link_del() calls __device_link_del() (device_link_del() -> > device_link_put_kref() kref_put() -> __device_link_del()) it grabs the > device_pm_lock(). > > However, when device_link_drop_managed() calls __device_link_del() > (device_link_drop_managed() -> kref_put() -> __device_link_del()) it > doesn't grab device_pm_lock(). There's nothing special about managed > device-links that remove the need for grabbing device_pm_lock(). So, this > patch makes sure device_pm_lock() is always held when deleting managed > links. > > And thanks to Stephen Boyd for helping me understand the lockdep splat. > > Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during > device link deletion") > Fixes: 515db266a9da ("driver core: Remove device link creation limitation") > [1] - > https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/ > Reported-by: Dong Aisheng > Signed-off-by: Saravana Kannan Thanks a lot for the quick fix. It worked for me. Tested-by: Dong Aisheng Regards Aisheng > --- > > Rafael, > > A bigger question I had is why we need to grab device_pm_lock() around > device_link_del() in the first place. I understand the need to grab it > during device_link_add() -- it's because we are checking the supplier is > in the dpm_list and because we are reordering devices on the dpm_list. > > But during deletion, we don't need to do either one of those. So, why > do we even need to grab the device_pm_lock() in the first place. The > device_links_write_lock() that we already grab before deleting a device > link seems like it'd be sufficient. If you agree we don't need to grab > device_pm_lock() during deletion, then I can change this patch to just > delete that locking. > > -Saravana > > drivers/base/core.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index f6f620aa9408..de1935e21d97 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -766,8 +766,10 @@ static void __device_link_del(struct kref *kref) > if (link->flags & DL_FLAG_PM_RUNTIME) > pm_runtime_drop_link(link->consumer); > > + device_pm_lock(); > list_del_rcu(>s_node); > list_del_rcu(>c_node); > + device_pm_unlock(); > device_unregister(>link_dev); > } > #else /* !CONFIG_SRCU */ > @@ -781,8 +783,10 @@ static void __device_link_del(struct kref *kref) > if (link->flags & DL_FLAG_PM_RUNTIME) > pm_runtime_drop_link(link->consumer); > > + device_pm_lock(); > list_del(>s_node); > list_del(>c_node); > + device_pm_unlock(); > device_unregister(>link_dev); > } > #endif /* !CONFIG_SRCU */ > @@ -807,9 +811,7 @@ static void device_link_put_kref(struct device_link *link) > void device_link_del(struct device_link *link) > { > device_links_write_lock(); > - device_pm_lock(); > device_link_put_kref(link); > - device_pm_unlock(); > device_links_write_unlock(); > } > EXPORT_SYMBOL_GPL(device_link_del); > @@ -830,7 +832,6 @@ void device_link_remove(void *consumer, struct device > *supplier) > return; > > device_links_write_lock(); > - device_pm_lock(); > > list_for_each_entry(link, >links.consumers, s_node) { > if (link->consumer == consumer) { > @@ -839,7 +840,6 @@ void device_link_remove(void *consumer, struct device > *supplier) > } > } > > - device_pm_unlock(); > device_links_write_unlock(); > } > EXPORT_SYMBOL_GPL(device_link_remove); > -- > 2.28.0.402.g5ffc5be6b7-goog >
[PATCH v1] driver core: Fix device_pm_lock() locking for device links
This commit fixes two issues: 1. The lockdep warning reported by Dong Aisheng [1]. It is a warning about a cycle (dpm_list_mtx --> kn->active#3 --> fw_lock) that was introduced when device-link devices were added to expose device link information in sysfs. The patch that "introduced" this cycle can't be reverted because it's fixes a real SRCU issue and also ensures that the device-link device is deleted as soon as the device-link is deleted. This is important to avoid sysfs name collisions if the device-link is create again immediately (this can happen a lot with deferred probing). 2. device_link_drop_managed() is not grabbing device_pm_lock(). When device_link_del() calls __device_link_del() (device_link_del() -> device_link_put_kref() kref_put() -> __device_link_del()) it grabs the device_pm_lock(). However, when device_link_drop_managed() calls __device_link_del() (device_link_drop_managed() -> kref_put() -> __device_link_del()) it doesn't grab device_pm_lock(). There's nothing special about managed device-links that remove the need for grabbing device_pm_lock(). So, this patch makes sure device_pm_lock() is always held when deleting managed links. And thanks to Stephen Boyd for helping me understand the lockdep splat. Fixes: 843e600b8a2b ("driver core: Fix sleeping in invalid context during device link deletion") Fixes: 515db266a9da ("driver core: Remove device link creation limitation") [1] - https://lore.kernel.org/lkml/CAA+hA=S4eAreb7vo69LAXSk2t5=deknxhaiy1wspk4xtp9u...@mail.gmail.com/ Reported-by: Dong Aisheng Signed-off-by: Saravana Kannan --- Rafael, A bigger question I had is why we need to grab device_pm_lock() around device_link_del() in the first place. I understand the need to grab it during device_link_add() -- it's because we are checking the supplier is in the dpm_list and because we are reordering devices on the dpm_list. But during deletion, we don't need to do either one of those. So, why do we even need to grab the device_pm_lock() in the first place. The device_links_write_lock() that we already grab before deleting a device link seems like it'd be sufficient. If you agree we don't need to grab device_pm_lock() during deletion, then I can change this patch to just delete that locking. -Saravana drivers/base/core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index f6f620aa9408..de1935e21d97 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -766,8 +766,10 @@ static void __device_link_del(struct kref *kref) if (link->flags & DL_FLAG_PM_RUNTIME) pm_runtime_drop_link(link->consumer); + device_pm_lock(); list_del_rcu(>s_node); list_del_rcu(>c_node); + device_pm_unlock(); device_unregister(>link_dev); } #else /* !CONFIG_SRCU */ @@ -781,8 +783,10 @@ static void __device_link_del(struct kref *kref) if (link->flags & DL_FLAG_PM_RUNTIME) pm_runtime_drop_link(link->consumer); + device_pm_lock(); list_del(>s_node); list_del(>c_node); + device_pm_unlock(); device_unregister(>link_dev); } #endif /* !CONFIG_SRCU */ @@ -807,9 +811,7 @@ static void device_link_put_kref(struct device_link *link) void device_link_del(struct device_link *link) { device_links_write_lock(); - device_pm_lock(); device_link_put_kref(link); - device_pm_unlock(); device_links_write_unlock(); } EXPORT_SYMBOL_GPL(device_link_del); @@ -830,7 +832,6 @@ void device_link_remove(void *consumer, struct device *supplier) return; device_links_write_lock(); - device_pm_lock(); list_for_each_entry(link, >links.consumers, s_node) { if (link->consumer == consumer) { @@ -839,7 +840,6 @@ void device_link_remove(void *consumer, struct device *supplier) } } - device_pm_unlock(); device_links_write_unlock(); } EXPORT_SYMBOL_GPL(device_link_remove); -- 2.28.0.402.g5ffc5be6b7-goog