Re: [PATCH v1 2/2] clk: Add support for sync_state()

2021-04-14 Thread Saravana Kannan
On Tue, Apr 6, 2021 at 8:45 PM 'Saravana Kannan' via kernel-team
 wrote:
>
> Clocks can be turned on (by the hardware, bootloader, etc) upon a
> reset/boot of a hardware platform. These "boot clocks" could be clocking
> devices that are active before the kernel starts running. For example,
> clocks needed for the interconnects, UART console, display, CPUs, DDR,
> etc.
>
> When a boot clock is used by more than one consumer or multiple boot
> clocks share a parent clock, the boot clock (or the common parent) can
> be turned off when the first consumer probes. This can crash the device
> or cause poor user experience.
>
> Fix this by explicitly enabling the boot clocks during clock
> registration and then removing the enable vote when the clock provider
> device gets its sync_state() callback. Since sync_state() callback comes
> only when all the consumers of a device (not a specific clock) have
> probed, this ensures the boot clocks are kept on at least until all
> their consumers have had a chance to vote on them (in their respective
> probe functions).
>
> Also, if a clock provider is loaded as a module and it has some boot
> clocks, they get turned off only when a consumer explicitly turns them
> off. So clocks that are boot clocks and are unused never get turned off
> because the logic to turn off unused clocks has already run during
> late_initcall_sync(). Adding sync_state() support also makes sure these
> unused boot clocks are turned off once all the consumers have probed.
>
> Signed-off-by: Saravana Kannan 

Hi Stephen,

Gentle reminder.


-Saravana

> ---
>  drivers/clk/clk.c| 84 +++-
>  include/linux/clk-provider.h |  1 +
>  2 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index d6301a3351f2..cd07f4d1254c 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -72,6 +72,8 @@ struct clk_core {
> unsigned long   flags;
> boolorphan;
> boolrpm_enabled;
> +   boolneed_sync;
> +   boolboot_enabled;
> unsigned intenable_count;
> unsigned intprepare_count;
> unsigned intprotect_count;
> @@ -1215,6 +1217,15 @@ static void __init clk_unprepare_unused_subtree(struct 
> clk_core *core)
> hlist_for_each_entry(child, >children, child_node)
> clk_unprepare_unused_subtree(child);
>
> +   /*
> +* Orphan clocks might still not have their state held if one of their
> +* ancestors hasn't been registered yet. We don't want to turn off
> +* these orphan clocks now as they will be turned off later when their
> +* device gets a sync_state() call.
> +*/
> +   if (dev_has_sync_state(core->dev))
> +   return;
> +
> if (core->prepare_count)
> return;
>
> @@ -1246,6 +1257,15 @@ static void __init clk_disable_unused_subtree(struct 
> clk_core *core)
> hlist_for_each_entry(child, >children, child_node)
> clk_disable_unused_subtree(child);
>
> +   /*
> +* Orphan clocks might still not have their state held if one of their
> +* ancestors hasn't been registered yet. We don't want to turn off
> +* these orphan clocks now as they will be turned off later when their
> +* device gets a sync_state() call.
> +*/
> +   if (dev_has_sync_state(core->dev))
> +   return;
> +
> if (core->flags & CLK_OPS_PARENT_ENABLE)
> clk_core_prepare_enable(core->parent);
>
> @@ -1319,6 +1339,38 @@ static int __init clk_disable_unused(void)
>  }
>  late_initcall_sync(clk_disable_unused);
>
> +static void clk_unprepare_disable_dev_subtree(struct clk_core *core,
> + struct device *dev)
> +{
> +   struct clk_core *child;
> +
> +   lockdep_assert_held(_lock);
> +
> +   hlist_for_each_entry(child, >children, child_node)
> +   clk_unprepare_disable_dev_subtree(child, dev);
> +
> +   if (core->dev != dev || !core->need_sync)
> +   return;
> +
> +   clk_core_disable_unprepare(core);
> +}
> +
> +void clk_sync_state(struct device *dev)
> +{
> +   struct clk_core *core;
> +
> +   clk_prepare_lock();
> +
> +   hlist_for_each_entry(core, _root_list, child_node)
> +   clk_unprepare_disable_dev_subtree(core, dev);
> +
> +   hlist_for_each_entry(core, _orphan_list, chil

[PATCH] driver core: Fix locking bug in deferred_probe_timeout_work_func()

2021-04-12 Thread Saravana Kannan
commit eed6e41813deb9ee622cd9242341f21430d7789f upstream.

list_for_each_entry_safe() is only useful if we are deleting nodes in a
linked list within the loop. It doesn't protect against other threads
adding/deleting nodes to the list in parallel. We need to grab
deferred_probe_mutex when traversing the deferred_probe_pending_list.

Cc: sta...@vger.kernel.org
Fixes: 25b4e70dcce9 ("driver core: allow stopping deferred probe after init")
Signed-off-by: Saravana Kannan 
Link: https://lore.kernel.org/r/20210402040342.2944858-2-sarava...@google.com
Signed-off-by: Greg Kroah-Hartman 
---
Hi Greg,

This should apply cleanly to 4.19 and 5.4 if you think this should be
picked up.

-Saravana

 drivers/base/dd.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4ba9231a6be8..26ba7a99b7d5 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -254,14 +254,16 @@ int driver_deferred_probe_check_state(struct device *dev)
 
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
-   struct device_private *private, *p;
+   struct device_private *p;
 
deferred_probe_timeout = 0;
driver_deferred_probe_trigger();
flush_work(_probe_work);
 
-   list_for_each_entry_safe(private, p, _probe_pending_list, 
deferred_probe)
-   dev_info(private->device, "deferred probe pending");
+   mutex_lock(_probe_mutex);
+   list_for_each_entry(p, _probe_pending_list, deferred_probe)
+   dev_info(p->device, "deferred probe pending\n");
+   mutex_unlock(_probe_mutex);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, 
deferred_probe_timeout_work_func);
 
-- 
2.31.1.295.g9ea45b61b8-goog



Re: [PATCH v2] of: property: fw_devlink: do not link ".*,nr-gpios"

2021-04-09 Thread Saravana Kannan
On Fri, Apr 9, 2021 at 12:26 PM Rob Herring  wrote:
>
> On Wed, Apr 7, 2021 at 3:45 PM Ilya Lipnitskiy
>  wrote:
> >
> > On Tue, Apr 6, 2021 at 6:24 PM Saravana Kannan  wrote:
> > >
> > > On Tue, Apr 6, 2021 at 6:10 PM Rob Herring  wrote:
> > > >
> > > > On Tue, Apr 6, 2021 at 7:46 PM Saravana Kannan  
> > > > wrote:
> > > > >
> > > > > On Tue, Apr 6, 2021 at 5:34 PM Rob Herring  wrote:
> > > > > >
> > > > > > On Tue, Apr 06, 2021 at 04:09:10PM -0700, Saravana Kannan wrote:
> > > > > > > On Mon, Apr 5, 2021 at 3:26 PM Ilya Lipnitskiy
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > [,]nr-gpios property is used by some GPIO drivers[0] to 
> > > > > > > > indicate
> > > > > > > > the number of GPIOs present on a system, not define a GPIO. 
> > > > > > > > nr-gpios is
> > > > > > > > not configured by #gpio-cells and can't be parsed along with 
> > > > > > > > other
> > > > > > > > "*-gpios" properties.
> > > > > > > >
> > > > > > > > nr-gpios without the "," prefix is not allowed by the DT
> > > > > > > > spec[1], so only add exception for the ",nr-gpios" suffix and 
> > > > > > > > let the
> > > > > > > > error message continue being printed for non-compliant 
> > > > > > > > implementations.
> > > > > > > >
> > > > > > > > [0]: nr-gpios is referenced in 
> > > > > > > > Documentation/devicetree/bindings/gpio:
> > > > > > > >  - gpio-adnp.txt
> > > > > > > >  - gpio-xgene-sb.txt
> > > > > > > >  - gpio-xlp.txt
> > > > > > > >  - snps,dw-apb-gpio.yaml
> > > > > > > >
> > > > > > > > [1]:
> > > > > > > > Link: 
> > > > > > > > https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20
> > > > > > > >
> > > > > > > > Fixes errors such as:
> > > > > > > >   OF: /palmbus@30/gpio@600: could not find phandle
> > > > > > > >
> > > > > > > > Fixes: 7f00be96f125 ("of: property: Add device link support for 
> > > > > > > > interrupt-parent, dmas and -gpio(s)")
> > > > > > > > Signed-off-by: Ilya Lipnitskiy 
> > > > > > > > Cc: Saravana Kannan 
> > > > > > > > Cc:  # 5.5.x
> > > > > > > > ---
> > > > > > > >  drivers/of/property.c | 11 ++-
> > > > > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > > > index 2046ae311322..1793303e84ac 100644
> > > > > > > > --- a/drivers/of/property.c
> > > > > > > > +++ b/drivers/of/property.c
> > > > > > > > @@ -1281,7 +1281,16 @@ DEFINE_SIMPLE_PROP(pinctrl7, 
> > > > > > > > "pinctrl-7", NULL)
> > > > > > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > > > >  DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
> > > > > > > >  DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> > > > > > > > -DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> > > > > > > > +
> > > > > > > > +static struct device_node *parse_gpios(struct device_node *np,
> > > > > > > > +  const char *prop_name, 
> > > > > > > > int index)
> > > > > > > > +{
> > > > > > > > +   if (!strcmp_suffix(prop_name, ",nr-gpios"))
> > > > > > > > +   return NULL;
> > > > > > >
> > > > > > > Ah I somehow missed this patch. This gives a blanked exception for
> > > > > > > vendor,nr-gpios. I'd prefer explicit exceptions for all the 
> > > > > > > instances
> > > > > > > of ",nr-gpios" we are grandfathering in. Any future additions 
> > > > > > > should
> > > > > > > be rejected. Can we do that please?
> > > > > > >
> > > > > > > Rob, you okay with making this list more explicit?
> > > > > >
> > > > > > Not the kernel's job IMO. A schema is the right way to handle that.
> > > > >
> > > > > Ok, that's fine by me. Btw, let's land this in driver-core? I've made
> > > > > changes there and this might cause conflicts. Not sure.
> > > >
> > > > It merges with linux-next fine. You'll need to resend this to Greg if
> > > > you want to do that.
> > > >
> > > > Reviewed-by: Rob Herring 
> > >
> > > Hi Greg,
> > >
> > > Can you pull this into driver-core please?
> > Do you want me to re-spin on top of driver-core? The patch is
> > currently based on dt/next in robh/linux.git
>
> I did say you need to resend the patch to Greg, but since there's no
> movement on this and I have other things to send upstream, I've
> applied it.

:'(

If it's not too late, can we please drop it? I'm sure Greg would be
okay with picking this up.

-Saravana


[PATCH v1 1/2] driver core: Add dev_set_drv_sync_state()

2021-04-06 Thread Saravana Kannan
This can be used by frameworks to set the sync_state() helper functions
for drivers that don't already have them set.

Signed-off-by: Saravana Kannan 
---
 include/linux/device.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..35e8833ca16b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -778,6 +778,18 @@ static inline bool dev_has_sync_state(struct device *dev)
return false;
 }
 
+static inline int dev_set_drv_sync_state(struct device *dev,
+void (*fn)(struct device *dev))
+{
+   if (!dev || !dev->driver)
+   return 0;
+   if (dev->driver->sync_state && dev->driver->sync_state != fn)
+   return -EBUSY;
+   if (!dev->driver->sync_state)
+   dev->driver->sync_state = fn;
+   return 0;
+}
+
 /*
  * High level routines for use by the bus drivers
  */
-- 
2.31.1.295.g9ea45b61b8-goog



[PATCH v1 2/2] clk: Add support for sync_state()

2021-04-06 Thread Saravana Kannan
Clocks can be turned on (by the hardware, bootloader, etc) upon a
reset/boot of a hardware platform. These "boot clocks" could be clocking
devices that are active before the kernel starts running. For example,
clocks needed for the interconnects, UART console, display, CPUs, DDR,
etc.

When a boot clock is used by more than one consumer or multiple boot
clocks share a parent clock, the boot clock (or the common parent) can
be turned off when the first consumer probes. This can crash the device
or cause poor user experience.

Fix this by explicitly enabling the boot clocks during clock
registration and then removing the enable vote when the clock provider
device gets its sync_state() callback. Since sync_state() callback comes
only when all the consumers of a device (not a specific clock) have
probed, this ensures the boot clocks are kept on at least until all
their consumers have had a chance to vote on them (in their respective
probe functions).

Also, if a clock provider is loaded as a module and it has some boot
clocks, they get turned off only when a consumer explicitly turns them
off. So clocks that are boot clocks and are unused never get turned off
because the logic to turn off unused clocks has already run during
late_initcall_sync(). Adding sync_state() support also makes sure these
unused boot clocks are turned off once all the consumers have probed.

Signed-off-by: Saravana Kannan 
---
 drivers/clk/clk.c| 84 +++-
 include/linux/clk-provider.h |  1 +
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index d6301a3351f2..cd07f4d1254c 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -72,6 +72,8 @@ struct clk_core {
unsigned long   flags;
boolorphan;
boolrpm_enabled;
+   boolneed_sync;
+   boolboot_enabled;
unsigned intenable_count;
unsigned intprepare_count;
unsigned intprotect_count;
@@ -1215,6 +1217,15 @@ static void __init clk_unprepare_unused_subtree(struct 
clk_core *core)
hlist_for_each_entry(child, >children, child_node)
clk_unprepare_unused_subtree(child);
 
+   /*
+* Orphan clocks might still not have their state held if one of their
+* ancestors hasn't been registered yet. We don't want to turn off
+* these orphan clocks now as they will be turned off later when their
+* device gets a sync_state() call.
+*/
+   if (dev_has_sync_state(core->dev))
+   return;
+
if (core->prepare_count)
return;
 
@@ -1246,6 +1257,15 @@ static void __init clk_disable_unused_subtree(struct 
clk_core *core)
hlist_for_each_entry(child, >children, child_node)
clk_disable_unused_subtree(child);
 
+   /*
+* Orphan clocks might still not have their state held if one of their
+* ancestors hasn't been registered yet. We don't want to turn off
+* these orphan clocks now as they will be turned off later when their
+* device gets a sync_state() call.
+*/
+   if (dev_has_sync_state(core->dev))
+   return;
+
if (core->flags & CLK_OPS_PARENT_ENABLE)
clk_core_prepare_enable(core->parent);
 
@@ -1319,6 +1339,38 @@ static int __init clk_disable_unused(void)
 }
 late_initcall_sync(clk_disable_unused);
 
+static void clk_unprepare_disable_dev_subtree(struct clk_core *core,
+ struct device *dev)
+{
+   struct clk_core *child;
+
+   lockdep_assert_held(_lock);
+
+   hlist_for_each_entry(child, >children, child_node)
+   clk_unprepare_disable_dev_subtree(child, dev);
+
+   if (core->dev != dev || !core->need_sync)
+   return;
+
+   clk_core_disable_unprepare(core);
+}
+
+void clk_sync_state(struct device *dev)
+{
+   struct clk_core *core;
+
+   clk_prepare_lock();
+
+   hlist_for_each_entry(core, _root_list, child_node)
+   clk_unprepare_disable_dev_subtree(core, dev);
+
+   hlist_for_each_entry(core, _orphan_list, child_node)
+   clk_unprepare_disable_dev_subtree(core, dev);
+
+   clk_prepare_unlock();
+}
+EXPORT_SYMBOL_GPL(clk_sync_state);
+
 static int clk_core_determine_round_nolock(struct clk_core *core,
   struct clk_rate_request *req)
 {
@@ -1725,6 +1777,30 @@ int clk_hw_get_parent_index(struct clk_hw *hw)
 }
 EXPORT_SYMBOL_GPL(clk_hw_get_parent_index);
 
+static void clk_core_hold_state(struct clk_core *core)
+{
+   if (core->need_sync || !core->boot_enabled)
+   return;
+
+   if (core->orphan || !dev_has_sync_state(core->dev))
+   return;
+
+   core->ne

[PATCH v1 0/2] Add sync_state() support to clock framework

2021-04-06 Thread Saravana Kannan
Stephen,

We can decide later if both these patches land through clk tree or the
driver-core tree. The meat of the series is in Patch 2/2 and that commit
text gives all the details.

Saravana Kannan (2):
  driver core: Add dev_set_drv_sync_state()
  clk: Add support for sync_state()

 drivers/clk/clk.c| 84 +++-
 include/linux/clk-provider.h |  1 +
 include/linux/device.h   | 12 ++
 3 files changed, 96 insertions(+), 1 deletion(-)

-- 
2.31.1.295.g9ea45b61b8-goog



Re: [PATCH v2] of: property: fw_devlink: do not link ".*,nr-gpios"

2021-04-06 Thread Saravana Kannan
On Tue, Apr 6, 2021 at 6:10 PM Rob Herring  wrote:
>
> On Tue, Apr 6, 2021 at 7:46 PM Saravana Kannan  wrote:
> >
> > On Tue, Apr 6, 2021 at 5:34 PM Rob Herring  wrote:
> > >
> > > On Tue, Apr 06, 2021 at 04:09:10PM -0700, Saravana Kannan wrote:
> > > > On Mon, Apr 5, 2021 at 3:26 PM Ilya Lipnitskiy
> > > >  wrote:
> > > > >
> > > > > [,]nr-gpios property is used by some GPIO drivers[0] to 
> > > > > indicate
> > > > > the number of GPIOs present on a system, not define a GPIO. nr-gpios 
> > > > > is
> > > > > not configured by #gpio-cells and can't be parsed along with other
> > > > > "*-gpios" properties.
> > > > >
> > > > > nr-gpios without the "," prefix is not allowed by the DT
> > > > > spec[1], so only add exception for the ",nr-gpios" suffix and let the
> > > > > error message continue being printed for non-compliant 
> > > > > implementations.
> > > > >
> > > > > [0]: nr-gpios is referenced in Documentation/devicetree/bindings/gpio:
> > > > >  - gpio-adnp.txt
> > > > >  - gpio-xgene-sb.txt
> > > > >  - gpio-xlp.txt
> > > > >  - snps,dw-apb-gpio.yaml
> > > > >
> > > > > [1]:
> > > > > Link: 
> > > > > https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20
> > > > >
> > > > > Fixes errors such as:
> > > > >   OF: /palmbus@30/gpio@600: could not find phandle
> > > > >
> > > > > Fixes: 7f00be96f125 ("of: property: Add device link support for 
> > > > > interrupt-parent, dmas and -gpio(s)")
> > > > > Signed-off-by: Ilya Lipnitskiy 
> > > > > Cc: Saravana Kannan 
> > > > > Cc:  # 5.5.x
> > > > > ---
> > > > >  drivers/of/property.c | 11 ++-
> > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > index 2046ae311322..1793303e84ac 100644
> > > > > --- a/drivers/of/property.c
> > > > > +++ b/drivers/of/property.c
> > > > > @@ -1281,7 +1281,16 @@ DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > >  DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
> > > > >  DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> > > > > -DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> > > > > +
> > > > > +static struct device_node *parse_gpios(struct device_node *np,
> > > > > +  const char *prop_name, int 
> > > > > index)
> > > > > +{
> > > > > +   if (!strcmp_suffix(prop_name, ",nr-gpios"))
> > > > > +   return NULL;
> > > >
> > > > Ah I somehow missed this patch. This gives a blanked exception for
> > > > vendor,nr-gpios. I'd prefer explicit exceptions for all the instances
> > > > of ",nr-gpios" we are grandfathering in. Any future additions should
> > > > be rejected. Can we do that please?
> > > >
> > > > Rob, you okay with making this list more explicit?
> > >
> > > Not the kernel's job IMO. A schema is the right way to handle that.
> >
> > Ok, that's fine by me. Btw, let's land this in driver-core? I've made
> > changes there and this might cause conflicts. Not sure.
>
> It merges with linux-next fine. You'll need to resend this to Greg if
> you want to do that.
>
> Reviewed-by: Rob Herring 

Hi Greg,

Can you pull this into driver-core please? I touch this file a lot and
might need to do so again if any fw_devlink=on issues come up. So
trying to preemptively avoid conflicts.

-Saravana


Re: [PATCH v2] of: property: fw_devlink: do not link ".*,nr-gpios"

2021-04-06 Thread Saravana Kannan
On Tue, Apr 6, 2021 at 5:34 PM Rob Herring  wrote:
>
> On Tue, Apr 06, 2021 at 04:09:10PM -0700, Saravana Kannan wrote:
> > On Mon, Apr 5, 2021 at 3:26 PM Ilya Lipnitskiy
> >  wrote:
> > >
> > > [,]nr-gpios property is used by some GPIO drivers[0] to indicate
> > > the number of GPIOs present on a system, not define a GPIO. nr-gpios is
> > > not configured by #gpio-cells and can't be parsed along with other
> > > "*-gpios" properties.
> > >
> > > nr-gpios without the "," prefix is not allowed by the DT
> > > spec[1], so only add exception for the ",nr-gpios" suffix and let the
> > > error message continue being printed for non-compliant implementations.
> > >
> > > [0]: nr-gpios is referenced in Documentation/devicetree/bindings/gpio:
> > >  - gpio-adnp.txt
> > >  - gpio-xgene-sb.txt
> > >  - gpio-xlp.txt
> > >  - snps,dw-apb-gpio.yaml
> > >
> > > [1]:
> > > Link: 
> > > https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20
> > >
> > > Fixes errors such as:
> > >   OF: /palmbus@30/gpio@600: could not find phandle
> > >
> > > Fixes: 7f00be96f125 ("of: property: Add device link support for 
> > > interrupt-parent, dmas and -gpio(s)")
> > > Signed-off-by: Ilya Lipnitskiy 
> > > Cc: Saravana Kannan 
> > > Cc:  # 5.5.x
> > > ---
> > >  drivers/of/property.c | 11 ++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > index 2046ae311322..1793303e84ac 100644
> > > --- a/drivers/of/property.c
> > > +++ b/drivers/of/property.c
> > > @@ -1281,7 +1281,16 @@ DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > >  DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
> > >  DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> > > -DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> > > +
> > > +static struct device_node *parse_gpios(struct device_node *np,
> > > +  const char *prop_name, int index)
> > > +{
> > > +   if (!strcmp_suffix(prop_name, ",nr-gpios"))
> > > +   return NULL;
> >
> > Ah I somehow missed this patch. This gives a blanked exception for
> > vendor,nr-gpios. I'd prefer explicit exceptions for all the instances
> > of ",nr-gpios" we are grandfathering in. Any future additions should
> > be rejected. Can we do that please?
> >
> > Rob, you okay with making this list more explicit?
>
> Not the kernel's job IMO. A schema is the right way to handle that.

Ok, that's fine by me. Btw, let's land this in driver-core? I've made
changes there and this might cause conflicts. Not sure.

-Saravana


Re: [PATCH v2] of: property: fw_devlink: do not link ".*,nr-gpios"

2021-04-06 Thread Saravana Kannan
On Mon, Apr 5, 2021 at 3:26 PM Ilya Lipnitskiy
 wrote:
>
> [,]nr-gpios property is used by some GPIO drivers[0] to indicate
> the number of GPIOs present on a system, not define a GPIO. nr-gpios is
> not configured by #gpio-cells and can't be parsed along with other
> "*-gpios" properties.
>
> nr-gpios without the "," prefix is not allowed by the DT
> spec[1], so only add exception for the ",nr-gpios" suffix and let the
> error message continue being printed for non-compliant implementations.
>
> [0]: nr-gpios is referenced in Documentation/devicetree/bindings/gpio:
>  - gpio-adnp.txt
>  - gpio-xgene-sb.txt
>  - gpio-xlp.txt
>  - snps,dw-apb-gpio.yaml
>
> [1]:
> Link: 
> https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20
>
> Fixes errors such as:
>   OF: /palmbus@30/gpio@600: could not find phandle
>
> Fixes: 7f00be96f125 ("of: property: Add device link support for 
> interrupt-parent, dmas and -gpio(s)")
> Signed-off-by: Ilya Lipnitskiy 
> Cc: Saravana Kannan 
> Cc:  # 5.5.x
> ---
>  drivers/of/property.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 2046ae311322..1793303e84ac 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1281,7 +1281,16 @@ DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
>  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
>  DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
>  DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> -DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> +
> +static struct device_node *parse_gpios(struct device_node *np,
> +  const char *prop_name, int index)
> +{
> +   if (!strcmp_suffix(prop_name, ",nr-gpios"))
> +   return NULL;

Ah I somehow missed this patch. This gives a blanked exception for
vendor,nr-gpios. I'd prefer explicit exceptions for all the instances
of ",nr-gpios" we are grandfathering in. Any future additions should
be rejected. Can we do that please?

Rob, you okay with making this list more explicit?

-Saravana


Re: [PATCH] of: property: do not create device links from *nr-gpios

2021-04-06 Thread Saravana Kannan
On Tue, Apr 6, 2021 at 12:28 PM Ilya Lipnitskiy
 wrote:
>
> On Tue, Apr 6, 2021 at 10:40 AM Rob Herring  wrote:
> >
> > On Mon, Apr 05, 2021 at 01:18:56PM -0700, Saravana Kannan wrote:
> > > On Mon, Apr 5, 2021 at 1:10 PM Ilya Lipnitskiy
> > >  wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > On Mon, Apr 5, 2021 at 1:01 PM Saravana Kannan  
> > > > wrote:
> > > > >
> > > > > On Sun, Apr 4, 2021 at 8:14 PM Ilya Lipnitskiy
> > > > >  wrote:
> > > > > >
> > > > > > [,]nr-gpios property is used by some GPIO drivers[0] to 
> > > > > > indicate
> > > > > > the number of GPIOs present on a system, not define a GPIO. 
> > > > > > nr-gpios is
> > > > > > not configured by #gpio-cells and can't be parsed along with other
> > > > > > "*-gpios" properties.
> > > > > >
> > > > > > scripts/dtc/checks.c also has a special case for nr-gpio{s}. 
> > > > > > However,
> > > > > > nr-gpio is not really special, so we only need to fix nr-gpios 
> > > > > > suffix
> > > > > > here.
> > > > >
> > > > > The only example of this that I see is "snps,nr-gpios".
> > > > arch/arm64/boot/dts/apm/apm-shadowcat.dtsi uses "apm,nr-gpios", with
> > > > parsing code in drivers/gpio/gpio-xgene-sb.c. There is also code in
> > > > drivers/gpio/gpio-adnp.c and drivers/gpio/gpio-mockup.c using
> > > > "nr-gpios" without any vendor prefix.
> > >
> > > Ah ok. I just grepped the DT files. I'm not sure what Rob's position
> > > is on supporting DT files not in upstream. Thanks for the
> > > clarification.
> >
> > If it's something we had documented, then we have to support it
> Do I read this correctly as a sort-of Ack of my proposed [PATCH v2] in
> this thread, since it aligns the code with the published DT schema?

He's talking about the DT binding documentation in the kernel.

I interpret Rob's reply as, you can do all of this:
1. Just fix up all drivers that use "*nr-gpios" that don't have
binding documentation in the kernel. Change them to use ngpios.
2. Try to switch away old defunct ARM server DTs from nr-gpios to
ngpios (both drivers and DT) and see if people notice.
3. Change the fw_devlink parsing code to have exceptions only for
cases that are using nr-gpios after (1) and (2).

-Saravana


Re: linux-next: manual merge of the driver-core tree with the devicetree tree

2021-04-06 Thread Saravana Kannan
On Tue, Apr 6, 2021 at 2:11 AM Greg KH  wrote:
>
> On Tue, Apr 06, 2021 at 06:19:45PM +1000, Stephen Rothwell wrote:
> > Hi all,
> >
> > Today's linux-next merge of the driver-core tree got a conflict in:
> >
> >   drivers/of/property.c
> >
> > between commit:
> >
> >   3915fed92365 ("of: property: Provide missing member description and 
> > remove excess param")
> >
> > from the devicetree tree and commit:
> >
> >   f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint")
> >
> > from the driver-core tree.
> >
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
>
> Change looks good to me, thanks!

LGTM too.

-Saravana


Re: [RFC] clk: add boot clock support

2021-04-05 Thread Saravana Kannan
On Mon, Apr 5, 2021 at 3:43 PM Sebastian Reichel
 wrote:
>
> Hi,
>
> On Tue, Mar 30, 2021 at 10:05:45AM -0700, Saravana Kannan wrote:
> > On Tue, Mar 30, 2021 at 2:09 AM Sebastian Reichel
> >  wrote:
> > > On Mon, Mar 29, 2021 at 05:36:11PM -0700, Saravana Kannan wrote:
> > > > On Mon, Mar 29, 2021 at 2:53 PM Sebastian Reichel
> > > >  wrote:
> > > > > On Mon, Mar 29, 2021 at 01:03:20PM -0700, Saravana Kannan wrote:
> > > > > > On Fri, Mar 26, 2021 at 2:52 AM Sebastian Reichel
> > > > > >  wrote:
> > > > > > > On Thu, Mar 25, 2021 at 06:55:52PM -0700, Saravana Kannan wrote:
> > > > > > > > On Thu, Mar 25, 2021 at 6:27 PM Rob Herring  
> > > > > > > > wrote:
> > > > > > > > > On Thu, Mar 18, 2021 at 10:03:18PM +0100, Sebastian Reichel 
> > > > > > > > > wrote:
> > > > > > > > > > On Congatec's QMX6 system on module one of the i.MX6 fixed 
> > > > > > > > > > clocks
> > > > > > > > > > is provided by an I2C RTC. Specifying this properly results 
> > > > > > > > > > in a
> > > > > > > > > > circular dependency, since the I2C RTC (and thus its clock) 
> > > > > > > > > > cannot
> > > > > > > > > > be initialized without the i.MX6 clock controller being 
> > > > > > > > > > initialized.
> > > > > > > > > >
> > > > > > > > > > With current code the following path is executed when i.MX6 
> > > > > > > > > > clock
> > > > > > > > > > controller is probed (and ckil clock is specified to be the 
> > > > > > > > > > I2C RTC
> > > > > > > > > > via DT):
> > > > > > > > > >
> > > > > > > > > > 1. imx6q_obtain_fixed_clk_hw(ccm_node, "ckil", 0);
> > > > > > > > > > 2. of_clk_get_by_name(ccm_node, "ckil");
> > > > > > > > > > 3. __of_clk_get(ccm_node, 0, ccm_node->full_name, "ckil");
> > > > > > > > > > 4. of_clk_get_hw(ccm_node, 0, "ckil")
> > > > > > > > > > 5. spec = of_parse_clkspec(ccm_node, 0, "ckil"); // get 
> > > > > > > > > > phandle
> > > > > > > > > > 6. of_clk_get_hw_from_clkspec(); // returns 
> > > > > > > > > > -EPROBE_DEFER
> > > > > > > > > > 7. error is propagated back, i.MX6q clock controller is 
> > > > > > > > > > probe deferred
> > > > > > > > > > 8. I2C controller is never initialized without clock 
> > > > > > > > > > controller
> > > > > > > > > >I2C RTC is never initialized without I2C controller
> > > > > > > > > >CKIL clock is never initialized without I2C RTC
> > > > > > > > > >clock controller is never initialized without CKIL
> > > > > > > > > >
> > > > > > > > > > To fix the circular dependency this registers a dummy clock 
> > > > > > > > > > when
> > > > > > > > > > the RTC clock is tried to be acquired. The dummy clock will 
> > > > > > > > > > later
> > > > > > > > > > be unregistered when the proper clock is registered for the 
> > > > > > > > > > RTC
> > > > > > > > > > DT node. IIUIC clk_core_reparent_orphans() will take care of
> > > > > > > > > > fixing up the clock tree.
> > > > > > > > > >
> > > > > > > > > > NOTE: For now the patch is compile tested only. If this 
> > > > > > > > > > approach
> > > > > > > > > > is the correct one I will do some testing and properly 
> > > > > > > > > > submit this.
> > > > > > > > > > You can find all the details about the hardware in the 
> > > > > > > > > > following
> > > > > > > > > > patchset:
> > > > > > > > > >
&g

Re: [PATCH] of: property: do not create device links from *nr-gpios

2021-04-05 Thread Saravana Kannan
On Mon, Apr 5, 2021 at 1:10 PM Ilya Lipnitskiy
 wrote:
>
> Hi Saravana,
>
> On Mon, Apr 5, 2021 at 1:01 PM Saravana Kannan  wrote:
> >
> > On Sun, Apr 4, 2021 at 8:14 PM Ilya Lipnitskiy
> >  wrote:
> > >
> > > [,]nr-gpios property is used by some GPIO drivers[0] to indicate
> > > the number of GPIOs present on a system, not define a GPIO. nr-gpios is
> > > not configured by #gpio-cells and can't be parsed along with other
> > > "*-gpios" properties.
> > >
> > > scripts/dtc/checks.c also has a special case for nr-gpio{s}. However,
> > > nr-gpio is not really special, so we only need to fix nr-gpios suffix
> > > here.
> >
> > The only example of this that I see is "snps,nr-gpios".
> arch/arm64/boot/dts/apm/apm-shadowcat.dtsi uses "apm,nr-gpios", with
> parsing code in drivers/gpio/gpio-xgene-sb.c. There is also code in
> drivers/gpio/gpio-adnp.c and drivers/gpio/gpio-mockup.c using
> "nr-gpios" without any vendor prefix.

Ah ok. I just grepped the DT files. I'm not sure what Rob's position
is on supporting DT files not in upstream. Thanks for the
clarification.

> I personally don't think causing regressions is good for any reason,

I agree, but this is not a functional regression. Just a warning
that's spit out. I don't have a strong opinion on the stack dump vs
not, but I think we should at least reject future additions like this
and limit the exceptions to exactly what's allowed today. nr-gpios
(without any vendor prefix) is especially annoying to me.

Looks like even the DT spec has an exception only for vendor,nr and not just nr.
https://github.com/devicetree-org/dt-schema/blob/master/schemas/gpio/gpio-consumer.yaml#L20

-Saravana

> so I think we need to fix this in stable releases. The patch can be
> reverted when nr-gpios is no longer special. The logic here should
> also be aligned with scripts/dtc/checks.c, I actually submitted a
> patch to warn about "nr-gpios" only and not "nr-gpio" in dtc as well:
> https://www.spinics.net/lists/devicetree-compiler/msg03619.html
>
> Ilya


Re: [PATCH] of: property: do not create device links from *nr-gpios

2021-04-05 Thread Saravana Kannan
On Sun, Apr 4, 2021 at 8:14 PM Ilya Lipnitskiy
 wrote:
>
> [,]nr-gpios property is used by some GPIO drivers[0] to indicate
> the number of GPIOs present on a system, not define a GPIO. nr-gpios is
> not configured by #gpio-cells and can't be parsed along with other
> "*-gpios" properties.
>
> scripts/dtc/checks.c also has a special case for nr-gpio{s}. However,
> nr-gpio is not really special, so we only need to fix nr-gpios suffix
> here.

The only example of this that I see is "snps,nr-gpios". I personally
would like to deprecate such overlapping/ambiguous definitions.

Maybe fix up the DT? This warning is a nice reminder that the DT needs
to be updated (if it can be). Outside of that, it's not causing any
issues that I know of.

If they are, then we can pick up a patch similar to this. I'd also
limit this fix to "snps,nr-gpios" so that future attempts to use
-gpios for anything other than listing GPIOs triggers a warning.

Rob, thoughts?

Thanks,
Saravana

>
> [0]: nr-gpios is referenced in Documentation/devicetree/bindings/gpio:
>  - gpio-adnp.txt
>  - gpio-xgene-sb.txt
>  - gpio-xlp.txt
>  - snps,dw-apb-gpio.yaml
>
> Fixes errors such as:
>   OF: /palmbus@30/gpio@600: could not find phandle
>
> Call Trace:
>   of_phandle_iterator_next+0x8c/0x16c
>   __of_parse_phandle_with_args+0x38/0xb8
>   of_parse_phandle_with_args+0x28/0x3c
>   parse_suffix_prop_cells+0x80/0xac
>   parse_gpios+0x20/0x2c
>   of_link_to_suppliers+0x18c/0x288
>   of_link_to_suppliers+0x1fc/0x288
>   device_add+0x4e0/0x734
>   of_platform_device_create_pdata+0xb8/0xfc
>   of_platform_bus_create+0x170/0x214
>   of_platform_populate+0x88/0xf4
>   __dt_register_buses+0xbc/0xf0
>   plat_of_setup+0x1c/0x34
>
> Fixes: 7f00be96f125 ("of: property: Add device link support for 
> interrupt-parent, dmas and -gpio(s)")
> Signed-off-by: Ilya Lipnitskiy 
> Cc: Saravana Kannan 
> Cc:  # 5.5.x
> ---
>  drivers/of/property.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 2bb3158c9e43..24672c295603 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1271,7 +1271,16 @@ DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
>  DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
>  DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
>  DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> -DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> +
> +static struct device_node *parse_gpios(struct device_node *np,
> +  const char *prop_name, int index)
> +{
> +   if (!strcmp_suffix(prop_name, "nr-gpios"))
> +   return NULL;
> +
> +   return parse_suffix_prop_cells(np, prop_name, index, "-gpios",
> +  "#gpio-cells");
> +}
>
>  static struct device_node *parse_iommu_maps(struct device_node *np,
> const char *prop_name, int index)
> --
> 2.31.1
>


[PATCH v1 2/2] driver core: Improve fw_devlink & deferred_probe_timeout interaction

2021-04-01 Thread Saravana Kannan
deferred_probe_timeout kernel commandline parameter allows probing of
consumer devices if the supplier devices don't have any drivers.

fw_devlink=on will indefintely block probe() calls on a device if all
its suppliers haven't probed successfully. This completely skips calls
to driver_deferred_probe_check_state() since that's only called when a
.probe() function calls framework APIs. So fw_devlink=on breaks
deferred_probe_timeout.

deferred_probe_timeout in its current state also ignores a lot of
information that's now available to the kernel. It assumes all suppliers
that haven't probed when the timer expires (or when initcalls are done
on a static kernel) will never probe and fails any calls to acquire
resources from these unprobed suppliers.

However, this assumption by deferred_probe_timeout isn't true under many
conditions. For example:
- If the consumer happens to be before the supplier in the deferred
  probe list.
- If the supplier itself is waiting on its supplier to probe.

This patch fixes both these issues by relaxing device links between
devices only if the supplier doesn't have any driver that could match
with (NOT bound to) the supplier device. This way, we only fail attempts
to acquire resources from suppliers that truly don't have any driver vs
suppliers that just happen to not have probed yet.

Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h |  1 +
 drivers/base/core.c | 64 -
 drivers/base/dd.c   |  5 
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 1b44ed588f66..e5f9b7e656c3 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -191,6 +191,7 @@ extern void device_links_driver_cleanup(struct device *dev);
 extern void device_links_no_driver(struct device *dev);
 extern bool device_links_busy(struct device *dev);
 extern void device_links_unbind_consumers(struct device *dev);
+extern void fw_devlink_drivers_done(void);
 
 /* device pm support */
 void device_pm_move_to_tail(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index de518178ac36..c05dae75b696 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -51,6 +51,7 @@ static LIST_HEAD(deferred_sync);
 static unsigned int defer_sync_state_count = 1;
 static DEFINE_MUTEX(fwnode_link_lock);
 static bool fw_devlink_is_permissive(void);
+static bool fw_devlink_drv_reg_done;
 
 /**
  * fwnode_link_add - Create a link between two fwnode_handles.
@@ -1598,6 +1599,52 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle 
*fwnode)
fw_devlink_parse_fwtree(child);
 }
 
+static void fw_devlink_relax_link(struct device_link *link)
+{
+   if (!(link->flags & DL_FLAG_INFERRED))
+   return;
+
+   if (link->flags == (DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE))
+   return;
+
+   pm_runtime_drop_link(link);
+   link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE;
+   dev_dbg(link->consumer, "Relaxing link with %s\n",
+   dev_name(link->supplier));
+}
+
+static int fw_devlink_no_driver(struct device *dev, void *data)
+{
+   struct device_link *link = to_devlink(dev);
+
+   if (!link->supplier->can_match)
+   fw_devlink_relax_link(link);
+
+   return 0;
+}
+
+void fw_devlink_drivers_done(void)
+{
+   fw_devlink_drv_reg_done = true;
+   device_links_write_lock();
+   class_for_each_device(_class, NULL, NULL,
+ fw_devlink_no_driver);
+   device_links_write_unlock();
+}
+
+static void fw_devlink_unblock_consumers(struct device *dev)
+{
+   struct device_link *link;
+
+   if (!fw_devlink_flags || fw_devlink_is_permissive())
+   return;
+
+   device_links_write_lock();
+   list_for_each_entry(link, >links.consumers, s_node)
+   fw_devlink_relax_link(link);
+   device_links_write_unlock();
+}
+
 /**
  * fw_devlink_relax_cycle - Convert cyclic links to SYNC_STATE_ONLY links
  * @con: Device to check dependencies for.
@@ -1634,13 +1681,7 @@ static int fw_devlink_relax_cycle(struct device *con, 
void *sup)
 
ret = 1;
 
-   if (!(link->flags & DL_FLAG_INFERRED))
-   continue;
-
-   pm_runtime_drop_link(link);
-   link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE;
-   dev_dbg(link->consumer, "Relaxing link with %s\n",
-   dev_name(link->supplier));
+   fw_devlink_relax_link(link);
}
return ret;
 }
@@ -3275,6 +3316,15 @@ int device_add(struct device *dev)
}
 
bus_probe_device(dev);
+
+   /*
+* If all driver registration is done and a newly added device doesn't
+* match with any driver, don't block its consumers from probing in
+* case the consumer device is ab

[PATCH v1 0/2] Fix deferred_probe_timeout and fw_devlink=on

2021-04-01 Thread Saravana Kannan
This series fixes existing bugs in deferred_probe_timeout and fixes some
interaction with fw_devlink=on.

Saravana Kannan (2):
  driver core: Fix locking bug in deferred_probe_timeout_work_func()
  driver core: Improve fw_devlink & deferred_probe_timeout interaction

 drivers/base/base.h |  1 +
 drivers/base/core.c | 64 -
 drivers/base/dd.c   | 13 ++---
 3 files changed, 68 insertions(+), 10 deletions(-)

-- 
2.31.0.208.g409f899ff0-goog



[PATCH v1 1/2] driver core: Fix locking bug in deferred_probe_timeout_work_func()

2021-04-01 Thread Saravana Kannan
list_for_each_entry_safe() is only useful if we are deleting nodes in a
linked list within the loop. It doesn't protect against other threads
adding/deleting nodes to the list in parallel. We need to grab
deferred_probe_mutex when traversing the deferred_probe_pending_list.

Cc: sta...@vger.kernel.org
Fixes: 25b4e70dcce9 ("driver core: allow stopping deferred probe after init")
Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 20b69b5e0e91..28ad8afd87bc 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -291,14 +291,16 @@ int driver_deferred_probe_check_state(struct device *dev)
 
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
-   struct device_private *private, *p;
+   struct device_private *p;
 
driver_deferred_probe_timeout = 0;
driver_deferred_probe_trigger();
flush_work(_probe_work);
 
-   list_for_each_entry_safe(private, p, _probe_pending_list, 
deferred_probe)
-   dev_info(private->device, "deferred probe pending\n");
+   mutex_lock(_probe_mutex);
+   list_for_each_entry(p, _probe_pending_list, deferred_probe)
+   dev_info(p->device, "deferred probe pending\n");
+   mutex_unlock(_probe_mutex);
wake_up_all(_timeout_waitqueue);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, 
deferred_probe_timeout_work_func);
-- 
2.31.0.208.g409f899ff0-goog



Re: [PATCH v1] of: property: fw_devlink: Add support for remote-endpoint

2021-03-30 Thread Saravana Kannan
On Tue, Mar 30, 2021 at 12:36 PM Stephen Boyd  wrote:
>
> Quoting Saravana Kannan (2021-03-30 11:50:55)
> > remote-endpoint property seems to always come in pairs where two devices
> > point to each other. So, we can't really tell from DT if there is a
> > functional probe order dependency between these two devices.
> >
> > However, there can be other dependencies between two devices that point
> > to each other with remote-endpoint. This non-remote-endpoint dependency
> > combined with one of the remote-endpoint dependency can lead to a cyclic
> > dependency[1].
> >
> > To avoid this cyclic dependency from incorrectly blocking probes,
> > fw_devlink needs to be made aware of remote-endpoint dependencies even
> > though remote-endpoint dependencies by themselves won't affect probe
> > ordering (because fw_devlink will see the cyclic dependency between
> > remote-endpoint devices and ignore the dependencies that cause the
> > cycle).
> >
> > Also, if a device ever needs to know if a non-probe-blocking
> > remote-endpoint has finished probing, it can now use the sync_state() to
> > figure it out.
> >
> > [1] - 
> > https://lore.kernel.org/lkml/CAGETcx9Snf23wrXqjDhJiTok9M3GcoVYDSyNYSMj9QnSRrA=c...@mail.gmail.com/#t
> > Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by 
> > default""")
> > Reported-by: Stephen Boyd 
> > Signed-off-by: Saravana Kannan 
> > ---
>
> Tested-by: Stephen Boyd 

Thanks!

>
> > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > index 5036a362f52e..2bb3158c9e43 100644
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1225,6 +1230,8 @@ static struct device_node *parse_##fname(struct 
> > device_node *np,   \
> >   * @parse_prop.prop_name: Name of property holding a phandle value
> >   * @parse_prop.index: For properties holding a list of phandles, this is 
> > the
> >   *   index into the list
> > + * @optional: The property can be an optional dependency.
>
> This bit conflicted for me on linux-next today so I dropped it in favor
> of 3915fed92365 ("of: property: Provide missing member description and
> remove excess param").

Ah looks like a change went into DT git repo but not in driver-core
yet. Yeah, dropping this bit is fine.

Rob/Greg,

I'll leave it to you to deal with the conflict?  I can't send to DT
because the fix needs to land in driver-core because of boot issues
and I can't resolve the conflict in driver-core because the
conflicting change isn't in driver-core yet.

Thanks,
Saravana


[PATCH v1] of: property: fw_devlink: Add support for remote-endpoint

2021-03-30 Thread Saravana Kannan
remote-endpoint property seems to always come in pairs where two devices
point to each other. So, we can't really tell from DT if there is a
functional probe order dependency between these two devices.

However, there can be other dependencies between two devices that point
to each other with remote-endpoint. This non-remote-endpoint dependency
combined with one of the remote-endpoint dependency can lead to a cyclic
dependency[1].

To avoid this cyclic dependency from incorrectly blocking probes,
fw_devlink needs to be made aware of remote-endpoint dependencies even
though remote-endpoint dependencies by themselves won't affect probe
ordering (because fw_devlink will see the cyclic dependency between
remote-endpoint devices and ignore the dependencies that cause the
cycle).

Also, if a device ever needs to know if a non-probe-blocking
remote-endpoint has finished probing, it can now use the sync_state() to
figure it out.

[1] - 
https://lore.kernel.org/lkml/CAGETcx9Snf23wrXqjDhJiTok9M3GcoVYDSyNYSMj9QnSRrA=c...@mail.gmail.com/#t
Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by 
default""")
Reported-by: Stephen Boyd 
Signed-off-by: Saravana Kannan 
---
Rob/Greg,

This needs to go into driver-core due to the Fixes.

-Saravana

 drivers/of/property.c | 48 ---
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 5036a362f52e..2bb3158c9e43 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1038,6 +1038,25 @@ static bool of_is_ancestor_of(struct device_node 
*test_ancestor,
return false;
 }
 
+static struct device_node *of_get_compat_node(struct device_node *np)
+{
+   of_node_get(np);
+
+   while (np) {
+   if (!of_device_is_available(np)) {
+   of_node_put(np);
+   np = NULL;
+   }
+
+   if (of_find_property(np, "compatible", NULL))
+   break;
+
+   np = of_get_next_parent(np);
+   }
+
+   return np;
+}
+
 /**
  * of_link_to_phandle - Add fwnode link to supplier from supplier phandle
  * @con_np: consumer device tree node
@@ -1061,25 +1080,11 @@ static int of_link_to_phandle(struct device_node 
*con_np,
struct device *sup_dev;
struct device_node *tmp_np = sup_np;
 
-   of_node_get(sup_np);
/*
 * Find the device node that contains the supplier phandle.  It may be
 * @sup_np or it may be an ancestor of @sup_np.
 */
-   while (sup_np) {
-
-   /* Don't allow linking to a disabled supplier */
-   if (!of_device_is_available(sup_np)) {
-   of_node_put(sup_np);
-   sup_np = NULL;
-   }
-
-   if (of_find_property(sup_np, "compatible", NULL))
-   break;
-
-   sup_np = of_get_next_parent(sup_np);
-   }
-
+   sup_np = of_get_compat_node(sup_np);
if (!sup_np) {
pr_debug("Not linking %pOFP to %pOFP - No device\n",
 con_np, tmp_np);
@@ -1225,6 +1230,8 @@ static struct device_node *parse_##fname(struct 
device_node *np,   \
  * @parse_prop.prop_name: Name of property holding a phandle value
  * @parse_prop.index: For properties holding a list of phandles, this is the
  *   index into the list
+ * @optional: The property can be an optional dependency.
+ * @node_not_dev: The consumer node containing the property is never a device.
  *
  * Returns:
  * parse_prop() return values are
@@ -1236,6 +1243,7 @@ struct supplier_bindings {
struct device_node *(*parse_prop)(struct device_node *np,
  const char *prop_name, int index);
bool optional;
+   bool node_not_dev;
 };
 
 DEFINE_SIMPLE_PROP(clocks, "clocks", "#clock-cells")
@@ -1260,6 +1268,7 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
 DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
 DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
 DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
+DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
 DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
 DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
 DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
@@ -1334,6 +1343,7 @@ static const struct supplier_bindings 
of_supplier_bindings[] = {
{ .parse_prop = parse_pinctrl6, },
{ .parse_prop = parse_pinctrl7, },
{ .parse_prop = parse_pinctrl8, },
+   { .parse_prop = parse_remote_endpoint, .node_not_dev = true, },
{ .parse_prop = parse_gpio_compat, },
{ .parse_prop = parse_interrupts, },
{ .parse_prop = parse_regulators, },
@@ -1378,10 

Re: [RFC] clk: add boot clock support

2021-03-30 Thread Saravana Kannan
On Tue, Mar 30, 2021 at 2:09 AM Sebastian Reichel
 wrote:
>
> Hi,
>
> On Mon, Mar 29, 2021 at 05:36:11PM -0700, Saravana Kannan wrote:
> > On Mon, Mar 29, 2021 at 2:53 PM Sebastian Reichel
> >  wrote:
> > > On Mon, Mar 29, 2021 at 01:03:20PM -0700, Saravana Kannan wrote:
> > > > On Fri, Mar 26, 2021 at 2:52 AM Sebastian Reichel
> > > >  wrote:
> > > > > On Thu, Mar 25, 2021 at 06:55:52PM -0700, Saravana Kannan wrote:
> > > > > > On Thu, Mar 25, 2021 at 6:27 PM Rob Herring  wrote:
> > > > > > > On Thu, Mar 18, 2021 at 10:03:18PM +0100, Sebastian Reichel wrote:
> > > > > > > > On Congatec's QMX6 system on module one of the i.MX6 fixed 
> > > > > > > > clocks
> > > > > > > > is provided by an I2C RTC. Specifying this properly results in a
> > > > > > > > circular dependency, since the I2C RTC (and thus its clock) 
> > > > > > > > cannot
> > > > > > > > be initialized without the i.MX6 clock controller being 
> > > > > > > > initialized.
> > > > > > > >
> > > > > > > > With current code the following path is executed when i.MX6 
> > > > > > > > clock
> > > > > > > > controller is probed (and ckil clock is specified to be the I2C 
> > > > > > > > RTC
> > > > > > > > via DT):
> > > > > > > >
> > > > > > > > 1. imx6q_obtain_fixed_clk_hw(ccm_node, "ckil", 0);
> > > > > > > > 2. of_clk_get_by_name(ccm_node, "ckil");
> > > > > > > > 3. __of_clk_get(ccm_node, 0, ccm_node->full_name, "ckil");
> > > > > > > > 4. of_clk_get_hw(ccm_node, 0, "ckil")
> > > > > > > > 5. spec = of_parse_clkspec(ccm_node, 0, "ckil"); // get phandle
> > > > > > > > 6. of_clk_get_hw_from_clkspec(); // returns -EPROBE_DEFER
> > > > > > > > 7. error is propagated back, i.MX6q clock controller is probe 
> > > > > > > > deferred
> > > > > > > > 8. I2C controller is never initialized without clock controller
> > > > > > > >I2C RTC is never initialized without I2C controller
> > > > > > > >CKIL clock is never initialized without I2C RTC
> > > > > > > >clock controller is never initialized without CKIL
> > > > > > > >
> > > > > > > > To fix the circular dependency this registers a dummy clock when
> > > > > > > > the RTC clock is tried to be acquired. The dummy clock will 
> > > > > > > > later
> > > > > > > > be unregistered when the proper clock is registered for the RTC
> > > > > > > > DT node. IIUIC clk_core_reparent_orphans() will take care of
> > > > > > > > fixing up the clock tree.
> > > > > > > >
> > > > > > > > NOTE: For now the patch is compile tested only. If this approach
> > > > > > > > is the correct one I will do some testing and properly submit 
> > > > > > > > this.
> > > > > > > > You can find all the details about the hardware in the following
> > > > > > > > patchset:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-devicetree/20210222171247.97609-1-sebastian.reic...@collabora.com/
> > > > > > > >
> > > > > > > > Signed-off-by: Sebastian Reichel 
> > > > > > > > 
> > > > > > > > ---
> > > > > > > >  .../bindings/clock/clock-bindings.txt |   7 +
> > > > > > > >  drivers/clk/clk.c | 146 
> > > > > > > > ++
> > > > > > > >  2 files changed, 153 insertions(+)
> > > > > > > >
> > > > > > > > diff --git 
> > > > > > > > a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
> > > > > > > > b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > > > > > index f2ea53832ac6..66d67ff4aa0f 100644
> > > > > > > > --- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
>

Re: [PATCH] clk: Mark fwnodes when their clock provider is added

2021-03-30 Thread Saravana Kannan
On Tue, Mar 30, 2021 at 8:42 AM Guenter Roeck  wrote:
>
> On Wed, Feb 10, 2021 at 01:44:34PM +0200, Tudor Ambarus wrote:
> > This is a follow-up for:
> > commit 3c9ea42802a1 ("clk: Mark fwnodes when their clock provider is 
> > added/removed")
> >
> > The above commit updated the deprecated of_clk_add_provider(),
> > but missed to update the preferred of_clk_add_hw_provider().
> > Update it now.
> >
> > Signed-off-by: Tudor Ambarus 
> > Reviewed-by: Saravana Kannan 
> > ---
> >  drivers/clk/clk.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 27ff90eacb1f..9370e4dfecae 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -4594,6 +4594,8 @@ int of_clk_add_hw_provider(struct device_node *np,
> >   if (ret < 0)
> >   of_clk_del_provider(np);
> >
> > + fwnode_dev_initialized(>fwnode, true);
> > +
>
> This causes a crash when booting raspi2 images in qemu.
>
> [   22.123618] Unable to handle kernel NULL pointer dereference at virtual 
> address 0028
> [   22.123830] pgd = (ptrval)
> [   22.123992] [0028] *pgd=
> [   22.124579] Internal error: Oops: 5 [#1] SMP ARM
> ...
> [   22.141624] [] (of_clk_add_hw_provider) from [] 
> (devm_of_clk_add_hw_provider+0x48/0x80)
> [   22.141819] [] (devm_of_clk_add_hw_provider) from [] 
> (raspberrypi_clk_probe+0x25c/0x384)
> [   22.141976] [] (raspberrypi_clk_probe) from [] 
> (platform_probe+0x5c/0xb8)
> [   22.142114] [] (platform_probe) from [] 
> (really_probe+0xf0/0x39c)
> [   22.142246] [] (really_probe) from [] 
> (driver_probe_device+0x68/0xc0)
> [   22.142377] [] (driver_probe_device) from [] 
> (bus_for_each_drv+0x84/0xc8)...
>
> np can (and will) be NULL here. See of_clk_set_defaults().

Thanks for the report. It was reported earlier by Marek and there's a
discussion going on about it in the thread.

-Saravana


Re: [RFC] clk: add boot clock support

2021-03-29 Thread Saravana Kannan
On Mon, Mar 29, 2021 at 2:53 PM Sebastian Reichel
 wrote:
>
> Hi,
>
> On Mon, Mar 29, 2021 at 01:03:20PM -0700, Saravana Kannan wrote:
> > On Fri, Mar 26, 2021 at 2:52 AM Sebastian Reichel
> >  wrote:
> > > On Thu, Mar 25, 2021 at 06:55:52PM -0700, Saravana Kannan wrote:
> > > > On Thu, Mar 25, 2021 at 6:27 PM Rob Herring  wrote:
> > > > > On Thu, Mar 18, 2021 at 10:03:18PM +0100, Sebastian Reichel wrote:
> > > > > > On Congatec's QMX6 system on module one of the i.MX6 fixed clocks
> > > > > > is provided by an I2C RTC. Specifying this properly results in a
> > > > > > circular dependency, since the I2C RTC (and thus its clock) cannot
> > > > > > be initialized without the i.MX6 clock controller being initialized.
> > > > > >
> > > > > > With current code the following path is executed when i.MX6 clock
> > > > > > controller is probed (and ckil clock is specified to be the I2C RTC
> > > > > > via DT):
> > > > > >
> > > > > > 1. imx6q_obtain_fixed_clk_hw(ccm_node, "ckil", 0);
> > > > > > 2. of_clk_get_by_name(ccm_node, "ckil");
> > > > > > 3. __of_clk_get(ccm_node, 0, ccm_node->full_name, "ckil");
> > > > > > 4. of_clk_get_hw(ccm_node, 0, "ckil")
> > > > > > 5. spec = of_parse_clkspec(ccm_node, 0, "ckil"); // get phandle
> > > > > > 6. of_clk_get_hw_from_clkspec(); // returns -EPROBE_DEFER
> > > > > > 7. error is propagated back, i.MX6q clock controller is probe 
> > > > > > deferred
> > > > > > 8. I2C controller is never initialized without clock controller
> > > > > >I2C RTC is never initialized without I2C controller
> > > > > >CKIL clock is never initialized without I2C RTC
> > > > > >clock controller is never initialized without CKIL
> > > > > >
> > > > > > To fix the circular dependency this registers a dummy clock when
> > > > > > the RTC clock is tried to be acquired. The dummy clock will later
> > > > > > be unregistered when the proper clock is registered for the RTC
> > > > > > DT node. IIUIC clk_core_reparent_orphans() will take care of
> > > > > > fixing up the clock tree.
> > > > > >
> > > > > > NOTE: For now the patch is compile tested only. If this approach
> > > > > > is the correct one I will do some testing and properly submit this.
> > > > > > You can find all the details about the hardware in the following
> > > > > > patchset:
> > > > > >
> > > > > > https://lore.kernel.org/linux-devicetree/20210222171247.97609-1-sebastian.reic...@collabora.com/
> > > > > >
> > > > > > Signed-off-by: Sebastian Reichel 
> > > > > > ---
> > > > > >  .../bindings/clock/clock-bindings.txt |   7 +
> > > > > >  drivers/clk/clk.c | 146 
> > > > > > ++
> > > > > >  2 files changed, 153 insertions(+)
> > > > > >
> > > > > > diff --git 
> > > > > > a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
> > > > > > b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > > > index f2ea53832ac6..66d67ff4aa0f 100644
> > > > > > --- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > > > +++ b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > > > @@ -32,6 +32,13 @@ clock-output-names: Recommended to be a list of 
> > > > > > strings of clock output signal
> > > > > >   Clock consumer nodes must never directly reference
> > > > > >   the provider's clock-output-names property.
> > > > > >
> > > > > > +boot-clock-frequencies: This property is used to specify that a 
> > > > > > clock is enabled
> > > > > > + by default with the provided frequency at 
> > > > > > boot time. This
> > > > > > + is required to break circular clock 
> > > > > > dependencies. For clock
> > > > > > + providers with #clock-cells = 0 this is a 
> > > > 

Re: [PATCH] clk: Mark fwnodes when their clock provider is added

2021-03-29 Thread Saravana Kannan
On Mon, Mar 29, 2021 at 2:25 PM Stephen Boyd  wrote:
>
> Quoting Geert Uytterhoeven (2021-03-26 11:29:55)
> > On Fri, Mar 26, 2021 at 7:13 PM Stephen Boyd  wrote:
> > > Quoting Nicolas Saenz Julienne (2021-03-25 11:25:24)
> > > > >
> > > > > This patch mainly revealed that clk/bcm/clk-raspberrypi.c driver calls
> > > > > devm_of_clk_add_hw_provider(), with a device pointer, which has a NULL
> > > > > dev->of_node. I'm not sure if adding a check for a NULL np in
> > > > > of_clk_add_hw_provider() is a right fix, though.
> > > >
> > > > I believe the right fix is not to call 'devm_of_clk_add_hw_provider()' 
> > > > if
> > > > 'pdev->dev.of_node == NULL'. In such case, which is RPi3's, only the 
> > > > CPU clock
> > > > is used, and it's defined and queried later through
> > > > devm_clk_hw_register_clkdev().
> > > >
> > > > @Marek, I don't mind taking care of it if it's OK with you.
> > > >
> > >
> > > Ah I see this is related to the patch I just reviewed. Can you reference
> > > this in the commit text? And instead of putting the change into the clk
> > > provider let's check for NULL 'np' in of_clk_add_hw_provider() instead
> > > and return 0 if there's nothing to do. That way we don't visit this
> > > problem over and over again.
> >
> > I'm not sure the latter is what we reall want: shouldn't calling
> > *of*_clk_add_hw_provider() with a NULL np be a bug in the provider?
> >
>
> I don't have a strong opinion either way. Would it be useful if the
> function returned an error when 'np' is NULL?

I lean towards returning an error. Not a strong opinion either.

-Saravana

> I guess the caller could
> use that to figure out that it should register a clkdev. But it
> shouldn't hurt to register both a clkdev lookup and a DT provider for
> the same clk. The framework will try the DT path first and then fallback
> to a clkdev lookup otherwise, so we'll be wasting memory for clkdev but
> otherwise be fine.
>
> Really it feels like we should try to unify around a
> devm_clk_add_hw_provider() API that figures out what to do based on if
> the device has an of_node or not. That would mean implementing something
> like clkdev but for a whole provider instead of a single clk. Then this
> question of returning an error would be moot here.


Re: [RFC] clk: add boot clock support

2021-03-29 Thread Saravana Kannan
On Fri, Mar 26, 2021 at 2:52 AM Sebastian Reichel
 wrote:
>
> Hi Saravana,
>
> On Thu, Mar 25, 2021 at 06:55:52PM -0700, Saravana Kannan wrote:
> > On Thu, Mar 25, 2021 at 6:27 PM Rob Herring  wrote:
> > >
> > > +Saravana
> > >
> > > On Thu, Mar 18, 2021 at 10:03:18PM +0100, Sebastian Reichel wrote:
> > > > On Congatec's QMX6 system on module one of the i.MX6 fixed clocks
> > > > is provided by an I2C RTC. Specifying this properly results in a
> > > > circular dependency, since the I2C RTC (and thus its clock) cannot
> > > > be initialized without the i.MX6 clock controller being initialized.
> > > >
> > > > With current code the following path is executed when i.MX6 clock
> > > > controller is probed (and ckil clock is specified to be the I2C RTC
> > > > via DT):
> > > >
> > > > 1. imx6q_obtain_fixed_clk_hw(ccm_node, "ckil", 0);
> > > > 2. of_clk_get_by_name(ccm_node, "ckil");
> > > > 3. __of_clk_get(ccm_node, 0, ccm_node->full_name, "ckil");
> > > > 4. of_clk_get_hw(ccm_node, 0, "ckil")
> > > > 5. spec = of_parse_clkspec(ccm_node, 0, "ckil"); // get phandle
> > > > 6. of_clk_get_hw_from_clkspec(); // returns -EPROBE_DEFER
> > > > 7. error is propagated back, i.MX6q clock controller is probe deferred
> > > > 8. I2C controller is never initialized without clock controller
> > > >I2C RTC is never initialized without I2C controller
> > > >CKIL clock is never initialized without I2C RTC
> > > >clock controller is never initialized without CKIL
> > > >
> > > > To fix the circular dependency this registers a dummy clock when
> > > > the RTC clock is tried to be acquired. The dummy clock will later
> > > > be unregistered when the proper clock is registered for the RTC
> > > > DT node. IIUIC clk_core_reparent_orphans() will take care of
> > > > fixing up the clock tree.
> > > >
> > > > NOTE: For now the patch is compile tested only. If this approach
> > > > is the correct one I will do some testing and properly submit this.
> > > > You can find all the details about the hardware in the following
> > > > patchset:
> > > >
> > > > https://lore.kernel.org/linux-devicetree/20210222171247.97609-1-sebastian.reic...@collabora.com/
> > > >
> > > > Signed-off-by: Sebastian Reichel 
> > > > ---
> > > >  .../bindings/clock/clock-bindings.txt |   7 +
> > > >  drivers/clk/clk.c | 146 ++
> > > >  2 files changed, 153 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
> > > > b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > index f2ea53832ac6..66d67ff4aa0f 100644
> > > > --- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > +++ b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > > > @@ -32,6 +32,13 @@ clock-output-names: Recommended to be a list of 
> > > > strings of clock output signal
> > > >   Clock consumer nodes must never directly reference
> > > >   the provider's clock-output-names property.
> > > >
> > > > +boot-clock-frequencies: This property is used to specify that a clock 
> > > > is enabled
> > > > + by default with the provided frequency at boot 
> > > > time. This
> > > > + is required to break circular clock dependencies. 
> > > > For clock
> > > > + providers with #clock-cells = 0 this is a single 
> > > > u32
> > > > + with the frequency in Hz. Otherwise it's a list of
> > > > + clock cell specifier + frequency in Hz.
> > >
> > > Seems alright to me. I hadn't thought about the aspect of needing to
> > > know the frequency. Other cases probably don't as you only need the
> > > clocks once both components have registered.
> > >
> > > Note this could be lost being threaded in the other series.
> >
> > I read this thread and tried to understand it. But my head isn't right
> > today (lack of sleep) so I couldn't wrap my head around it. I'll look
> > at it again after the weekend. In the meantime, Sebastian can you
> > ple

Re: [RFC] clk: add boot clock support

2021-03-25 Thread Saravana Kannan
On Thu, Mar 25, 2021 at 6:27 PM Rob Herring  wrote:
>
> +Saravana
>
> On Thu, Mar 18, 2021 at 10:03:18PM +0100, Sebastian Reichel wrote:
> > On Congatec's QMX6 system on module one of the i.MX6 fixed clocks
> > is provided by an I2C RTC. Specifying this properly results in a
> > circular dependency, since the I2C RTC (and thus its clock) cannot
> > be initialized without the i.MX6 clock controller being initialized.
> >
> > With current code the following path is executed when i.MX6 clock
> > controller is probed (and ckil clock is specified to be the I2C RTC
> > via DT):
> >
> > 1. imx6q_obtain_fixed_clk_hw(ccm_node, "ckil", 0);
> > 2. of_clk_get_by_name(ccm_node, "ckil");
> > 3. __of_clk_get(ccm_node, 0, ccm_node->full_name, "ckil");
> > 4. of_clk_get_hw(ccm_node, 0, "ckil")
> > 5. spec = of_parse_clkspec(ccm_node, 0, "ckil"); // get phandle
> > 6. of_clk_get_hw_from_clkspec(); // returns -EPROBE_DEFER
> > 7. error is propagated back, i.MX6q clock controller is probe deferred
> > 8. I2C controller is never initialized without clock controller
> >I2C RTC is never initialized without I2C controller
> >CKIL clock is never initialized without I2C RTC
> >clock controller is never initialized without CKIL
> >
> > To fix the circular dependency this registers a dummy clock when
> > the RTC clock is tried to be acquired. The dummy clock will later
> > be unregistered when the proper clock is registered for the RTC
> > DT node. IIUIC clk_core_reparent_orphans() will take care of
> > fixing up the clock tree.
> >
> > NOTE: For now the patch is compile tested only. If this approach
> > is the correct one I will do some testing and properly submit this.
> > You can find all the details about the hardware in the following
> > patchset:
> >
> > https://lore.kernel.org/linux-devicetree/20210222171247.97609-1-sebastian.reic...@collabora.com/
> >
> > Signed-off-by: Sebastian Reichel 
> > ---
> >  .../bindings/clock/clock-bindings.txt |   7 +
> >  drivers/clk/clk.c | 146 ++
> >  2 files changed, 153 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
> > b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > index f2ea53832ac6..66d67ff4aa0f 100644
> > --- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > +++ b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > @@ -32,6 +32,13 @@ clock-output-names: Recommended to be a list of strings 
> > of clock output signal
> >   Clock consumer nodes must never directly reference
> >   the provider's clock-output-names property.
> >
> > +boot-clock-frequencies: This property is used to specify that a clock is 
> > enabled
> > + by default with the provided frequency at boot time. 
> > This
> > + is required to break circular clock dependencies. For 
> > clock
> > + providers with #clock-cells = 0 this is a single u32
> > + with the frequency in Hz. Otherwise it's a list of
> > + clock cell specifier + frequency in Hz.
>
> Seems alright to me. I hadn't thought about the aspect of needing to
> know the frequency. Other cases probably don't as you only need the
> clocks once both components have registered.
>
> Note this could be lost being threaded in the other series.

I read this thread and tried to understand it. But my head isn't right
today (lack of sleep) so I couldn't wrap my head around it. I'll look
at it again after the weekend. In the meantime, Sebastian can you
please point me to the DT file and the specific device nodes (names or
line number) where this cycle is present?

Keeping a clock on until all its consumers probe is part of my TODO
list (next item after fw_devlink=on lands). I already have it working
in AOSP, but need to clean it up for upstream. fw_devlink can also
break *some* cycles (not all). So I'm wondering if the kernel will
solve this automatically soon(ish). If it can solve it automatically,
I'd rather not add new DT bindings because it'll make it more work for
fw_devlink.

Thanks,
Saravana


Re: [PATCH v1 3/3] Revert "Revert "driver core: Set fw_devlink=on by default""

2021-03-25 Thread Saravana Kannan
On Thu, Mar 25, 2021 at 2:19 PM Stephen Boyd  wrote:
>
> Quoting Saravana Kannan (2021-03-02 13:11:32)
> > This reverts commit 3e4c982f1ce75faf5314477b8da296d2d00919df.
> >
> > Since all reported issues due to fw_devlink=on should be addressed by
> > this series, revert the revert. fw_devlink=on Take II.
> >
> > Signed-off-by: Saravana Kannan 
> > ---
>
> This seems to break the display on lazor (see
> arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi) on linux next today
> (next-20210325). I tried booting with fw_devlink=permissive on the
> commandline and the display came up again. Looking at the drivers that
> are in the deferred state there are three:
>
>  localhost ~ # cat /sys/kernel/debug/devices_deferred
>  ae94000.dsi
>  panel
>  2-002d
>
> and the panel has these suppliers:
>
>  localhost ~ # ls /sys/devices/platform/panel/
>  driver_override  power
> supplier:platform:pp3300-dx-edp-regulator
>  modalias subsystemuevent
>  of_node  supplier:i2c:2-002d  waiting_for_supplier
>
> Is there some sort of circular dependency going on that is preventing
> either driver from probing? My understanding is 2-002d is the dsi bridge
> (compatible is ti,sn65dsi86) and that is waiting for the panel to come
> up, and the panel does a circular dependency where it requests the hpd
> gpio from the dsi bridge at probe but then ignores it and tries to get
> the hpd gpio later when powering on the panel. If it didn't do this it
> would probe defer forever because the bridge supplies the hpd gpio to
> the panel and the panel provides the panel to the bridge driver.

I had a side chat with Stephen. The problem is due to a cycle of
dependency between panel and bridge (supplier:i2c:2-002d). panel needs
GPIO from bridge, and bridge has panel as a remote-endpoint. But
fw_delink isn't able to break the cycle because it doesn't parse
"remote-endpoint" yet. So for now, only permissive will work for this
case.

I'll look into adding remote-endpoint support. But that's a bit more
complicated.

-Saravana

>
> >  drivers/base/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 45c75cc96fdc..de518178ac36 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1538,7 +1538,7 @@ static void device_links_purge(struct device *dev)
> >  #define FW_DEVLINK_FLAGS_RPM   (FW_DEVLINK_FLAGS_ON | \
> >  DL_FLAG_PM_RUNTIME)
> >
> > -static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
> > +static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
> >  static int __init fw_devlink_setup(char *arg)
> >  {
> > if (!arg)


Re: [net-next PATCH v7 02/16] net: phy: Introduce fwnode_mdio_find_device()

2021-03-10 Thread Saravana Kannan
On Wed, Mar 10, 2021 at 10:21 PM Calvin Johnson
 wrote:
>
> Define fwnode_mdio_find_device() to get a pointer to the
> mdio_device from fwnode passed to the function.
>
> Refactor of_mdio_find_device() to use fwnode_mdio_find_device().
>
> Signed-off-by: Calvin Johnson 
> ---
>
> Changes in v7:
> - correct fwnode_mdio_find_device() description
>
> Changes in v6:
> - fix warning for function parameter of fwnode_mdio_find_device()
>
> Changes in v5: None
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
>
>  drivers/net/mdio/of_mdio.c   | 11 +--
>  drivers/net/phy/phy_device.c | 23 +++
>  include/linux/phy.h  |  6 ++
>  3 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c
> index ea9d5855fb52..d5e0970b2561 100644
> --- a/drivers/net/mdio/of_mdio.c
> +++ b/drivers/net/mdio/of_mdio.c
> @@ -347,16 +347,7 @@ EXPORT_SYMBOL(of_mdiobus_register);
>   */
>  struct mdio_device *of_mdio_find_device(struct device_node *np)
>  {
> -   struct device *d;
> -
> -   if (!np)
> -   return NULL;
> -
> -   d = bus_find_device_by_of_node(_bus_type, np);
> -   if (!d)
> -   return NULL;
> -
> -   return to_mdio_device(d);
> +   return fwnode_mdio_find_device(of_fwnode_handle(np));
>  }
>  EXPORT_SYMBOL(of_mdio_find_device);
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index cc38e326405a..daabb17bba00 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2819,6 +2819,29 @@ static bool phy_drv_supports_irq(struct phy_driver 
> *phydrv)
> return phydrv->config_intr && phydrv->handle_interrupt;
>  }
>
> +/**
> + * fwnode_mdio_find_device - Given a fwnode, find the mdio_device
> + * @fwnode: pointer to the mdio_device's fwnode
> + *
> + * If successful, returns a pointer to the mdio_device with the embedded
> + * struct device refcount incremented by one, or NULL on failure.
> + * The caller should call put_device() on the mdio_device after its use.
> + */
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode)
> +{
> +   struct device *d;
> +
> +   if (!fwnode)
> +   return NULL;
> +
> +   d = bus_find_device_by_fwnode(_bus_type, fwnode);

Sorry about the late review, but can you look into using
get_dev_from_fwnode()? As long as you aren't registering two devices
for the same fwnode, it's an O(1) operation instead of having to loop
through a list of devices in a bus. You can check the returned
device's bus type if you aren't sure about not registering two devices
with the same fw_node and then fall back to this looping.

-Saravana

> +   if (!d)
> +   return NULL;
> +
> +   return to_mdio_device(d);
> +}
> +EXPORT_SYMBOL(fwnode_mdio_find_device);
> +
>  /**
>   * phy_probe - probe and init a PHY device
>   * @dev: device to probe and init
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 1a12e4436b5b..f5eb1e3981a1 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1366,11 +1366,17 @@ struct phy_device *phy_device_create(struct mii_bus 
> *bus, int addr, u32 phy_id,
>  bool is_c45,
>  struct phy_c45_device_ids *c45_ids);
>  #if IS_ENABLED(CONFIG_PHYLIB)
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode);
>  struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool 
> is_c45);
>  int phy_device_register(struct phy_device *phy);
>  void phy_device_free(struct phy_device *phydev);
>  #else
>  static inline
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode)
> +{
> +   return 0;
> +}
> +static inline
>  struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
>  {
> return NULL;
> --
> 2.17.1
>


Re: [PATCH v1 1/3] driver core: Avoid pointless deferred probe attempts

2021-03-09 Thread Saravana Kannan
On Tue, Mar 2, 2021 at 1:11 PM Saravana Kannan  wrote:
>
> There's no point in adding a device to the deferred probe list if we
> know for sure that it doesn't have a matching driver. So, check if a
> device can match with a driver before adding it to the deferred probe
> list.
>
> Signed-off-by: Saravana Kannan 

Rafael/Greg,

Any concerns with this specific patch? Do you see any bugs? I'm asking
because some of the other improvements I'm working on depend on this
flag. So I want to make sure this can land before I take my work in
progress too far.

-Saravana

> ---
>  drivers/base/dd.c  | 6 ++
>  include/linux/device.h | 4 
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9179825ff646..f18963f42e21 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
> deferred_probe_work_func);
>
>  void driver_deferred_probe_add(struct device *dev)
>  {
> +   if (!dev->can_match)
> +   return;
> +
> mutex_lock(_probe_mutex);
> if (list_empty(>p->deferred_probe)) {
> dev_dbg(dev, "Added to deferred list\n");
> @@ -726,6 +729,7 @@ static int driver_probe_device(struct device_driver *drv, 
> struct device *dev)
> if (!device_is_registered(dev))
> return -ENODEV;
>
> +   dev->can_match = true;
> pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
>  drv->bus->name, __func__, dev_name(dev), drv->name);
>
> @@ -829,6 +833,7 @@ static int __device_attach_driver(struct device_driver 
> *drv, void *_data)
> return 0;
> } else if (ret == -EPROBE_DEFER) {
> dev_dbg(dev, "Device match requests probe deferral\n");
> +   dev->can_match = true;
> driver_deferred_probe_add(dev);
> } else if (ret < 0) {
> dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> @@ -1064,6 +1069,7 @@ static int __driver_attach(struct device *dev, void 
> *data)
> return 0;
> } else if (ret == -EPROBE_DEFER) {
> dev_dbg(dev, "Device match requests probe deferral\n");
> +   dev->can_match = true;
> driver_deferred_probe_add(dev);
> } else if (ret < 0) {
> dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index ba660731bd25..569932d282c0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -439,6 +439,9 @@ struct dev_links_info {
>   * @state_synced: The hardware state of this device has been synced to match
>   *   the software state of this device by calling the driver/bus
>   *   sync_state() callback.
> + * @can_match: The device has matched with a driver at least once or it is in
> + * a bus (like AMBA) which can't check for matching drivers until
> + * other devices probe successfully.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   * architecture supports non-coherent devices.
>   * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> @@ -545,6 +548,7 @@ struct device {
> booloffline:1;
> boolof_node_reused:1;
> boolstate_synced:1;
> +   boolcan_match:1;
>  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> --
> 2.30.1.766.gb4fecdf3b7-goog
>


Re: [PATCH v3] amba: Remove deferred device addition

2021-03-08 Thread Saravana Kannan
On Sun, Mar 7, 2021 at 11:28 PM Marek Szyprowski
 wrote:
>
> Hi Saravana,
>
> On 05.03.2021 19:02, Saravana Kannan wrote:
> > On Fri, Mar 5, 2021 at 3:45 AM Marek Szyprowski
> >  wrote:
> >> On 04.03.2021 20:51, Saravana Kannan wrote:
> >>> The uevents generated for an amba device need PID and CID information
> >>> that's available only when the amba device is powered on, clocked and
> >>> out of reset. So, if those resources aren't available, the information
> >>> can't be read to generate the uevents. To workaround this requirement,
> >>> if the resources weren't available, the device addition was deferred and
> >>> retried periodically.
> >>>
> >>> However, this deferred addition retry isn't based on resources becoming
> >>> available. Instead, it's retried every 5 seconds and causes arbitrary
> >>> probe delays for amba devices and their consumers.
> >>>
> >>> Also, maintaining a separate deferred-probe like mechanism is
> >>> maintenance headache.
> >>>
> >>> With this commit, instead of deferring the device addition, we simply
> >>> defer the generation of uevents for the device and probing of the device
> >>> (because drivers needs PID and CID to match) until the PID and CID
> >>> information can be read. This allows us to delete all the amba specific
> >>> deferring code and also avoid the arbitrary probing delays.
> >>>
> >>> Cc: Rob Herring 
> >>> Cc: Ulf Hansson 
> >>> Cc: John Stultz 
> >>> Cc: Saravana Kannan 
> >>> Cc: Linus Walleij 
> >>> Cc: Sudeep Holla 
> >>> Cc: Nicolas Saenz Julienne 
> >>> Cc: Geert Uytterhoeven 
> >>> Cc: Marek Szyprowski 
> >>> Cc: Russell King 
> >>> Signed-off-by: Saravana Kannan 
> >>> ---
> >>>
> >>> v1 -> v2:
> >>> - Dropped RFC tag
> >>> - Complete rewrite to not use stub devices.
> >>> v2 -> v3:
> >>> - Flipped the if() condition for hard-coded periphids.
> >>> - Added a stub driver to handle the case where all amba drivers are
> >>> modules loaded by uevents.
> >>> - Cc Marek after I realized I forgot to add him.
> >>>
> >>> Marek,
> >>>
> >>> Would you mind testing this? It looks okay with my limited testing.
> >> It looks it works fine on my test systems. I've checked current
> >> linux-next and this patch. You can add:
> >>
> >> Tested-by: Marek Szyprowski 
> > Hi Marek,
> >
> > Thanks! Does your test set up have amda drivers that are loaded based
> > on uevents? That's the one I couldn't test.
>
> I've checked both, the built-in and all amba drivers compiled as
> modules, loaded by udev. Both works fine here.
>
> >> I've briefly scanned the code and I'm curious how does it work. Does it
> >> depend on the recently introduced "fw_devlink=on" feature? I don't see
> >> other mechanism, which would trigger matching amba device if pm domains,
> >> clocks or resets were not available on time to read pid/cid while adding
> >> a device...
> > No, it does not depend on fw_devlink or device links in any way.
> >
> > When a device is attempted to be probed (when it's added or during
> > deferred probe), it's matched with all the drivers on the bus.
> > When a new driver is registered to a bus, all devices in that bus are
> > matched with the driver to see if they'll work together.
> > That's how match is called. And match() can return -EPROBE_DEFER and
> > that'll cause the device to be put in the deferred probe list by
> > driver core.
> >
> > The tricky part in this patch was the uevent handling and the
> > chicken-and-egg issue I talk about in the comments.
>
> Thanks for the explanation. This EPROBE_DEFER support in match()
> callback must be something added after I crafted that periodic retry
> based workaround.
>

I think it got in just a few months before your patches, but your
patches worked :) I actually don't like match returning -EPROBE_DEFER,
but I can work around it for some of my fw_devlink optimization plans.

More context here:
https://lore.kernel.org/lkml/CAGETcx_qO4vxTSyBtBR2k7fd_3rGJF42iBbJH37HPNw=fhe...@mail.gmail.com/

-Saravana


Re: [PATCH v3] amba: Remove deferred device addition

2021-03-05 Thread Saravana Kannan
On Fri, Mar 5, 2021 at 3:45 AM Marek Szyprowski
 wrote:
>
> Hi Saravana,
>
> On 04.03.2021 20:51, Saravana Kannan wrote:
> > The uevents generated for an amba device need PID and CID information
> > that's available only when the amba device is powered on, clocked and
> > out of reset. So, if those resources aren't available, the information
> > can't be read to generate the uevents. To workaround this requirement,
> > if the resources weren't available, the device addition was deferred and
> > retried periodically.
> >
> > However, this deferred addition retry isn't based on resources becoming
> > available. Instead, it's retried every 5 seconds and causes arbitrary
> > probe delays for amba devices and their consumers.
> >
> > Also, maintaining a separate deferred-probe like mechanism is
> > maintenance headache.
> >
> > With this commit, instead of deferring the device addition, we simply
> > defer the generation of uevents for the device and probing of the device
> > (because drivers needs PID and CID to match) until the PID and CID
> > information can be read. This allows us to delete all the amba specific
> > deferring code and also avoid the arbitrary probing delays.
> >
> > Cc: Rob Herring 
> > Cc: Ulf Hansson 
> > Cc: John Stultz 
> > Cc: Saravana Kannan 
> > Cc: Linus Walleij 
> > Cc: Sudeep Holla 
> > Cc: Nicolas Saenz Julienne 
> > Cc: Geert Uytterhoeven 
> > Cc: Marek Szyprowski 
> > Cc: Russell King 
> > Signed-off-by: Saravana Kannan 
> > ---
> >
> > v1 -> v2:
> > - Dropped RFC tag
> > - Complete rewrite to not use stub devices.
> > v2 -> v3:
> > - Flipped the if() condition for hard-coded periphids.
> > - Added a stub driver to handle the case where all amba drivers are
> >modules loaded by uevents.
> > - Cc Marek after I realized I forgot to add him.
> >
> > Marek,
> >
> > Would you mind testing this? It looks okay with my limited testing.
>
> It looks it works fine on my test systems. I've checked current
> linux-next and this patch. You can add:
>
> Tested-by: Marek Szyprowski 

Hi Marek,

Thanks! Does your test set up have amda drivers that are loaded based
on uevents? That's the one I couldn't test.

> I've briefly scanned the code and I'm curious how does it work. Does it
> depend on the recently introduced "fw_devlink=on" feature? I don't see
> other mechanism, which would trigger matching amba device if pm domains,
> clocks or resets were not available on time to read pid/cid while adding
> a device...

No, it does not depend on fw_devlink or device links in any way.

When a device is attempted to be probed (when it's added or during
deferred probe), it's matched with all the drivers on the bus.
When a new driver is registered to a bus, all devices in that bus are
matched with the driver to see if they'll work together.
That's how match is called. And match() can return -EPROBE_DEFER and
that'll cause the device to be put in the deferred probe list by
driver core.

The tricky part in this patch was the uevent handling and the
chicken-and-egg issue I talk about in the comments.

Russell,

Does this look good now? Plan to pick it up some time?

Thanks,
Saravana

>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R Institute Poland
>


Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-04 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 2:21 AM Michael Walle  wrote:
>
> Am 2021-03-03 10:28, schrieb Saravana Kannan:
> > On Wed, Mar 3, 2021 at 12:59 AM Michael Walle  wrote:
> >>
> >> Am 2021-03-02 23:47, schrieb Saravana Kannan:
> >> > On Tue, Mar 2, 2021 at 2:42 PM Saravana Kannan 
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 2, 2021 at 2:24 PM Michael Walle  wrote:
> >> >> >
> >> >> > Am 2021-03-02 22:11, schrieb Saravana Kannan:
> >> >> > > I think Patch 1 should fix [4] without [5]. Can you test the series
> >> >> > > please?
> >> >> >
> >> >> > Mh, I'm on latest linux-next (next-20210302) and I've applied patch 
> >> >> > 3/3
> >> >> > and
> >> >> > reverted commit 7007b745a508 ("PCI: layerscape: Convert to
> >> >> > builtin_platform_driver()"). I'd assumed that PCIe shouldn't be 
> >> >> > working,
> >> >> > right? But it is. Did I miss something?
> >> >>
> >> >> You need to revert [5].
> >> >
> >> > My bad. You did revert it. Ah... I wonder if it was due to
> >> > fw_devlink.strict that I added. To break PCI again, also set
> >> > fw_devlink.strict=1 in the kernel command line.
> >>
> >> Indeed, adding fw_devlink.strict=1 will break PCI again. But if
> >> I then apply 1/3 and 2/3 again, PCI is still broken. Just to be clear:
> >> I'm keeping the fw_devlink.strict=1 parameter.
> >
> > Thanks for your testing! I assume you are also setting fw_devlink=on?
>
> I've applied patch 3/3 and added nothing to the commandline, so yes.
>
> > Hmmm... ok. In the working case, does your PCI probe before IOMMU? If
> > yes, then your results make sense.
>
> Yes that was the conclusion last time. That the probe is deferred and
> the __init section is already discarded when there might a second
> try of the probe.

Long response below, but the TL;DR is:
The real fix for your case was the implementation of fw_devlink.strict
and NOT Patch 1 of this series. So, sorry for wasting your test
effort.

During the earlier debugging (for take I), this is what I thought:

With fw_devlink=permissive, your boot sequence was (Case 1):
1. IOMMU probe
2. PCI builtin_platform_driver_probe() attempt
- Driver core sets up PCI with IOMMU
- PCI probe succeeds.
- PCI works with IOMMU. < Remember this point.

And with fw_devlink=on, I thought the IOMMU probe order was
unnecessarily changed and caused this (Case 2):
1. IOMMU probe reordered for some reason to be attempted before its
suppliers. Gets deferred.
2. PCI probe attempt
- fw_devlink + device links defers the probe because IOMMU isn't ready.
- builtin_platform_driver_probe() replaces drv->probe with
platform_probe_fail()
3. IOMMU deferred probe succeeds eventually.
4. PCI deferred probe is attempted
- platform_probe_fail() which is a stub just returns -ENXIO

And if this was the case, patch 1 in this series would have fixed it
by removing unnecessary reordering of probes.

But what was really happening was (after I went through your logs
again and looked at the code):
With fw_devlink=permissive, your boot sequence was really (Case 3):
1. PCI builtin_platform_driver_probe() attempt
- Driver core does NOT set up PCI with IOMMU
- PCI probe succeeds.
- PCI works without IOMMU. < Remember this point.
2. IOMMU probes

And with fw_devlink=on what was happening was (Case 4):
1. PCI builtin_platform_driver_probe() attempt
- fw_devlink + device links defers the probe because it thinks
IOMMU is mandatory and isn't ready.
- builtin_platform_driver_probe() replaces drv->probe with
platform_probe_fail()
2. IOMMU probes.
3. PCI deferred probe is attempted
- platform_probe_fail() which is a stub just returns -ENXIO
4. PCI is broken now.

In your case IOMMU is not mandatory and PCI works without IOMMU even
when fw_devlink=off/permissive. So the real fix for your case is the
addition of fw_devlink.strict and defaulting it to 0. Because of my
misunderstanding of your case, I didn't realize I already fixed your
case and I thought Patch 1 in this series would fix your case.

Patch 1 in this series is still important for other reasons, just not for you.

> So I guess, Patch 1/3 and Patch 2/3 doesn't fix that and the drivers
> still need to be converted to builtin_platform_driver(), right?

So there is no real issue between fw_devlink=on and
builtin_platform_driver_probe() anymore. At least none that I know of
or has been reported.

If you really want your PCI to work _with_ IOMMU, then
builtin_platform_driver_probe() is wrong even with fw_devlink=off. And
if you wanted PCI to work with IOMMU support and fw_devlink wasn't
available, you'll have to play initcall chicken with the IOMMU driver
or implement some IOMMU check + deferred probing in your PCI probe
function.

However, with fw_devlink=on, all you have to do is set fw_devlink=on
and fw_devlink.strict=1 and use builtin_platform_driver() and not have
to care about initcall orders or figure out how to defer when IOMMU
isn't ready yet.

-Saravana


[PATCH v3] amba: Remove deferred device addition

2021-03-04 Thread Saravana Kannan
The uevents generated for an amba device need PID and CID information
that's available only when the amba device is powered on, clocked and
out of reset. So, if those resources aren't available, the information
can't be read to generate the uevents. To workaround this requirement,
if the resources weren't available, the device addition was deferred and
retried periodically.

However, this deferred addition retry isn't based on resources becoming
available. Instead, it's retried every 5 seconds and causes arbitrary
probe delays for amba devices and their consumers.

Also, maintaining a separate deferred-probe like mechanism is
maintenance headache.

With this commit, instead of deferring the device addition, we simply
defer the generation of uevents for the device and probing of the device
(because drivers needs PID and CID to match) until the PID and CID
information can be read. This allows us to delete all the amba specific
deferring code and also avoid the arbitrary probing delays.

Cc: Rob Herring 
Cc: Ulf Hansson 
Cc: John Stultz 
Cc: Saravana Kannan 
Cc: Linus Walleij 
Cc: Sudeep Holla 
Cc: Nicolas Saenz Julienne 
Cc: Geert Uytterhoeven 
Cc: Marek Szyprowski 
Cc: Russell King 
Signed-off-by: Saravana Kannan 
---

v1 -> v2:
- Dropped RFC tag
- Complete rewrite to not use stub devices.
v2 -> v3:
- Flipped the if() condition for hard-coded periphids.
- Added a stub driver to handle the case where all amba drivers are
  modules loaded by uevents.
- Cc Marek after I realized I forgot to add him.

Marek,

Would you mind testing this? It looks okay with my limited testing.

-Saravana

 drivers/amba/bus.c | 329 +
 1 file changed, 151 insertions(+), 178 deletions(-)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 939ca220bf78..836d6d23bba3 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -149,11 +149,101 @@ static struct attribute *amba_dev_attrs[] = {
 };
 ATTRIBUTE_GROUPS(amba_dev);
 
+static int amba_read_periphid(struct amba_device *dev)
+{
+   u32 size;
+   void __iomem *tmp;
+   u32 pid, cid;
+   struct reset_control *rstc;
+   int i, ret;
+
+   /*
+* Dynamically calculate the size of the resource
+* and use this for iomap
+*/
+   size = resource_size(>res);
+   tmp = ioremap(dev->res.start, size);
+   if (!tmp)
+   return -ENOMEM;
+
+   ret = dev_pm_domain_attach(>dev, true);
+   if (ret)
+   goto err_pm;
+
+   ret = amba_get_enable_pclk(dev);
+   if (ret)
+   goto err_clk;
+
+   /*
+* Find reset control(s) of the amba bus and de-assert them.
+*/
+   rstc = of_reset_control_array_get_optional_shared(dev->dev.of_node);
+   if (IS_ERR(rstc)) {
+   ret = PTR_ERR(rstc);
+   if (ret != -EPROBE_DEFER)
+   dev_err(>dev, "can't get reset: %d\n",
+   ret);
+   goto err_reset;
+   }
+   reset_control_deassert(rstc);
+   reset_control_put(rstc);
+
+   /*
+* Read pid and cid based on size of resource
+* they are located at end of region
+*/
+   for (pid = 0, i = 0; i < 4; i++)
+   pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) <<
+   (i * 8);
+   for (cid = 0, i = 0; i < 4; i++)
+   cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
+   (i * 8);
+
+   if (cid == CORESIGHT_CID) {
+   /* set the base to the start of the last 4k block */
+   void __iomem *csbase = tmp + size - 4096;
+
+   dev->uci.devarch =
+   readl(csbase + UCI_REG_DEVARCH_OFFSET);
+   dev->uci.devtype =
+   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
+   }
+
+   amba_put_disable_pclk(dev);
+
+   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
+   dev->periphid = pid;
+   dev->cid = cid;
+   }
+
+   if (!dev->periphid)
+   ret = -ENODEV;
+
+   return ret;
+
+err_reset:
+   amba_put_disable_pclk(dev);
+err_clk:
+   dev_pm_domain_detach(>dev, true);
+err_pm:
+   iounmap(tmp);
+   return ret;
+}
+
 static int amba_match(struct device *dev, struct device_driver *drv)
 {
struct amba_device *pcdev = to_amba_device(dev);
struct amba_driver *pcdrv = to_amba_driver(drv);
 
+   if (!pcdev->periphid) {
+   int ret = amba_read_periphid(pcdev);
+
+   if (ret)
+   return ret;
+   dev_set_uevent_suppress(dev, false);
+   kobject_uevent(>kobj, KOBJ_ADD);
+   }
+
/* When driver_override is set, only bind to the matching driver */
if (pcdev->driver_override)
return !strcmp(p

Re: [PATCH v2] amba: Remove deferred device addition

2021-03-04 Thread Saravana Kannan
On Thu, Mar 4, 2021 at 6:12 AM Russell King - ARM Linux admin
 wrote:
>
> On Wed, Mar 03, 2021 at 08:08:44PM -0800, Saravana Kannan wrote:
> > Marek,
> >
> > I tested it and saw the device get added before the resources were
> > available and the uevent file looked okay. Would you mind testing it
> > further?
>
> To put it bluntly, if you have tested this, the testing was not very
> effective. Deleting the lines that are removed by the patch so we can
> see what the new code looks like below:
>
> > > +int amba_device_add(struct amba_device *dev, struct resource *parent)
> > >  {
> > > +   int ret;
> > >
> > > WARN_ON(dev->irq[0] == (unsigned int)-1);
> > > WARN_ON(dev->irq[1] == (unsigned int)-1);
> > >
> > > ret = request_resource(parent, >res);
> > > if (ret)
> > > +   return ret;
> > >
> > > +   /* If primecell ID isn't hard-coded, figure it out */
> > > +   if (dev->periphid) {
> > > +   ret = amba_read_periphid(dev);
>
> So, if the peripheral ID has _already_ been set, we attempt to read the
> peripheral ID from the device. Isn't that just wrong?
>
> > > +   if (ret && ret != -EPROBE_DEFER)
> > > +   goto err_release;
> > > /*
> > > +* AMBA device uevents require reading its pid and cid
> > > +* registers.  To do this, the device must be on, clocked 
> > > and
> > > +* out of reset.  However in some cases those resources 
> > > might
> > > +* not yet be available.  If that's the case, we suppress 
> > > the
> > > +* generation of uevents until we can read the pid and cid
> > > +* registers.  See also amba_match().
> > >  */
> > > +   if (ret)
> > > +   dev_set_uevent_suppress(>dev, true);
> > > }
>
> If the peripheral ID has not been set, we don't attempt to read it, and
> we generate an add event when the amba device is added with a zero
> peripheral ID.
>
> I guess that if() statement should be negated - and with such an error,
> I fail to see how this code could have been properly tested.

Yeah, the if() needs to be flipped. I even flipped it and then
unflipped it before I sent the patch. Thanks for catching it.

It worked in my testing because the device didn't have hard coded PID.
So it worked out fine.

But I now realize I still have a chicken-and-egg problem if ALL amba
drivers are modules. amba_match() will never be called because none of
the amba drivers have been loaded. None of the amba drivers would be
loaded (depending on the set up) because none of the uevents were sent
out. But there's a simple fix for this. I'll send that as part of v3.

Marek,

It'd still be nice if you can test this with the if() above flipped.
If all your amba drivers are modules and loaded based on uevents,
manually loading one of them will kick off everything.

-Saravana


Re: [PATCH v2] amba: Remove deferred device addition

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 8:00 PM Saravana Kannan  wrote:
>
> The uevents generated for an amba device need PID and CID information
> that's available only when the amba device is powered on, clocked and
> out of reset. So, if those resources aren't available, the information
> can't be read to generate the uevents. To workaround this requirement,
> if the resources weren't available, the device addition was deferred and
> retried periodically.
>
> However, this deferred addition retry isn't based on resources becoming
> available. Instead, it's retried every 5 seconds and causes arbitrary
> probe delays for amba devices and their consumers.
>
> Also, maintaining a separate deferred-probe like mechanism is
> maintenance headache.
>
> With this commit, instead of deferring the device addition, we simply
> defer the generation of uevents for the device and probing of the device
> (because drivers needs PID and CID to match) until the PID and CID
> information can be read. This allows us to delete all the amba specific
> deferring code and also avoid the arbitrary probing delays.
>
> Cc: Rob Herring 
> Cc: Ulf Hansson 
> Cc: John Stultz 
> Cc: Saravana Kannan 
> Cc: Linus Walleij 
> Cc: Sudeep Holla 
> Cc: Nicolas Saenz Julienne 
> Cc: Geert Uytterhoeven 
> Cc: Russell King 
> Signed-off-by: Saravana Kannan 
> ---
>  drivers/amba/bus.c | 293 ++---
>  1 file changed, 115 insertions(+), 178 deletions(-)
>
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 939ca220bf78..fac4110b2f58 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -149,11 +149,101 @@ static struct attribute *amba_dev_attrs[] = {
>  };
>  ATTRIBUTE_GROUPS(amba_dev);
>
> +static int amba_read_periphid(struct amba_device *dev)
> +{
> +   u32 size;
> +   void __iomem *tmp;
> +   u32 pid, cid;
> +   struct reset_control *rstc;
> +   int i, ret;
> +
> +   /*
> +* Dynamically calculate the size of the resource
> +* and use this for iomap
> +*/
> +   size = resource_size(>res);
> +   tmp = ioremap(dev->res.start, size);
> +   if (!tmp)
> +   return -ENOMEM;
> +
> +   ret = dev_pm_domain_attach(>dev, true);
> +   if (ret)
> +   goto err_pm;
> +
> +   ret = amba_get_enable_pclk(dev);
> +   if (ret)
> +   goto err_clk;
> +
> +   /*
> +* Find reset control(s) of the amba bus and de-assert them.
> +*/
> +   rstc = of_reset_control_array_get_optional_shared(dev->dev.of_node);
> +   if (IS_ERR(rstc)) {
> +   ret = PTR_ERR(rstc);
> +   if (ret != -EPROBE_DEFER)
> +   dev_err(>dev, "can't get reset: %d\n",
> +   ret);
> +   goto err_reset;
> +   }
> +   reset_control_deassert(rstc);
> +   reset_control_put(rstc);
> +
> +   /*
> +* Read pid and cid based on size of resource
> +* they are located at end of region
> +*/
> +   for (pid = 0, i = 0; i < 4; i++)
> +   pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) <<
> +   (i * 8);
> +   for (cid = 0, i = 0; i < 4; i++)
> +   cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
> +   (i * 8);
> +
> +   if (cid == CORESIGHT_CID) {
> +   /* set the base to the start of the last 4k block */
> +   void __iomem *csbase = tmp + size - 4096;
> +
> +   dev->uci.devarch =
> +   readl(csbase + UCI_REG_DEVARCH_OFFSET);
> +   dev->uci.devtype =
> +   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
> +   }
> +
> +   amba_put_disable_pclk(dev);
> +
> +   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
> +   dev->periphid = pid;
> +   dev->cid = cid;
> +   }
> +
> +   if (!dev->periphid)
> +   ret = -ENODEV;
> +
> +   return ret;
> +
> +err_reset:
> +   amba_put_disable_pclk(dev);
> +err_clk:
> +   dev_pm_domain_detach(>dev, true);
> +err_pm:
> +   iounmap(tmp);
> +   return ret;
> +}
> +
>  static int amba_match(struct device *dev, struct device_driver *drv)
>  {
> struct amba_device *pcdev = to_amba_device(dev);
> struct amba_driver *pcdrv = to_amba_driver(drv);
>
> +   if (!pcdev->periphid) {
> +   int ret = amba_read_periphid(pcdev);
> +
> +   if (ret)
> +

Re: [PATCH v2] amba: Remove deferred device addition

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 8:00 PM Saravana Kannan  wrote:
>
> The uevents generated for an amba device need PID and CID information
> that's available only when the amba device is powered on, clocked and
> out of reset. So, if those resources aren't available, the information
> can't be read to generate the uevents. To workaround this requirement,
> if the resources weren't available, the device addition was deferred and
> retried periodically.
>
> However, this deferred addition retry isn't based on resources becoming
> available. Instead, it's retried every 5 seconds and causes arbitrary
> probe delays for amba devices and their consumers.
>
> Also, maintaining a separate deferred-probe like mechanism is
> maintenance headache.
>
> With this commit, instead of deferring the device addition, we simply
> defer the generation of uevents for the device and probing of the device
> (because drivers needs PID and CID to match) until the PID and CID
> information can be read. This allows us to delete all the amba specific
> deferring code and also avoid the arbitrary probing delays.
>
> Cc: Rob Herring 
> Cc: Ulf Hansson 
> Cc: John Stultz 
> Cc: Saravana Kannan 
> Cc: Linus Walleij 
> Cc: Sudeep Holla 
> Cc: Nicolas Saenz Julienne 
> Cc: Geert Uytterhoeven 
> Cc: Russell King 
> Signed-off-by: Saravana Kannan 
> ---
>  drivers/amba/bus.c | 293 ++---
>  1 file changed, 115 insertions(+), 178 deletions(-)
>
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 939ca220bf78..fac4110b2f58 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -149,11 +149,101 @@ static struct attribute *amba_dev_attrs[] = {
>  };
>  ATTRIBUTE_GROUPS(amba_dev);
>
> +static int amba_read_periphid(struct amba_device *dev)
> +{
> +   u32 size;
> +   void __iomem *tmp;
> +   u32 pid, cid;
> +   struct reset_control *rstc;
> +   int i, ret;
> +
> +   /*
> +* Dynamically calculate the size of the resource
> +* and use this for iomap
> +*/
> +   size = resource_size(>res);
> +   tmp = ioremap(dev->res.start, size);
> +   if (!tmp)
> +   return -ENOMEM;
> +
> +   ret = dev_pm_domain_attach(>dev, true);
> +   if (ret)
> +   goto err_pm;
> +
> +   ret = amba_get_enable_pclk(dev);
> +   if (ret)
> +   goto err_clk;
> +
> +   /*
> +* Find reset control(s) of the amba bus and de-assert them.
> +*/
> +   rstc = of_reset_control_array_get_optional_shared(dev->dev.of_node);
> +   if (IS_ERR(rstc)) {
> +   ret = PTR_ERR(rstc);
> +   if (ret != -EPROBE_DEFER)
> +   dev_err(>dev, "can't get reset: %d\n",
> +   ret);
> +   goto err_reset;
> +   }
> +   reset_control_deassert(rstc);
> +   reset_control_put(rstc);
> +
> +   /*
> +* Read pid and cid based on size of resource
> +* they are located at end of region
> +*/
> +   for (pid = 0, i = 0; i < 4; i++)
> +   pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) <<
> +   (i * 8);
> +   for (cid = 0, i = 0; i < 4; i++)
> +   cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
> +   (i * 8);
> +
> +   if (cid == CORESIGHT_CID) {
> +   /* set the base to the start of the last 4k block */
> +   void __iomem *csbase = tmp + size - 4096;
> +
> +   dev->uci.devarch =
> +   readl(csbase + UCI_REG_DEVARCH_OFFSET);
> +   dev->uci.devtype =
> +   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
> +   }
> +
> +   amba_put_disable_pclk(dev);
> +
> +   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
> +   dev->periphid = pid;
> +   dev->cid = cid;
> +   }
> +
> +   if (!dev->periphid)
> +   ret = -ENODEV;
> +
> +   return ret;
> +
> +err_reset:
> +   amba_put_disable_pclk(dev);
> +err_clk:
> +   dev_pm_domain_detach(>dev, true);
> +err_pm:
> +   iounmap(tmp);
> +   return ret;
> +}
> +
>  static int amba_match(struct device *dev, struct device_driver *drv)
>  {
> struct amba_device *pcdev = to_amba_device(dev);
> struct amba_driver *pcdrv = to_amba_driver(drv);
>
> +   if (!pcdev->periphid) {
> +   int ret = amba_read_periphid(pcdev);
> +
> +   if (ret)
> +

[PATCH v2] amba: Remove deferred device addition

2021-03-03 Thread Saravana Kannan
The uevents generated for an amba device need PID and CID information
that's available only when the amba device is powered on, clocked and
out of reset. So, if those resources aren't available, the information
can't be read to generate the uevents. To workaround this requirement,
if the resources weren't available, the device addition was deferred and
retried periodically.

However, this deferred addition retry isn't based on resources becoming
available. Instead, it's retried every 5 seconds and causes arbitrary
probe delays for amba devices and their consumers.

Also, maintaining a separate deferred-probe like mechanism is
maintenance headache.

With this commit, instead of deferring the device addition, we simply
defer the generation of uevents for the device and probing of the device
(because drivers needs PID and CID to match) until the PID and CID
information can be read. This allows us to delete all the amba specific
deferring code and also avoid the arbitrary probing delays.

Cc: Rob Herring 
Cc: Ulf Hansson 
Cc: John Stultz 
Cc: Saravana Kannan 
Cc: Linus Walleij 
Cc: Sudeep Holla 
Cc: Nicolas Saenz Julienne 
Cc: Geert Uytterhoeven 
Cc: Russell King 
Signed-off-by: Saravana Kannan 
---
 drivers/amba/bus.c | 293 ++---
 1 file changed, 115 insertions(+), 178 deletions(-)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 939ca220bf78..fac4110b2f58 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -149,11 +149,101 @@ static struct attribute *amba_dev_attrs[] = {
 };
 ATTRIBUTE_GROUPS(amba_dev);
 
+static int amba_read_periphid(struct amba_device *dev)
+{
+   u32 size;
+   void __iomem *tmp;
+   u32 pid, cid;
+   struct reset_control *rstc;
+   int i, ret;
+
+   /*
+* Dynamically calculate the size of the resource
+* and use this for iomap
+*/
+   size = resource_size(>res);
+   tmp = ioremap(dev->res.start, size);
+   if (!tmp)
+   return -ENOMEM;
+
+   ret = dev_pm_domain_attach(>dev, true);
+   if (ret)
+   goto err_pm;
+
+   ret = amba_get_enable_pclk(dev);
+   if (ret)
+   goto err_clk;
+
+   /*
+* Find reset control(s) of the amba bus and de-assert them.
+*/
+   rstc = of_reset_control_array_get_optional_shared(dev->dev.of_node);
+   if (IS_ERR(rstc)) {
+   ret = PTR_ERR(rstc);
+   if (ret != -EPROBE_DEFER)
+   dev_err(>dev, "can't get reset: %d\n",
+   ret);
+   goto err_reset;
+   }
+   reset_control_deassert(rstc);
+   reset_control_put(rstc);
+
+   /*
+* Read pid and cid based on size of resource
+* they are located at end of region
+*/
+   for (pid = 0, i = 0; i < 4; i++)
+   pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) <<
+   (i * 8);
+   for (cid = 0, i = 0; i < 4; i++)
+   cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
+   (i * 8);
+
+   if (cid == CORESIGHT_CID) {
+   /* set the base to the start of the last 4k block */
+   void __iomem *csbase = tmp + size - 4096;
+
+   dev->uci.devarch =
+   readl(csbase + UCI_REG_DEVARCH_OFFSET);
+   dev->uci.devtype =
+   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
+   }
+
+   amba_put_disable_pclk(dev);
+
+   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
+   dev->periphid = pid;
+   dev->cid = cid;
+   }
+
+   if (!dev->periphid)
+   ret = -ENODEV;
+
+   return ret;
+
+err_reset:
+   amba_put_disable_pclk(dev);
+err_clk:
+   dev_pm_domain_detach(>dev, true);
+err_pm:
+   iounmap(tmp);
+   return ret;
+}
+
 static int amba_match(struct device *dev, struct device_driver *drv)
 {
struct amba_device *pcdev = to_amba_device(dev);
struct amba_driver *pcdrv = to_amba_driver(drv);
 
+   if (!pcdev->periphid) {
+   int ret = amba_read_periphid(pcdev);
+
+   if (ret)
+   return ret;
+   dev_set_uevent_suppress(dev, false);
+   kobject_uevent(>kobj, KOBJ_ADD);
+   }
+
/* When driver_override is set, only bind to the matching driver */
if (pcdev->driver_override)
return !strcmp(pcdev->driver_override, drv->name);
@@ -373,98 +463,43 @@ static void amba_device_release(struct device *dev)
kfree(d);
 }
 
-static int amba_device_try_add(struct amba_device *dev, struct resource 
*parent)
+/**
+ * amba_device_add - add a previously allocated AMBA device structure
+ * @dev: AMBA device allocated by amba_device_alloc
+ * @parent: resource parent for this devi

Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 2:03 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Wed, Mar 3, 2021 at 10:24 AM Saravana Kannan  wrote:
> > On Wed, Mar 3, 2021 at 1:22 AM Geert Uytterhoeven  
> > wrote:
> > > On Tue, Mar 2, 2021 at 10:11 PM Saravana Kannan  
> > > wrote:
> > > > This series fixes the last few remaining issues reported when 
> > > > fw_devlink=on
> > > > by default.
> > >
> > > [...]
> > >
> > > Thanks for your series!
> > >
> > > > Geert/Marek,
> > > >
> > > > As far as I know, there shouldn't have any more issues you reported that
> > > > are still left unfixed after this series. Please correct me if I'm 
> > > > wrong or
> > > > if you find new issues.
> > >
> > > While this fixes the core support, there may still be driver fixes left
> > > that were not developed in time for the v5.12-rc1 merge window.
> > > Personally, I'm aware of "soc: renesas: rmobile-sysc: Mark fwnode
> > > when PM domain is added", which I have queued for v5.13[1].
> > > There may be other fixes for other platforms.
> >
> > Right, I intended this series for 5.13. Is that what you are trying to say 
> > too?
>
> OK, v5.13 is fine for me.
> It wasn't clear to me if you intended (the last patch of) this series to
> be merged for v5.12-rcX or v5.13.

The entire series is meant for 5.13.

I don't want to land the Patch 1/3 in 5.12 in case it causes some
regression. And 2/3 isn't urgent.

-Saravana

>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


[PATCH v1] RFC: amba: Remove amba specific deferred probe handling

2021-03-03 Thread Saravana Kannan
The addition/probe of amba devices has its own weird deferred probe
mechanism that needs to be maintained separately. It doesn't
automatically get any bugs fixes or improvements to the common deferred
probe mechanism.

It also has an arbitrary 5 second periodic attempt. So, even if the
resources are available, there can be an arbitrary delay before amba
devices are probed.

This patch used a proxy/stub device so that amba devices can hook into
the common deferred probe mechanism. This also means amba devices get
probed as soon as their resources are available.

Cc: Linus Walleij 
Cc: Ulf Hansson 
Cc: John Stultz 
Cc: Saravana Kannan 
Cc: Sudeep Holla 
Cc: Nicolas Saenz Julienne 
Cc: Geert Uytterhoeven 
Cc: Russell King 
Cc: Rob Herring 
Signed-off-by: Saravana Kannan 
---

We talked about this almost a year ago[1] and it has been nagging me all
this time. So, finally got around to giving it a shot. This actually
seems to work -- I tested it on a device that was lying around.

Thoughts?

[1] - 
https://lore.kernel.org/linux-arm-kernel/cagetcx8cn-b6l2y10lkb91s3n06b6+be2z_a0402eyny-8y...@mail.gmail.com/
-Saravana

 drivers/amba/bus.c   | 116 ++-
 include/linux/amba/bus.h |   1 +
 2 files changed, 53 insertions(+), 64 deletions(-)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 939ca220bf78..393d189b6bca 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -24,6 +24,9 @@
 
 #define to_amba_driver(d)  container_of(d, struct amba_driver, drv)
 
+static int amba_proxy_probe(struct amba_device *adev,
+   const struct amba_id *id);
+
 /* called on periphid match and class 0x9 coresight device. */
 static int
 amba_cs_uci_id_match(const struct amba_id *table, struct amba_device *dev)
@@ -46,6 +49,8 @@ amba_cs_uci_id_match(const struct amba_id *table, struct 
amba_device *dev)
 static const struct amba_id *
 amba_lookup(const struct amba_id *table, struct amba_device *dev)
 {
+   if (!table)
+   return NULL;
while (table->mask) {
if (((dev->periphid & table->mask) == table->id) &&
((dev->cid != CORESIGHT_CID) ||
@@ -185,6 +190,9 @@ static int amba_probe(struct device *dev)
const struct amba_id *id = amba_lookup(pcdrv->id_table, pcdev);
int ret;
 
+   if (!pcdev->periphid)
+   return pcdrv->probe(pcdev, 0);
+
do {
ret = of_clk_set_defaults(dev->of_node, false);
if (ret < 0)
@@ -224,6 +232,9 @@ static int amba_remove(struct device *dev)
struct amba_device *pcdev = to_amba_device(dev);
struct amba_driver *drv = to_amba_driver(dev->driver);
 
+   if (!pcdev->periphid)
+   return 0;
+
pm_runtime_get_sync(dev);
if (drv->remove)
drv->remove(pcdev);
@@ -325,9 +336,20 @@ struct bus_type amba_bustype = {
 };
 EXPORT_SYMBOL_GPL(amba_bustype);
 
+static struct amba_driver amba_proxy_drv = {
+   .drv = {
+   .name = "amba-proxy",
+   },
+   .probe = amba_proxy_probe,
+};
+
 static int __init amba_init(void)
 {
-   return bus_register(_bustype);
+   int ret = bus_register(_bustype);
+
+   if (ret)
+   return ret;
+   return amba_driver_register(_proxy_drv);
 }
 
 postcore_initcall(amba_init);
@@ -490,58 +512,19 @@ static int amba_device_try_add(struct amba_device *dev, 
struct resource *parent)
goto err_release;
 }
 
-/*
- * Registration of AMBA device require reading its pid and cid registers.
- * To do this, the device must be turned on (if it is a part of power domain)
- * and have clocks enabled. However in some cases those resources might not be
- * yet available. Returning EPROBE_DEFER is not a solution in such case,
- * because callers don't handle this special error code. Instead such devices
- * are added to the special list and their registration is retried from
- * periodic worker, until all resources are available and registration 
succeeds.
- */
-struct deferred_device {
-   struct amba_device *dev;
-   struct resource *parent;
-   struct list_head node;
-};
-
-static LIST_HEAD(deferred_devices);
-static DEFINE_MUTEX(deferred_devices_lock);
-
-static void amba_deferred_retry_func(struct work_struct *dummy);
-static DECLARE_DELAYED_WORK(deferred_retry_work, amba_deferred_retry_func);
-
-#define DEFERRED_DEVICE_TIMEOUT (msecs_to_jiffies(5 * 1000))
-
-static int amba_deferred_retry(void)
+static int amba_proxy_probe(struct amba_device *adev,
+   const struct amba_id *id)
 {
-   struct deferred_device *ddev, *tmp;
-
-   mutex_lock(_devices_lock);
-
-   list_for_each_entry_safe(ddev, tmp, _devices, node) {
-   int ret = amba_device_try_add(ddev->dev, ddev->parent);
-
-   if (ret == -EPROBE_DEFER)
-   contin

Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 1:22 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Tue, Mar 2, 2021 at 10:11 PM Saravana Kannan  wrote:
> > This series fixes the last few remaining issues reported when fw_devlink=on
> > by default.
>
> [...]
>
> Thanks for your series!
>
> > Geert/Marek,
> >
> > As far as I know, there shouldn't have any more issues you reported that
> > are still left unfixed after this series. Please correct me if I'm wrong or
> > if you find new issues.
>
> While this fixes the core support, there may still be driver fixes left
> that were not developed in time for the v5.12-rc1 merge window.
> Personally, I'm aware of "soc: renesas: rmobile-sysc: Mark fwnode
> when PM domain is added", which I have queued for v5.13[1].
> There may be other fixes for other platforms.

Right, I intended this series for 5.13. Is that what you are trying to say too?

-Saravana


>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel.git/commit/?h=renesas-drivers-for-v5.13=fb13bbd6c90ee4fb983c0e9a341bd2832a3857cf
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH v1] RFC: amba: Remove amba specific deferred probe handling

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 12:32 AM Saravana Kannan  wrote:
>
> The addition/probe of amba devices has its own weird deferred probe
> mechanism that needs to be maintained separately. It doesn't
> automatically get any bugs fixes or improvements to the common deferred
> probe mechanism.
>
> It also has an arbitrary 5 second periodic attempt. So, even if the
> resources are available, there can be an arbitrary delay before amba
> devices are probed.
>
> This patch used a proxy/stub device so that amba devices can hook into
> the common deferred probe mechanism. This also means amba devices get
> probed as soon as their resources are available.
>
> Cc: Linus Walleij 
> Cc: Ulf Hansson 
> Cc: John Stultz 
> Cc: Saravana Kannan 
> Cc: Sudeep Holla 
> Cc: Nicolas Saenz Julienne 
> Cc: Geert Uytterhoeven 
> Cc: Russell King 
> Cc: Rob Herring 
> Signed-off-by: Saravana Kannan 
> ---
>
> We talked about this almost a year ago[1] and it has been nagging me all
> this time. So, finally got around to giving it a shot. This actually
> seems to work -- I tested it on a device that was lying around.

Btw, what really is the requirement wrt the uevents? Will this whole
thing work if I figure out a way to do this:

1. Add an amba device without the AMBA_ID and MODALIAS uevent vars and
without periphid set.
2. Once the resources (clocks, etc) are available, set periphid and
add those uevents.
3. Trigger a normal deferred probe attempt.

Will userspace properly load the right driver and will things work if
there is a couple of seconds of (theoretical) delay between (1) and
(2)? If so, that might be pretty easy to do without a stub device too.

-Saravana

>
> Thoughts?
>
> [1] - 
> https://lore.kernel.org/linux-arm-kernel/cagetcx8cn-b6l2y10lkb91s3n06b6+be2z_a0402eyny-8y...@mail.gmail.com/
>
> -Saravana
>
>  drivers/amba/bus.c   | 116 ++-
>  include/linux/amba/bus.h |   1 +
>  2 files changed, 53 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 939ca220bf78..393d189b6bca 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -24,6 +24,9 @@
>
>  #define to_amba_driver(d)  container_of(d, struct amba_driver, drv)
>
> +static int amba_proxy_probe(struct amba_device *adev,
> +   const struct amba_id *id);
> +
>  /* called on periphid match and class 0x9 coresight device. */
>  static int
>  amba_cs_uci_id_match(const struct amba_id *table, struct amba_device *dev)
> @@ -46,6 +49,8 @@ amba_cs_uci_id_match(const struct amba_id *table, struct 
> amba_device *dev)
>  static const struct amba_id *
>  amba_lookup(const struct amba_id *table, struct amba_device *dev)
>  {
> +   if (!table)
> +   return NULL;
> while (table->mask) {
> if (((dev->periphid & table->mask) == table->id) &&
> ((dev->cid != CORESIGHT_CID) ||
> @@ -185,6 +190,9 @@ static int amba_probe(struct device *dev)
> const struct amba_id *id = amba_lookup(pcdrv->id_table, pcdev);
> int ret;
>
> +   if (!pcdev->periphid)
> +   return pcdrv->probe(pcdev, 0);
> +
> do {
> ret = of_clk_set_defaults(dev->of_node, false);
> if (ret < 0)
> @@ -224,6 +232,9 @@ static int amba_remove(struct device *dev)
> struct amba_device *pcdev = to_amba_device(dev);
> struct amba_driver *drv = to_amba_driver(dev->driver);
>
> +   if (!pcdev->periphid)
> +   return 0;
> +
> pm_runtime_get_sync(dev);
> if (drv->remove)
> drv->remove(pcdev);
> @@ -325,9 +336,20 @@ struct bus_type amba_bustype = {
>  };
>  EXPORT_SYMBOL_GPL(amba_bustype);
>
> +static struct amba_driver amba_proxy_drv = {
> +   .drv = {
> +   .name = "amba-proxy",
> +   },
> +   .probe = amba_proxy_probe,
> +};
> +
>  static int __init amba_init(void)
>  {
> -   return bus_register(_bustype);
> +   int ret = bus_register(_bustype);
> +
> +   if (ret)
> +   return ret;
> +   return amba_driver_register(_proxy_drv);
>  }
>
>  postcore_initcall(amba_init);
> @@ -490,58 +512,19 @@ static int amba_device_try_add(struct amba_device *dev, 
> struct resource *parent)
> goto err_release;
>  }
>
> -/*
> - * Registration of AMBA device require reading its pid and cid registers.
> - * To do this, the device must be turned on (if it is a part of power domain)
> - * and have clocks enabled. However in some cases those resources might not 
> be
> - 

Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-03 Thread Saravana Kannan
On Wed, Mar 3, 2021 at 12:59 AM Michael Walle  wrote:
>
> Am 2021-03-02 23:47, schrieb Saravana Kannan:
> > On Tue, Mar 2, 2021 at 2:42 PM Saravana Kannan 
> > wrote:
> >>
> >> On Tue, Mar 2, 2021 at 2:24 PM Michael Walle  wrote:
> >> >
> >> > Am 2021-03-02 22:11, schrieb Saravana Kannan:
> >> > > I think Patch 1 should fix [4] without [5]. Can you test the series
> >> > > please?
> >> >
> >> > Mh, I'm on latest linux-next (next-20210302) and I've applied patch 3/3
> >> > and
> >> > reverted commit 7007b745a508 ("PCI: layerscape: Convert to
> >> > builtin_platform_driver()"). I'd assumed that PCIe shouldn't be working,
> >> > right? But it is. Did I miss something?
> >>
> >> You need to revert [5].
> >
> > My bad. You did revert it. Ah... I wonder if it was due to
> > fw_devlink.strict that I added. To break PCI again, also set
> > fw_devlink.strict=1 in the kernel command line.
>
> Indeed, adding fw_devlink.strict=1 will break PCI again. But if
> I then apply 1/3 and 2/3 again, PCI is still broken. Just to be clear:
> I'm keeping the fw_devlink.strict=1 parameter.

Thanks for your testing! I assume you are also setting fw_devlink=on?

Hmmm... ok. In the working case, does your PCI probe before IOMMU? If
yes, then your results make sense.

If your PCI does probe after IOMMU and uses IOMMU, then I'm not sure
what else could be changing the order of the device probing. In any
case, glad that the default case works and we have a fix merged even
for .strict=1.

-Saravana


Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-02 Thread Saravana Kannan
On Tue, Mar 2, 2021 at 2:42 PM Saravana Kannan  wrote:
>
> On Tue, Mar 2, 2021 at 2:24 PM Michael Walle  wrote:
> >
> > Am 2021-03-02 22:11, schrieb Saravana Kannan:
> > > I think Patch 1 should fix [4] without [5]. Can you test the series
> > > please?
> >
> > Mh, I'm on latest linux-next (next-20210302) and I've applied patch 3/3
> > and
> > reverted commit 7007b745a508 ("PCI: layerscape: Convert to
> > builtin_platform_driver()"). I'd assumed that PCIe shouldn't be working,
> > right? But it is. Did I miss something?
>
> You need to revert [5].

My bad. You did revert it. Ah... I wonder if it was due to
fw_devlink.strict that I added. To break PCI again, also set
fw_devlink.strict=1 in the kernel command line.

-Saravana


Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-02 Thread Saravana Kannan
On Tue, Mar 2, 2021 at 2:24 PM Michael Walle  wrote:
>
> Am 2021-03-02 22:11, schrieb Saravana Kannan:
> > I think Patch 1 should fix [4] without [5]. Can you test the series
> > please?
>
> Mh, I'm on latest linux-next (next-20210302) and I've applied patch 3/3
> and
> reverted commit 7007b745a508 ("PCI: layerscape: Convert to
> builtin_platform_driver()"). I'd assumed that PCIe shouldn't be working,
> right? But it is. Did I miss something?

You need to revert [5].

-Saravana

>
> Anyway, I've also applied Patch 1/3 and 2/3 and it still works. But I
> guess that doesn't say much.
>
> -michael


[PATCH v1 2/3] driver core: Update device link status properly for device_bind_driver()

2021-03-02 Thread Saravana Kannan
Device link status was not getting updated correctly when
device_bind_driver() is called on a device. This causes a warning[1].
Fix this by updating device links that can be updated and dropping
device links that can't be updated to a sensible state.

[1] - 
https://lore.kernel.org/lkml/56f7d032-ba5a-a8c7-23de-2969d98c5...@nvidia.com/
Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h |  1 +
 drivers/base/core.c | 35 +++
 drivers/base/dd.c   |  4 +++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 52b3d7b75c27..1b44ed588f66 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -185,6 +185,7 @@ extern int device_links_read_lock(void);
 extern void device_links_read_unlock(int idx);
 extern int device_links_read_lock_held(void);
 extern int device_links_check_suppliers(struct device *dev);
+extern void device_links_force_bind(struct device *dev);
 extern void device_links_driver_bound(struct device *dev);
 extern void device_links_driver_cleanup(struct device *dev);
 extern void device_links_no_driver(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..45c75cc96fdc 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1153,6 +1153,41 @@ static ssize_t waiting_for_supplier_show(struct device 
*dev,
 }
 static DEVICE_ATTR_RO(waiting_for_supplier);
 
+/**
+ * device_links_force_bind - Prepares device to be force bound
+ * @dev: Consumer device.
+ *
+ * device_bind_driver() force binds a device to a driver without calling any
+ * driver probe functions. So the consumer really isn't going to wait for any
+ * supplier before it's bound to the driver. We still want the device link
+ * states to be sensible when this happens.
+ *
+ * In preparation for device_bind_driver(), this function goes through each
+ * supplier device links and checks if the supplier is bound. If it is, then
+ * the device link status is set to CONSUMER_PROBE. Otherwise, the device link
+ * is dropped. Links without the DL_FLAG_MANAGED flag set are ignored.
+ */
+void device_links_force_bind(struct device *dev)
+{
+   struct device_link *link, *ln;
+
+   device_links_write_lock();
+
+   list_for_each_entry_safe(link, ln, >links.suppliers, c_node) {
+   if (!(link->flags & DL_FLAG_MANAGED))
+   continue;
+
+   if (link->status != DL_STATE_AVAILABLE) {
+   device_link_drop_managed(link);
+   continue;
+   }
+   WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
+   }
+   dev->links.status = DL_DEV_PROBING;
+
+   device_links_write_unlock();
+}
+
 /**
  * device_links_driver_bound - Update device links after probing its driver.
  * @dev: Device to update the links for.
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f18963f42e21..eb201c6d5a6a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -460,8 +460,10 @@ int device_bind_driver(struct device *dev)
int ret;
 
ret = driver_sysfs_add(dev);
-   if (!ret)
+   if (!ret) {
+   device_links_force_bind(dev);
driver_bound(dev);
+   }
else if (dev->bus)
blocking_notifier_call_chain(>bus->p->bus_notifier,
 BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
-- 
2.30.1.766.gb4fecdf3b7-goog



[PATCH v1 1/3] driver core: Avoid pointless deferred probe attempts

2021-03-02 Thread Saravana Kannan
There's no point in adding a device to the deferred probe list if we
know for sure that it doesn't have a matching driver. So, check if a
device can match with a driver before adding it to the deferred probe
list.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c  | 6 ++
 include/linux/device.h | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9179825ff646..f18963f42e21 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
deferred_probe_work_func);
 
 void driver_deferred_probe_add(struct device *dev)
 {
+   if (!dev->can_match)
+   return;
+
mutex_lock(_probe_mutex);
if (list_empty(>p->deferred_probe)) {
dev_dbg(dev, "Added to deferred list\n");
@@ -726,6 +729,7 @@ static int driver_probe_device(struct device_driver *drv, 
struct device *dev)
if (!device_is_registered(dev))
return -ENODEV;
 
+   dev->can_match = true;
pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 drv->bus->name, __func__, dev_name(dev), drv->name);
 
@@ -829,6 +833,7 @@ static int __device_attach_driver(struct device_driver 
*drv, void *_data)
return 0;
} else if (ret == -EPROBE_DEFER) {
dev_dbg(dev, "Device match requests probe deferral\n");
+   dev->can_match = true;
driver_deferred_probe_add(dev);
} else if (ret < 0) {
dev_dbg(dev, "Bus failed to match device: %d\n", ret);
@@ -1064,6 +1069,7 @@ static int __driver_attach(struct device *dev, void *data)
return 0;
} else if (ret == -EPROBE_DEFER) {
dev_dbg(dev, "Device match requests probe deferral\n");
+   dev->can_match = true;
driver_deferred_probe_add(dev);
} else if (ret < 0) {
dev_dbg(dev, "Bus failed to match device: %d\n", ret);
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..569932d282c0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -439,6 +439,9 @@ struct dev_links_info {
  * @state_synced: The hardware state of this device has been synced to match
  *   the software state of this device by calling the driver/bus
  *   sync_state() callback.
+ * @can_match: The device has matched with a driver at least once or it is in
+ * a bus (like AMBA) which can't check for matching drivers until
+ * other devices probe successfully.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
  * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
@@ -545,6 +548,7 @@ struct device {
booloffline:1;
boolof_node_reused:1;
boolstate_synced:1;
+   boolcan_match:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
-- 
2.30.1.766.gb4fecdf3b7-goog



[PATCH v1 3/3] Revert "Revert "driver core: Set fw_devlink=on by default""

2021-03-02 Thread Saravana Kannan
This reverts commit 3e4c982f1ce75faf5314477b8da296d2d00919df.

Since all reported issues due to fw_devlink=on should be addressed by
this series, revert the revert. fw_devlink=on Take II.

Signed-off-by: Saravana Kannan 
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 45c75cc96fdc..de518178ac36 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1538,7 +1538,7 @@ static void device_links_purge(struct device *dev)
 #define FW_DEVLINK_FLAGS_RPM   (FW_DEVLINK_FLAGS_ON | \
 DL_FLAG_PM_RUNTIME)
 
-static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
+static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
 static int __init fw_devlink_setup(char *arg)
 {
if (!arg)
-- 
2.30.1.766.gb4fecdf3b7-goog



[PATCH v1 0/3] driver core: Set fw_devlink=on take II

2021-03-02 Thread Saravana Kannan
This series fixes the last few remaining issues reported when fw_devlink=on
by default.

Patch 1 is just [6] pulled in without changes into this series. It reduces
some unnecessary probe reordering caused by a combination of fw_devlink and
existing device link code. This fixes some issue caused by fw_devlink=on
with respect to DMAs and IOMMUs [1].

Patch 2 fixes a warning [2] present in code unrelated to fw_devlink. It was
just exposed by fw_devlink.

Jon,

Patch 2 should address the issues you reported[2] even without [3]. Could
you test this series please?

Michael,

I think Patch 1 should fix [4] without [5]. Can you test the series please?

Geert/Marek,

As far as I know, there shouldn't have any more issues you reported that
are still left unfixed after this series. Please correct me if I'm wrong or
if you find new issues.

[1] - 
https://lore.kernel.org/lkml/camuhmduvvr8jes51_8_ypoicr-nwad_2nklyukwey8mbxx9...@mail.gmail.com/
[2] - 
https://lore.kernel.org/lkml/56f7d032-ba5a-a8c7-23de-2969d98c5...@nvidia.com/
[3] - 
https://lore.kernel.org/lkml/5176f496-facb-d7b0-9f4e-a9e4b8974...@nvidia.com/
[4] - https://lore.kernel.org/lkml/4b9ae679b6f76d2f7e340e2ec229d...@walle.cc/
[5] - https://lore.kernel.org/lkml/20210120105246.23218-1-mich...@walle.cc/
[6] - 
https://lore.kernel.org/lkml/20210217235130.1744843-1-sarava...@google.com/

Cc: Michael Walle 
Cc: Jon Hunter 
Cc: Marek Szyprowski 
Cc: Geert Uytterhoeven 
Cc: Guenter Roeck 

Thanks,
Saravana

Saravana Kannan (3):
  driver core: Avoid pointless deferred probe attempts
  driver core: Update device link status properly for
device_bind_driver()
  Revert "Revert "driver core: Set fw_devlink=on by default""

 drivers/base/base.h|  1 +
 drivers/base/core.c| 37 -
 drivers/base/dd.c  | 10 +-
 include/linux/device.h |  4 
 4 files changed, 50 insertions(+), 2 deletions(-)

-- 
2.30.1.766.gb4fecdf3b7-goog



Re: [PATCH v2 0/2] gpio: regression fixes

2021-03-01 Thread Saravana Kannan
On Mon, Mar 1, 2021 at 1:12 AM Bartosz Golaszewski
 wrote:
>
> On Mon, Mar 1, 2021 at 10:05 AM Johan Hovold  wrote:
> >
> > Here's a fix for a regression in 5.12 due to the new stub-driver hack,
> > and a fix for potential list corruption due to missing locking which has
> > been there since the introduction of the character-device interface in
> > 4.6.
> >
> > Johan
> >
> > Changes in v2
> >  - drop the corresponding drv_set_drvdata() which is no longer needed
> >after patch 1/2
> >  - add Saravanas's reviewed-by tag to patch 2/2
> >
> >
> > Johan Hovold (2):
> >   gpio: fix NULL-deref-on-deregistration regression
> >   gpio: fix gpio-device list corruption
> >
> >  drivers/gpio/gpiolib.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > --
> > 2.26.2
> >
>
> Patches applied, thanks!

Thanks Johan and Bartosz!

-Saravana


Re: [GIT PULL] Driver core / debugfs changes for 5.12-rc1

2021-02-28 Thread Saravana Kannan
On Sat, Feb 27, 2021 at 6:27 AM Greg KH  wrote:
>
> On Wed, Feb 24, 2021 at 10:20:44AM -0800, Linus Torvalds wrote:
> > On Wed, Feb 24, 2021 at 6:27 AM Greg KH  wrote:
> > >
> > >  [..] I've reverted that change at
> > > the very end so we don't have to worry about regressions in 5.12.
> >
> > Side note: it would have been really nice to see links to the actual
> > problem reports in the revert commit.
>
> Odd, this showed up in my gmail spam folder, just saw this now :(

Yup, went to spam for me too!

> > Yes, there's a "Link:" line there, but that points to the
> > less-than-useful patch submission for the revert, not to the actual
> > _reasons_ for the revert.
> >
> > Now I'm looking at that revert, and I have absolutely no idea why it
> > happened. Only a very vague "there are still reported regressions
> > happening".
> >
> > I've pulled it, but wanted to just point out that when there's some
> > fairly fundamental revert like this, it really would be good to link
> > to the problems, so that when people try to re-enable it, they have
> > the history for why it didn't work the first time.
> >
> > Now all that history is basically lost (well, hopefully Saravana and
> > you actually remember, but you get my point).
>
> Sorry, the history is on the original commit Link that was reverted, and
> in lots of other emails on lkml over the past few weeks.  Next time I'll
> include links to those threads as well.

These are links of interest. All the fixes are at least linked to from
these two threads. And if they are not, they all mention "Fixes" and
list the commit that was reverted. So it's all trackable if we really
find the need to do so in the future.

The original series that set fw_devlink=on where most of the issues
were reported.
[1] - 
https://lore.kernel.org/lkml/20201218031703.3053753-1-sarava...@google.com/

This is the series that made fw_devlink=on more forgiving. And a few
issues were reported there.
[2] - 
https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/

To summarize, the issues fell into one of these types:
* Drivers would initialize the hardware without actually probing a
struct device that existed AND didn't use the existing mechanisms (Eg:
IRQCHIP_DECLARE) meant to allow this. So fw_devlink makes their
consumers wait forever. Bunch of driver fixes for this, but [2] also
workaround most of these by making fw_devlink/fwnode code a bit
smarter.
* Drivers would initialize the hardware without creating a struct
device at all despite the DT node having a compatible property. [2]
handles this by making fw_devlink a bit smarter.
* Some device link status not getting updated correctly when a driver
is force bound with a device (I know the fix, haven't gotten around to
submitting it).
* fw_devlink causing some probe reordering that should technically be
harmless (all suppliers probed before the consumer), but drivers still
don't like it. Bunch of fixes + reducing/removing unnecessary
reordering of probes by fw_devlink (latter is under review).
* Spinlock/corruption errors that fw_devlink exposed by reordering
some probes, but the actual issue was unrelated to fw_devlink.

So it was a mix of making fw_devlink smarter to deal with existing
code and fixing some drivers.

Thanks,
Saravana


Re: [PATCH 1/2] gpio: fix NULL-deref-on-deregistration regression

2021-02-26 Thread Saravana Kannan
On Fri, Feb 26, 2021 at 6:55 AM Johan Hovold  wrote:
>
> Fix a NULL-pointer deference when deregistering the gpio character
> device that was introduced by the recent stub-driver hack. When the new
> "driver" is unbound as part of deregistration, driver core clears the
> driver-data pointer which is used to retrieve the struct gpio_device in
> its release callback.
>
> Fix this by using container_of() in the release callback as should have
> been done all along.
>
> Fixes: 4731210c09f5 ("gpiolib: Bind gpio_device to a driver to enable 
> fw_devlink=on by default")
> Cc: Saravana Kannan 
> Cc: Greg Kroah-Hartman 
> Reported-by: syzbot+d27b4c8adbbff70fb...@syzkaller.appspotmail.com
> Signed-off-by: Johan Hovold 
> ---
>  drivers/gpio/gpiolib.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index adf55db080d8..e1016bc8cf14 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -474,7 +474,7 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid);
>
>  static void gpiodevice_release(struct device *dev)
>  {
> -   struct gpio_device *gdev = dev_get_drvdata(dev);
> +   struct gpio_device *gdev = container_of(dev, struct gpio_device, dev);

Can you also delete the dev_set_drvdata() in
gpiochip_add_data_with_key() if the drvdata is not used
elsewhere anymore? I skimmed the code and it doesn't look like it, but
I could be wrong.

-Saravana


Re: [PATCH 2/2] gpio: fix gpio-device list corruption

2021-02-26 Thread Saravana Kannan
On Fri, Feb 26, 2021 at 6:55 AM Johan Hovold  wrote:
>
> Make sure to hold the gpio_lock when removing the gpio device from the
> gpio_devices list (when dropping the last reference) to avoid corrupting
> the list when there are concurrent accesses.
>
> Fixes: ff2b13592299 ("gpio: make the gpiochip a real device")
> Cc: sta...@vger.kernel.org  # 4.6
> Signed-off-by: Johan Hovold 
> ---
>  drivers/gpio/gpiolib.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index e1016bc8cf14..42bdc55a15f9 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -475,8 +475,12 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid);
>  static void gpiodevice_release(struct device *dev)
>  {
> struct gpio_device *gdev = container_of(dev, struct gpio_device, dev);
> +   unsigned long flags;
>
> +   spin_lock_irqsave(_lock, flags);
> list_del(>list);
> +   spin_unlock_irqrestore(_lock, flags);
> +

Reviewed-by: Saravana Kannan 

-Saravana

> ida_free(_ida, gdev->id);
> kfree_const(gdev->label);
> kfree(gdev->descs);
> --
> 2.26.2
>


Re: [PATCH v3] ARM: imx: avic: Convert to using IRQCHIP_DECLARE

2021-02-26 Thread Saravana Kannan
On Mon, Feb 22, 2021 at 6:33 PM Saravana Kannan  wrote:
>
> On Thu, Feb 4, 2021 at 6:07 PM Saravana Kannan  wrote:
> >
> > On Thu, Feb 4, 2021 at 5:54 PM Fabio Estevam  wrote:
> > >
> > > Hi Saravana,
> > >
> > > On Thu, Feb 4, 2021 at 10:39 PM Saravana Kannan  
> > > wrote:
> > > >
> > > > Using IRQCHIP_DECLARE lets fw_devlink know that it should not wait for
> > > > these interrupt controllers to be populated as struct devices. Without
> > > > this change, fw_devlink=on will make the consumers of these interrupt
> > > > controllers wait for the struct device to be added and thereby block the
> > > > consumers' probes forever. Converting to IRQCHIP_DECLARE addresses boot
> > > > issues on imx25 with fw_devlink=on that were reported by Martin.
> > > >
> > > > This also removes a lot of boilerplate code.
> > > >
> > > > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> > > > Reported-by: Martin Kaiser 
> > > > Signed-off-by: Saravana Kannan 
> > > > Tested-by: Martin Kaiser 
> > >
> > > Thanks for the respin:
> > >
> > > Reviewed-by: Fabio Estevam 
> >
> > Thanks for the quick review.
> >
>
> Maintainers,
>
> Is this getting picked up for 5.12?
>

Gentle reminder.

-Saravana


Re: [PATCH] staging: board: Fix uninitialized spinlock when attaching genpd

2021-02-25 Thread Saravana Kannan
On Thu, Feb 25, 2021 at 1:25 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 10:03 PM Saravana Kannan  wrote:
> > On Mon, Feb 15, 2021 at 11:10 AM Geert Uytterhoeven
> >  wrote:
> > > On Mon, Feb 15, 2021 at 7:37 PM Saravana Kannan  
> > > wrote:
> > > > On Mon, Feb 15, 2021 at 7:14 AM Geert Uytterhoeven
> > > > > @@ -148,7 +149,11 @@ static int board_staging_add_dev_domain(struct 
> > > > > platform_device *pdev,
> > > > > pd_args.np = np;
> > > > > pd_args.args_count = 0;
> > > > >
> > > > > -   return of_genpd_add_device(_args, >dev);
> > > > > +   /* Cfr. device_pm_init_common() */
> > > >
> > > > What's Cfr?
> > >
> > > "compare to" (from Latin "confer").
> >
> > Can you please change this to "refer to" or "similar to"? Also, not
> > sure if this comment is even adding anything useful even if you switch
> > the words.
>
> I changed it to "Initialization similar to device_pm_init_common()"
>
> > Also, device_pm_init_common() is used in two places outside of
> > drivers/base/ with this change. Maybe better to move it to
> > linux/device.h?
>
> arch/sh/drivers/platform_early.c has a separate definition, and this
> is intentional, cfr. commit 507fd01d5387 ("drivers: move the early
> platform device support to arch/sh"):
>
> In order not to export internal drivers/base functions to arch code for
> this temporary solution - copy the two needed routines for driver
> matching from drivers/base/platform.c to arch/sh/drivers/platform_early.c.
>

Thanks. The comments and decision to copy the code sounds okay to me.
But I'll still leave the Ack/Review to Rafael or someone else as I'm
not too familiar with the intent of this flag.

-Saravana


Re: [PATCH] driver core: Avoid pointless deferred probe attempts

2021-02-24 Thread Saravana Kannan
On Thu, Feb 18, 2021 at 9:24 AM Saravana Kannan  wrote:
>
> On Thu, Feb 18, 2021 at 9:18 AM Rafael J. Wysocki  wrote:
> >
> > On Thu, Feb 18, 2021 at 12:51 AM Saravana Kannan  
> > wrote:
> > >
> > > There's no point in adding a device to the deferred probe list if we
> > > know for sure that it doesn't have a matching driver. So, check if a
> > > device can match with a driver before adding it to the deferred probe
> > > list.
> >
> > What if a matching driver module loads in the meantime?
>
> Driver registration always triggers a match attempt and this flag will
> get set at that point. Yes, the user can disable autoprobe, but
> that'll block deferred probes too.
>

Let me know what you think Rafael.

-Saravana

>
> >
> > >
> > > Signed-off-by: Saravana Kannan 
> > > ---
> > > Geert,
> > >
> > > Can you give this a shot for your I2C DMA issue with fw_devlink=on?
> > >
> > > -Saravana
> > >
> > >  drivers/base/dd.c  | 6 ++
> > >  include/linux/device.h | 4 
> > >  2 files changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index 9179825ff646..f18963f42e21 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
> > > deferred_probe_work_func);
> > >
> > >  void driver_deferred_probe_add(struct device *dev)
> > >  {
> > > +   if (!dev->can_match)
> > > +   return;
> > > +
> > > mutex_lock(_probe_mutex);
> > > if (list_empty(>p->deferred_probe)) {
> > > dev_dbg(dev, "Added to deferred list\n");
> > > @@ -726,6 +729,7 @@ static int driver_probe_device(struct device_driver 
> > > *drv, struct device *dev)
> > > if (!device_is_registered(dev))
> > > return -ENODEV;
> > >
> > > +   dev->can_match = true;
> > > pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> > >  drv->bus->name, __func__, dev_name(dev), drv->name);
> > >
> > > @@ -829,6 +833,7 @@ static int __device_attach_driver(struct 
> > > device_driver *drv, void *_data)
> > > return 0;
> > > } else if (ret == -EPROBE_DEFER) {
> > > dev_dbg(dev, "Device match requests probe deferral\n");
> > > +   dev->can_match = true;
> > > driver_deferred_probe_add(dev);
> > > } else if (ret < 0) {
> > > dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> > > @@ -1064,6 +1069,7 @@ static int __driver_attach(struct device *dev, void 
> > > *data)
> > > return 0;
> > > } else if (ret == -EPROBE_DEFER) {
> > > dev_dbg(dev, "Device match requests probe deferral\n");
> > > +   dev->can_match = true;
> > > driver_deferred_probe_add(dev);
> > > } else if (ret < 0) {
> > > dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index 7619a84f8ce4..1f9cc1ba78bc 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -438,6 +438,9 @@ struct dev_links_info {
> > >   * @state_synced: The hardware state of this device has been synced to 
> > > match
> > >   *   the software state of this device by calling the 
> > > driver/bus
> > >   *   sync_state() callback.
> > > + * @can_match: The device has matched with a driver at least once or it 
> > > is in
> > > + * a bus (like AMBA) which can't check for matching drivers 
> > > until
> > > + * other devices probe successfully.
> > >   * @dma_coherent: this particular device is dma coherent, even if the
> > >   * architecture supports non-coherent devices.
> > >   * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> > > @@ -544,6 +547,7 @@ struct device {
> > > booloffline:1;
> > > boolof_node_reused:1;
> > > boolstate_synced:1;
> > > +   boolcan_match:1;
> > >  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> > >  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> > >  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> > > --


Re: [PATCH] driver core: Avoid pointless deferred probe attempts

2021-02-23 Thread Saravana Kannan
On Tue, Feb 23, 2021 at 2:10 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Thu, Feb 18, 2021 at 12:51 AM Saravana Kannan  wrote:
> > There's no point in adding a device to the deferred probe list if we
> > know for sure that it doesn't have a matching driver. So, check if a
> > device can match with a driver before adding it to the deferred probe
> > list.
> >
> > Signed-off-by: Saravana Kannan 
>
> Thanks for your patch!
>
> > ---
> > Geert,
> >
> > Can you give this a shot for your I2C DMA issue with fw_devlink=on?
>
> Yes, this makes I2C use DMA again on Salvator-XS during kernel boot-up.

Thanks for testing Geert!

> I haven't run any more elaborate tests on other platforms.

Yeah, this change should only go into 5.13 after it gets tested as
part of driver-core-next.

-Saravana


Re: [PATCH v3] ARM: imx: avic: Convert to using IRQCHIP_DECLARE

2021-02-22 Thread Saravana Kannan
On Thu, Feb 4, 2021 at 6:07 PM Saravana Kannan  wrote:
>
> On Thu, Feb 4, 2021 at 5:54 PM Fabio Estevam  wrote:
> >
> > Hi Saravana,
> >
> > On Thu, Feb 4, 2021 at 10:39 PM Saravana Kannan  
> > wrote:
> > >
> > > Using IRQCHIP_DECLARE lets fw_devlink know that it should not wait for
> > > these interrupt controllers to be populated as struct devices. Without
> > > this change, fw_devlink=on will make the consumers of these interrupt
> > > controllers wait for the struct device to be added and thereby block the
> > > consumers' probes forever. Converting to IRQCHIP_DECLARE addresses boot
> > > issues on imx25 with fw_devlink=on that were reported by Martin.
> > >
> > > This also removes a lot of boilerplate code.
> > >
> > > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> > > Reported-by: Martin Kaiser 
> > > Signed-off-by: Saravana Kannan 
> > > Tested-by: Martin Kaiser 
> >
> > Thanks for the respin:
> >
> > Reviewed-by: Fabio Estevam 
>
> Thanks for the quick review.
>

Maintainers,

Is this getting picked up for 5.12?

-Saravana


Re: [PATCH] driver core: Avoid pointless deferred probe attempts

2021-02-18 Thread Saravana Kannan
On Thu, Feb 18, 2021 at 9:24 AM Saravana Kannan  wrote:
>
> On Thu, Feb 18, 2021 at 9:18 AM Rafael J. Wysocki  wrote:
> >
> > On Thu, Feb 18, 2021 at 12:51 AM Saravana Kannan  
> > wrote:
> > >
> > > There's no point in adding a device to the deferred probe list if we
> > > know for sure that it doesn't have a matching driver. So, check if a
> > > device can match with a driver before adding it to the deferred probe
> > > list.
> >
> > What if a matching driver module loads in the meantime?
>
> Driver registration always triggers a match attempt and this flag will
> get set at that point. Yes, the user can disable autoprobe, but
> that'll block deferred probes too.
>

Btw, this can wait for 5.13. Doesn't need to go into 5.12-rcX.

> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index 9179825ff646..f18963f42e21 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
> > > deferred_probe_work_func);
> > >
> > >  void driver_deferred_probe_add(struct device *dev)
> > >  {
> > > +   if (!dev->can_match)
> > > +   return;
> > > +

Also, if you are worried about this check, for now, I can move it
inside device_links_driver_bound() which is the only place that
currently adds a device to the deferred probe list before the driver
is present. But it seemed like a good check in general to have in
driver_deferred_probe_add(), so I put it there.

-Saravana


Re: [PATCH] driver core: Avoid pointless deferred probe attempts

2021-02-18 Thread Saravana Kannan
On Thu, Feb 18, 2021 at 9:18 AM Rafael J. Wysocki  wrote:
>
> On Thu, Feb 18, 2021 at 12:51 AM Saravana Kannan  wrote:
> >
> > There's no point in adding a device to the deferred probe list if we
> > know for sure that it doesn't have a matching driver. So, check if a
> > device can match with a driver before adding it to the deferred probe
> > list.
>
> What if a matching driver module loads in the meantime?

Driver registration always triggers a match attempt and this flag will
get set at that point. Yes, the user can disable autoprobe, but
that'll block deferred probes too.

-Saravana

>
> >
> > Signed-off-by: Saravana Kannan 
> > ---
> > Geert,
> >
> > Can you give this a shot for your I2C DMA issue with fw_devlink=on?
> >
> > -Saravana
> >
> >  drivers/base/dd.c  | 6 ++
> >  include/linux/device.h | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9179825ff646..f18963f42e21 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
> > deferred_probe_work_func);
> >
> >  void driver_deferred_probe_add(struct device *dev)
> >  {
> > +   if (!dev->can_match)
> > +   return;
> > +
> > mutex_lock(_probe_mutex);
> > if (list_empty(>p->deferred_probe)) {
> > dev_dbg(dev, "Added to deferred list\n");
> > @@ -726,6 +729,7 @@ static int driver_probe_device(struct device_driver 
> > *drv, struct device *dev)
> > if (!device_is_registered(dev))
> > return -ENODEV;
> >
> > +   dev->can_match = true;
> > pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> >  drv->bus->name, __func__, dev_name(dev), drv->name);
> >
> > @@ -829,6 +833,7 @@ static int __device_attach_driver(struct device_driver 
> > *drv, void *_data)
> > return 0;
> > } else if (ret == -EPROBE_DEFER) {
> > dev_dbg(dev, "Device match requests probe deferral\n");
> > +   dev->can_match = true;
> > driver_deferred_probe_add(dev);
> > } else if (ret < 0) {
> > dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> > @@ -1064,6 +1069,7 @@ static int __driver_attach(struct device *dev, void 
> > *data)
> > return 0;
> > } else if (ret == -EPROBE_DEFER) {
> > dev_dbg(dev, "Device match requests probe deferral\n");
> > +   dev->can_match = true;
> > driver_deferred_probe_add(dev);
> > } else if (ret < 0) {
> > dev_dbg(dev, "Bus failed to match device: %d\n", ret);
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 7619a84f8ce4..1f9cc1ba78bc 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -438,6 +438,9 @@ struct dev_links_info {
> >   * @state_synced: The hardware state of this device has been synced to 
> > match
> >   *   the software state of this device by calling the 
> > driver/bus
> >   *   sync_state() callback.
> > + * @can_match: The device has matched with a driver at least once or it is 
> > in
> > + * a bus (like AMBA) which can't check for matching drivers 
> > until
> > + * other devices probe successfully.
> >   * @dma_coherent: this particular device is dma coherent, even if the
> >   * architecture supports non-coherent devices.
> >   * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> > @@ -544,6 +547,7 @@ struct device {
> > booloffline:1;
> > boolof_node_reused:1;
> > boolstate_synced:1;
> > +   boolcan_match:1;
> >  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> >  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> >  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> > --


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-17 Thread Saravana Kannan
On Tue, Feb 16, 2021 at 12:31 PM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan  wrote:
> > On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven
> >  wrote:
> > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan  
> > > wrote:
> > > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven 
> > > >  wrote:
> > > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan 
> > > > >  wrote:
> > > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven 
> > > > > >  wrote:
> > > > > > >   - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > > > > /sys/kernel/debug/dmaengine/summary:
> > > > > > >
> > > > > > > -dma4chan0| e66d8000.i2c:tx
> > > > > > > -dma4chan1| e66d8000.i2c:rx
> > > > > > > -dma5chan0| e651.i2c:tx
> > > > > >
> > > > > > I think I need more context on the problem before I can try to fix 
> > > > > > it.
> > > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > > > > I2C was using DMA? If so, the next step is to see if the I2C 
> > > > > > relative
> > > > > > probe order with DMA is getting changed and if so, why.
> > > > >
> > > > > More detailed log:
> > > > >
> > > > > platform e66d8000.i2c: Linked as a consumer to 
> > > > > e615.clock-controller
> > > > > platform e66d8000.i2c: Linked as a sync state only consumer to 
> > > > > e6055400.gpio
> > > > >
> > > > > Why is e66d8000.i2c not linked as a consumer to 
> > > > > e670.dma-controller?
> > > >
> > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> > > > "optional"/"driver decides" dependency.
> > >
> > > Oh, I thought dma/iommu were considered mandatory initially,
> > > but dropped as dependencies in the late boot process?
> >
> > No, I didn't do that in case the drivers that didn't need the
> > IOMMU/DMA were sensitive to probe order.
> >
> > My goal was for fw_devlink=on to not affect probe order for devices
> > that currently don't need to defer probe. But see below...
> >
> > >
> > > >
> > > > > platform e670.dma-controller: Linked as a consumer to
> > > > > e615.clock-controller
> > > >
> > > > Is this the only supplier of dma-controller?
> > >
> > > No, e618.system-controller is also a supplier.
> > >
> > > > > platform e66d8000.i2c: Added to deferred list
> > > > > platform e670.dma-controller: Added to deferred list
> > > > >
> > > > > bus: 'platform': driver_probe_device: matched device
> > > > > e670.dma-controller with driver rcar-dmac
> > > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > > device e670.dma-controller
> > > > > platform e670.dma-controller: Driver rcar-dmac requests probe 
> > > > > deferral
> > > > >
> > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > > > > with driver i2c-rcar
> > > > > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > > > > e66d8000.i2c
> > > > >
> > > > > I2C becomes available...
> > > > >
> > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > > > > [...]
> > > > >
> > > > > but DMA is not available yet, so the driver falls back to PIO.
> > > > >
> > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > > > > bus: 'platform': really_probe: bound device e66d8000.i2c to 
> > > > > driver i2c-rcar
> > > > >
> > > > > platform e670.dma-controller: Retrying from deferred list
> > > > > bus: 'platform': driver_probe_device: matched device
> > > > > e670.dma-controller with driver rcar-dmac
> > > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > > device e670.dma-controller

[PATCH] driver core: Avoid pointless deferred probe attempts

2021-02-17 Thread Saravana Kannan
There's no point in adding a device to the deferred probe list if we
know for sure that it doesn't have a matching driver. So, check if a
device can match with a driver before adding it to the deferred probe
list.

Signed-off-by: Saravana Kannan 
---
Geert,

Can you give this a shot for your I2C DMA issue with fw_devlink=on?

-Saravana

 drivers/base/dd.c  | 6 ++
 include/linux/device.h | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9179825ff646..f18963f42e21 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -123,6 +123,9 @@ static DECLARE_WORK(deferred_probe_work, 
deferred_probe_work_func);
 
 void driver_deferred_probe_add(struct device *dev)
 {
+   if (!dev->can_match)
+   return;
+
mutex_lock(_probe_mutex);
if (list_empty(>p->deferred_probe)) {
dev_dbg(dev, "Added to deferred list\n");
@@ -726,6 +729,7 @@ static int driver_probe_device(struct device_driver *drv, 
struct device *dev)
if (!device_is_registered(dev))
return -ENODEV;
 
+   dev->can_match = true;
pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 drv->bus->name, __func__, dev_name(dev), drv->name);
 
@@ -829,6 +833,7 @@ static int __device_attach_driver(struct device_driver 
*drv, void *_data)
return 0;
} else if (ret == -EPROBE_DEFER) {
dev_dbg(dev, "Device match requests probe deferral\n");
+   dev->can_match = true;
driver_deferred_probe_add(dev);
} else if (ret < 0) {
dev_dbg(dev, "Bus failed to match device: %d\n", ret);
@@ -1064,6 +1069,7 @@ static int __driver_attach(struct device *dev, void *data)
return 0;
} else if (ret == -EPROBE_DEFER) {
dev_dbg(dev, "Device match requests probe deferral\n");
+   dev->can_match = true;
driver_deferred_probe_add(dev);
} else if (ret < 0) {
dev_dbg(dev, "Bus failed to match device: %d\n", ret);
diff --git a/include/linux/device.h b/include/linux/device.h
index 7619a84f8ce4..1f9cc1ba78bc 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -438,6 +438,9 @@ struct dev_links_info {
  * @state_synced: The hardware state of this device has been synced to match
  *   the software state of this device by calling the driver/bus
  *   sync_state() callback.
+ * @can_match: The device has matched with a driver at least once or it is in
+ * a bus (like AMBA) which can't check for matching drivers until
+ * other devices probe successfully.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
  * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
@@ -544,6 +547,7 @@ struct device {
booloffline:1;
boolof_node_reused:1;
boolstate_synced:1;
+   boolcan_match:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
-- 
2.30.0.478.g8a0d178c01-goog



Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

2021-02-16 Thread Saravana Kannan
On Tue, Feb 16, 2021 at 7:05 PM Guenter Roeck  wrote:
>
> On Tue, Feb 16, 2021 at 06:39:55PM -0800, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck  wrote:
> > >
> > > On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > > > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck  
> > > > wrote:
> > > >>
> > > >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > > >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck  
> > > >>> wrote:
> > > >>>>
> > > >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > > >>>>> Cyclic dependencies in some firmware was one of the last remaining
> > > >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > >>>>> dependencies don't block probing, set fw_devlink=on by default.
> > > >>>>>
> > > >>>>> Setting fw_devlink=on by default brings a bunch of benefits 
> > > >>>>> (currently,
> > > >>>>> only for systems with device tree firmware):
> > > >>>>> * Significantly cuts down deferred probes.
> > > >>>>> * Device probe is effectively attempted in graph order.
> > > >>>>> * Makes it much easier to load drivers as modules without having to
> > > >>>>>   worry about functional dependencies between modules (depmod is 
> > > >>>>> still
> > > >>>>>   needed for symbol dependencies).
> > > >>>>>
> > > >>>>> If this patch prevents some devices from probing, it's very likely 
> > > >>>>> due
> > > >>>>> to the system having one or more device drivers that "probe"/set up 
> > > >>>>> a
> > > >>>>> device (DT node with compatible property) without creating a struct
> > > >>>>> device for it.  If we hit such cases, the device drivers need to be
> > > >>>>> fixed so that they populate struct devices and probe them like 
> > > >>>>> normal
> > > >>>>> device drivers so that the driver core is aware of the devices and 
> > > >>>>> their
> > > >>>>> status. See [1] for an example of such a case.
> > > >>>>>
> > > >>>>> [1] - 
> > > >>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/
> > > >>>>> Signed-off-by: Saravana Kannan 
> > > >>>>
> > > >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck 
> > > >>>> when
> > > >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> > > >>>> is attached.
> > > >>>
> > > >>> Thanks for the report Guenter. Can you please try this series?
> > > >>> https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/
> > > >>>
> > > >>
> > > >> Not this week. I have lots of reviews to complete before the end of 
> > > >> the week,
> > > >> with the 5.12 commit window coming up.
> > > >
> > > > Ok. By next week, all the fixes should be in linux-next too. So it
> > > > should be easier if you choose to test.
> > > >
> > > >> Given the number of problems observed, I personally think that it is 
> > > >> way
> > > >> too early for this patch. We'll have no end of problems if it is 
> > > >> applied
> > > >> to the upstream kernel in the next commit window. Of course, that is 
> > > >> just
> > > >> my personal opinion.
> > > >
> > > > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > > > Just to be sure I understand it right, you are not saying this patch
> > > > caused them all right? You are just saying that 115 general boot
> > > > failures that might mask fw_devlink issues in some of them, right?
> > > >
> > >
> > > Correct.
> >
> > Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
> > [1] - 
> > https://lore.kernel.org/lkml/20210215224258.1231449-1-sarava...@google.com/
> >
>
> I honestly don't know. Current status of -next in my tests is:
>
> Build results:
> total: 149 pass: 144 fail: 5
> Qemu test results:
> total: 432 pass: 371 fail: 61
>
> This is for next-20210216. Newly introduced failures keep popping up. Some
> of the failures have been persistent for weeks, so it is all but impossible
> to say if affected platforms experience more than one failure.
>
> Also, please keep in mind that my boot tests are very shallow, along the
> line of "it boots, therefore it works". It only tests hardware which is
> emulated by qemu and is needed for booting. It tests probably much less
> than 1% of driver code. It can and should not be used for any useful
> fw_devlink related test coverage.

Agreed. I'm not using this for fw_devlink=on test coverage. Just
checking to make sure I've addressed any issues you've seen.

FYI, you can change it at runtime using the kernel commandline param
fw_devlink=permissive. So, you don't have to build all these kernels
again to test if fw_devlink=on is making things worse.

-Saravana


Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

2021-02-16 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck  wrote:
>
> On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck  wrote:
> >>
> >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck  wrote:
> >>>>
> >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> >>>>> Cyclic dependencies in some firmware was one of the last remaining
> >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>>>> dependencies don't block probing, set fw_devlink=on by default.
> >>>>>
> >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>>>> only for systems with device tree firmware):
> >>>>> * Significantly cuts down deferred probes.
> >>>>> * Device probe is effectively attempted in graph order.
> >>>>> * Makes it much easier to load drivers as modules without having to
> >>>>>   worry about functional dependencies between modules (depmod is still
> >>>>>   needed for symbol dependencies).
> >>>>>
> >>>>> If this patch prevents some devices from probing, it's very likely due
> >>>>> to the system having one or more device drivers that "probe"/set up a
> >>>>> device (DT node with compatible property) without creating a struct
> >>>>> device for it.  If we hit such cases, the device drivers need to be
> >>>>> fixed so that they populate struct devices and probe them like normal
> >>>>> device drivers so that the driver core is aware of the devices and their
> >>>>> status. See [1] for an example of such a case.
> >>>>>
> >>>>> [1] - 
> >>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/
> >>>>> Signed-off-by: Saravana Kannan 
> >>>>
> >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> >>>> is attached.
> >>>
> >>> Thanks for the report Guenter. Can you please try this series?
> >>> https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/
> >>>
> >>
> >> Not this week. I have lots of reviews to complete before the end of the 
> >> week,
> >> with the 5.12 commit window coming up.
> >
> > Ok. By next week, all the fixes should be in linux-next too. So it
> > should be easier if you choose to test.
> >
> >> Given the number of problems observed, I personally think that it is way
> >> too early for this patch. We'll have no end of problems if it is applied
> >> to the upstream kernel in the next commit window. Of course, that is just
> >> my personal opinion.
> >
> > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > Just to be sure I understand it right, you are not saying this patch
> > caused them all right? You are just saying that 115 general boot
> > failures that might mask fw_devlink issues in some of them, right?
> >
>
> Correct.

Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
[1] - 
https://lore.kernel.org/lkml/20210215224258.1231449-1-sarava...@google.com/

-Saravana


Re: [PATCH] of: property: fw_devlink: Ignore interrupts property for some configs

2021-02-16 Thread Saravana Kannan
On Tue, Feb 16, 2021 at 12:20 PM Enrico Weigelt, metux IT consult
 wrote:
>
> On 15.02.21 23:42, Saravana Kannan wrote:
>
> Hi,
>
> > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > index 79b68519fe30..5036a362f52e 100644
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1300,6 +1300,9 @@ static struct device_node *parse_interrupts(struct 
> > device_node *np,
> >   {
> >   struct of_phandle_args sup_args;
> >
> > + if (!IS_ENABLED(CONFIG_OF_IRQ) || IS_ENABLED(CONFIG_PPC))
> > + return NULL;
> > +
> >   if (strcmp(prop_name, "interrupts") &&
> >   strcmp(prop_name, "interrupts-extended"))
> >   return NULL;
>
> wouldn't it be better to #ifdef-out the whole code in this case ?

No, #ifdef is not preferred. That's why we even have the IS_ENABLED()
macros in the first place. The compiled will optimize out the code.

-Saravana


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-16 Thread Saravana Kannan
On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan  wrote:
> > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven  
> > wrote:
> > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan  
> > > wrote:
> > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven 
> > > >  wrote:
> > > > >   - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > > /sys/kernel/debug/dmaengine/summary:
> > > > >
> > > > > -dma4chan0| e66d8000.i2c:tx
> > > > > -dma4chan1| e66d8000.i2c:rx
> > > > > -dma5chan0| e651.i2c:tx
> > > >
> > > > I think I need more context on the problem before I can try to fix it.
> > > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > > I2C was using DMA? If so, the next step is to see if the I2C relative
> > > > probe order with DMA is getting changed and if so, why.
> > >
> > > More detailed log:
> > >
> > > platform e66d8000.i2c: Linked as a consumer to 
> > > e615.clock-controller
> > > platform e66d8000.i2c: Linked as a sync state only consumer to 
> > > e6055400.gpio
> > >
> > > Why is e66d8000.i2c not linked as a consumer to e670.dma-controller?
> >
> > Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> > "optional"/"driver decides" dependency.
>
> Oh, I thought dma/iommu were considered mandatory initially,
> but dropped as dependencies in the late boot process?

No, I didn't do that in case the drivers that didn't need the
IOMMU/DMA were sensitive to probe order.

My goal was for fw_devlink=on to not affect probe order for devices
that currently don't need to defer probe. But see below...

>
> >
> > > platform e670.dma-controller: Linked as a consumer to
> > > e615.clock-controller
> >
> > Is this the only supplier of dma-controller?
>
> No, e618.system-controller is also a supplier.
>
> > > platform e66d8000.i2c: Added to deferred list
> > > platform e670.dma-controller: Added to deferred list
> > >
> > > bus: 'platform': driver_probe_device: matched device
> > > e670.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e670.dma-controller
> > > platform e670.dma-controller: Driver rcar-dmac requests probe 
> > > deferral
> > >
> > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > > with driver i2c-rcar
> > > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > > e66d8000.i2c
> > >
> > > I2C becomes available...
> > >
> > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > > [...]
> > >
> > > but DMA is not available yet, so the driver falls back to PIO.
> > >
> > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver 
> > > i2c-rcar
> > >
> > > platform e670.dma-controller: Retrying from deferred list
> > > bus: 'platform': driver_probe_device: matched device
> > > e670.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e670.dma-controller
> > > platform e670.dma-controller: Driver rcar-dmac requests probe 
> > > deferral
> > > platform e670.dma-controller: Added to deferred list
> > > platform e670.dma-controller: Retrying from deferred list
> > > bus: 'platform': driver_probe_device: matched device
> > > e670.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e670.dma-controller
> > > driver: 'rcar-dmac': driver_bound: bound to device 
> > > 'e670.dma-controller'
> > > bus: 'platform': really_probe: bound device
> > > e670.dma-controller to driver rcar-dmac
> > >
> > > DMA becomes available.
> > >
> > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> > > that the I2C controllers do not have DMA channels allocated, as the
> > > kernel has performed 

Re: [PATCH v2] soc: renesas: rmobile-sysc: Mark fwnode when PM domain is added

2021-02-16 Thread Saravana Kannan
On Tue, Feb 16, 2021 at 4:40 AM Geert Uytterhoeven
 wrote:
>
> Currently, there are two drivers binding to the R-Mobile System
> Controller (SYSC):
>   - The rmobile-sysc driver registers PM domains from a core_initcall(),
> and does not use a platform driver,
>   - The optional rmobile-reset driver registers a reset handler, and
> does use a platform driver.
>
> As fw_devlink only considers devices, commit bab2d712eeaf9d60 ("PM:
> domains: Mark fwnodes when their powerdomain is added/removed") works
> only for PM Domain drivers where the DT node is a real device node, and
> not for PM Domain drivers using a hierarchical representation inside a
> subnode.  Hence if fw_devlink is enabled, probing of on-chip devices
> that are part of the SYSC PM domain is deferred until the optional
> rmobile-reset driver has been bound.   If the rmobile-reset driver is
> not available, this will never happen, and thus lead to complete system
> boot failures.
>
> Fix this by explicitly marking the fwnode initialized.
>
> Suggested-by: Saravana Kannan 
> Signed-off-by: Geert Uytterhoeven 
> ---
> This is v2 of "soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> reset handling".
> To be queued in renesas-devel as a fix for v5.12 if v5.12-rc1 will have
> fw_devlink enabled.
>
> v2:
>   - Call fwnode_dev_initialized() instead of setting OF_POPULATED,
>   - Drop reset handling move, as fwnode_dev_initialized() does not
> prevent the rmobile-reset driver from binding against the same
> device.
> ---
>  drivers/soc/renesas/rmobile-sysc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/soc/renesas/rmobile-sysc.c 
> b/drivers/soc/renesas/rmobile-sysc.c
> index bf64d052f9245db5..204e6135180b919c 100644
> --- a/drivers/soc/renesas/rmobile-sysc.c
> +++ b/drivers/soc/renesas/rmobile-sysc.c
> @@ -342,6 +342,8 @@ static int __init rmobile_init_pm_domains(void)
> of_node_put(np);
> break;
> }
> +
> +   fwnode_dev_initialized(>fwnode, true);
> }
>
> put_special_pds();

Acked-by: Saravana Kannan 

Keep in mind that this might have to land in driver-core-next since
that API is currently only in driver-core-next.

-Saravana


Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on

2021-02-16 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 12:59 PM Saravana Kannan  wrote:
>
> On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
>  wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan  
> > wrote:
> > > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki  
> > > wrote:
> > > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > >  wrote:
> > > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > > >
> > > > > With fw_devlink=on, devices are added to the deferred probe pending 
> > > > > list
> > > > > if they are determined to be a consumer,
> > >
> > > If they are determined to be a consumer or if they are determined to
> > > have a supplier that hasn't probed yet?
> >
> > When the supplier has probed:
> >
> > bus: 'platform': driver_probe_device: matched device
> > e615.clock-controller with driver renesas-cpg-mssr
> > bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> > with device e615.clock-controller
> > PM: Added domain provider from /soc/clock-controller@e615
> > driver: 'renesas-cpg-mssr': driver_bound: bound to device
> > 'e615.clock-controller'
> > platform e6055800.gpio: Added to deferred list
> > [...]
> > platform e602.watchdog: Added to deferred list
> > [...]
> > platform fe00.pcie: Added to deferred list
> >
> > > > > which happens before their
> > > > > driver's .probe() method is called.  If the actual probe fails later
> > > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > > deferred probe pending list, and it will be probed again when deferred
> > > > > probing kicks in, which is futile.
> > > > >
> > > > > Fix this by explicitly removing the device from the deferred probe
> > > > > pending list in case of probe failures.
> > > > >
> > > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > > Signed-off-by: Geert Uytterhoeven 
> > > >
> > > > Good catch:
> > > >
> > > > Reviewed-by: Rafael J. Wysocki 
> > >
> > > The issue is real and needs to be fixed. But I'm confused how this can
> > > happen. We won't even enter really_probe() if the driver isn't ready.
> > > We also won't get to run the driver's .probe() if the suppliers aren't
> > > ready. So how does the device get added to the deferred probe list
> > > before the driver is ready? Is this due to device_links_driver_bound()
> > > on the supplier?
> > >
> > > Can you give a more detailed step by step on the case you are hitting?
> >
> > The device is added to the list due to device_links_driver_bound()
> > calling driver_deferred_probe_add() on all consumer devices.
>
> Thanks for the explanation. Maybe add more details like this to the
> commit text or in the code?
>
> For the code:
> Reviewed-by: Saravana Kanna 

Ugh... I just realized that I might have to give this a Nak because of
bad locking in deferred_probe_work_func(). The unlock/lock inside the
loop is a terrible hack. If we add this patch, we can end up modifying
a linked list while it's being traversed and cause a crash or busy
loop (you'll accidentally end up on an "empty list"). I ran into a
similar issue during one of my unrelated refactors.

-Saravana


[PATCH] of: property: fw_devlink: Ignore interrupts property for some configs

2021-02-15 Thread Saravana Kannan
When CONFIG_OF_IRQ is not defined, it doesn't make sense to parse
interrupts property.

Also, parsing and tracking interrupts property breaks some PPC
devices[1].  But none of the IRQ drivers in PPC seem ready to be
converted to a proper platform (or any bus) driver. So, there's not much
of a point in tracking the interrupts property for CONFIG_PPC. So, let's
stop parsing interrupts for CONFIG_PPC.

[1] - https://lore.kernel.org/lkml/20210213185422.ga195...@roeck-us.net/
Fixes: 4104ca776ba3 ("of: property: Add fw_devlink support for interrupts")
Reported-by: Guenter Roeck 
Signed-off-by: Saravana Kannan 
---
Greg/Rob,

I believe this needs to land on driver-core-next.

-Saravana

 drivers/of/property.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 79b68519fe30..5036a362f52e 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1300,6 +1300,9 @@ static struct device_node *parse_interrupts(struct 
device_node *np,
 {
struct of_phandle_args sup_args;
 
+   if (!IS_ENABLED(CONFIG_OF_IRQ) || IS_ENABLED(CONFIG_PPC))
+   return NULL;
+
if (strcmp(prop_name, "interrupts") &&
strcmp(prop_name, "interrupts-extended"))
return NULL;
-- 
2.30.0.478.g8a0d178c01-goog



Re: [PATCH v2 2/2] of: property: Add fw_devlink support for interrupts

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 1:09 AM Marc Zyngier  wrote:
>
> Hi Saravana,
>
> On Mon, 15 Feb 2021 08:29:53 +,
> Saravana Kannan  wrote:
> >
> > On Sun, Feb 14, 2021 at 7:58 PM Guenter Roeck  wrote:
> > >
> > > On 2/14/21 1:12 PM, Saravana Kannan wrote:
> > > [ ... ]
> > > >
> > > > Can you please give me the following details:
> > > > * The DTS file for the board (not the SoC).
> > >
> > > The devicetree file extracted from the running system is attached.
> > > Hope it helps.
> >
> > Hi Guenter,
> >
> > Thanks for the DTS file and logs. That helps a lot.
> >
> > Looking at the attachment and this line from the earlier email:
> > [   14.084606][   T11] pci 0005:01:00.0: probe deferral - wait for
> > supplier interrupt-controller@0
> >
> > It's clear the PCI node is waiting on:
> > interrupt-controller@0 {
> > #address-cells = <0x00>;
> > device_type = "PowerPC-Interrupt-Source-Controller";
> > compatible = "ibm,opal-xive-vc\0IBM,opal-xics";
> > #interrupt-cells = <0x02>;
> > reg = <0x00 0x00 0x00 0x00>;
> > phandle = <0x804b>;
> > interrupt-controller;
> > };
> >
> > If I grep for "ibm,opal-xive-vc", I see only one instance of it in the
> > code. And that eventually ends up getting called like this:
> > irq_find_matching_fwspec() -> xive_irq_domain_match() -> xive_native_match()
> >
> > static bool xive_native_match(struct device_node *node)
> > {
> > return of_device_is_compatible(node, "ibm,opal-xive-vc");
> > }
> >
> > However, when the IRQ domain are first registered, in xive_init_host()
> > the "np" passed in is NOT the same node that xive_native_match() would
> > match.
> > static void __init xive_init_host(struct device_node *np)
> > {
> > xive_irq_domain = irq_domain_add_nomap(np, XIVE_MAX_IRQ,
> >_irq_domain_ops, NULL);
> > if (WARN_ON(xive_irq_domain == NULL))
> > return;
> > irq_set_default_host(xive_irq_domain);
> > }
> >
> > Instead, the "np" here is:
> > interrupt-controller@603020318 {
> > ibm,xive-provision-page-size = <0x1>;
> > ibm,xive-eq-sizes = <0x0c 0x10 0x15 0x18>;
> > single-escalation-support;
> > ibm,xive-provision-chips = <0x00>;
> > ibm,xive-#priorities = <0x08>;
> > compatible = "ibm,opal-xive-pe\0ibm,opal-intc";
> > reg = <0x60302 0x318 0x00 0x1 0x60302
> > 0x319 0x00 0x1 0x60302 0x31a 0x00 0x1 0x60302
> > 0x31b 0x00 0x1>;
> > phandle = <0x8051>;
> > };
> >
> > There are many ways to fix this, but I first want to make sure this is
> > a valid way to register irqdomains before trying to fix it. I just
> > find it weird that the node that's registered is unrelated (not a
> > parent/child) of the node that matches.
> >
> > Marc,
> >
> > Is this a valid way to register irqdomains? Just registering
> > interrupt-controller@603020318 DT node where there are multiple
> > interrupt controllers?
>
> Absolutely.
>
> The node is only one of the many possible ways to retrieve a
> domain. In general, what you pass as the of_node/fwnode_handle can be
> anything you want. It doesn't have to represent anything in the system
> (we even create then ex-nihilo in some cases), and the match/select
> callbacks are authoritative when they exist.
>
> There is also the use of a default domain, which is used as a fallback
> when no domain is found via the normal matching procedure.
>
> PPC has established a way of dealing with domains long before ARM did,
> closer to the board files of old than what we would do today (code
> driven rather than data structure driven).
>
> Strictly mapping domains onto HW blocks is a desirable property, but
> that is all it is. That doesn't affect the very purpose of the IRQ
> domains, which is to translate numbers from one context into another.
>
> I'd be all for rationalising this, but it is pretty hard to introduce
> semantic where there is none.

Ok, I'm going to disable parsing "interrupts" for PPC. It doesn't look
like any of the irq drivers are even remotely ready to be converted to
a proper device driver anyway.

And if this continues for other properties, I'll just disable
fw_devlink for PPC entirely.

-Saravana


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-15 Thread Saravana Kannan
Hi Geert,

On Mon, Feb 15, 2021 at 7:16 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan  wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven  
> > wrote:
> > >   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > >
> > >   - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > > node OF_POPULATED after init") is no longer needed (but already
> > > queued for v5.12 anyway)
> >
> > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > it anymore, so maybe work it out with him? It's a balance between some
> > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
>
> > >   2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
> > >
> > >   - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> > > reset handling" is no longer needed
> > > 
> > > https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+rene...@glider.be/
> >
> > Good to see more evidence that this series is fixing things at a more
> > generic level.
>
> I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n,
> booting fails again, as everything is waiting on the system controller,
> which never becomes available.
> Rcar-sysc doesn't suffer from this problem, cfr. above.
> Perhaps because the rmobile-sysc bindings use a hierarchical instead
> of a linear PM domain description, and thus consumers point to the
> children of the system controller node?
> Cfr. system-controller@e618 in arch/arm/boot/dts/r8a7740.dtsi.

Ok, I see what's going on. The problem is that the "power domain"
fwnode being registered is not the node that contains the "compatible"
property and becomes a device. So this patch[1] is not helping here.
Fix is to do something like this (to avoid using OF_POPULATED flag and
breaking reset):

diff --git a/drivers/soc/renesas/rmobile-sysc.c
b/drivers/soc/renesas/rmobile-sysc.c
index 9046b8c933cb..b7e66139ef7d 100644
--- a/drivers/soc/renesas/rmobile-sysc.c
+++ b/drivers/soc/renesas/rmobile-sysc.c
@@ -344,6 +344,7 @@ static int __init rmobile_init_pm_domains(void)
of_node_put(np);
break;
}
+   fwnode_dev_initialized(>fwnode, true);
}

put_special_pds();

Can you give it a shot?

[1] - 
https://lore.kernel.org/lkml/20210205222644.2357303-8-sarava...@google.com/

> > >   - On R-Mobile A1, I get a BUG and a memory leak:
> > >
> > > BUG: spinlock bad magic on CPU#0, swapper/1
> > >  lock: lcdc0_device+0x10c/0x308, .magic: , .owner:
> > > /-1, .owner_cpu: 0
> > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
> > > Hardware name: Generic R8A7740 (Flattened Device Tree)
> > > [] (unwind_backtrace) from []
> > > (show_stack+0x10/0x14)
> > > [] (show_stack) from []
> > > (do_raw_spin_lock+0x20/0x94)
> > > [] (do_raw_spin_lock) from []
> > > (dev_pm_get_subsys_data+0x30/0xa0)
> > > [] (dev_pm_get_subsys_data) from []
> > > (genpd_add_device+0x34/0x1c0)
> > > [] (genpd_add_device) from []
> > > (of_genpd_add_device+0x34/0x4c)
> > > [] (of_genpd_add_device) from []
> > > (board_staging_register_device+0xf8/0x118)
> > > [] (board_staging_register_device) from
>
> This is indeed a pre-existing problem.
> of_genpd_add_device() is called before platform_device_register(),
> as it needs to attach the genpd before the device is probed.
> But the spinlock is only initialized when the device is registered.
> This was masked before due to an unrelated wait context check failure,
> which disabled any further spinlock checks, and exposed by fw_devlinks
> changing probe order.
> Patch sent.
> "[PATCH] staging: board: Fix uninitialized spinlock when attaching genpd"
> https://lore.kernel.org/r/20210215151405.2551143-1-geert+rene...@glider.be
>

Great!

Thanks,
Saravana


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan  wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven  
> > wrote:
> > >   - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > /sys/kernel/debug/dmaengine/summary:
> > >
> > > -dma4chan0| e66d8000.i2c:tx
> > > -dma4chan1| e66d8000.i2c:rx
> > > -dma5chan0| e651.i2c:tx
> >
> > I think I need more context on the problem before I can try to fix it.
> > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > I2C was using DMA? If so, the next step is to see if the I2C relative
> > probe order with DMA is getting changed and if so, why.
>
> More detailed log:
>
> platform e66d8000.i2c: Linked as a consumer to e615.clock-controller
> platform e66d8000.i2c: Linked as a sync state only consumer to 
> e6055400.gpio
>
> Why is e66d8000.i2c not linked as a consumer to e670.dma-controller?

Because fw_devlink.strict=1 is not set and dma/iommu is considered an
"optional"/"driver decides" dependency.

> platform e670.dma-controller: Linked as a consumer to
> e615.clock-controller

Is this the only supplier of dma-controller?

> platform e66d8000.i2c: Added to deferred list
> platform e670.dma-controller: Added to deferred list
>
> bus: 'platform': driver_probe_device: matched device
> e670.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e670.dma-controller
> platform e670.dma-controller: Driver rcar-dmac requests probe deferral
>
> bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> with driver i2c-rcar
> bus: 'platform': really_probe: probing driver i2c-rcar with device
> e66d8000.i2c
>
> I2C becomes available...
>
> i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> [...]
>
> but DMA is not available yet, so the driver falls back to PIO.
>
> driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> bus: 'platform': really_probe: bound device e66d8000.i2c to driver 
> i2c-rcar
>
> platform e670.dma-controller: Retrying from deferred list
> bus: 'platform': driver_probe_device: matched device
> e670.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e670.dma-controller
> platform e670.dma-controller: Driver rcar-dmac requests probe deferral
> platform e670.dma-controller: Added to deferred list
> platform e670.dma-controller: Retrying from deferred list
> bus: 'platform': driver_probe_device: matched device
> e670.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e670.dma-controller
> driver: 'rcar-dmac': driver_bound: bound to device 
> 'e670.dma-controller'
> bus: 'platform': really_probe: bound device
> e670.dma-controller to driver rcar-dmac
>
> DMA becomes available.
>
> Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> that the I2C controllers do not have DMA channels allocated, as the
> kernel has performed no more I2C transfers after DMA became available.
>
> Using i2cdetect shows that DMA is used, which is good:
>
> i2c-rcar e66d8000.i2c: got DMA channel for rx
>
> With permissive devlinks, the clock controller consumers are not added
> to the deferred probing list, and probe order is slightly different.
> The I2C controllers are still probed before the DMA controllers.
> But DMA becomes available a bit earlier, before the probing of the last
> I2C slave driver.

This seems like a race? I'm guessing it's two different threads
probing those two devices? And it just happens to work for
"permissive" assuming the boot timing doesn't change?

> Hence /sys/kernel/debug/dmaengine/summary shows that
> some I2C transfers did use DMA.
>
> So the real issue is that e66d8000.i2c not linked as a consumer to
> e670.dma-controller.

That's because fw_devlink.strict=1 isn't set. If you need DMA to be
treated as a mandatory supplier, you'll need to set the flag.

Is fw_devlink=on really breaking anything here? It just seems like
"permissive" got lucky with the timing and it could break at any point
in the future. Thought?

-Saravana


Re: [PATCH] staging: board: Fix uninitialized spinlock when attaching genpd

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 11:10 AM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 7:37 PM Saravana Kannan  wrote:
> > On Mon, Feb 15, 2021 at 7:14 AM Geert Uytterhoeven
> > > @@ -148,7 +149,11 @@ static int board_staging_add_dev_domain(struct 
> > > platform_device *pdev,
> > > pd_args.np = np;
> > > pd_args.args_count = 0;
> > >
> > > -   return of_genpd_add_device(_args, >dev);
> > > +   /* Cfr. device_pm_init_common() */
> >
> > What's Cfr?
>
> "compare to" (from Latin "confer").

Can you please change this to "refer to" or "similar to"? Also, not
sure if this comment is even adding anything useful even if you switch
the words.

Also, device_pm_init_common() is used in two places outside of
drivers/base/ with this change. Maybe better to move it to
linux/device.h?

-Saravana


Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan  wrote:
> > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki  wrote:
> > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > >  wrote:
> > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > >
> > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > if they are determined to be a consumer,
> >
> > If they are determined to be a consumer or if they are determined to
> > have a supplier that hasn't probed yet?
>
> When the supplier has probed:
>
> bus: 'platform': driver_probe_device: matched device
> e615.clock-controller with driver renesas-cpg-mssr
> bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> with device e615.clock-controller
> PM: Added domain provider from /soc/clock-controller@e615
> driver: 'renesas-cpg-mssr': driver_bound: bound to device
> 'e615.clock-controller'
> platform e6055800.gpio: Added to deferred list
> [...]
> platform e602.watchdog: Added to deferred list
> [...]
> platform fe00.pcie: Added to deferred list
>
> > > > which happens before their
> > > > driver's .probe() method is called.  If the actual probe fails later
> > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > deferred probe pending list, and it will be probed again when deferred
> > > > probing kicks in, which is futile.
> > > >
> > > > Fix this by explicitly removing the device from the deferred probe
> > > > pending list in case of probe failures.
> > > >
> > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > Signed-off-by: Geert Uytterhoeven 
> > >
> > > Good catch:
> > >
> > > Reviewed-by: Rafael J. Wysocki 
> >
> > The issue is real and needs to be fixed. But I'm confused how this can
> > happen. We won't even enter really_probe() if the driver isn't ready.
> > We also won't get to run the driver's .probe() if the suppliers aren't
> > ready. So how does the device get added to the deferred probe list
> > before the driver is ready? Is this due to device_links_driver_bound()
> > on the supplier?
> >
> > Can you give a more detailed step by step on the case you are hitting?
>
> The device is added to the list due to device_links_driver_bound()
> calling driver_deferred_probe_add() on all consumer devices.

Thanks for the explanation. Maybe add more details like this to the
commit text or in the code?

For the code:
Reviewed-by: Saravana Kanna 

-Saravana

>
> > > > +++ b/drivers/base/dd.c
> > > > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, 
> > > > struct device_driver *drv)
> > > > case -ENXIO:
> > > > pr_debug("%s: probe of %s rejects match %d\n",
> > > >  drv->name, dev_name(dev), ret);
> > > > +   driver_deferred_probe_del(dev);
> > > > break;
> > > > default:
> > > > /* driver matched but the probe failed */
> > > > pr_warn("%s: probe of %s failed with error %d\n",
> > > > drv->name, dev_name(dev), ret);
> > > > +   driver_deferred_probe_del(dev);
> > > > }
> > > > /*
> > > >  * Ignore errors returned by ->probe so that the next driver 
> > > > can try
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH] staging: board: Fix uninitialized spinlock when attaching genpd

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 7:14 AM Geert Uytterhoeven
 wrote:
>
> On Armadillo-800-EVA with CONFIG_DEBUG_SPINLOCK=y:
>
> BUG: spinlock bad magic on CPU#0, swapper/1
>  lock: lcdc0_device+0x10c/0x308, .magic: , .owner: /-1, 
> .owner_cpu: 0
> CPU: 0 PID: 1 Comm: swapper Not tainted 
> 5.11.0-rc5-armadillo-00036-gbbca04be7a80-dirty #287
> Hardware name: Generic R8A7740 (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (do_raw_spin_lock+0x20/0x94)
> [] (do_raw_spin_lock) from [] 
> (dev_pm_get_subsys_data+0x8c/0x11c)
> [] (dev_pm_get_subsys_data) from [] 
> (genpd_add_device+0x78/0x2b8)
> [] (genpd_add_device) from [] 
> (of_genpd_add_device+0x34/0x4c)
> [] (of_genpd_add_device) from [] 
> (board_staging_register_device+0x11c/0x148)
> [] (board_staging_register_device) from [] 
> (board_staging_register_devices+0x24/0x28)
>
> of_genpd_add_device() is called before platform_device_register(), as it
> needs to attach the genpd before the device is probed.  But the spinlock
> is only initialized when the device is registered.
>
> Fix this by open-coding the spinlock initialization, cfr.
> device_pm_init_common() in the internal drivers/base code, and in the
> SuperH early platform code.
>
> Signed-off-by: Geert Uytterhoeven 
> ---
> Exposed by fw_devlinks changing probe order.
> Masked before due to an unrelated wait context check failure, which
> disabled any further spinlock checks.
> https://lore.kernel.org/linux-acpi/camuhmdvl-1rkj5u-hdva4f4w_+8ygvqqujqbczmsdv4yxzz...@mail.gmail.com
> ---
>  drivers/staging/board/board.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/board/board.c b/drivers/staging/board/board.c
> index cb6feb34dd401ae3..604612937f038e92 100644
> --- a/drivers/staging/board/board.c
> +++ b/drivers/staging/board/board.c
> @@ -136,6 +136,7 @@ int __init board_staging_register_clock(const struct 
> board_staging_clk *bsc)
>  static int board_staging_add_dev_domain(struct platform_device *pdev,
> const char *domain)
>  {
> +   struct device *dev = >dev;
> struct of_phandle_args pd_args;
> struct device_node *np;
>
> @@ -148,7 +149,11 @@ static int board_staging_add_dev_domain(struct 
> platform_device *pdev,
> pd_args.np = np;
> pd_args.args_count = 0;
>
> -   return of_genpd_add_device(_args, >dev);
> +   /* Cfr. device_pm_init_common() */

What's Cfr?

> +   spin_lock_init(>power.lock);
> +   dev->power.early_init = true;

Also, I tried looking up, but it's not exactly what this flag
represents other than the fact the spinlock has been initialized?
Which is weird to me. So maybe Rafael can double check this?

-Saravana

> +
> +   return of_genpd_add_device(_args, dev);
>  }
>  #else
>  static inline int board_staging_add_dev_domain(struct platform_device *pdev,
> --
> 2.25.1
>


Re: [PATCH] driver core: Fix double failed probing with fw_devlink=on

2021-02-15 Thread Saravana Kannan
On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki  wrote:
>
> On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
>  wrote:
> >
> > With fw_devlink=permissive, devices are added to the deferred probe
> > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> >
> > With fw_devlink=on, devices are added to the deferred probe pending list
> > if they are determined to be a consumer,

If they are determined to be a consumer or if they are determined to
have a supplier that hasn't probed yet?

> > which happens before their
> > driver's .probe() method is called.  If the actual probe fails later
> > (real failure, not -EPROBE_DEFER), the device will still be on the
> > deferred probe pending list, and it will be probed again when deferred
> > probing kicks in, which is futile.
> >
> > Fix this by explicitly removing the device from the deferred probe
> > pending list in case of probe failures.
> >
> > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > Signed-off-by: Geert Uytterhoeven 
>
> Good catch:
>
> Reviewed-by: Rafael J. Wysocki 

Geert,

The issue is real and needs to be fixed. But I'm confused how this can
happen. We won't even enter really_probe() if the driver isn't ready.
We also won't get to run the driver's .probe() if the suppliers aren't
ready. So how does the device get added to the deferred probe list
before the driver is ready? Is this due to device_links_driver_bound()
on the supplier?

Can you give a more detailed step by step on the case you are hitting?

Greg/Rafael,

Let's hold off picking this patch till I get to take a closer look
(within a day or two) please.

-Saravana

>
> > ---
> > Seen on various Renesas R-Car platforms, cfr.
> > https://lore.kernel.org/linux-acpi/camuhmdvl-1rkj5u-hdva4f4w_+8ygvqqujqbczmsdv4yxzz...@mail.gmail.com
> > ---
> >  drivers/base/dd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9179825ff646f4e3..91c4181093c43709 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct 
> > device_driver *drv)
> > case -ENXIO:
> > pr_debug("%s: probe of %s rejects match %d\n",
> >  drv->name, dev_name(dev), ret);
> > +   driver_deferred_probe_del(dev);
> > break;
> > default:
> > /* driver matched but the probe failed */
> > pr_warn("%s: probe of %s failed with error %d\n",
> > drv->name, dev_name(dev), ret);
> > +   driver_deferred_probe_del(dev);
> > }
> > /*
> >  * Ignore errors returned by ->probe so that the next driver can try
> > --
> > 2.25.1
> >


Re: [PATCH v2 2/2] of: property: Add fw_devlink support for interrupts

2021-02-15 Thread Saravana Kannan
On Sun, Feb 14, 2021 at 7:58 PM Guenter Roeck  wrote:
>
> On 2/14/21 1:12 PM, Saravana Kannan wrote:
> [ ... ]
> >
> > Can you please give me the following details:
> > * The DTS file for the board (not the SoC).
>
> The devicetree file extracted from the running system is attached.
> Hope it helps.

Hi Guenter,

Thanks for the DTS file and logs. That helps a lot.

Looking at the attachment and this line from the earlier email:
[   14.084606][   T11] pci 0005:01:00.0: probe deferral - wait for
supplier interrupt-controller@0

It's clear the PCI node is waiting on:
interrupt-controller@0 {
#address-cells = <0x00>;
device_type = "PowerPC-Interrupt-Source-Controller";
compatible = "ibm,opal-xive-vc\0IBM,opal-xics";
#interrupt-cells = <0x02>;
reg = <0x00 0x00 0x00 0x00>;
phandle = <0x804b>;
interrupt-controller;
};

If I grep for "ibm,opal-xive-vc", I see only one instance of it in the
code. And that eventually ends up getting called like this:
irq_find_matching_fwspec() -> xive_irq_domain_match() -> xive_native_match()

static bool xive_native_match(struct device_node *node)
{
return of_device_is_compatible(node, "ibm,opal-xive-vc");
}

However, when the IRQ domain are first registered, in xive_init_host()
the "np" passed in is NOT the same node that xive_native_match() would
match.
static void __init xive_init_host(struct device_node *np)
{
xive_irq_domain = irq_domain_add_nomap(np, XIVE_MAX_IRQ,
   _irq_domain_ops, NULL);
if (WARN_ON(xive_irq_domain == NULL))
return;
irq_set_default_host(xive_irq_domain);
}

Instead, the "np" here is:
interrupt-controller@603020318 {
ibm,xive-provision-page-size = <0x1>;
ibm,xive-eq-sizes = <0x0c 0x10 0x15 0x18>;
single-escalation-support;
ibm,xive-provision-chips = <0x00>;
ibm,xive-#priorities = <0x08>;
compatible = "ibm,opal-xive-pe\0ibm,opal-intc";
reg = <0x60302 0x318 0x00 0x1 0x60302
0x319 0x00 0x1 0x60302 0x31a 0x00 0x1 0x60302
0x31b 0x00 0x1>;
phandle = <0x8051>;
};

There are many ways to fix this, but I first want to make sure this is
a valid way to register irqdomains before trying to fix it. I just
find it weird that the node that's registered is unrelated (not a
parent/child) of the node that matches.

Marc,

Is this a valid way to register irqdomains? Just registering
interrupt-controller@603020318 DT node where there are multiple
interrupt controllers?

Thanks,
Saravana


Re: [PATCH v4 8/8] clk: Mark fwnodes when their clock provider is added/removed

2021-02-14 Thread Saravana Kannan
On Fri, Feb 12, 2021 at 4:39 PM Stephen Boyd  wrote:
>
> Quoting Saravana Kannan (2021-02-05 14:26:44)
> > This allows fw_devlink to recognize clock provider drivers that don't
> > use the device-driver model to initialize the device. fw_devlink will
> > use this information to make sure consumers of such clock providers
> > aren't indefinitely blocked from probing, waiting for the power domain
> > device to appear and bind to a driver.
>
> The "power domain" part of this commit text doesn't make any sense. Is
> it copy/pasted from some other patch? Should probably say "waiting for
> the clk providing device"?

Yeah, copy-pasta.

>
> >
> > Signed-off-by: Saravana Kannan 
> > ---
>
> Acked-by: Stephen Boyd 

Thanks,
Saravana


Re: [PATCH v2 2/2] of: property: Add fw_devlink support for interrupts

2021-02-14 Thread Saravana Kannan
On Sat, Feb 13, 2021 at 10:54 AM Guenter Roeck  wrote:
>
> Hi,
>
> On Thu, Jan 21, 2021 at 02:57:12PM -0800, Saravana Kannan wrote:
> > This allows fw_devlink to create device links between consumers of an
> > interrupt and the supplier of the interrupt.
> >
> > Cc: Marc Zyngier 
> > Cc: Kevin Hilman 
> > Cc: Greg Kroah-Hartman 
> > Reviewed-by: Rob Herring 
> > Reviewed-by: Thierry Reding 
> > Reviewed-by: Linus Walleij 
> > Signed-off-by: Saravana Kannan 
>
> This patch causes all ppc64:powernv qemu emulations to fail.
> The problem is always the same: The root file system can not be mounted.
>
> Example:
>
> [   14.245672][T1] VFS: Cannot open root device "sda" or 
> unknown-block(0,0): error -6
> [   14.246063][T1] Please append a correct "root=" boot option; here are 
> the available partitions:
> [   14.246609][T1] 1f00  131072 mtdblock0
> [   14.246648][T1]  (driver?)
> [   14.247137][T1] Kernel panic - not syncing: VFS: Unable to mount root 
> fs on unknown-block(0,0)
> [   14.247631][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.11.0-rc7-next-20210212 #1
> [   14.248166][T1] Call Trace:
> [   14.248344][T1] [c2c07a70] [c08f052c] 
> dump_stack+0x100/0x174 (unreliable)
> [   14.248780][T1] [c2c07ab0] [c010d0e0] panic+0x190/0x450
> [   14.249097][T1] [c2c07b50] [c14d1af8] 
> mount_block_root+0x320/0x430
> [   14.249442][T1] [c2c07c50] [c14d1e64] 
> prepare_namespace+0x1b0/0x204
> [   14.249798][T1] [c2c07cc0] [c14d1544] 
> kernel_init_freeable+0x3dc/0x438
> [   14.250145][T1] [c2c07da0] [c0012b7c] 
> kernel_init+0x2c/0x170
> [   14.250466][T1] [c2c07e10] [c000d56c] 
> ret_from_kernel_thread+0x5c/0x70
> [   28.068945385,5] OPAL: Reboot request...
>
> Another:
>
> [   14.273398][T1] md: Autodetecting RAID arrays.
> [   14.273665][T1] md: autorun ...
> [   14.273860][T1] md: ... autorun DONE.
> [   14.275078][T1] Waiting for root device /dev/mmcblk0...
>
> [ waits until terminated ]
>
> Key difference seems to be that PCI devices are no longer instantiated
> with this patch applied. Specifically, I see
>
> [1.153780][T1] pci 0005:01 : [PE# fd] Setting up window#0 
> 0..7fff pg=1^M
> [1.154475][T1] pci 0005:01 : [PE# fd] Enabling 64-bit DMA bypass^M
> [1.155749][T1] pci 0005:01:00.0: Adding to iommu group 0^M
> [1.160543][T1] pci 0005:00:00.0: enabling device (0105 -> 0107)^M
>
> in both cases, but (exmple nvme) I don't see
>
> [   13.520561][   T11] nvme nvme0: pci function 0005:01:00.0^M
> [   13.521747][   T45] nvme 0005:01:00.0: enabling device (0100 -> 0102)^M
>
> after this patch has been applied.
>
> Reverting th patch plus its fix resolves the problem.
>
> Bisect log attached.

Hi Guenter,

Thanks for the report.

Can you please give me the following details:
* The DTS file for the board (not the SoC).
* A boot log with the logs enabled in device_links_check_suppliers()
and device_link_add()

That should help me debug this.

Rob,

Looks like Guenter has this patch[1] too. What PPC specific IRQ hack
am I missing? Any ideas?

[1] - 
https://lore.kernel.org/lkml/20210209010439.3529036-1-sarava...@google.com/

Thanks,
Saravana


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-12 Thread Saravana Kannan
On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan  wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven  
> > wrote:
> > >   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > >
> > >   - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > > node OF_POPULATED after init") is no longer needed (but already
> > > queued for v5.12 anyway)
> >
> > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > it anymore, so maybe work it out with him? It's a balance between some
> > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
>
> Rob: should it be reverted?  For v5.13?
> I guess other similar "fixes" went in in the mean time.
>
> > >   - Some devices are reprobed, despite their drivers returning
> > > a real error code, and not -EPROBE_DEFER:
> >
> > Sorry, it's not obvious from the logs below where "reprobing" is
> > happening. Can you give more pointers please?
>
> My log was indeed not a full log, but just the reprobes happening.
> I'll send you a full log by private email.
>
> > Also, thinking more about this, the only way I could see this happen is:
> > 1. Device fails with error that's not -EPROBE_DEFER
> > 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
> > flag) where it's a consumer.
> > 3. The supplier probes and the device gets added to the deferred probe
> > list again.
> >
> > But I can't see how this sequence can happen. Device links are created
> > only when a device is added. And is the supplier isn't added yet, the
> > consumer wouldn't have probed in the first place.
>
> The full log doesn't show any evidence of the device being added
> to a list in between the two probes.
>
> > Other than "annoying waste of time" is this causing any other problems?
>
> Probably not.  But see below.
>
> > >   - The PCI reprobing leads to a memory leak, for which I've sent a 
> > > fix
> > > "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> > > 
> > > https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+rene...@glider.be/
> >
> > Wrt PCI reprobing,
> > 1. Is this PCI never expected to probe, but it's being reattempted
> > despite the NOT EPROBE_DEFER error? Or
>
> There is no PCIe card present, so the failure is expected.
> Later it is reprobed, which of course fails again.
>
> > 2. The PCI was deferred probe when it should have probed and then when
> > it's finally reattemped and it could succeed, we are hitting this mem
> > leak issue?
>
> I think the leak has always been there, but it was just exposed by
> this unneeded reprobe.  I don't think a reprobe after that specific
> error path had ever happened before.
>
> > I'm basically trying to distinguish between "this stuff should never
> > be retried" vs "this/it's suppliers got probe deferred with
> > fw_devlink=on vs but didn't get probe deferred with
> > fw_devlink=permissive and that's causing issues"
>
> There should not be a probe deferral, as no -EPROBE_DEFER was
> returned.
>
> > >   - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > /sys/kernel/debug/dmaengine/summary:
> > >
> > > -dma4chan0| e66d8000.i2c:tx
> > > -dma4chan1| e66d8000.i2c:rx
> > > -dma5chan0| e651.i2c:tx
> >
> > I think I need more context on the problem before I can try to fix it.
> > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > I2C was using DMA? If so, the next step is to see if the I2C relative
> > probe order with DMA is getting changed and if so, why.
>
> Yes, I plan to dig deeper to see what really happens...

Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes
it and you also don't see this issue with fw_devlink=permissive, then
it means there's probably some unnecessary probe deferral that we
should try to avoid. At least, that's my hunch right now.

Thanks,
Saravana

>
> > >   - On R-Mobile A1, I get a BUG and a memory leak:
> > >
> > > BUG: spinlock bad magic on CPU#0, swapper/1
>
> >
> > Hmm... I looked at this in bits and pieces throughout the day. At
> > least spent an hour looking at this. This doesn't make a lot of sense
> > to me. I don't even touch anything in this code path AFAICT.  Are
> > modules/kernel mixed up somehow? I need more info before I can help.
> > Does reverting my pm domain change make any difference (assume it
> > boots this far without it).
>
> I plan to dig deeper to see what really happens...
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: phy_attach_direct()'s use of device_bind_driver()

2021-02-11 Thread Saravana Kannan
On Thu, Feb 11, 2021 at 5:57 AM Andrew Lunn  wrote:
>
> > Yeah, I plan to fix this. So I have a few more questions. In the
> > example I gave, what should happen if the gpios listed in the phy's DT
> > node aren't ready yet?
>
> There are four different use cases for GPIO.
>
> 1) The GPIO is used to reset all devices on the MDIO bus. When the bus
> is registered with the core, the core will try to get this GPIO. If we
> get EPROBE_DEFER, the registration of the bus is deferred and tried
> again later. If the MAC driver tries to get the PHY device before the
> MDIO bus is enumerated, it should also get EPROBE_DEFER, and in the
> end everything should work.
>
> 2) The GPIO is for a specific PHY. Here we have an oddity in the
> code. If the PHY responds to bus enumeration, before we start doing
> anything with the reset GPIO, it will be discovered on the bus. At
> this point, we try to get the GPIO. If that fails with EPROBE_DEFER,
> all the PHYs on the bus are unregistered, and the bus registration
> process fails with EPROBE_DEFER.
>
> 3) The GPIO is for a specific PHY. However, the device does not
> respond to enumeration, because it is held in reset. You can get
> around this by placing the ID values into device tree. The bus is
> first enumerated in the normal way. And then devices which are listed
> in DT, but have not been found, and have ID registers are registered
> to the bus. This follows pretty much the same path as for a device
> which is discovered. Before the device is registered with the device
> core, we get the GPIOs, and handle the EPROBE_DEFER, unwinding
> everything.
>
> 4) The GPIO does not use the normal name in DT. Or the PHY has some
> other resource, which phylib does nothing with. The driver specific to
> the hardware has code to handle the resource. It should try to get
> those resources during probe. If probe returns EPROBE_DEFER, the probe
> will be retried later. And when the MAC driver tries to find the PHY,
> it should also get EPROBE_DEFER.
>
> In case 4, the fallback driver has no idea about these PHY devices
> specific properties. They are not part of 802.3 clause 22. So it will
> ignore them. Probably the PHY will not work, because it is missing a
> reset, or a clock, or a regulator. But we don't really care about
> that. In order that the DT was accepted into the kernel, there must be
> a device specific driver which uses those properties. So the kernel
> installation is broken, that hardware specific driver is missing.

Thanks! I don't know anything about mdio (other than the generic bus
stuff) or "MAC driver" (except for "MAC address"). So  I had to read
this multiple times and I think I finally got it at a high level. So,
to summarize it and ignoring case 4, the phy device would never get
added to driver core before all it's required resources are available
just because of how it's part of an ethernet controller/mdio bus. So
by the time we force bind a PHY to the generic driver, all the
required resources should already be set up and work with the generic
driver.

So the plan to fix this warning is, when device_bind_driver() is called:
1. Delete all device links from the device (in this case, the PHY) to
suppliers that haven't probed yet because there's no probe function
that can defer at this point.
2. Then call the usual device link status update code so that it
updates the status of the remaining device links correctly. This will
avoid the warning.

This seems like a generic solution that works for PHY and for any
device that is force bound.

Thanks for the help!

-Saravana


Re: [PATCH v1 0/5] Enable fw_devlink=on by default

2021-02-11 Thread Saravana Kannan
On Thu, Feb 11, 2021 at 9:48 AM Rafael J. Wysocki  wrote:
>
> On Thu, Feb 11, 2021 at 6:15 PM Saravana Kannan  wrote:
> >
> > On Thu, Feb 11, 2021 at 7:03 AM Rafael J. Wysocki  wrote:
> > >
> > > On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan  
> > > wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter  wrote:
> > > > >
> > > > >
> > > > > On 14/01/2021 16:56, Jon Hunter wrote:
> > > > > >
> > > > > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> Yes this is the warning shown here [0] and this is coming from
> > > > > >>> the 'Generic PHY stmmac-0:00' device.
> > > > > >>
> > > > > >> Can you print the supplier and consumer device when this warning is
> > > > > >> happening and let me know? That'd help too. I'm guessing the phy is
> > > > > >> the consumer.
> > > > > >
> > > > > >
> > > > > > Sorry I should have included that. I added a print to dump this on
> > > > > > another build but failed to include here.
> > > > > >
> > > > > > WARNING KERN Generic PHY stmmac-0:00: supplier 220.gpio (status 
> > > > > > 1)
> > > > > >
> > > > > > The status is the link->status and looks like the supplier is the
> > > > > > gpio controller. I have verified that the gpio controller is probed
> > > > > > before this successfully.
> > > > > >
> > > > > >> So the warning itself isn't a problem -- it's not breaking 
> > > > > >> anything or
> > > > > >> leaking memory or anything like that. But the device link is 
> > > > > >> jumping
> > > > > >> states in an incorrect manner. With enough context of this code 
> > > > > >> (why
> > > > > >> the device_bind_driver() is being called directly instead of going
> > > > > >> through the normal probe path), it should be easy to fix (I'll just
> > > > > >> need to fix up the device link state).
> > > > > >
> > > > > > Correct, the board seems to boot fine, we just get this warning.
> > > > >
> > > > >
> > > > > Have you had chance to look at this further?
> > > >
> > > > Hi Jon,
> > > >
> > > > I finally got around to looking into this. Here's the email[1] that
> > > > describes why it's done this way.
> > > >
> > > > [1] - https://lore.kernel.org/lkml/ycrjmpkjk0pxk...@lunn.ch/
> > > >
> > > > >
> > > > > The following does appear to avoid the warning, but I am not sure if
> > > > > this is the correct thing to do ...
> > > > >
> > > > > index 9179825ff646..095aba84f7c2 100644
> > > > > --- a/drivers/base/dd.c
> > > > > +++ b/drivers/base/dd.c
> > > > > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > > > >  {
> > > > > int ret;
> > > > >
> > > > > +   ret = device_links_check_suppliers(dev);
> > > > > +   if (ret)
> > > > > +   return ret;
> > > > > +
> > > > > ret = driver_sysfs_add(dev);
> > > > > if (!ret)
> > > > > driver_bound(dev);
> > > >
> > > > So digging deeper into the usage of device_bind_driver and looking at
> > > > [1], it doesn't look like returning an error here is a good option.
> > > > When device_bind_driver() is called, the driver's probe function isn't
> > > > even called. So, there's no way for the driver to even defer probing
> > > > based on any of the suppliers. So, we have a couple of options:
> > > >
> > > > 1. Delete all the links to suppliers that haven't bound.
> > >
> > > Or maybe convert them to stateless links?  Would that be doable at all?
> >
> > Yeah, I think it should be doable.
> >
> > >
> > > > We'll still leave the links to active suppliers alone in case it helps 
> > > > with
> > > > suspend/resume correctness.
> > > > 2. Fix the warn

Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-11 Thread Saravana Kannan
On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven  wrote:
>
> Hi Saravana,
>
> On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan  wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
> >
> > Thanks,
> > Saravana
> >
> > Saravana Kannan (8):
> >   driver core: fw_devlink: Detect supplier devices that will never be
> > added
> >   of: property: Don't add links to absent suppliers
> >   driver core: Add fw_devlink.strict kernel param
> >   of: property: Add fw_devlink support for optional properties
> >   driver core: fw_devlink: Handle suppliers that don't use driver core
> >   irqdomain: Mark fwnodes when their irqdomain is added/removed
> >   PM: domains: Mark fwnodes when their powerdomain is added/removed
> >   clk: Mark fwnodes when their clock provider is added/removed
>
> Thanks for your series, which is now part of driver-core-next.
> I gave driver-core-next + [1] a try on various Renesas boards.

Thanks!

> Test results are below.
> In general, the result looks much better than before.

Ah, good to hear this.

> [1] - 
> https://lore.kernel.org/lkml/20210210114435.122242-1-tudor.amba...@microchip.com/
>
>   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
>
>   - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> node OF_POPULATED after init") is no longer needed (but already
> queued for v5.12 anyway)

Rob doesn't like the proliferation of OF_POPULATED and we don't need
it anymore, so maybe work it out with him? It's a balance between some
wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

>   - Some devices are reprobed, despite their drivers returning
> a real error code, and not -EPROBE_DEFER:

Sorry, it's not obvious from the logs below where "reprobing" is
happening. Can you give more pointers please?

Also, thinking more about this, the only way I could see this happen is:
1. Device fails with error that's not -EPROBE_DEFER
2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
flag) where it's a consumer.
3. The supplier probes and the device gets added to the deferred probe
list again.

But I can't see how this sequence can happen. Device links are created
only when a device is added. And is the supplier isn't added yet, the
consumer wouldn't have probed in the first place.

Other than "annoying waste of time" is this causing any other problems?

> renesas_wdt e602.watchdog: Watchdog blacklisted on r8a7791 
> ES1.*
> (rwdt_probe() returns -ENODEV)
>
> sh-pfc e606.pinctrl: pin GP_7_23 already requested by
> ee09.pci; cannot claim for e659.usb
> sh-pfc e606.pinctrl: pin-247 (e659.usb) status -22
> sh-pfc e606.pinctrl: could not request pin 247
> (GP_7_23) from group usb0  on device sh-pfc
> renesas_usbhs e659.usb: Error applying setting,
> reverse things back
> renesas_usbhs: probe of e659.usb failed with error -22
>
> rcar-pcie fe00.pcie: host bridge /soc/pcie@fe00 ranges:
> rcar-pcie fe00.pcie:   IO
> 0x00fe10..0x00fe1f -> 0x00
> rcar-pcie fe00.pcie:  MEM
> 0x00fe20..0x00fe3f -> 0x00fe20
> rcar-pcie fe00.pcie:  MEM
> 0x003000..0x0037ff -> 0x003000
> rcar-pcie fe00.pcie:  MEM
> 0x003800..0x003fff -> 0x003800
> rcar-pcie fe00.pcie:   IB MEM
> 0x004000..0x00bfff -> 0x004000
> rcar-pcie fe00.pcie:   IB MEM
> 0x02..0x02 -> 0x02
> rcar-pcie fe00.pcie: PCIe link down
> (rcar_pcie_probe() returns -ENODEV)
>
> xhci-hcd ee00.usb: 

Re: [PATCH v1 0/5] Enable fw_devlink=on by default

2021-02-11 Thread Saravana Kannan
On Thu, Feb 11, 2021 at 7:03 AM Rafael J. Wysocki  wrote:
>
> On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan  wrote:
> >
> > On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter  wrote:
> > >
> > >
> > > On 14/01/2021 16:56, Jon Hunter wrote:
> > > >
> > > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > > >
> > > > ...
> > > >
> > > >>> Yes this is the warning shown here [0] and this is coming from
> > > >>> the 'Generic PHY stmmac-0:00' device.
> > > >>
> > > >> Can you print the supplier and consumer device when this warning is
> > > >> happening and let me know? That'd help too. I'm guessing the phy is
> > > >> the consumer.
> > > >
> > > >
> > > > Sorry I should have included that. I added a print to dump this on
> > > > another build but failed to include here.
> > > >
> > > > WARNING KERN Generic PHY stmmac-0:00: supplier 220.gpio (status 1)
> > > >
> > > > The status is the link->status and looks like the supplier is the
> > > > gpio controller. I have verified that the gpio controller is probed
> > > > before this successfully.
> > > >
> > > >> So the warning itself isn't a problem -- it's not breaking anything or
> > > >> leaking memory or anything like that. But the device link is jumping
> > > >> states in an incorrect manner. With enough context of this code (why
> > > >> the device_bind_driver() is being called directly instead of going
> > > >> through the normal probe path), it should be easy to fix (I'll just
> > > >> need to fix up the device link state).
> > > >
> > > > Correct, the board seems to boot fine, we just get this warning.
> > >
> > >
> > > Have you had chance to look at this further?
> >
> > Hi Jon,
> >
> > I finally got around to looking into this. Here's the email[1] that
> > describes why it's done this way.
> >
> > [1] - https://lore.kernel.org/lkml/ycrjmpkjk0pxk...@lunn.ch/
> >
> > >
> > > The following does appear to avoid the warning, but I am not sure if
> > > this is the correct thing to do ...
> > >
> > > index 9179825ff646..095aba84f7c2 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > >  {
> > > int ret;
> > >
> > > +   ret = device_links_check_suppliers(dev);
> > > +   if (ret)
> > > +   return ret;
> > > +
> > > ret = driver_sysfs_add(dev);
> > > if (!ret)
> > > driver_bound(dev);
> >
> > So digging deeper into the usage of device_bind_driver and looking at
> > [1], it doesn't look like returning an error here is a good option.
> > When device_bind_driver() is called, the driver's probe function isn't
> > even called. So, there's no way for the driver to even defer probing
> > based on any of the suppliers. So, we have a couple of options:
> >
> > 1. Delete all the links to suppliers that haven't bound.
>
> Or maybe convert them to stateless links?  Would that be doable at all?

Yeah, I think it should be doable.

>
> > We'll still leave the links to active suppliers alone in case it helps with
> > suspend/resume correctness.
> > 2. Fix the warning to not warn on suppliers that haven't probed if the
> > device's driver has no probe function. But this will also need fixing
> > up the cleanup part when device_release_driver() is called. Also, I'm
> > not sure if device_bind_driver() is ever called when the driver
> > actually has a probe() function.
> >
> > Rafael,
> >
> > Option 1 above is pretty straightforward.
>
> I would prefer this ->

Ok

>
> > Option 2 would look something like what's at the end of this email +
> > caveat about whether the probe check is sufficient.
>
> -> because "fix the warning" really means that we haven't got the
> device link state machine right and getting it right may imply a major
> redesign.
>
> Overall, I'd prefer to take a step back and allow things to stabilize
> for a while to let people catch up with this.

Are you referring to if/when we implement Option 2? Or do you want to
step back for a while even before implementing Option 1?


-Saravana

>
> > Do you have a preference between Option 1 vs 2? Or do you have

Re: phy_attach_direct()'s use of device_bind_driver()

2021-02-11 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 11:31 PM Heiner Kallweit  wrote:
>
> On 11.02.2021 00:29, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn  wrote:
> >>
> >> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> >>> Hi,
> >>>
> >>> This email was triggered by this other email[1].
> >>>
> >>> Why is phy_attach_direct() directly calling device_bind_driver()
> >>> instead of using bus_probe_device()?
> >>
> >> Hi Saravana
> >>
> >> So this is to do with the generic PHY, which is a special case.
> >>
> >> First the normal case. The MDIO bus driver registers an MDIO bus using
> >> mdiobus_register(). This will enumerate the bus, finding PHYs on
> >> it. Each PHY device is registered with the device core, using the
> >> usual device_add(). The core will go through the registered PHY
> >> drivers and see if one can drive this hardware, based on the ID
> >> registers the PHY has at address 2 and 3. If a match is found, the
> >> driver probes the device, all in the usual way.
> >>
> >> Sometime later, the MAC driver wants to make use of the PHY
> >> device. This is often in the open() call of the MAC driver, when the
> >> interface is configured up. The MAC driver asks phylib to associate a
> >> PHY devices to the MAC device. In the normal case, the PHY has been
> >> probed, and everything is good to go.
> >>
> >> However, sometimes, there is no driver for the PHY. There is no driver
> >> for that hardware. Or the driver has not been built, or it is not on
> >> the disk, etc. So the device core has not been able to probe
> >> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
> >> PHY should support. And most PHY devices have this minimum. So there
> >> is a fall back driver, the generic PHY driver. It assumes the minimum
> >> registers are available, and does its best to drive the hardware. It
> >> often works, but not always. So if the MAC asks phylib to connect to a
> >> PHY which does not have a driver, we forcefully bind the generic
> >> driver to the device, and hope for the best.
> >
> > Thanks for the detailed answer Andrew! I think it gives me enough
> > info/context to come up with a proper fix.
> >
> >> We don't actually recommend using the generic driver. Use the specific
> >> driver for the hardware. But the generic driver can at least get you
> >> going, allow you to scp the correct driver onto the system, etc.
> >
> > I'm not sure if I can control what driver they use. If I can fix this
> > warning, I'll probably try to do that.
> >
> The genphy driver is a last resort, at least they lose functionality like
> downshift detection and control. Therefore they should go with the
> dedicated Marvell PHY driver.
>
> But right, this avoids the warning, but the underlying issue (probably
> in device_bind_driver()) still exists. Would be good if you can fix it.

Yeah, I plan to fix this. So I have a few more questions. In the
example I gave, what should happen if the gpios listed in the phy's DT
node aren't ready yet? The generic phy driver itself probably isn't
using any GPIO? But will the phy work without the GPIO hardware being
initialized? The reason I'm asking this question is, if the phy is
linked to a supplier and the supplier is not ready, should the
device_bind_driver() succeed or not?

-Saravana


Re: [PATCH v1 0/5] Enable fw_devlink=on by default

2021-02-10 Thread Saravana Kannan
On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter  wrote:
>
>
> On 14/01/2021 16:56, Jon Hunter wrote:
> >
> > On 14/01/2021 16:47, Saravana Kannan wrote:
> >
> > ...
> >
> >>> Yes this is the warning shown here [0] and this is coming from
> >>> the 'Generic PHY stmmac-0:00' device.
> >>
> >> Can you print the supplier and consumer device when this warning is
> >> happening and let me know? That'd help too. I'm guessing the phy is
> >> the consumer.
> >
> >
> > Sorry I should have included that. I added a print to dump this on
> > another build but failed to include here.
> >
> > WARNING KERN Generic PHY stmmac-0:00: supplier 220.gpio (status 1)
> >
> > The status is the link->status and looks like the supplier is the
> > gpio controller. I have verified that the gpio controller is probed
> > before this successfully.
> >
> >> So the warning itself isn't a problem -- it's not breaking anything or
> >> leaking memory or anything like that. But the device link is jumping
> >> states in an incorrect manner. With enough context of this code (why
> >> the device_bind_driver() is being called directly instead of going
> >> through the normal probe path), it should be easy to fix (I'll just
> >> need to fix up the device link state).
> >
> > Correct, the board seems to boot fine, we just get this warning.
>
>
> Have you had chance to look at this further?

Hi Jon,

I finally got around to looking into this. Here's the email[1] that
describes why it's done this way.

[1] - https://lore.kernel.org/lkml/ycrjmpkjk0pxk...@lunn.ch/

>
> The following does appear to avoid the warning, but I am not sure if
> this is the correct thing to do ...
>
> index 9179825ff646..095aba84f7c2 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
>  {
> int ret;
>
> +   ret = device_links_check_suppliers(dev);
> +   if (ret)
> +   return ret;
> +
> ret = driver_sysfs_add(dev);
> if (!ret)
> driver_bound(dev);

So digging deeper into the usage of device_bind_driver and looking at
[1], it doesn't look like returning an error here is a good option.
When device_bind_driver() is called, the driver's probe function isn't
even called. So, there's no way for the driver to even defer probing
based on any of the suppliers. So, we have a couple of options:

1. Delete all the links to suppliers that haven't bound. We'll still
leave the links to active suppliers alone in case it helps with
suspend/resume correctness.
2. Fix the warning to not warn on suppliers that haven't probed if the
device's driver has no probe function. But this will also need fixing
up the cleanup part when device_release_driver() is called. Also, I'm
not sure if device_bind_driver() is ever called when the driver
actually has a probe() function.

Rafael,

Option 1 above is pretty straightforward.
Option 2 would look something like what's at the end of this email +
caveat about whether the probe check is sufficient.

Do you have a preference between Option 1 vs 2? Or do you have some
other option in mind?

Thanks,
Saravana

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5481b6940a02..8102b3c48bbc 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1247,7 +1247,8 @@ void device_links_driver_bound(struct device *dev)
 */
device_link_drop_managed(link);
} else {
-   WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
+   WARN_ON(link->status != DL_STATE_CONSUMER_PROBE &&
+   dev->driver->probe);
WRITE_ONCE(link->status, DL_STATE_ACTIVE);
}

@@ -1302,7 +1303,8 @@ static void __device_links_no_driver(struct device *dev)
if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {
WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
} else {
-   WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY));
+   WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY) &&
+   dev->driver->probe);
WRITE_ONCE(link->status, DL_STATE_DORMANT);
}
}


Re: phy_attach_direct()'s use of device_bind_driver()

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn  wrote:
>
> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> > Hi,
> >
> > This email was triggered by this other email[1].
> >
> > Why is phy_attach_direct() directly calling device_bind_driver()
> > instead of using bus_probe_device()?
>
> Hi Saravana
>
> So this is to do with the generic PHY, which is a special case.
>
> First the normal case. The MDIO bus driver registers an MDIO bus using
> mdiobus_register(). This will enumerate the bus, finding PHYs on
> it. Each PHY device is registered with the device core, using the
> usual device_add(). The core will go through the registered PHY
> drivers and see if one can drive this hardware, based on the ID
> registers the PHY has at address 2 and 3. If a match is found, the
> driver probes the device, all in the usual way.
>
> Sometime later, the MAC driver wants to make use of the PHY
> device. This is often in the open() call of the MAC driver, when the
> interface is configured up. The MAC driver asks phylib to associate a
> PHY devices to the MAC device. In the normal case, the PHY has been
> probed, and everything is good to go.
>
> However, sometimes, there is no driver for the PHY. There is no driver
> for that hardware. Or the driver has not been built, or it is not on
> the disk, etc. So the device core has not been able to probe
> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
> PHY should support. And most PHY devices have this minimum. So there
> is a fall back driver, the generic PHY driver. It assumes the minimum
> registers are available, and does its best to drive the hardware. It
> often works, but not always. So if the MAC asks phylib to connect to a
> PHY which does not have a driver, we forcefully bind the generic
> driver to the device, and hope for the best.

Thanks for the detailed answer Andrew! I think it gives me enough
info/context to come up with a proper fix.

> We don't actually recommend using the generic driver. Use the specific
> driver for the hardware. But the generic driver can at least get you
> going, allow you to scp the correct driver onto the system, etc.

I'm not sure if I can control what driver they use. If I can fix this
warning, I'll probably try to do that.

-Saravana


phy_attach_direct()'s use of device_bind_driver()

2021-02-10 Thread Saravana Kannan
Hi,

This email was triggered by this other email[1].

Why is phy_attach_direct() directly calling device_bind_driver()
instead of using bus_probe_device()? I'm asking because this is
causing device links status to not get updated correctly and causes
this[2] warning.

We can fix the device links issue with something like this[3], but
want to understand the reason for the current implementation of
phy_attach_direct() before we go ahead and put in that fix.

Thanks,
Saravana

[1] - 
https://lore.kernel.org/lkml/e11bc6a2-ec9d-ea3b-71f7-13c9f764b...@nvidia.com/#t
[2] - 
https://lore.kernel.org/lkml/56f7d032-ba5a-a8c7-23de-2969d98c5...@nvidia.com/
[3] - 
https://lore.kernel.org/lkml/6a43e209-1d2d-b10a-4564-0289d5413...@nvidia.com/


Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck  wrote:
>
> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck  wrote:
> >>
> >> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> >>> Cyclic dependencies in some firmware was one of the last remaining
> >>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>> dependencies don't block probing, set fw_devlink=on by default.
> >>>
> >>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>> only for systems with device tree firmware):
> >>> * Significantly cuts down deferred probes.
> >>> * Device probe is effectively attempted in graph order.
> >>> * Makes it much easier to load drivers as modules without having to
> >>>   worry about functional dependencies between modules (depmod is still
> >>>   needed for symbol dependencies).
> >>>
> >>> If this patch prevents some devices from probing, it's very likely due
> >>> to the system having one or more device drivers that "probe"/set up a
> >>> device (DT node with compatible property) without creating a struct
> >>> device for it.  If we hit such cases, the device drivers need to be
> >>> fixed so that they populate struct devices and probe them like normal
> >>> device drivers so that the driver core is aware of the devices and their
> >>> status. See [1] for an example of such a case.
> >>>
> >>> [1] - 
> >>> https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/
> >>> Signed-off-by: Saravana Kannan 
> >>
> >> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> >> trying to reboot. Reverting this patch fixes the problem. Bisect log
> >> is attached.
> >
> > Thanks for the report Guenter. Can you please try this series?
> > https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/
> >
>
> Not this week. I have lots of reviews to complete before the end of the week,
> with the 5.12 commit window coming up.

Ok. By next week, all the fixes should be in linux-next too. So it
should be easier if you choose to test.

> Given the number of problems observed, I personally think that it is way
> too early for this patch. We'll have no end of problems if it is applied
> to the upstream kernel in the next commit window. Of course, that is just
> my personal opinion.

You had said "with 115 of 430 boot tests failing in -next" earlier.
Just to be sure I understand it right, you are not saying this patch
caused them all right? You are just saying that 115 general boot
failures that might mask fw_devlink issues in some of them, right?

Thanks,
Saravana


Re: linux-next: build failure after merge of the driver-core tree

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 12:15 PM Rob Herring  wrote:
>
> On Wed, Feb 10, 2021 at 1:17 PM Saravana Kannan  wrote:
> >
> > On Wed, Feb 10, 2021 at 11:06 AM Saravana Kannan  
> > wrote:
> > >
> > > On Wed, Feb 10, 2021 at 10:18 AM Greg KH  wrote:
> > > >
> > > > On Wed, Feb 10, 2021 at 09:47:20PM +1100, Stephen Rothwell wrote:
> > > > > Hi all,
> > > > >
> > > > > After merging the driver-core tree, today's linux-next build (sparc64
> > > > > defconfig) failed like this:
> > > > >
> > > > > drivers/of/property.o: In function `parse_interrupts':
> > > > > property.c:(.text+0x14e0): undefined reference to `of_irq_parse_one'
> > > > >
> > > > > Caused by commit
> > > > >
> > > > >   f265f06af194 ("of: property: Fix fw_devlink handling of 
> > > > > interrupts/interrupts-extended")
> > > > >
> > > > > CONFIG_OF_IRQ depends on !SPARC so of_irq_parse_one() needs a stub.
>
> It's always Sparc!
>
> > > > > I have added the following patch for today.
> > > > >
> > > > > From: Stephen Rothwell 
> > > > > Date: Wed, 10 Feb 2021 21:27:56 +1100
> > > > > Subject: [PATCH] of: irq: make a stub for of_irq_parse_one()
> > > > >
> > > > > Signed-off-by: Stephen Rothwell 
> > > > > ---
> > > > >  include/linux/of_irq.h | 9 +++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > Thanks Stephen!
> >
> > Actually the stub needs to return an error. 0 indicates it found the 
> > interrupt.
>
> I have a slight preference if you could add an 'if
> (!IS_ENABLED(CONFIG_OF_IRQ))' at the caller instead.
>
> If you grep of_irq_parse_one, you'll see there's only a few users
> which means it's on my hit list to make it private. Stub functions
> give the impression 'use everywhere'.

I already sent out a fix :(

Will that check optimize out the code and not cause build errors? If
so, I can send out a patch later.

-Saravana


[PATCH] of: irq: Fix the return value for of_irq_parse_one() stub

2021-02-10 Thread Saravana Kannan
When commit 1852ebd13542 ("of: irq: make a stub for of_irq_parse_one()")
added a stub for of_irq_parse_one() it set the return value to 0. Return
value of 0 in this instance means the call succeeded and the out_irq
pointer was filled with valid data. So, fix it to return an error value.

Fixes: 1852ebd13542 ("of: irq: make a stub for of_irq_parse_one()")
Signed-off-by: Saravana Kannan 
---

This needs to go into driver-core.

-Saravana

 include/linux/of_irq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h
index f898d838d201..aaf219bd0354 100644
--- a/include/linux/of_irq.h
+++ b/include/linux/of_irq.h
@@ -60,7 +60,7 @@ u32 of_msi_map_id(struct device *dev, struct device_node 
*msi_np, u32 id_in);
 static inline int of_irq_parse_one(struct device_node *device, int index,
   struct of_phandle_args *out_irq)
 {
-   return 0;
+   return -EINVAL;
 }
 static inline int of_irq_count(struct device_node *dev)
 {
-- 
2.30.0.478.g8a0d178c01-goog



Re: linux-next: build failure after merge of the driver-core tree

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 11:06 AM Saravana Kannan  wrote:
>
> On Wed, Feb 10, 2021 at 10:18 AM Greg KH  wrote:
> >
> > On Wed, Feb 10, 2021 at 09:47:20PM +1100, Stephen Rothwell wrote:
> > > Hi all,
> > >
> > > After merging the driver-core tree, today's linux-next build (sparc64
> > > defconfig) failed like this:
> > >
> > > drivers/of/property.o: In function `parse_interrupts':
> > > property.c:(.text+0x14e0): undefined reference to `of_irq_parse_one'
> > >
> > > Caused by commit
> > >
> > >   f265f06af194 ("of: property: Fix fw_devlink handling of 
> > > interrupts/interrupts-extended")
> > >
> > > CONFIG_OF_IRQ depends on !SPARC so of_irq_parse_one() needs a stub.
> > > I have added the following patch for today.
> > >
> > > From: Stephen Rothwell 
> > > Date: Wed, 10 Feb 2021 21:27:56 +1100
> > > Subject: [PATCH] of: irq: make a stub for of_irq_parse_one()
> > >
> > > Signed-off-by: Stephen Rothwell 
> > > ---
> > >  include/linux/of_irq.h | 9 +++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Thanks Stephen!

Actually the stub needs to return an error. 0 indicates it found the interrupt.

-Saravana


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 2:02 AM  wrote:
>
> On 2/10/21 10:54 AM, Saravana Kannan wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > On Wed, Feb 10, 2021 at 12:19 AM  wrote:
> >>
> >> Hi, Saravana,
> >>
> >> On 2/6/21 12:26 AM, Saravana Kannan wrote:
> >>> There are a lot of devices/drivers where they never have a struct device
> >>> created for them or the driver initializes the hardware without ever
> >>> binding to the struct device.
> >>>
> >>> This series is intended to avoid any boot regressions due to such
> >>> devices/drivers when fw_devlink=on and also address the handling of
> >>> optional suppliers.
> >>>
> >>> Patch 1 and 2 addresses the issue of firmware nodes that look like
> >>> they'll have struct devices created for them, but will never actually
> >>> have struct devices added for them. For example, DT nodes with a
> >>> compatible property that don't have devices added for them.
> >>>
> >>> Patch 3 and 4 allow for handling optional DT bindings.
> >>>
> >>> Patch 5 sets up a generic API to handle drivers that never bind with
> >>> their devices.
> >>>
> >>> Patch 6 through 8 update different frameworks to use the new API.
> >>>
> >>> Thanks,
> >>> Saravana
> >>>
> >>> Saravana Kannan (8):
> >>>   driver core: fw_devlink: Detect supplier devices that will never be
> >>> added
> >>>   of: property: Don't add links to absent suppliers
> >>>   driver core: Add fw_devlink.strict kernel param
> >>>   of: property: Add fw_devlink support for optional properties
> >>>   driver core: fw_devlink: Handle suppliers that don't use driver core
> >>>   irqdomain: Mark fwnodes when their irqdomain is added/removed
> >>>   PM: domains: Mark fwnodes when their powerdomain is added/removed
> >>>   clk: Mark fwnodes when their clock provider is added/removed
> >>>
> >>>  .../admin-guide/kernel-parameters.txt |  5 ++
> >>>  drivers/base/core.c   | 58 ++-
> >>>  drivers/base/power/domain.c   |  2 +
> >>>  drivers/clk/clk.c |  3 +
> >>>  drivers/of/property.c | 16 +++--
> >>>  include/linux/fwnode.h| 20 ++-
> >>>  kernel/irq/irqdomain.c|  2 +
> >>>  7 files changed, 98 insertions(+), 8 deletions(-)
> >>>
> >>
> >> Even with this patch set applied, sama5d2_xplained can not boot.
> >> Patch at [1] makes sama5d2_xplained boot again. Stephen applied it
> >> to clk-next.
> >
> > I'm glad you won't actually have any boot issues in 5.12, but the fact
> > you need [1] with this series doesn't make a lot of sense to me
> > because:
> >
> > 1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode
> > in question way before any consumer devices are added.
>
> Looks like in my case FWNODE_FLAG_INITIALIZED is not set, because
> drivers/clk/at91/sama5d2.c uses of_clk_add_hw_provider().

Ah, that explains it.

> > 2. Any consumer device added after (1) will stop trying to link to the
> > clock device.
> >
> > Are you somehow adding a consumer to the clock fwnode before (1)?
> >
> > Can you try this patch without your clk fix? I was trying to avoid
> > looping through a list, but looks like your case might somehow need
> > it?
> >
>
> I tried it, didn't solve my boot problem.

Thanks! I should stop coding past midnight!

> The following patch makes the
> sama5d2_xplained boot again, even without the patch from [1]:

Great! I gave a reviewed-by.

-Saravana


Re: [PATCH] clk: Mark fwnodes when their clock provider is added

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 3:44 AM Tudor Ambarus
 wrote:
>
> This is a follow-up for:
> commit 3c9ea42802a1 ("clk: Mark fwnodes when their clock provider is 
> added/removed")
>
> The above commit updated the deprecated of_clk_add_provider(),
> but missed to update the preferred of_clk_add_hw_provider().
> Update it now.

Thanks Tudor! Good catch!

I checked to make sure the deregistration path undoes this one. So, it
looks good to me.

Reviewed-by: Saravana Kannan 

-Saravana

>
> Signed-off-by: Tudor Ambarus 
> ---
>  drivers/clk/clk.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 27ff90eacb1f..9370e4dfecae 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -4594,6 +4594,8 @@ int of_clk_add_hw_provider(struct device_node *np,
> if (ret < 0)
> of_clk_del_provider(np);
>
> +   fwnode_dev_initialized(>fwnode, true);
> +
> return ret;
>  }
>  EXPORT_SYMBOL_GPL(of_clk_add_hw_provider);
> --
> 2.25.1
>


Re: linux-next: build failure after merge of the driver-core tree

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 10:18 AM Greg KH  wrote:
>
> On Wed, Feb 10, 2021 at 09:47:20PM +1100, Stephen Rothwell wrote:
> > Hi all,
> >
> > After merging the driver-core tree, today's linux-next build (sparc64
> > defconfig) failed like this:
> >
> > drivers/of/property.o: In function `parse_interrupts':
> > property.c:(.text+0x14e0): undefined reference to `of_irq_parse_one'
> >
> > Caused by commit
> >
> >   f265f06af194 ("of: property: Fix fw_devlink handling of 
> > interrupts/interrupts-extended")
> >
> > CONFIG_OF_IRQ depends on !SPARC so of_irq_parse_one() needs a stub.
> > I have added the following patch for today.
> >
> > From: Stephen Rothwell 
> > Date: Wed, 10 Feb 2021 21:27:56 +1100
> > Subject: [PATCH] of: irq: make a stub for of_irq_parse_one()
> >
> > Signed-off-by: Stephen Rothwell 
> > ---
> >  include/linux/of_irq.h | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)

Thanks Stephen!

-Saravana

>
> Thanks for this, I'll go queue it up now.
>
> greg k-h


Re: [PATCH] clk: at91: Fix the declaration of the clocks

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 12:51 AM Geert Uytterhoeven
 wrote:
>
> Hi Saravana,
>
> On Wed, Feb 10, 2021 at 1:57 AM Saravana Kannan  wrote:
> > On Tue, Feb 9, 2021 at 4:54 PM Stephen Boyd  wrote:
> > > Quoting tudor.amba...@microchip.com (2021-02-08 01:49:45)
> > > > Do you plan to take this patch for v5.12?
> > > > If fw_devlink will remain set to ON for v5.12, some of our boards will
> > > > no longer boot without this patch.
> > >
> > > Is fw_devlink defaulted to on for v5.12?
> >
> > Yes.
>
> Have all issues been identified and understood?
> Have all issues been fixed, reviewed, and committed?
> Have all fixes entered linux-next?
> Have all fixes been migrated from submaintainers to maintainers?

I'm hoping Tudor has reported and the fixes that have gone in so far
addressed all his issues. Otherwise, they need to be reported so we
can fix them.

As of now, there's no pending fix that hasn't landed in maintainer
trees. So that's good.

-Saravana

>
> We're already at v5.11-rc7.
> Yes, we can get fixes into v5.12-rc7. Or v5.12-rc9...
>
> Gr{oetje,eeting}s,
>
> Geert
>
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

2021-02-10 Thread Saravana Kannan
On Wed, Feb 10, 2021 at 12:19 AM  wrote:
>
> Hi, Saravana,
>
> On 2/6/21 12:26 AM, Saravana Kannan wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
> >
> > Thanks,
> > Saravana
> >
> > Saravana Kannan (8):
> >   driver core: fw_devlink: Detect supplier devices that will never be
> > added
> >   of: property: Don't add links to absent suppliers
> >   driver core: Add fw_devlink.strict kernel param
> >   of: property: Add fw_devlink support for optional properties
> >   driver core: fw_devlink: Handle suppliers that don't use driver core
> >   irqdomain: Mark fwnodes when their irqdomain is added/removed
> >   PM: domains: Mark fwnodes when their powerdomain is added/removed
> >   clk: Mark fwnodes when their clock provider is added/removed
> >
> >  .../admin-guide/kernel-parameters.txt |  5 ++
> >  drivers/base/core.c   | 58 ++-
> >  drivers/base/power/domain.c   |  2 +
> >  drivers/clk/clk.c |  3 +
> >  drivers/of/property.c | 16 +++--
> >  include/linux/fwnode.h| 20 ++-
> >  kernel/irq/irqdomain.c|  2 +
> >  7 files changed, 98 insertions(+), 8 deletions(-)
> >
>
> Even with this patch set applied, sama5d2_xplained can not boot.
> Patch at [1] makes sama5d2_xplained boot again. Stephen applied it
> to clk-next.

I'm glad you won't actually have any boot issues in 5.12, but the fact
you need [1] with this series doesn't make a lot of sense to me
because:

1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode
in question way before any consumer devices are added.
2. Any consumer device added after (1) will stop trying to link to the
clock device.

Are you somehow adding a consumer to the clock fwnode before (1)?

Can you try this patch without your clk fix? I was trying to avoid
looping through a list, but looks like your case might somehow need
it?

-Saravana

+++ b/drivers/base/core.c
@@ -943,6 +943,31 @@ static void device_links_missing_supplier(struct
device *dev)
}
 }

+static int fw_devlink_check_suppliers(struct device *dev)
+{
+   struct fwnode_link *link;
+   int ret = 0;
+
+   if (!dev->fwnode ||fw_devlink_is_permissive())
+   return 0;
+
+   /*
+* Device waiting for supplier to become available is not allowed to
+* probe.
+*/
+   mutex_lock(_link_lock);
+   list_for_each_entry(link, >fwnode->suppliers, c_hook) {
+   if (link->supplier->flags & FWNODE_FLAG_INITIALIZED)
+   continue;
+
+   ret = -EPROBE_DEFER;
+   break;
+   }
+   mutex_unlock(_link_lock);
+
+   return ret;
+}
+
 /**
  * device_links_check_suppliers - Check presence of supplier drivers.
  * @dev: Consumer device.
@@ -964,21 +989,13 @@ int device_links_check_suppliers(struct device *dev)
struct device_link *link;
int ret = 0;

-   /*
-* Device waiting for supplier to become available is not allowed to
-* probe.
-*/
-   mutex_lock(_link_lock);
-   if (dev->fwnode && !list_empty(>fwnode->suppliers) &&
-   !fw_devlink_is_permissive()) {
+   if (fw_devlink_check_suppliers(dev)) {
dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
list_first_entry(>fwnode->suppliers,
struct fwnode_link,
c_hook)->supplier);
-   mutex_unlock(_link_lock);
return -EPROBE_DEFER;
}
-   mutex_unlock(_link_lock);

device_links_write_lock();



>
> Cheers,
> ta
>
> [1] 
> https://lore.kernel.org/lkml/20210203154332.470587-1-tudor.amba...@microchip.com/


Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

2021-02-10 Thread Saravana Kannan
On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck  wrote:
>
> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > Cyclic dependencies in some firmware was one of the last remaining
> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > dependencies don't block probing, set fw_devlink=on by default.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> >   worry about functional dependencies between modules (depmod is still
> >   needed for symbol dependencies).
> >
> > If this patch prevents some devices from probing, it's very likely due
> > to the system having one or more device drivers that "probe"/set up a
> > device (DT node with compatible property) without creating a struct
> > device for it.  If we hit such cases, the device drivers need to be
> > fixed so that they populate struct devices and probe them like normal
> > device drivers so that the driver core is aware of the devices and their
> > status. See [1] for an example of such a case.
> >
> > [1] - 
> > https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/
> > Signed-off-by: Saravana Kannan 
>
> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> trying to reboot. Reverting this patch fixes the problem. Bisect log
> is attached.

Thanks for the report Guenter. Can you please try this series?
https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/

It's in driver-core-testing too if that's easier.

-Saravana


Re: [PATCH] clk: at91: Fix the declaration of the clocks

2021-02-09 Thread Saravana Kannan
On Tue, Feb 9, 2021 at 4:54 PM Stephen Boyd  wrote:
>
> Quoting tudor.amba...@microchip.com (2021-02-08 01:49:45)
> > Hi, Michael, Stephen,
> >
> > Do you plan to take this patch for v5.12?
> > If fw_devlink will remain set to ON for v5.12, some of our boards will
> > no longer boot without this patch.
>
> Is fw_devlink defaulted to on for v5.12?

Yes.

-Saravana


Re: [PATCH v4 4/8] of: property: Add fw_devlink support for optional properties

2021-02-09 Thread Saravana Kannan
On Tue, Feb 9, 2021 at 1:33 PM Rob Herring  wrote:
>
> On Fri, Feb 05, 2021 at 02:26:40PM -0800, Saravana Kannan wrote:
> > Not all DT bindings are mandatory bindings. Add support for optional DT
> > bindings and mark iommus, iommu-map, dmas as optional DT bindings.
>
> I don't think we can say these are optional or not. It's got to be a
> driver decision somehow.

Right, so maybe the word "optional" isn't a good name for it. I can
change that if you want.

The point being, fw_devlink can't block the probe of this driver based
on iommu property. We let the driver decide if it wants to
-EPROBE_DEFER or not or however it wants to handle this.

> For example, if IOMMU is optional, what happens with this sequence:
>
> driver probes without IOMMU
> driver calls dma_map_?()
> IOMMU driver probes
> h/w accesses DMA buffer --> BOOM!

Right. But how is this really related to fw_devlink? AFAICT, this is
an issue even today. If the driver needs the IOMMU, then it needs to
make sure the IOMMU has probed? What am I missing?

-Saravana


Re: [PATCH] clk: at91: sama5d2: Mark device OF_POPULATED after setup

2021-02-09 Thread Saravana Kannan
On Tue, Feb 9, 2021 at 7:21 AM  wrote:
>
> Hi, Saravana,
>
> On 2/9/21 11:11 AM, Saravana Kannan wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > On Mon, Feb 8, 2021 at 11:55 PM Stephen Boyd  wrote:
> >>
> >> Quoting Saravana Kannan (2021-01-28 09:01:41)
> >>> On Thu, Jan 28, 2021 at 2:45 AM Tudor Ambarus
> >>>  wrote:
> >>>>
> >>>> The sama5d2 requires the clock provider initialized before timers.
> >>>> We can't use a platform driver for the sama5d2-pmc driver, as the
> >>>> platform_bus_init() is called later on, after time_init().
> >>>>
> >>>> As fw_devlink considers only devices, it does not know that the
> >>>> pmc is ready. Hence probing of devices that depend on it fail:
> >>>> probe deferral - supplier f0014000.pmc not ready
> >>>>
> >>>> Fix this by setting the OF_POPULATED flag for the sama5d2_pmc
> >>>> device node after successful setup. This will make
> >>>> of_link_to_phandle() ignore the sama5d2_pmc device node as a
> >>>> dependency, and consumer devices will be probed again.
> >>>>
> >>>> Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> >>>> Signed-off-by: Tudor Ambarus 
> >>>> ---
> >>>> I'll be out of office, will check the rest of the at91 SoCs
> >>>> at the begining of next week.
> >>>>
> >>>>  drivers/clk/at91/sama5d2.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/drivers/clk/at91/sama5d2.c b/drivers/clk/at91/sama5d2.c
> >>>> index 9a5cbc7cd55a..5eea2b4a63dd 100644
> >>>> --- a/drivers/clk/at91/sama5d2.c
> >>>> +++ b/drivers/clk/at91/sama5d2.c
> >>>> @@ -367,6 +367,8 @@ static void __init sama5d2_pmc_setup(struct 
> >>>> device_node *np)
> >>>>
> >>>> of_clk_add_hw_provider(np, of_clk_hw_pmc_get, sama5d2_pmc);
> >>>>
> >>>> +   of_node_set_flag(np, OF_POPULATED);
> >>>> +
> >>>> return;
> >>>
> >>> Hi Tudor,
> >>>
> >>> Thanks for looking into this.
> >>>
> >>> I already accounted for early clocks like this when I designed
> >>> fw_devlink. Each driver shouldn't need to set OF_POPULATED.
> >>> drivers/clk/clk.c already does this for you.
> >>>
> >>> I think the problem is that your driver is using
> >>> CLK_OF_DECLARE_DRIVER() instead of CLK_OF_DECLARE(). The comments for
> >>> CLK_OF_DECLARE_DRIVER() says:
> >>> /*
> >>>  * Use this macro when you have a driver that requires two initialization
> >>>  * routines, one at of_clk_init(), and one at platform device probe
> >>>  */
> >>>
> >>> In your case, you are explicitly NOT having a driver bind to this
> >>> clock later. So you shouldn't be using CLK_OF_DECLARE() instead.
> >>>
> >>
> >> I see
> >>
> >> drivers/power/reset/at91-sama5d2_shdwc.c:   { .compatible = 
> >> "atmel,sama5d2-pmc" },
> >>
> >> so isn't that the driver that wants to bind to the same device node
> >> again? First at of_clk_init() time here and then second for the reset
> >> driver?
> >
> > You are right. I assumed that when Tudor was setting OF_POPULATED,
>
> No, there's a single driver that binds to that compatible.
>
> > they didn't want to create a struct device and they knew it was right
> > for their platform.
> >
> > However...
> > $ git grep "atmel,sama5d2-pmc"
> > arch/arm/boot/dts/sama5d2.dtsi: compatible =
> > "atmel,sama5d2-pmc", "syscon";
> > arch/arm/mach-at91/pm.c:{ .compatible = "atmel,sama5d2-pmc",
> > .data = _infos[1] },
> > drivers/clk/at91/pmc.c: { .compatible = "atmel,sama5d2-pmc" },
> > drivers/clk/at91/sama5d2.c:CLK_OF_DECLARE_DRIVER(sama5d2_pmc,
> > "atmel,sama5d2-pmc", sama5d2_pmc_setup);
> > drivers/power/reset/at91-sama5d2_shdwc.c:   { .compatible =
> > "atmel,sama5d2-pmc" },
> >
> > Geez! How many drivers are there for this one device. Clearly not all
> > of them are going to bind. But I'm not going to dig into this. You can
>
> From this entire list only the drivers/clk/at91/sama5d2.c driver binds to the
> "atmel,sama5d2-pmc" compatible, the rest are just using the compatible to
> map the PMC memory.
>
> > reject this patch. I expect this series [1] to take care of the issue
> > Tudor was trying to fix.
> >
> > Tudor,
> >
> > Want to give this series [1] a shot?
>
> The series at [1] doesn't apply clean neither on next-20210209, nor on
> driver-core-next. On top of which sha1 should I apply them?

It's on top of driver-core-next:
4731210c09f5 gpiolib: Bind gpio_device to a driver to enable
fw_devlink=on by default

> Anyway, I think the patch at [2] is still needed, regardless of the outcome
> of [1].

Right, [2] is still a good clean up based on your comment above.

-Saravana

> >
> > [1] - 
> > https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/
>
> [2] 
> https://lore.kernel.org/lkml/20210203154332.470587-1-tudor.amba...@microchip.com/
>
> Cheers,
> ta
>


Re: [PATCH] clk: at91: sama5d2: Mark device OF_POPULATED after setup

2021-02-09 Thread Saravana Kannan
On Mon, Feb 8, 2021 at 11:55 PM Stephen Boyd  wrote:
>
> Quoting Saravana Kannan (2021-01-28 09:01:41)
> > On Thu, Jan 28, 2021 at 2:45 AM Tudor Ambarus
> >  wrote:
> > >
> > > The sama5d2 requires the clock provider initialized before timers.
> > > We can't use a platform driver for the sama5d2-pmc driver, as the
> > > platform_bus_init() is called later on, after time_init().
> > >
> > > As fw_devlink considers only devices, it does not know that the
> > > pmc is ready. Hence probing of devices that depend on it fail:
> > > probe deferral - supplier f0014000.pmc not ready
> > >
> > > Fix this by setting the OF_POPULATED flag for the sama5d2_pmc
> > > device node after successful setup. This will make
> > > of_link_to_phandle() ignore the sama5d2_pmc device node as a
> > > dependency, and consumer devices will be probed again.
> > >
> > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > Signed-off-by: Tudor Ambarus 
> > > ---
> > > I'll be out of office, will check the rest of the at91 SoCs
> > > at the begining of next week.
> > >
> > >  drivers/clk/at91/sama5d2.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/clk/at91/sama5d2.c b/drivers/clk/at91/sama5d2.c
> > > index 9a5cbc7cd55a..5eea2b4a63dd 100644
> > > --- a/drivers/clk/at91/sama5d2.c
> > > +++ b/drivers/clk/at91/sama5d2.c
> > > @@ -367,6 +367,8 @@ static void __init sama5d2_pmc_setup(struct 
> > > device_node *np)
> > >
> > > of_clk_add_hw_provider(np, of_clk_hw_pmc_get, sama5d2_pmc);
> > >
> > > +   of_node_set_flag(np, OF_POPULATED);
> > > +
> > > return;
> >
> > Hi Tudor,
> >
> > Thanks for looking into this.
> >
> > I already accounted for early clocks like this when I designed
> > fw_devlink. Each driver shouldn't need to set OF_POPULATED.
> > drivers/clk/clk.c already does this for you.
> >
> > I think the problem is that your driver is using
> > CLK_OF_DECLARE_DRIVER() instead of CLK_OF_DECLARE(). The comments for
> > CLK_OF_DECLARE_DRIVER() says:
> > /*
> >  * Use this macro when you have a driver that requires two initialization
> >  * routines, one at of_clk_init(), and one at platform device probe
> >  */
> >
> > In your case, you are explicitly NOT having a driver bind to this
> > clock later. So you shouldn't be using CLK_OF_DECLARE() instead.
> >
>
> I see
>
> drivers/power/reset/at91-sama5d2_shdwc.c:   { .compatible = 
> "atmel,sama5d2-pmc" },
>
> so isn't that the driver that wants to bind to the same device node
> again? First at of_clk_init() time here and then second for the reset
> driver?

You are right. I assumed that when Tudor was setting OF_POPULATED,
they didn't want to create a struct device and they knew it was right
for their platform.

However...
$ git grep "atmel,sama5d2-pmc"
arch/arm/boot/dts/sama5d2.dtsi: compatible =
"atmel,sama5d2-pmc", "syscon";
arch/arm/mach-at91/pm.c:{ .compatible = "atmel,sama5d2-pmc",
.data = _infos[1] },
drivers/clk/at91/pmc.c: { .compatible = "atmel,sama5d2-pmc" },
drivers/clk/at91/sama5d2.c:CLK_OF_DECLARE_DRIVER(sama5d2_pmc,
"atmel,sama5d2-pmc", sama5d2_pmc_setup);
drivers/power/reset/at91-sama5d2_shdwc.c:   { .compatible =
"atmel,sama5d2-pmc" },

Geez! How many drivers are there for this one device. Clearly not all
of them are going to bind. But I'm not going to dig into this. You can
reject this patch. I expect this series [1] to take care of the issue
Tudor was trying to fix.

Tudor,

Want to give this series [1] a shot?

[1] - 
https://lore.kernel.org/lkml/20210205222644.2357303-1-sarava...@google.com/
-Saravana


  1   2   3   4   5   6   7   8   9   10   >