Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-24 Thread Benoit Cousson
On 10/23/2012 06:15 PM, Sebastien Guiriec wrote:
 Hi Benoit and John,
 
 On 10/23/2012 06:07 PM, Benoit Cousson wrote:
 On 10/23/2012 05:59 PM, Jon Hunter wrote:

 On 10/23/2012 10:09 AM, Benoit Cousson wrote:
 On 10/23/2012 04:49 PM, Jon Hunter wrote:
 Hi Seb,

 On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:
 Add base address and interrupt line inside Device Tree data for
 OMAP5

 Signed-off-by: Sebastien Guiriec s-guir...@ti.com
 ---
   arch/arm/boot/dts/omap5.dtsi |   16 
   1 file changed, 16 insertions(+)

 diff --git a/arch/arm/boot/dts/omap5.dtsi
 b/arch/arm/boot/dts/omap5.dtsi
 index 42c78be..9e39f9f 100644
 --- a/arch/arm/boot/dts/omap5.dtsi
 +++ b/arch/arm/boot/dts/omap5.dtsi
 @@ -104,6 +104,8 @@

   gpio1: gpio@4ae1 {
   compatible = ti,omap4-gpio;
 +reg = 0x4ae1 0x200;
 +interrupts = 0 29 0x4;
   ti,hwmods = gpio1;
   gpio-controller;
   #gpio-cells = 2;

 I am wondering if we should add the interrupt-parent property to add
 nodes in the device-tree source. I know that today the
 interrupt-parent
 is being defined globally, but when device-tree maps an interrupt
 for a
 device it searches for the interrupt-parent starting the current
 device
 node.

 So in other words, for gpio1 it will search the gpio1 binding for
 interrupt-parent and if not found move up a level and search
 again. It
 will keep doing this until it finds the interrupt-parent.

 Therefore, I believe it will improve search time and hence, boot
 time if
 we have interrupt-parent defined in each node.

 Mmm, I'm not that sure. it will increase the size of the blob, so
 increase the time to load it and then to parse it. Where in the current
 case, it is just going up to the parent node using the already
 un-flatten tree in memory and thus that should not take that much time.

 Yes it will definitely increase the size, so that could slow things
 down.

 That being said, it might be interesting to benchmark that to see what
 is the real impact.

 Right, I wonder what the key functions are we need to benchmark to get
 an overall feel for what is best? Right now I am seeing some people add
 the interrupt-parent for device nodes and others not. Ideally we should
 be consistent, but at the same time it is probably something that we can
 easily sort out later. So not a big deal either way.

 For consistency, I'd rather not add it at all for the moment.
 Later, when we will only support DT boot, people will start complaining
 about the boot time increase and then we will start optimizing a little
 bit :-)
 
 I just do it like that to be consistent with what is inside OMAP4 dtsi
 for those IPs (GPIO/UART/MMC/I2C). Now after checking Peter already add
 the interrupt-parent for all audio IPs (OMAP3/4/5). But here we need
 also interrupts name. So here we should try to be consistent.
 
 So I can send back the series for OMAP5 and update the OMAP4 with
   interrupts-parent = gic

No, you should not, as explained previously. You'd better remove the one
already in audio IPs for consistency.


Regards,
Benoit

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Sebastien Guiriec
Add base address and interrupt line inside Device Tree data for
OMAP5

Signed-off-by: Sebastien Guiriec s-guir...@ti.com
---
 arch/arm/boot/dts/omap5.dtsi |   16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
index 42c78be..9e39f9f 100644
--- a/arch/arm/boot/dts/omap5.dtsi
+++ b/arch/arm/boot/dts/omap5.dtsi
@@ -104,6 +104,8 @@
 
gpio1: gpio@4ae1 {
compatible = ti,omap4-gpio;
+   reg = 0x4ae1 0x200;
+   interrupts = 0 29 0x4;
ti,hwmods = gpio1;
gpio-controller;
#gpio-cells = 2;
@@ -113,6 +115,8 @@
 
gpio2: gpio@48055000 {
compatible = ti,omap4-gpio;
+   reg = 0x48055000 0x200;
+   interrupts = 0 30 0x4;
ti,hwmods = gpio2;
gpio-controller;
#gpio-cells = 2;
@@ -122,6 +126,8 @@
 
gpio3: gpio@48057000 {
compatible = ti,omap4-gpio;
+   reg = 0x48057000 0x200;
+   interrupts = 0 31 0x4;
ti,hwmods = gpio3;
gpio-controller;
#gpio-cells = 2;
@@ -131,6 +137,8 @@
 
gpio4: gpio@48059000 {
compatible = ti,omap4-gpio;
+   reg = 0x48059000 0x200;
+   interrupts = 0 32 0x4;
ti,hwmods = gpio4;
gpio-controller;
#gpio-cells = 2;
@@ -140,6 +148,8 @@
 
gpio5: gpio@4805b000 {
compatible = ti,omap4-gpio;
+   reg = 0x4805b000 0x200;
+   interrupts = 0 33 0x4;
ti,hwmods = gpio5;
gpio-controller;
#gpio-cells = 2;
@@ -149,6 +159,8 @@
 
gpio6: gpio@4805d000 {
compatible = ti,omap4-gpio;
+   reg = 0x4805d000 0x200;
+   interrupts = 0 34 0x4;
ti,hwmods = gpio6;
gpio-controller;
#gpio-cells = 2;
@@ -158,6 +170,8 @@
 
gpio7: gpio@48051000 {
compatible = ti,omap4-gpio;
+   reg = 0x48051000 0x200;
+   interrupts = 0 35 0x4;
ti,hwmods = gpio7;
gpio-controller;
#gpio-cells = 2;
@@ -167,6 +181,8 @@
 
gpio8: gpio@48053000 {
compatible = ti,omap4-gpio;
+   reg = 0x48053000 0x200;
+   interrupts = 0 121 0x4;
ti,hwmods = gpio8;
gpio-controller;
#gpio-cells = 2;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Jon Hunter
Hi Seb,

On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:
 Add base address and interrupt line inside Device Tree data for
 OMAP5
 
 Signed-off-by: Sebastien Guiriec s-guir...@ti.com
 ---
  arch/arm/boot/dts/omap5.dtsi |   16 
  1 file changed, 16 insertions(+)
 
 diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
 index 42c78be..9e39f9f 100644
 --- a/arch/arm/boot/dts/omap5.dtsi
 +++ b/arch/arm/boot/dts/omap5.dtsi
 @@ -104,6 +104,8 @@
  
   gpio1: gpio@4ae1 {
   compatible = ti,omap4-gpio;
 + reg = 0x4ae1 0x200;
 + interrupts = 0 29 0x4;
   ti,hwmods = gpio1;
   gpio-controller;
   #gpio-cells = 2;

I am wondering if we should add the interrupt-parent property to add
nodes in the device-tree source. I know that today the interrupt-parent
is being defined globally, but when device-tree maps an interrupt for a
device it searches for the interrupt-parent starting the current device
node.

So in other words, for gpio1 it will search the gpio1 binding for
interrupt-parent and if not found move up a level and search again. It
will keep doing this until it finds the interrupt-parent.

Therefore, I believe it will improve search time and hence, boot time if
we have interrupt-parent defined in each node.

Cheers
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Benoit Cousson
On 10/23/2012 04:49 PM, Jon Hunter wrote:
 Hi Seb,
 
 On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:
 Add base address and interrupt line inside Device Tree data for
 OMAP5

 Signed-off-by: Sebastien Guiriec s-guir...@ti.com
 ---
  arch/arm/boot/dts/omap5.dtsi |   16 
  1 file changed, 16 insertions(+)

 diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
 index 42c78be..9e39f9f 100644
 --- a/arch/arm/boot/dts/omap5.dtsi
 +++ b/arch/arm/boot/dts/omap5.dtsi
 @@ -104,6 +104,8 @@
  
  gpio1: gpio@4ae1 {
  compatible = ti,omap4-gpio;
 +reg = 0x4ae1 0x200;
 +interrupts = 0 29 0x4;
  ti,hwmods = gpio1;
  gpio-controller;
  #gpio-cells = 2;
 
 I am wondering if we should add the interrupt-parent property to add
 nodes in the device-tree source. I know that today the interrupt-parent
 is being defined globally, but when device-tree maps an interrupt for a
 device it searches for the interrupt-parent starting the current device
 node.
 
 So in other words, for gpio1 it will search the gpio1 binding for
 interrupt-parent and if not found move up a level and search again. It
 will keep doing this until it finds the interrupt-parent.
 
 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.

Mmm, I'm not that sure. it will increase the size of the blob, so
increase the time to load it and then to parse it. Where in the current
case, it is just going up to the parent node using the already
un-flatten tree in memory and thus that should not take that much time.

That being said, it might be interesting to benchmark that to see what
is the real impact.

Regards,
Benoit

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Jon Hunter

On 10/23/2012 10:09 AM, Benoit Cousson wrote:
 On 10/23/2012 04:49 PM, Jon Hunter wrote:
 Hi Seb,

 On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:
 Add base address and interrupt line inside Device Tree data for
 OMAP5

 Signed-off-by: Sebastien Guiriec s-guir...@ti.com
 ---
  arch/arm/boot/dts/omap5.dtsi |   16 
  1 file changed, 16 insertions(+)

 diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
 index 42c78be..9e39f9f 100644
 --- a/arch/arm/boot/dts/omap5.dtsi
 +++ b/arch/arm/boot/dts/omap5.dtsi
 @@ -104,6 +104,8 @@
  
 gpio1: gpio@4ae1 {
 compatible = ti,omap4-gpio;
 +   reg = 0x4ae1 0x200;
 +   interrupts = 0 29 0x4;
 ti,hwmods = gpio1;
 gpio-controller;
 #gpio-cells = 2;

 I am wondering if we should add the interrupt-parent property to add
 nodes in the device-tree source. I know that today the interrupt-parent
 is being defined globally, but when device-tree maps an interrupt for a
 device it searches for the interrupt-parent starting the current device
 node.

 So in other words, for gpio1 it will search the gpio1 binding for
 interrupt-parent and if not found move up a level and search again. It
 will keep doing this until it finds the interrupt-parent.

 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.
 
 Mmm, I'm not that sure. it will increase the size of the blob, so
 increase the time to load it and then to parse it. Where in the current
 case, it is just going up to the parent node using the already
 un-flatten tree in memory and thus that should not take that much time.

Yes it will definitely increase the size, so that could slow things down.

 That being said, it might be interesting to benchmark that to see what
 is the real impact.

Right, I wonder what the key functions are we need to benchmark to get
an overall feel for what is best? Right now I am seeing some people add
the interrupt-parent for device nodes and others not. Ideally we should
be consistent, but at the same time it is probably something that we can
easily sort out later. So not a big deal either way.

Cheers
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Benoit Cousson
On 10/23/2012 05:59 PM, Jon Hunter wrote:
 
 On 10/23/2012 10:09 AM, Benoit Cousson wrote:
 On 10/23/2012 04:49 PM, Jon Hunter wrote:
 Hi Seb,

 On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:
 Add base address and interrupt line inside Device Tree data for
 OMAP5

 Signed-off-by: Sebastien Guiriec s-guir...@ti.com
 ---
  arch/arm/boot/dts/omap5.dtsi |   16 
  1 file changed, 16 insertions(+)

 diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
 index 42c78be..9e39f9f 100644
 --- a/arch/arm/boot/dts/omap5.dtsi
 +++ b/arch/arm/boot/dts/omap5.dtsi
 @@ -104,6 +104,8 @@
  
gpio1: gpio@4ae1 {
compatible = ti,omap4-gpio;
 +  reg = 0x4ae1 0x200;
 +  interrupts = 0 29 0x4;
ti,hwmods = gpio1;
gpio-controller;
#gpio-cells = 2;

 I am wondering if we should add the interrupt-parent property to add
 nodes in the device-tree source. I know that today the interrupt-parent
 is being defined globally, but when device-tree maps an interrupt for a
 device it searches for the interrupt-parent starting the current device
 node.

 So in other words, for gpio1 it will search the gpio1 binding for
 interrupt-parent and if not found move up a level and search again. It
 will keep doing this until it finds the interrupt-parent.

 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.

 Mmm, I'm not that sure. it will increase the size of the blob, so
 increase the time to load it and then to parse it. Where in the current
 case, it is just going up to the parent node using the already
 un-flatten tree in memory and thus that should not take that much time.
 
 Yes it will definitely increase the size, so that could slow things down.
 
 That being said, it might be interesting to benchmark that to see what
 is the real impact.
 
 Right, I wonder what the key functions are we need to benchmark to get
 an overall feel for what is best? Right now I am seeing some people add
 the interrupt-parent for device nodes and others not. Ideally we should
 be consistent, but at the same time it is probably something that we can
 easily sort out later. So not a big deal either way.

For consistency, I'd rather not add it at all for the moment.
Later, when we will only support DT boot, people will start complaining
about the boot time increase and then we will start optimizing a little
bit :-)

Regards,
Benoit


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Sebastien Guiriec

Hi Benoit and John,

On 10/23/2012 06:07 PM, Benoit Cousson wrote:

On 10/23/2012 05:59 PM, Jon Hunter wrote:


On 10/23/2012 10:09 AM, Benoit Cousson wrote:

On 10/23/2012 04:49 PM, Jon Hunter wrote:

Hi Seb,

On 10/23/2012 03:37 AM, Sebastien Guiriec wrote:

Add base address and interrupt line inside Device Tree data for
OMAP5

Signed-off-by: Sebastien Guiriec s-guir...@ti.com
---
  arch/arm/boot/dts/omap5.dtsi |   16 
  1 file changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
index 42c78be..9e39f9f 100644
--- a/arch/arm/boot/dts/omap5.dtsi
+++ b/arch/arm/boot/dts/omap5.dtsi
@@ -104,6 +104,8 @@

gpio1: gpio@4ae1 {
compatible = ti,omap4-gpio;
+   reg = 0x4ae1 0x200;
+   interrupts = 0 29 0x4;
ti,hwmods = gpio1;
gpio-controller;
#gpio-cells = 2;


I am wondering if we should add the interrupt-parent property to add
nodes in the device-tree source. I know that today the interrupt-parent
is being defined globally, but when device-tree maps an interrupt for a
device it searches for the interrupt-parent starting the current device
node.

So in other words, for gpio1 it will search the gpio1 binding for
interrupt-parent and if not found move up a level and search again. It
will keep doing this until it finds the interrupt-parent.

Therefore, I believe it will improve search time and hence, boot time if
we have interrupt-parent defined in each node.


Mmm, I'm not that sure. it will increase the size of the blob, so
increase the time to load it and then to parse it. Where in the current
case, it is just going up to the parent node using the already
un-flatten tree in memory and thus that should not take that much time.


Yes it will definitely increase the size, so that could slow things down.


That being said, it might be interesting to benchmark that to see what
is the real impact.


Right, I wonder what the key functions are we need to benchmark to get
an overall feel for what is best? Right now I am seeing some people add
the interrupt-parent for device nodes and others not. Ideally we should
be consistent, but at the same time it is probably something that we can
easily sort out later. So not a big deal either way.


For consistency, I'd rather not add it at all for the moment.
Later, when we will only support DT boot, people will start complaining
about the boot time increase and then we will start optimizing a little
bit :-)


I just do it like that to be consistent with what is inside OMAP4 dtsi 
for those IPs (GPIO/UART/MMC/I2C). Now after checking Peter already add 
the interrupt-parent for all audio IPs (OMAP3/4/5). But here we need 
also interrupts name. So here we should try to be consistent.


So I can send back the series for OMAP5 and update the OMAP4 with
  interrupts-parent = gic

As of today we are not consistent.



Regards,
Benoit




--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Mitch Bradley
On 10/23/2012 4:49 AM, Jon Hunter wrote:

 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.

I strongly suspect (based on many years of performance tuning, with
special focus on boot time) that the time difference will be completely
insignificant.  The total extra time for walking up the interrupt tree
for every interrupt in a large system is comparable to the time it takes
to send a few characters out a UART.  So you can get more improvement
from eliminating a single printk() than from globally adding per-node
interrupt-parent.

Furthermore, the cost of processing all of the interrupt-parent
properties is probably similar to the cost of the avoided tree walks.

CPU cycles are very fast compared to I/O register accesses, say a factor
of 100.  Now consider that many modern devices contain embedded
microcontrollers (SD cards, network interface modules, USB hubs and
devices, ...), and those devices usually require various delays measured
in milliseconds, to ensure that the microcontroller is ready for the
next initialization step.  Those delays are extremely long compared to
CPU cycles.  Obviously, some of that can be overlapped by careful
multithreading, but that isn't free either.

The bottom line is that I'm pretty sure that adding per-node
interrupt-parent would not be worthwhile from the standpoint of speeding
up boot time.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Jon Hunter
Hi Mitch,

On 10/23/2012 11:55 AM, Mitch Bradley wrote:
 On 10/23/2012 4:49 AM, Jon Hunter wrote:
 
 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.
 
 I strongly suspect (based on many years of performance tuning, with
 special focus on boot time) that the time difference will be completely
 insignificant.  The total extra time for walking up the interrupt tree
 for every interrupt in a large system is comparable to the time it takes
 to send a few characters out a UART.  So you can get more improvement
 from eliminating a single printk() than from globally adding per-node
 interrupt-parent.
 
 Furthermore, the cost of processing all of the interrupt-parent
 properties is probably similar to the cost of the avoided tree walks.
 
 CPU cycles are very fast compared to I/O register accesses, say a factor
 of 100.  Now consider that many modern devices contain embedded
 microcontrollers (SD cards, network interface modules, USB hubs and
 devices, ...), and those devices usually require various delays measured
 in milliseconds, to ensure that the microcontroller is ready for the
 next initialization step.  Those delays are extremely long compared to
 CPU cycles.  Obviously, some of that can be overlapped by careful
 multithreading, but that isn't free either.
 
 The bottom line is that I'm pretty sure that adding per-node
 interrupt-parent would not be worthwhile from the standpoint of speeding
 up boot time.

Absolutely, I don't expect this to miraculously improve the boot time or
suggest that this is a major contributor to boot time, but what is the
best approach in general in terms of efficiency (memory and time). In
other words, is there a best practice? And from your feedback, I
understand that adding a global interrupt-parent is a good practice.

For a bit of fun, I took an omap4430 board and benchmarked the time
taken by the of_irq_find_parent() when interrupt-parent was defined for
each node using interrupts and without.

There were a total of 47 device nodes using interrupts. Adding the
interrupt-parent to all 47 nodes increased the dtb from 13211 bytes to
13963 bytes.

On boot-up I saw 117 calls to of_irq_find_parent() for this platform
(there appears to be multiple calls for a given device). Without
interrupt-parent defined for each node total time spent in
of_irq_find_parent() was 1.028 ms where as with interrupt-parent defined
for each node the total time was 0.4032 ms. This was done using a
38.4MHz timer and the overhead of reading the timer 117 times was about
36 us.

I understand that this does not provide the full picture, but I wanted
to get a better handle on the times here. So yes the overall overhead
here is not significant for us to worry about.

Cheers
Jon


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts

2012-10-23 Thread Mitch Bradley
On 10/23/2012 1:15 PM, Jon Hunter wrote:
 Hi Mitch,
 
 On 10/23/2012 11:55 AM, Mitch Bradley wrote:
 On 10/23/2012 4:49 AM, Jon Hunter wrote:

 Therefore, I believe it will improve search time and hence, boot time if
 we have interrupt-parent defined in each node.

 I strongly suspect (based on many years of performance tuning, with
 special focus on boot time) that the time difference will be completely
 insignificant.  The total extra time for walking up the interrupt tree
 for every interrupt in a large system is comparable to the time it takes
 to send a few characters out a UART.  So you can get more improvement
 from eliminating a single printk() than from globally adding per-node
 interrupt-parent.

 Furthermore, the cost of processing all of the interrupt-parent
 properties is probably similar to the cost of the avoided tree walks.

 CPU cycles are very fast compared to I/O register accesses, say a factor
 of 100.  Now consider that many modern devices contain embedded
 microcontrollers (SD cards, network interface modules, USB hubs and
 devices, ...), and those devices usually require various delays measured
 in milliseconds, to ensure that the microcontroller is ready for the
 next initialization step.  Those delays are extremely long compared to
 CPU cycles.  Obviously, some of that can be overlapped by careful
 multithreading, but that isn't free either.

 The bottom line is that I'm pretty sure that adding per-node
 interrupt-parent would not be worthwhile from the standpoint of speeding
 up boot time.
 
 Absolutely, I don't expect this to miraculously improve the boot time or
 suggest that this is a major contributor to boot time, but what is the
 best approach in general in terms of efficiency (memory and time). In
 other words, is there a best practice? And from your feedback, I
 understand that adding a global interrupt-parent is a good practice.

From a maintenance standpoint, saying it once is best practice.  Time
that you don't spend doing unnecessary maintenance can be spent looking
for other, higher value, improvements.  And when you do need to optimize
something, it's much easier if the function is centralized.

Pushing the interrupt parent up the tree to the appropriate point can
make the next platform easier, opening the possibility of changing just
one thing instead of several dozen.

There have been several cases when I have violated good factoring in
order to save a little time, only to have to undo it later when the next
system was enough different that the de-factored version didn't work.

So, while there are certainly cases where you are forced to do
otherwise, I generally like the don't repeat yourself mantra.

 
 For a bit of fun, I took an omap4430 board and benchmarked the time
 taken by the of_irq_find_parent() when interrupt-parent was defined for
 each node using interrupts and without.
 
 There were a total of 47 device nodes using interrupts. Adding the
 interrupt-parent to all 47 nodes increased the dtb from 13211 bytes to
 13963 bytes.
 
 On boot-up I saw 117 calls to of_irq_find_parent() for this platform
 (there appears to be multiple calls for a given device). Without
 interrupt-parent defined for each node total time spent in
 of_irq_find_parent() was 1.028 ms where as with interrupt-parent defined
 for each node the total time was 0.4032 ms. This was done using a
 38.4MHz timer and the overhead of reading the timer 117 times was about
 36 us.

That sounds about right.  The savings of 600 us is 6 characters at
115200 baud.

 
 I understand that this does not provide the full picture, but I wanted
 to get a better handle on the times here. So yes the overall overhead
 here is not significant for us to worry about.

Big ticket items for boot time improvement are time spent waiting for
peripheral devices to become ready and time spent spewing diagnostic
messages.  But in the final analysis, you just have to measure what is
happening and see what you can do to improve it.  In my experience, CPU
cycles are rarely problematic, unless they are artificially slowed down
due to caches being off or due to direct execution from slow memory like
ROMs.

I once shaved an hour off the startup time for a PowerPC system by
moving some critical code into cache.  This was on a prototype chip
that was being emulated by arrays of FPGAs.

On the first generation OLPC XO-1 machine we were really interested in
super-fast wakeup from suspend.  I tuned that firmware code path to the
nth degree, finally getting stuck at 2 ms because you had to wait that
long before accessing the PCI bus interface, otherwise the SD controller
chip would lock up.  Then I transferred control to the kernel, which had
to wait something like 40 ms (two display frame times) to re-sync the
video subsystem, then it had to re-enable the USB subsystem, which ended
up taking a good fraction of a second.

Things haven't gotten much better (in fact they are probably worse),
because, even the the CPUs have gotten