RE: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-04 Thread David Laight
From: Larry Finger
> Sent: 03 February 2016 19:45
...
> The performance will depend on where you satisfy the condition. All switch 
> cases
> have the same execution time, but in the if .. else if .. else form, the 
> earlier
> tests execute more quickly. I'm not sure that one can make any blanket 
> statement
> about performance. Certainly, the switch version will be larger. For a switch
> with 8 cases plus default, the object code if 43 bytes larger than the nested
> ifs in a test program that I created. That is a significant penalty.

There is also the penalty of the (likely) data cache miss reading the jump 
table.
But given this code is all about generating a variable delay the execution
speed is probably irrelevant.

It would be much more interesting if the delay could be changed for sleeps.

David


Re: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-04 Thread Larry Finger

On 02/04/2016 03:48 AM, David Laight wrote:

From: Larry Finger

Sent: 03 February 2016 19:45

...

The performance will depend on where you satisfy the condition. All switch cases
have the same execution time, but in the if .. else if .. else form, the earlier
tests execute more quickly. I'm not sure that one can make any blanket statement
about performance. Certainly, the switch version will be larger. For a switch
with 8 cases plus default, the object code if 43 bytes larger than the nested
ifs in a test program that I created. That is a significant penalty.


There is also the penalty of the (likely) data cache miss reading the jump 
table.
But given this code is all about generating a variable delay the execution
speed is probably irrelevant.

It would be much more interesting if the delay could be changed for sleeps.


Unfortunately, sleeping is not possible for the routines that call 
rtl_addr_delay().

Larry




RE: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-04 Thread David Laight
From: Larry Finger
> Sent: 04 February 2016 15:44
> On 02/04/2016 03:48 AM, David Laight wrote:
> > From: Larry Finger
> >> Sent: 03 February 2016 19:45
> > ...
> >> The performance will depend on where you satisfy the condition. All switch 
> >> cases
> >> have the same execution time, but in the if .. else if .. else form, the 
> >> earlier
> >> tests execute more quickly. I'm not sure that one can make any blanket 
> >> statement
> >> about performance. Certainly, the switch version will be larger. For a 
> >> switch
> >> with 8 cases plus default, the object code if 43 bytes larger than the 
> >> nested
> >> ifs in a test program that I created. That is a significant penalty.
> >
> > There is also the penalty of the (likely) data cache miss reading the jump 
> > table.
> > But given this code is all about generating a variable delay the execution
> > speed is probably irrelevant.
> >
> > It would be much more interesting if the delay could be changed for sleeps.
> 
> Unfortunately, sleeping is not possible for the routines that call 
> rtl_addr_delay().

I hope none of my systems ever busy-delay for 50ms.

Actually possibly just as troublesome is udelay(1).

I presume this is used after a 'hardware' write in order to meet a minimum
pulse width (or an inter-cycle recovery time).

Unfortunately the initial write cycle will almost certainly be 'posted' so
may not actually be seen by the target until some time later.

This means that although the cpu delayed 1us, the target hardware might
see the second access back to back with the first one.

Forcing the posted write to complete almost certainly involves reading
back from exactly the same physical address (possibly after some sync
instruction(s)).

David



RE: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-03 Thread David Laight
From: Byeoungwook Kim
> Sent: 03 February 2016 02:00
> Conditional codes in rtl_addr_delay() were improved in readability and
> performance by using switch codes.

I'd like to see the performance data :-)

> diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
> b/drivers/net/wireless/realtek/rtlwifi/core.c
> index 4ae421e..05f432c 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/core.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/core.c
> @@ -37,18 +37,26 @@
> 
>  void rtl_addr_delay(u32 addr)
>  {
> - if (addr == 0xfe)
> + switch (addr) {
> + case 0xfe:
>   mdelay(50);
> - else if (addr == 0xfd)
> + break;
> + case 0xfd:
>   mdelay(5);
> - else if (addr == 0xfc)
> + break;
> + case 0xfc:
>   mdelay(1);
> - else if (addr == 0xfb)
> + break;
> + case 0xfb:
>   udelay(50);
> - else if (addr == 0xfa)
> + break;
> + case 0xfa:
>   udelay(5);
> - else if (addr == 0xf9)
> + break;
> + case 0xf9:
>   udelay(1);
> + break;
> + };

Straight 'performance' can't matter here, not with mdelay(50)!
The most likely effect is from speeding up the 'don't delay' path
and reducing the number of conditionals (and hence accuracy of) udelay(1).
Reversing the if-chain might be better still.

David



Re: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-03 Thread ByeoungWook Kim
Hi David,

2016-02-03 23:41 GMT+09:00 David Laight :
> From: Byeoungwook Kim
>> Sent: 03 February 2016 02:00
>> Conditional codes in rtl_addr_delay() were improved in readability and
>> performance by using switch codes.
>> ...
>>  void rtl_addr_delay(u32 addr)
>>  {
>> - if (addr == 0xfe)
>> + switch (addr) {
>> + case 0xfe:
>>   mdelay(50);
>> - else if (addr == 0xfd)
>> + break;
>> + case 0xfd:
>>   mdelay(5);
>> - else if (addr == 0xfc)
>> + break;
>> + case 0xfc:
>>   mdelay(1);
>> - else if (addr == 0xfb)
>> + break;
>> + case 0xfb:
>>   udelay(50);
>> - else if (addr == 0xfa)
>> + break;
>> + case 0xfa:
>>   udelay(5);
>> - else if (addr == 0xf9)
>> + break;
>> + case 0xf9:
>>   udelay(1);
>> + break;
>> + };
>
> Straight 'performance' can't matter here, not with mdelay(50)!
> The most likely effect is from speeding up the 'don't delay' path
> and reducing the number of conditionals (and hence accuracy of) udelay(1).
> Reversing the if-chain might be better still.
>

I agree with your assists about "The most likely effect is from
speeding up the 'don't delay' path and reducing the number of
conditionals (and hence accuracy of) udelay(1).".

I converted to assembly codes like next line from conditionals.

---

  if (addr == 0xf9)
00951445  cmpdword ptr [addr],0F9h
0095144C  jnemain+35h (0951455h)
  a();
0095144E  calla (09510EBh)
00951453  jmpmain+83h (09514A3h)
  else if (addr == 0xfa)
00951455  cmpdword ptr [addr],0FAh
0095145C  jnemain+45h (0951465h)
  a();
0095145E  calla (09510EBh)
00951463  jmpmain+83h (09514A3h)
  else if (addr == 0xfb)
00951465  cmpdword ptr [addr],0FBh
0095146C  jnemain+55h (0951475h)
  a();
0095146E  calla (09510EBh)
00951473  jmpmain+83h (09514A3h)
  else if (addr == 0xfc)
00951475  cmpdword ptr [addr],0FCh
0095147C  jnemain+65h (0951485h)
  b();
0095147E  callb (09510E6h)
00951483  jmpmain+83h (09514A3h)
  else if (addr == 0xfd)
00951485  cmpdword ptr [addr],0FDh
0095148C  jnemain+75h (0951495h)
  b();
0095148E  callb (09510E6h)
00951493  jmpmain+83h (09514A3h)
  else if (addr == 0xfe)
00951495  cmpdword ptr [addr],0FEh
0095149C  jnemain+83h (09514A3h)
  b();
0095149E  callb (09510E6h)

---

if the addr value was 0xfe, Big-O-notation is O(1).
but if the addr value was 0xf9, Big-O-notation is O(n).

2016-02-03 23:41 GMT+09:00 David Laight :
> From: Byeoungwook Kim
>> Sent: 03 February 2016 02:00
>> Conditional codes in rtl_addr_delay() were improved in readability and
>> performance by using switch codes.
>
> I'd like to see the performance data :-)

I used switch codes to solve of this problem.

If the addr variable was increment consecutive, switch codes is able
to use branch table for optimize.
so I converted to assembly codes like next line from same codes about my patch.

 switch (addr)
011C1445  moveax,dword ptr [addr]
011C1448  movdword ptr [ebp-0D0h],eax
011C144E  movecx,dword ptr [ebp-0D0h]
011C1454  subecx,0F9h
011C145A  movdword ptr [ebp-0D0h],ecx
011C1460  cmpdword ptr [ebp-0D0h],5
011C1467  ja  $LN6+28h (011C149Eh)
011C1469  movedx,dword ptr [ebp-0D0h]
011C146F  jmpdword ptr [edx*4+11C14B4h]
  {
  case 0xf9: a(); break;
011C1476  calla (011C10EBh)
011C147B  jmp$LN6+28h (011C149Eh)
  case 0xfa: a(); break;
011C147D  calla (011C10EBh)
011C1482  jmp$LN6+28h (011C149Eh)
  case 0xfb: a(); break;
011C1484  calla (011C10EBh)
011C1489  jmp$LN6+28h (011C149Eh)
  case 0xfc: b(); break;
011C148B  callb (011C10E6h)
011C1490  jmp$LN6+28h (011C149Eh)
  case 0xfd: b(); break;
011C1492  callb (011C10E6h)
011C1497  jmp$LN6+28h (011C149Eh)
  case 0xfe: b(); break;
011C1499  callb (011C10E6h)
  }

===[[branch table]]===
011C14B4 011C1476h
011C14B8 011C147Dh
011C14BC 011C1484h
011C14C0 011C148Bh
011C14C4 011C1492h
011C14C8 011C1499h

So conditional codes into rtl_addr_delay() can improve to readability
and performance that used switch codes.

Regards,
Byeoungwook.


Re: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-03 Thread Larry Finger

On 02/03/2016 11:49 AM, ByeoungWook Kim wrote:

Hi David,

2016-02-03 23:41 GMT+09:00 David Laight :

From: Byeoungwook Kim

Sent: 03 February 2016 02:00
Conditional codes in rtl_addr_delay() were improved in readability and
performance by using switch codes.
...
  void rtl_addr_delay(u32 addr)
  {
- if (addr == 0xfe)
+ switch (addr) {
+ case 0xfe:
   mdelay(50);
- else if (addr == 0xfd)
+ break;
+ case 0xfd:
   mdelay(5);
- else if (addr == 0xfc)
+ break;
+ case 0xfc:
   mdelay(1);
- else if (addr == 0xfb)
+ break;
+ case 0xfb:
   udelay(50);
- else if (addr == 0xfa)
+ break;
+ case 0xfa:
   udelay(5);
- else if (addr == 0xf9)
+ break;
+ case 0xf9:
   udelay(1);
+ break;
+ };


Straight 'performance' can't matter here, not with mdelay(50)!
The most likely effect is from speeding up the 'don't delay' path
and reducing the number of conditionals (and hence accuracy of) udelay(1).
Reversing the if-chain might be better still.



I agree with your assists about "The most likely effect is from
speeding up the 'don't delay' path and reducing the number of
conditionals (and hence accuracy of) udelay(1).".

I converted to assembly codes like next line from conditionals.

---

   if (addr == 0xf9)
00951445  cmpdword ptr [addr],0F9h
0095144C  jnemain+35h (0951455h)
   a();
0095144E  calla (09510EBh)
00951453  jmpmain+83h (09514A3h)
   else if (addr == 0xfa)
00951455  cmpdword ptr [addr],0FAh
0095145C  jnemain+45h (0951465h)
   a();
0095145E  calla (09510EBh)
00951463  jmpmain+83h (09514A3h)
   else if (addr == 0xfb)
00951465  cmpdword ptr [addr],0FBh
0095146C  jnemain+55h (0951475h)
   a();
0095146E  calla (09510EBh)
00951473  jmpmain+83h (09514A3h)
   else if (addr == 0xfc)
00951475  cmpdword ptr [addr],0FCh
0095147C  jnemain+65h (0951485h)
   b();
0095147E  callb (09510E6h)
00951483  jmpmain+83h (09514A3h)
   else if (addr == 0xfd)
00951485  cmpdword ptr [addr],0FDh
0095148C  jnemain+75h (0951495h)
   b();
0095148E  callb (09510E6h)
00951493  jmpmain+83h (09514A3h)
   else if (addr == 0xfe)
00951495  cmpdword ptr [addr],0FEh
0095149C  jnemain+83h (09514A3h)
   b();
0095149E  callb (09510E6h)

---

if the addr value was 0xfe, Big-O-notation is O(1).
but if the addr value was 0xf9, Big-O-notation is O(n).

2016-02-03 23:41 GMT+09:00 David Laight :

From: Byeoungwook Kim

Sent: 03 February 2016 02:00
Conditional codes in rtl_addr_delay() were improved in readability and
performance by using switch codes.


I'd like to see the performance data :-)


I used switch codes to solve of this problem.

If the addr variable was increment consecutive, switch codes is able
to use branch table for optimize.
so I converted to assembly codes like next line from same codes about my patch.

  switch (addr)
011C1445  moveax,dword ptr [addr]
011C1448  movdword ptr [ebp-0D0h],eax
011C144E  movecx,dword ptr [ebp-0D0h]
011C1454  subecx,0F9h
011C145A  movdword ptr [ebp-0D0h],ecx
011C1460  cmpdword ptr [ebp-0D0h],5
011C1467  ja  $LN6+28h (011C149Eh)
011C1469  movedx,dword ptr [ebp-0D0h]
011C146F  jmpdword ptr [edx*4+11C14B4h]
   {
   case 0xf9: a(); break;
011C1476  calla (011C10EBh)
011C147B  jmp$LN6+28h (011C149Eh)
   case 0xfa: a(); break;
011C147D  calla (011C10EBh)
011C1482  jmp$LN6+28h (011C149Eh)
   case 0xfb: a(); break;
011C1484  calla (011C10EBh)
011C1489  jmp$LN6+28h (011C149Eh)
   case 0xfc: b(); break;
011C148B  callb (011C10E6h)
011C1490  jmp$LN6+28h (011C149Eh)
   case 0xfd: b(); break;
011C1492  callb (011C10E6h)
011C1497  jmp$LN6+28h (011C149Eh)
   case 0xfe: b(); break;
011C1499  callb (011C10E6h)
   }

===[[branch table]]===
011C14B4 011C1476h
011C14B8 011C147Dh
011C14BC 011C1484h
011C14C0 011C148Bh
011C14C4 011C1492h
011C14C8 011C1499h

So conditional codes into rtl_addr_delay() can improve to readability
and performance that used switch codes.


My advice is that you relax. I was out of my office for a day and a half, and I 
return to find my inbox full of this topic. The discussion is OK, but submitting 
3 versions of a patch before I (the maintainer) even have a chance to read the 
original submission. When resubmitting a new version of a multi-patch set, every 
member of that set should be resubmitted with the new version even though a 
particular member has not changed. This convention makes it easier for the 
maintainer to keep track of the changes. In addition, all patches are 

Re: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-02 Thread Julian Calaby
Hi Byeounwook,

On Wed, Feb 3, 2016 at 12:59 PM, Byeoungwook Kim  wrote:
> Conditional codes in rtl_addr_delay() were improved in readability and
> performance by using switch codes.
>
> Signed-off-by: Byeoungwook Kim 
> Reported-by: Julian Calaby 

Reviewed-by: Julian Calaby 

Thanks,

Julian Calaby


> ---
>  drivers/net/wireless/realtek/rtlwifi/core.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
> b/drivers/net/wireless/realtek/rtlwifi/core.c
> index 4ae421e..05f432c 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/core.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/core.c
> @@ -37,18 +37,26 @@
>
>  void rtl_addr_delay(u32 addr)
>  {
> -   if (addr == 0xfe)
> +   switch (addr) {
> +   case 0xfe:
> mdelay(50);
> -   else if (addr == 0xfd)
> +   break;
> +   case 0xfd:
> mdelay(5);
> -   else if (addr == 0xfc)
> +   break;
> +   case 0xfc:
> mdelay(1);
> -   else if (addr == 0xfb)
> +   break;
> +   case 0xfb:
> udelay(50);
> -   else if (addr == 0xfa)
> +   break;
> +   case 0xfa:
> udelay(5);
> -   else if (addr == 0xf9)
> +   break;
> +   case 0xf9:
> udelay(1);
> +   break;
> +   };
>  }
>  EXPORT_SYMBOL(rtl_addr_delay);
>
> --
> 2.5.0
>



-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/