Re: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-16 Thread David Miller
From: David Laight 
Date: Thu, 16 Mar 2017 12:12:06 +

> From: Shannon Nelson
>> Sent: 16 March 2017 00:18
>> To: David Laight; net...@vger.kernel.org; da...@davemloft.net
>> On 3/15/2017 1:50 AM, David Laight wrote:
>> > From: Shannon Nelson
>> >> Sent: 14 March 2017 17:25
>> > ...
>> >> + if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
>> >> + dev->stats.multicast++;
>> >
>> > I'd guess that:
>> >dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
>> > generates faster code.
>> > Especially if is_multicast_ether_addr(x) is (*x >> 7).
> 
> I'd clearly got brain-fade there, mcast bit is the first transmitted bit
> (on ethernet) but the bytes are sent LSB first (like async).
>> >David
>> 
>> Hi David, thanks for the comment.  My local instruction level
>> performance guru is on vacation this week so I can't do a quick check
>> with him today on this.  However, I"m not too worried here since the
>> inline code for is_multicast_ether_addr() is simply
>> 
>>  return 0x01 & addr[0];
>> 
>> and objdump tells me that on sparc it compiles down to a simple single
>> byte load and compare:
>> 
>>  325c:   c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
>>  3260:   80 88 60 01 btst  1, %g1
>>  3264:   32 60 00 b3 bne,a,pn   %xcc, 3530 
>>  3268:   c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
>>  dev->stats.multicast++;
> 
> Followed by a branch that might be marked 'assume taken' so the
> normal path takes the branch.

The branch is predicted not taken, so the fallthrough happens most
often.  And this is optimal for most Niagara parts as taken branches
make the cpu thread yield whereas non-taken branches do not.

But this is such a petty thing to be discussing compared to the substance
of this person's changes.  David, I really wish you wouldn't waste people's
time with this stuff.

Maybe if you had to review hundreds of networking patches every day like
I do, you would start to understand the costs of the interference you
place into the review process when you bring up such small matters like
this all the time.

I'd much rather you review the substance of a person's changes,
because that actually helps things more forward.  If you want to micro
optimize then _do it on your own time_, submit patches that do the
micro optimization, and have it go through the review process like
everyone else's changes.

I very much appreciate your cooperation on this matter.

Thanks.


Re: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-16 Thread David Miller
From: David Laight 
Date: Thu, 16 Mar 2017 12:12:06 +

> From: Shannon Nelson
>> Sent: 16 March 2017 00:18
>> To: David Laight; net...@vger.kernel.org; da...@davemloft.net
>> On 3/15/2017 1:50 AM, David Laight wrote:
>> > From: Shannon Nelson
>> >> Sent: 14 March 2017 17:25
>> > ...
>> >> + if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
>> >> + dev->stats.multicast++;
>> >
>> > I'd guess that:
>> >dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
>> > generates faster code.
>> > Especially if is_multicast_ether_addr(x) is (*x >> 7).
> 
> I'd clearly got brain-fade there, mcast bit is the first transmitted bit
> (on ethernet) but the bytes are sent LSB first (like async).
>> >David
>> 
>> Hi David, thanks for the comment.  My local instruction level
>> performance guru is on vacation this week so I can't do a quick check
>> with him today on this.  However, I"m not too worried here since the
>> inline code for is_multicast_ether_addr() is simply
>> 
>>  return 0x01 & addr[0];
>> 
>> and objdump tells me that on sparc it compiles down to a simple single
>> byte load and compare:
>> 
>>  325c:   c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
>>  3260:   80 88 60 01 btst  1, %g1
>>  3264:   32 60 00 b3 bne,a,pn   %xcc, 3530 
>>  3268:   c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
>>  dev->stats.multicast++;
> 
> Followed by a branch that might be marked 'assume taken' so the
> normal path takes the branch.

The branch is predicted not taken, so the fallthrough happens most
often.  And this is optimal for most Niagara parts as taken branches
make the cpu thread yield whereas non-taken branches do not.

But this is such a petty thing to be discussing compared to the substance
of this person's changes.  David, I really wish you wouldn't waste people's
time with this stuff.

Maybe if you had to review hundreds of networking patches every day like
I do, you would start to understand the costs of the interference you
place into the review process when you bring up such small matters like
this all the time.

I'd much rather you review the substance of a person's changes,
because that actually helps things more forward.  If you want to micro
optimize then _do it on your own time_, submit patches that do the
micro optimization, and have it go through the review process like
everyone else's changes.

I very much appreciate your cooperation on this matter.

Thanks.


RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-16 Thread David Laight
From: Shannon Nelson
> Sent: 16 March 2017 00:18
> To: David Laight; net...@vger.kernel.org; da...@davemloft.net
> On 3/15/2017 1:50 AM, David Laight wrote:
> > From: Shannon Nelson
> >> Sent: 14 March 2017 17:25
> > ...
> >> +  if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> >> +  dev->stats.multicast++;
> >
> > I'd guess that:
> > dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
> > generates faster code.
> > Especially if is_multicast_ether_addr(x) is (*x >> 7).

I'd clearly got brain-fade there, mcast bit is the first transmitted bit
(on ethernet) but the bytes are sent LSB first (like async).
> > David
> 
> Hi David, thanks for the comment.  My local instruction level
> performance guru is on vacation this week so I can't do a quick check
> with him today on this.  However, I"m not too worried here since the
> inline code for is_multicast_ether_addr() is simply
> 
>   return 0x01 & addr[0];
> 
> and objdump tells me that on sparc it compiles down to a simple single
> byte load and compare:
> 
>  325c:c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
>  3260:80 88 60 01 btst  1, %g1
>  3264:32 60 00 b3 bne,a,pn   %xcc, 3530 
>  3268:c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
>   dev->stats.multicast++;

Followed by a branch that might be marked 'assume taken' so the
normal path takes the branch.
I guess that is followed by 'add 1 to %g1', 'stx %g1, [ %l1 + 0x168 ]'
and a branch to 3530.
GCC must be using that condition to generate get the bottom of a loop
to 'fallthrough' to its top!

My version should generate something like:
ldub  [ %g2 + %g3 ], %g1
ldx   [ %l1 + 0x168 ], %g2
and   1, %g1
add   %g1, %g2, %g2
stx   %g2, [ %l1 + 0x168 ]
While this looks like 5 instructions (rather than 2) it has fewer pipeline
stalls and can be 'spread out' into the surrounding lines of code to
reduce the stalls further.

> I don't think this driver will ever be used on anything bug sparc, so
> I'm not worried about how x86 might compile this.

On x86 gcc is likely to ignore the 'unlikely' and generate a forwards
(predicted not taken) branch around the increment.
I've had to but asm comments in the else part of conditionals like
that to force gcc to generate a forwards jump to the 'unlikely' statements.

David





RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-16 Thread David Laight
From: Shannon Nelson
> Sent: 16 March 2017 00:18
> To: David Laight; net...@vger.kernel.org; da...@davemloft.net
> On 3/15/2017 1:50 AM, David Laight wrote:
> > From: Shannon Nelson
> >> Sent: 14 March 2017 17:25
> > ...
> >> +  if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> >> +  dev->stats.multicast++;
> >
> > I'd guess that:
> > dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
> > generates faster code.
> > Especially if is_multicast_ether_addr(x) is (*x >> 7).

I'd clearly got brain-fade there, mcast bit is the first transmitted bit
(on ethernet) but the bytes are sent LSB first (like async).
> > David
> 
> Hi David, thanks for the comment.  My local instruction level
> performance guru is on vacation this week so I can't do a quick check
> with him today on this.  However, I"m not too worried here since the
> inline code for is_multicast_ether_addr() is simply
> 
>   return 0x01 & addr[0];
> 
> and objdump tells me that on sparc it compiles down to a simple single
> byte load and compare:
> 
>  325c:c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
>  3260:80 88 60 01 btst  1, %g1
>  3264:32 60 00 b3 bne,a,pn   %xcc, 3530 
>  3268:c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
>   dev->stats.multicast++;

Followed by a branch that might be marked 'assume taken' so the
normal path takes the branch.
I guess that is followed by 'add 1 to %g1', 'stx %g1, [ %l1 + 0x168 ]'
and a branch to 3530.
GCC must be using that condition to generate get the bottom of a loop
to 'fallthrough' to its top!

My version should generate something like:
ldub  [ %g2 + %g3 ], %g1
ldx   [ %l1 + 0x168 ], %g2
and   1, %g1
add   %g1, %g2, %g2
stx   %g2, [ %l1 + 0x168 ]
While this looks like 5 instructions (rather than 2) it has fewer pipeline
stalls and can be 'spread out' into the surrounding lines of code to
reduce the stalls further.

> I don't think this driver will ever be used on anything bug sparc, so
> I'm not worried about how x86 might compile this.

On x86 gcc is likely to ignore the 'unlikely' and generate a forwards
(predicted not taken) branch around the increment.
I've had to but asm comments in the else part of conditionals like
that to force gcc to generate a forwards jump to the 'unlikely' statements.

David





Re: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-15 Thread Shannon Nelson

On 3/15/2017 1:50 AM, David Laight wrote:

From: Shannon Nelson

Sent: 14 March 2017 17:25

...

+   if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
+   dev->stats.multicast++;


I'd guess that:
dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
generates faster code.
Especially if is_multicast_ether_addr(x) is (*x >> 7).

David


Hi David, thanks for the comment.  My local instruction level 
performance guru is on vacation this week so I can't do a quick check 
with him today on this.  However, I"m not too worried here since the 
inline code for is_multicast_ether_addr() is simply


return 0x01 & addr[0];

and objdump tells me that on sparc it compiles down to a simple single 
byte load and compare:


325c:   c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
3260:   80 88 60 01 btst  1, %g1
3264:   32 60 00 b3 bne,a,pn   %xcc, 3530 
3268:   c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
dev->stats.multicast++;

I don't think this driver will ever be used on anything bug sparc, so 
I'm not worried about how x86 might compile this.


sln



Re: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-15 Thread Shannon Nelson

On 3/15/2017 1:50 AM, David Laight wrote:

From: Shannon Nelson

Sent: 14 March 2017 17:25

...

+   if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
+   dev->stats.multicast++;


I'd guess that:
dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
generates faster code.
Especially if is_multicast_ether_addr(x) is (*x >> 7).

David


Hi David, thanks for the comment.  My local instruction level 
performance guru is on vacation this week so I can't do a quick check 
with him today on this.  However, I"m not too worried here since the 
inline code for is_multicast_ether_addr() is simply


return 0x01 & addr[0];

and objdump tells me that on sparc it compiles down to a simple single 
byte load and compare:


325c:   c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
3260:   80 88 60 01 btst  1, %g1
3264:   32 60 00 b3 bne,a,pn   %xcc, 3530 
3268:   c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
dev->stats.multicast++;

I don't think this driver will ever be used on anything bug sparc, so 
I'm not worried about how x86 might compile this.


sln



RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-15 Thread David Laight
From: Shannon Nelson
> Sent: 14 March 2017 17:25
...
> + if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> + dev->stats.multicast++;

I'd guess that:
dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
generates faster code.
Especially if is_multicast_ether_addr(x) is (*x >> 7).

David



RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-15 Thread David Laight
From: Shannon Nelson
> Sent: 14 March 2017 17:25
...
> + if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> + dev->stats.multicast++;

I'd guess that:
dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
generates faster code.
Especially if is_multicast_ether_addr(x) is (*x >> 7).

David