RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 06:17 -0700, Eric Dumazet wrote: > On Mon, 2014-03-24 at 10:30 +, David Laight wrote: > > ip_fast_csum() either needs an explicit "m" constraint for the actual > > buffer (and target) bytes, or the stronger "memory" constraint. > > The 'volatile' is then not needed. I

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 10:30 +, David Laight wrote: > From: Eric Dumazet > > On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > > > From: Eric Dumazet > > > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > > > > > It looks like a barrier() would be more appropriate. > > > > > > barrier() ==

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread David Laight
From: Eric Dumazet > On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > > From: Eric Dumazet > > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > > > It looks like a barrier() would be more appropriate. > > > > barrier() == __asm__ __volatile__(:::"memory") > > Indeed, but now you mention it,

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread David Laight
From: Eric Dumazet On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 21 Mar 2014 05:50:50 -0700 It looks like a barrier() would be more appropriate. barrier() == __asm__ __volatile__(:::memory) Indeed, but now you mention

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 10:30 +, David Laight wrote: From: Eric Dumazet On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 21 Mar 2014 05:50:50 -0700 It looks like a barrier() would be more appropriate. barrier() ==

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 06:17 -0700, Eric Dumazet wrote: On Mon, 2014-03-24 at 10:30 +, David Laight wrote: ip_fast_csum() either needs an explicit m constraint for the actual buffer (and target) bytes, or the stronger memory constraint. The 'volatile' is then not needed. I am testing

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-23 Thread Eric Dumazet
On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > From: Eric Dumazet > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > It looks like a barrier() would be more appropriate. > > barrier() == __asm__ __volatile__(:::"memory") Indeed, but now you mention it, ip_fast_csum() do not uses volatile

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-23 Thread Eric Dumazet
On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 21 Mar 2014 05:50:50 -0700 It looks like a barrier() would be more appropriate. barrier() == __asm__ __volatile__(:::memory) Indeed, but now you mention it, ip_fast_csum() do not

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Miller
From: Eric Dumazet Date: Fri, 21 Mar 2014 05:50:50 -0700 > It looks like a barrier() would be more appropriate. barrier() == __asm__ __volatile__(:::"memory") -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Laight
From: Eric Dumazet > On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > > Eric Dumazet writes: > > > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > > is insane... > > > > > > Couldn't it just be the cache miss? > > Or the fact that we mix 16 bit stores and

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:47 -0700, Eric Dumazet wrote: > Another idea would be to move the ip_fast_csum() call at the end of > inet_gro_complete() > > I'll try this : > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 8c54870db792..0ca8f350a532 100644 > --- a/net/ipv4/af_inet.c >

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:32 -0700, Eric Dumazet wrote: > On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: > > > Or the fact that we mix 16 bit stores and 32bit loads ? > > > > iph->tot_len = newlen; > > iph->check = 0; > > iph->check = ip_fast_csum(iph, 5); > > Yep definitely. Using 16

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: > Or the fact that we mix 16 bit stores and 32bit loads ? > > iph->tot_len = newlen; > iph->check = 0; > iph->check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads in ip_fast_csum() totally removes the stall. I no longer see

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Andi Kleen
On Fri, Mar 21, 2014 at 05:50:50AM -0700, Eric Dumazet wrote: > On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > > Eric Dumazet writes: > > > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > > is insane... > > > > > > Couldn't it just be the cache miss? >

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > Eric Dumazet writes: > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > is insane... > > > Couldn't it just be the cache miss? Or the fact that we mix 16 bit stores and 32bit loads ? iph->tot_len = newlen;

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: Eric Dumazet eric.duma...@gmail.com writes: I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that is insane... Couldn't it just be the cache miss? Or the fact that we mix 16 bit stores and 32bit loads ?

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Andi Kleen
On Fri, Mar 21, 2014 at 05:50:50AM -0700, Eric Dumazet wrote: On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: Eric Dumazet eric.duma...@gmail.com writes: I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that is insane... Couldn't it just be the cache

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: Or the fact that we mix 16 bit stores and 32bit loads ? iph-tot_len = newlen; iph-check = 0; iph-check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads in ip_fast_csum() totally removes the stall. I no longer see

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:32 -0700, Eric Dumazet wrote: On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: Or the fact that we mix 16 bit stores and 32bit loads ? iph-tot_len = newlen; iph-check = 0; iph-check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads in

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:47 -0700, Eric Dumazet wrote: Another idea would be to move the ip_fast_csum() call at the end of inet_gro_complete() I'll try this : diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8c54870db792..0ca8f350a532 100644 --- a/net/ipv4/af_inet.c +++

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Laight
From: Eric Dumazet On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: Eric Dumazet eric.duma...@gmail.com writes: I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that is insane... Couldn't it just be the cache miss? Or the fact that we mix 16 bit stores

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 21 Mar 2014 05:50:50 -0700 It looks like a barrier() would be more appropriate. barrier() == __asm__ __volatile__(:::memory) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to