Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote:
> Do we really switch more though?
> 
> Look at the difference in interrupts vs context switch. IPIs are an interrupt
> so if TTWU_QUEUE wakes process B using an IPI, does that count as a context
> switch?

Nope. Nor would it for NO_TTWU_QUEUE. A process waking another is just
that, a wakeup.

A context switch is when we stop running a process and start running
anther. A wakeup can lead to us deciding the newly woken task is a
better task to run, however its not a given.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Mike Galbraith
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote: 
> On Wed, Oct 03, 2012 at 03:30:01PM +0200, Mike Galbraith wrote:

> > nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.
> > 
> > procs ---memory-- ---swap-- -io -system-- 
> > -cpu--
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa st
> > TTW_QUEUE
> >  1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 
> > 15 84  0  0
> >  1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 
> > 15 84  0  0
> >  1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 
> > 16 83  0  0
> >  2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 
> > 20 76  0  0
> > NO_TTWU_QUEUE
> >  1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 
> > 27 71  0  0
> >  1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 
> > 27 71  0  0
> >  1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 
> > 27 72  0  0
> >  2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 
> > 27 71  0  0
> > 
> > We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
> > hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
> > the way context switch happy benchmarks usually work.
> > 
> 
> Do we really switch more though?

Yup, pipe-test measures the full round trip, and agrees with vmstat.
netperf TCP_RR agrees, tbench agrees... this STREAM thingy is the only
high frequency switcher I've seen go all weird like this.

(hm, wonder how fast box can blast IPIs.. 1MHz?)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Mel Gorman
On Wed, Oct 03, 2012 at 03:30:01PM +0200, Mike Galbraith wrote:
> On Wed, 2012-10-03 at 10:13 +0200, Mike Galbraith wrote:
> 
> > Watching all cores instead. 
> > 
> > switch rate ~890KHzswitch rate ~570KHz
> > NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
> > 5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
> > _raw_spin_lock_irqsave
> > 4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
> > __skb_recv_datagram
> > 2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
> > copy_user_generic_string
> > 2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
> > reschedule_interrupt
> > 2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
> > sock_alloc_send_pskb
> > 2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   [k] 
> > __schedule
> > 2.08%  [kernel]  [k] __skb_recv_datagram   2.31%  [kernel]   [k] 
> > try_to_wake_up
> > 1.81%  [kernel]  [k] udp_sendmsg   2.14%  [kernel]   [k] 
> > system_call
> > 1.76%  [kernel]  [k] system_call   1.98%  [kernel]   [k] 
> > udp_sendmsg
> > 1.73%  [kernel]  [k] __udp4_lib_lookup 1.96%  [kernel]   [k] 
> > __udp4_lib_lookup
> > 1.65%  [kernel]  [k] __slab_free.isra.42   1.78%  [kernel]   [k] 
> > sock_def_readable
> > 1.62%  [kernel]  [k] try_to_wake_up1.63%  [kernel]   [k] 
> > __slab_free.isra.42
> > 1.43%  [kernel]  [k] update_rq_clock   1.60%  [kernel]   [k] 
> > __switch_to
> > 1.43%  [kernel]  [k] sock_def_readable 1.52%  [kernel]   [k] 
> > dma_issue_pending_all
> > 1.41%  [kernel]  [k] dma_issue_pending_all 1.48%  [kernel]   [k] 
> > __ip_append_data.isra.35
> > 1.40%  [kernel]  [k] menu_select   1.44%  [kernel]   [k] 
> > _raw_spin_lock
> > 1.36%  [kernel]  [k] finish_task_switch1.38%  [kernel]   [k] 
> > _raw_spin_unlock_irqrestore
> > 1.30%  [kernel]  [k] ksize 1.33%  [kernel]   [k] 
> > __udp4_lib_rcv
> > 
> > Strange.
> 
> nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.
> 
> procs ---memory-- ---swap-- -io -system-- 
> -cpu--
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> TTW_QUEUE
>  1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 15 
> 84  0  0
>  1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 15 
> 84  0  0
>  1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 16 
> 83  0  0
>  2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 20 
> 76  0  0
> NO_TTWU_QUEUE
>  1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 27 
> 71  0  0
>  1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 27 
> 71  0  0
>  1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 27 
> 72  0  0
>  2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 27 
> 71  0  0
> 
> We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
> hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
> the way context switch happy benchmarks usually work.
> 

Do we really switch more though?

Look at the difference in interrupts vs context switch. IPIs are an interrupt
so if TTWU_QUEUE wakes process B using an IPI, does that count as a context
switch? It probably does not get accounted as a context switch even though
it's functionally similar in this case but I'd like to hear confirmation
of that.

If we did assume that these IPIs are effectively context switches then look
at the TTWU_QUEUE figures. There are 530K interrupts versus 33K interrupts
for NO_TTWU_QUEUE. If each one of those IPIs are effectively a context
switch then the actual switch rates are 1.5M switches versus 1.3 switches
and TTWU_QUEUE is actually switching faster.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Mel Gorman
On Wed, Oct 03, 2012 at 03:30:01PM +0200, Mike Galbraith wrote:
 On Wed, 2012-10-03 at 10:13 +0200, Mike Galbraith wrote:
 
  Watching all cores instead. 
  
  switch rate ~890KHzswitch rate ~570KHz
  NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
  5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
  _raw_spin_lock_irqsave
  4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
  __skb_recv_datagram
  2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
  copy_user_generic_string
  2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
  reschedule_interrupt
  2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
  sock_alloc_send_pskb
  2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   [k] 
  __schedule
  2.08%  [kernel]  [k] __skb_recv_datagram   2.31%  [kernel]   [k] 
  try_to_wake_up
  1.81%  [kernel]  [k] udp_sendmsg   2.14%  [kernel]   [k] 
  system_call
  1.76%  [kernel]  [k] system_call   1.98%  [kernel]   [k] 
  udp_sendmsg
  1.73%  [kernel]  [k] __udp4_lib_lookup 1.96%  [kernel]   [k] 
  __udp4_lib_lookup
  1.65%  [kernel]  [k] __slab_free.isra.42   1.78%  [kernel]   [k] 
  sock_def_readable
  1.62%  [kernel]  [k] try_to_wake_up1.63%  [kernel]   [k] 
  __slab_free.isra.42
  1.43%  [kernel]  [k] update_rq_clock   1.60%  [kernel]   [k] 
  __switch_to
  1.43%  [kernel]  [k] sock_def_readable 1.52%  [kernel]   [k] 
  dma_issue_pending_all
  1.41%  [kernel]  [k] dma_issue_pending_all 1.48%  [kernel]   [k] 
  __ip_append_data.isra.35
  1.40%  [kernel]  [k] menu_select   1.44%  [kernel]   [k] 
  _raw_spin_lock
  1.36%  [kernel]  [k] finish_task_switch1.38%  [kernel]   [k] 
  _raw_spin_unlock_irqrestore
  1.30%  [kernel]  [k] ksize 1.33%  [kernel]   [k] 
  __udp4_lib_rcv
  
  Strange.
 
 nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.
 
 procs ---memory-- ---swap-- -io -system-- 
 -cpu--
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
 st
 TTW_QUEUE
  1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 15 
 84  0  0
  1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 15 
 84  0  0
  1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 16 
 83  0  0
  2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 20 
 76  0  0
 NO_TTWU_QUEUE
  1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 27 
 71  0  0
  1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 27 
 71  0  0
  1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 27 
 72  0  0
  2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 27 
 71  0  0
 
 We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
 hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
 the way context switch happy benchmarks usually work.
 

Do we really switch more though?

Look at the difference in interrupts vs context switch. IPIs are an interrupt
so if TTWU_QUEUE wakes process B using an IPI, does that count as a context
switch? It probably does not get accounted as a context switch even though
it's functionally similar in this case but I'd like to hear confirmation
of that.

If we did assume that these IPIs are effectively context switches then look
at the TTWU_QUEUE figures. There are 530K interrupts versus 33K interrupts
for NO_TTWU_QUEUE. If each one of those IPIs are effectively a context
switch then the actual switch rates are 1.5M switches versus 1.3 switches
and TTWU_QUEUE is actually switching faster.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Mike Galbraith
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote: 
 On Wed, Oct 03, 2012 at 03:30:01PM +0200, Mike Galbraith wrote:

  nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.
  
  procs ---memory-- ---swap-- -io -system-- 
  -cpu--
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
  wa st
  TTW_QUEUE
   1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 
  15 84  0  0
   1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 
  15 84  0  0
   1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 
  16 83  0  0
   2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 
  20 76  0  0
  NO_TTWU_QUEUE
   1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 
  27 71  0  0
   1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 
  27 71  0  0
   1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 
  27 72  0  0
   2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 
  27 71  0  0
  
  We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
  hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
  the way context switch happy benchmarks usually work.
  
 
 Do we really switch more though?

Yup, pipe-test measures the full round trip, and agrees with vmstat.
netperf TCP_RR agrees, tbench agrees... this STREAM thingy is the only
high frequency switcher I've seen go all weird like this.

(hm, wonder how fast box can blast IPIs.. 1MHz?)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote:
 Do we really switch more though?
 
 Look at the difference in interrupts vs context switch. IPIs are an interrupt
 so if TTWU_QUEUE wakes process B using an IPI, does that count as a context
 switch?

Nope. Nor would it for NO_TTWU_QUEUE. A process waking another is just
that, a wakeup.

A context switch is when we stop running a process and start running
anther. A wakeup can lead to us deciding the newly woken task is a
better task to run, however its not a given.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-05 Thread Mel Gorman
On Wed, Oct 03, 2012 at 11:04:16AM -0700, Rick Jones wrote:
> On 10/03/2012 02:47 AM, Mel Gorman wrote:
> >On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:
> >>On 10/02/2012 01:45 AM, Mel Gorman wrote:
> >>
> >>>SIZE=64
> >>>taskset -c 0 netserver
> >>>taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
> >>>-P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> >>
> >>Just FYI, unless you are running a hacked version of netperf, the
> >>"50" in "-i 50,6" will be silently truncated to 30.
> >>
> >
> >I'm not using a hacked version of netperf. The 50,6 has been there a long
> >time so I'm not sure where I took it from any more. It might have been an
> >older version or me being over-zealous at the time.
> 
> No version has ever gone past 30.  It has been that way since the
> confidence interval code was contributed.  It doesn't change
> anything, so it hasn't messed-up any results.  It would be good to
> fix but not critical.
> 

It's fixed already. I don't know where the 50 came out of in that case.
I was probably thinking "many iterations" at the time without reading
the docs properly.

> >>PS - I trust it is the receive-side throughput being reported/used
> >>with UDP_STREAM :)
> >
> >Good question. Now that I examine the scripts, it is in fact the sending
> >side that is being reported which is flawed. Granted I'm not expecting any
> >UDP loss on loopback and looking through a range of results, the
> >difference is marginal. It's still wrong to report just the sending side
> >for UDP_STREAM and I'll correct the scripts for it in the future.
> 
> Switching from sending to receiving throughput in UDP_STREAM could
> be a non-trivial disconnect in throughputs.  As Eric mentions, the
> receiver could be dropping lots of datagrams if it cannot keep-up,
> and netperf makes not attempt to provide any application-layer
> flow-control.
> 

I'll bear that in mind in the future if there is a sudden change in
throughput and determine if that is because of this send/receive disconnect
or something else. The raw data is recorded in either case so it should
be manageable.

> Not sure which version of netperf you are using to know whether or
> not it has gone to the "omni" code path.  If you aren't using 2.5.0
> or 2.6.0 then the confidence intervals will have been computed based
> on the receive side throughput, so you will at least know that it
> was stable, even if it wasn't the same as the sending side.
> 

I'm using 2.4.5 because there was "no reason" to upgrade. I'll take a
closer look at upgrading soon based on this comment because there is
a possibility that the confidence interval detection is a little
broken in the version I'm using.

> The top of trunk will use the remote's receive stats for the omni
> migration of a UDP_STREAM test too.  I think it is that way in 2.5.0
> and 2.6.0 as well but I've not gone into the repository to check.
> 
> Of course, that means you don't necessarily know that the sending
> throughput met your confidence intervals :)
> 

:)

> If you are on 2.5.0 or later, you may find:
> 
> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
> 
> helpful when looking to parse results.
> 

Thanks.

> One more, little thing - taskset may indeed be better for what you
> are doing (it will happen "sooner" certainly), but there is also the
> global -T option to bind netperf/netserver to the specified CPU id. 
> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dT_002c-Global-41
> 

I was aware of the option but had avoided using it as I think when I
worked with taskset inially the option didn't exist.

Thanks a lot Rick for the suggestions, they are very helpful.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-05 Thread Mel Gorman
On Wed, Oct 03, 2012 at 11:04:16AM -0700, Rick Jones wrote:
 On 10/03/2012 02:47 AM, Mel Gorman wrote:
 On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:
 On 10/02/2012 01:45 AM, Mel Gorman wrote:
 
 SIZE=64
 taskset -c 0 netserver
 taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
 -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
 
 Just FYI, unless you are running a hacked version of netperf, the
 50 in -i 50,6 will be silently truncated to 30.
 
 
 I'm not using a hacked version of netperf. The 50,6 has been there a long
 time so I'm not sure where I took it from any more. It might have been an
 older version or me being over-zealous at the time.
 
 No version has ever gone past 30.  It has been that way since the
 confidence interval code was contributed.  It doesn't change
 anything, so it hasn't messed-up any results.  It would be good to
 fix but not critical.
 

It's fixed already. I don't know where the 50 came out of in that case.
I was probably thinking many iterations at the time without reading
the docs properly.

 PS - I trust it is the receive-side throughput being reported/used
 with UDP_STREAM :)
 
 Good question. Now that I examine the scripts, it is in fact the sending
 side that is being reported which is flawed. Granted I'm not expecting any
 UDP loss on loopback and looking through a range of results, the
 difference is marginal. It's still wrong to report just the sending side
 for UDP_STREAM and I'll correct the scripts for it in the future.
 
 Switching from sending to receiving throughput in UDP_STREAM could
 be a non-trivial disconnect in throughputs.  As Eric mentions, the
 receiver could be dropping lots of datagrams if it cannot keep-up,
 and netperf makes not attempt to provide any application-layer
 flow-control.
 

I'll bear that in mind in the future if there is a sudden change in
throughput and determine if that is because of this send/receive disconnect
or something else. The raw data is recorded in either case so it should
be manageable.

 Not sure which version of netperf you are using to know whether or
 not it has gone to the omni code path.  If you aren't using 2.5.0
 or 2.6.0 then the confidence intervals will have been computed based
 on the receive side throughput, so you will at least know that it
 was stable, even if it wasn't the same as the sending side.
 

I'm using 2.4.5 because there was no reason to upgrade. I'll take a
closer look at upgrading soon based on this comment because there is
a possibility that the confidence interval detection is a little
broken in the version I'm using.

 The top of trunk will use the remote's receive stats for the omni
 migration of a UDP_STREAM test too.  I think it is that way in 2.5.0
 and 2.6.0 as well but I've not gone into the repository to check.
 
 Of course, that means you don't necessarily know that the sending
 throughput met your confidence intervals :)
 

:)

 If you are on 2.5.0 or later, you may find:
 
 http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection
 
 helpful when looking to parse results.
 

Thanks.

 One more, little thing - taskset may indeed be better for what you
 are doing (it will happen sooner certainly), but there is also the
 global -T option to bind netperf/netserver to the specified CPU id. 
 http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dT_002c-Global-41
 

I was aware of the option but had avoided using it as I think when I
worked with taskset inially the option didn't exist.

Thanks a lot Rick for the suggestions, they are very helpful.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Rick Jones

On 10/03/2012 02:47 AM, Mel Gorman wrote:

On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:

On 10/02/2012 01:45 AM, Mel Gorman wrote:


SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE


Just FYI, unless you are running a hacked version of netperf, the
"50" in "-i 50,6" will be silently truncated to 30.



I'm not using a hacked version of netperf. The 50,6 has been there a long
time so I'm not sure where I took it from any more. It might have been an
older version or me being over-zealous at the time.


No version has ever gone past 30.  It has been that way since the 
confidence interval code was contributed.  It doesn't change anything, 
so it hasn't messed-up any results.  It would be good to fix but not 
critical.



PS - I trust it is the receive-side throughput being reported/used
with UDP_STREAM :)


Good question. Now that I examine the scripts, it is in fact the sending
side that is being reported which is flawed. Granted I'm not expecting any
UDP loss on loopback and looking through a range of results, the
difference is marginal. It's still wrong to report just the sending side
for UDP_STREAM and I'll correct the scripts for it in the future.


Switching from sending to receiving throughput in UDP_STREAM could be a 
non-trivial disconnect in throughputs.  As Eric mentions, the receiver 
could be dropping lots of datagrams if it cannot keep-up, and netperf 
makes not attempt to provide any application-layer flow-control.


Not sure which version of netperf you are using to know whether or not 
it has gone to the "omni" code path.  If you aren't using 2.5.0 or 2.6.0 
then the confidence intervals will have been computed based on the 
receive side throughput, so you will at least know that it was stable, 
even if it wasn't the same as the sending side.


The top of trunk will use the remote's receive stats for the omni 
migration of a UDP_STREAM test too.  I think it is that way in 2.5.0 and 
2.6.0 as well but I've not gone into the repository to check.


Of course, that means you don't necessarily know that the sending 
throughput met your confidence intervals :)


If you are on 2.5.0 or later, you may find:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

helpful when looking to parse results.

One more, little thing - taskset may indeed be better for what you are 
doing (it will happen "sooner" certainly), but there is also the global 
-T option to bind netperf/netserver to the specified CPU id. 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dT_002c-Global-41


happy benchmarking,

rick jones
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Wed, 2012-10-03 at 10:13 +0200, Mike Galbraith wrote:

> Watching all cores instead. 
> 
> switch rate ~890KHzswitch rate ~570KHz
> NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
> 5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
> _raw_spin_lock_irqsave
> 4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
> __skb_recv_datagram
> 2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
> copy_user_generic_string
> 2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
> reschedule_interrupt
> 2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
> sock_alloc_send_pskb
> 2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   [k] 
> __schedule
> 2.08%  [kernel]  [k] __skb_recv_datagram   2.31%  [kernel]   [k] 
> try_to_wake_up
> 1.81%  [kernel]  [k] udp_sendmsg   2.14%  [kernel]   [k] 
> system_call
> 1.76%  [kernel]  [k] system_call   1.98%  [kernel]   [k] 
> udp_sendmsg
> 1.73%  [kernel]  [k] __udp4_lib_lookup 1.96%  [kernel]   [k] 
> __udp4_lib_lookup
> 1.65%  [kernel]  [k] __slab_free.isra.42   1.78%  [kernel]   [k] 
> sock_def_readable
> 1.62%  [kernel]  [k] try_to_wake_up1.63%  [kernel]   [k] 
> __slab_free.isra.42
> 1.43%  [kernel]  [k] update_rq_clock   1.60%  [kernel]   [k] 
> __switch_to
> 1.43%  [kernel]  [k] sock_def_readable 1.52%  [kernel]   [k] 
> dma_issue_pending_all
> 1.41%  [kernel]  [k] dma_issue_pending_all 1.48%  [kernel]   [k] 
> __ip_append_data.isra.35
> 1.40%  [kernel]  [k] menu_select   1.44%  [kernel]   [k] 
> _raw_spin_lock
> 1.36%  [kernel]  [k] finish_task_switch1.38%  [kernel]   [k] 
> _raw_spin_unlock_irqrestore
> 1.30%  [kernel]  [k] ksize 1.33%  [kernel]   [k] 
> __udp4_lib_rcv
> 
> Strange.

nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.

procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
TTW_QUEUE
 1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 15 
84  0  0
 1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 15 
84  0  0
 1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 16 
83  0  0
 2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 20 
76  0  0
NO_TTWU_QUEUE
 1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 27 71 
 0  0
 1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 27 71 
 0  0
 1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 27 72 
 0  0
 2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 27 71 
 0  0

We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
the way context switch happy benchmarks usually work.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Eric Dumazet
On Wed, 2012-10-03 at 10:47 +0100, Mel Gorman wrote:
> On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:

> > PS - I trust it is the receive-side throughput being reported/used
> > with UDP_STREAM :)
> 
> Good question. Now that I examine the scripts, it is in fact the sending
> side that is being reported which is flawed. Granted I'm not expecting any
> UDP loss on loopback and looking through a range of results, the
> difference is marginal. It's still wrong to report just the sending side
> for UDP_STREAM and I'll correct the scripts for it in the future.

You can have drops even on loopback if the receiver doesnt drain its
socket receive queue fast enough.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mel Gorman
On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:
> On 10/02/2012 01:45 AM, Mel Gorman wrote:
> 
> >SIZE=64
> >taskset -c 0 netserver
> >taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
> >15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> 
> Just FYI, unless you are running a hacked version of netperf, the
> "50" in "-i 50,6" will be silently truncated to 30.
> 

I'm not using a hacked version of netperf. The 50,6 has been there a long
time so I'm not sure where I took it from any more. It might have been an
older version or me being over-zealous at the time.

> PS - I trust it is the receive-side throughput being reported/used
> with UDP_STREAM :)

Good question. Now that I examine the scripts, it is in fact the sending
side that is being reported which is flawed. Granted I'm not expecting any
UDP loss on loopback and looking through a range of results, the
difference is marginal. It's still wrong to report just the sending side
for UDP_STREAM and I'll correct the scripts for it in the future.

Thanks!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Wed, 2012-10-03 at 08:50 +0200, Mike Galbraith wrote: 
> On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
> > On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
> > > On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
> > > > On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> > > > > this?
> > > > > 
> > > > 
> > > > You saw problems with TCP_RR where as this is UDP_STREAM.
> > > 
> > > Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
> > > would tell me anything about why those numbers happen.
> > > 
> > > > I'm running this through MMTests with a version of the
> > > > configs/config-global-dhp__network-performance file that only runs
> > > > netperf-udp. Ultimately it runs netperf for a size something like
> > > > this
> > > > 
> > > > SIZE=64
> > > > taskset -c 0 netserver
> > > > taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 
> > > > -- -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> > > 
> > 
> > lock_stat points at the runqueue lock which makes sense as without the
> > IPI the rq->lock has to be taken
> 
> Perf top says we're spinning in ttwu() with NO_TTWU_QUEUE.
> 
> nohz=off idle=halt, netperf -l bignum vs -i 50,6 -I 99,1 -l 20, watching
> with taskset -c 3 perf top -C 1 -U.
> 
> Switch rate rises by nearly 200k/s with NO_TTWU_QUEUE, ttwu() climbs to
> #1 spot.  Annotate shows while (p->on_cpu) cpu_relax() eating ~50% of
> all ttwu() cycles.  Turn TTWU_QUEUE back on, spin evaporates, switch
> rate drops, throughput climbs.  Nifty.

But drop idle=halt, switch rate climbs by ~320k/s, and resched_task()
takes #1 spot.  Hm, maybe perf gets a tad confused monitoring 1 CPU?

NO_TTWU_QUEUE nohz=off idle=halt   TTWU_QUEUE nohz=off idle=halt
  NO_TTWU_QUEUE nohz=off
5.73%  [kernel]   [k] try_to_wake_up   4.40%  [kernel]   [k] 
sock_alloc_send_pskb 5.02%  [kernel]   [k] resched_task
3.60%  [kernel]   [k] sock_alloc_send_pskb 4.20%  [kernel]   [k] 
__udp4_lib_lookup4.00%  [kernel]   [k] sock_alloc_send_pskb
3.25%  [kernel]   [k] udp_sendmsg  3.69%  [kernel]   [k] 
udp_sendmsg  3.30%  [kernel]   [k] udp_sendmsg
3.05%  [kernel]   [k] __udp4_lib_rcv   3.52%  [kernel]   [k] 
_raw_spin_lock   3.11%  [kernel]   [k] _raw_spin_lock_irqsave
3.02%  [kernel]   [k] __udp4_lib_lookup2.96%  [kernel]   [k] 
dma_issue_pending_all2.95%  [kernel]   [k] __udp4_lib_lookup
2.73%  [kernel]   [k] _raw_spin_lock_irqsave   2.91%  [kernel]   [k] 
try_to_wake_up   2.65%  [kernel]   [k] try_to_wake_up
2.68%  [kernel]   [k] sock_def_readable2.62%  [kernel]   [k] 
copy_user_generic_string 2.45%  [kernel]   [k] dma_issue_pending_all
2.61%  [kernel]   [k] _raw_spin_lock   2.57%  [kernel]   [k] 
__ip_append_data.isra.35 2.44%  [kernel]   [k] ksize
2.25%  [kernel]   [k] dma_issue_pending_all2.36%  [kernel]   [k] 
fib_table_lookup 2.43%  [kernel]   [k] sock_def_readable
2.21%  [kernel]   [k] __ip_append_data.isra.35 2.27%  [kernel]   [k] 
sock_def_readable2.18%  [kernel]   [k] copy_user_generic_string
2.09%  [kernel]   [k] resched_task 2.23%  [kernel]   [k] 
system_call  2.15%  [kernel]   [k] __ip_append_data.isra.35
1.99%  [kernel]   [k] copy_user_generic_string 2.15%  [kernel]   [k] 
_raw_spin_lock_irqsave   1.91%  [kernel]   [k] check_preempt_curr
1.96%  [kernel]   [k] fib_table_lookup 1.97%  [kernel]   [k] 
__udp4_lib_rcv   1.85%  [kernel]   [k] udp_send_skb
1.96%  [kernel]   [k] udp_send_skb 1.81%  [kernel]   [k] 
__alloc_skb  1.84%  [kernel]   [k] system_call
1.82%  [kernel]   [k] ksize1.79%  [kernel]   [k] 
udp_send_skb 1.80%  [kernel]   [k] _raw_spin_lock
1.75%  [kernel]   [k] system_call  1.77%  [kernel]   [k] 
__do_softirq 1.78%  [kernel]   [k] __netif_receive_skb
1.73%  [kernel]   [k] __wake_up_common 1.76%  [kernel]   [k] 
sock_sendmsg 1.74%  [kernel]   [k] task_waking_fair
1.60%  [kernel]   [k] task_waking_fair 1.69%  [kernel]   [k] 
enqueue_to_backlog   1.70%  [kernel]   [k] __do_softirq

Watching all cores instead. 

switch rate ~890KHzswitch rate ~570KHz
NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
_raw_spin_lock_irqsave
4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
__skb_recv_datagram
2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
copy_user_generic_string
2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
reschedule_interrupt
2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
sock_alloc_send_pskb
2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   

Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
> On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
> > On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
> > > On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
> > 
> > > > Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> > > > this?
> > > > 
> > > 
> > > You saw problems with TCP_RR where as this is UDP_STREAM.
> > 
> > Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
> > would tell me anything about why those numbers happen.
> > 
> > > I'm running this through MMTests with a version of the
> > > configs/config-global-dhp__network-performance file that only runs
> > > netperf-udp. Ultimately it runs netperf for a size something like
> > > this
> > > 
> > > SIZE=64
> > > taskset -c 0 netserver
> > > taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
> > > -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> > 
> 
> lock_stat points at the runqueue lock which makes sense as without the
> IPI the rq->lock has to be taken

Perf top says we're spinning in ttwu() with NO_TTWU_QUEUE.

nohz=off idle=halt, netperf -l bignum vs -i 50,6 -I 99,1 -l 20, watching
with taskset -c 3 perf top -C 1 -U.

Switch rate rises by nearly 200k/s with NO_TTWU_QUEUE, ttwu() climbs to
#1 spot.  Annotate shows while (p->on_cpu) cpu_relax() eating ~50% of
all ttwu() cycles.  Turn TTWU_QUEUE back on, spin evaporates, switch
rate drops, throughput climbs.  Nifty.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
 On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
  On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
   On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
  
Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
this?

   
   You saw problems with TCP_RR where as this is UDP_STREAM.
  
  Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
  would tell me anything about why those numbers happen.
  
   I'm running this through MMTests with a version of the
   configs/config-global-dhp__network-performance file that only runs
   netperf-udp. Ultimately it runs netperf for a size something like
   this
   
   SIZE=64
   taskset -c 0 netserver
   taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
   -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
  
 
 lock_stat points at the runqueue lock which makes sense as without the
 IPI the rq-lock has to be taken

Perf top says we're spinning in ttwu() with NO_TTWU_QUEUE.

nohz=off idle=halt, netperf -l bignum vs -i 50,6 -I 99,1 -l 20, watching
with taskset -c 3 perf top -C 1 -U.

Switch rate rises by nearly 200k/s with NO_TTWU_QUEUE, ttwu() climbs to
#1 spot.  Annotate shows while (p-on_cpu) cpu_relax() eating ~50% of
all ttwu() cycles.  Turn TTWU_QUEUE back on, spin evaporates, switch
rate drops, throughput climbs.  Nifty.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Wed, 2012-10-03 at 08:50 +0200, Mike Galbraith wrote: 
 On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
  On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
   On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
   
 Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
 this?
 

You saw problems with TCP_RR where as this is UDP_STREAM.
   
   Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
   would tell me anything about why those numbers happen.
   
I'm running this through MMTests with a version of the
configs/config-global-dhp__network-performance file that only runs
netperf-udp. Ultimately it runs netperf for a size something like
this

SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 
-- -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
   
  
  lock_stat points at the runqueue lock which makes sense as without the
  IPI the rq-lock has to be taken
 
 Perf top says we're spinning in ttwu() with NO_TTWU_QUEUE.
 
 nohz=off idle=halt, netperf -l bignum vs -i 50,6 -I 99,1 -l 20, watching
 with taskset -c 3 perf top -C 1 -U.
 
 Switch rate rises by nearly 200k/s with NO_TTWU_QUEUE, ttwu() climbs to
 #1 spot.  Annotate shows while (p-on_cpu) cpu_relax() eating ~50% of
 all ttwu() cycles.  Turn TTWU_QUEUE back on, spin evaporates, switch
 rate drops, throughput climbs.  Nifty.

But drop idle=halt, switch rate climbs by ~320k/s, and resched_task()
takes #1 spot.  Hm, maybe perf gets a tad confused monitoring 1 CPU?

NO_TTWU_QUEUE nohz=off idle=halt   TTWU_QUEUE nohz=off idle=halt
  NO_TTWU_QUEUE nohz=off
5.73%  [kernel]   [k] try_to_wake_up   4.40%  [kernel]   [k] 
sock_alloc_send_pskb 5.02%  [kernel]   [k] resched_task
3.60%  [kernel]   [k] sock_alloc_send_pskb 4.20%  [kernel]   [k] 
__udp4_lib_lookup4.00%  [kernel]   [k] sock_alloc_send_pskb
3.25%  [kernel]   [k] udp_sendmsg  3.69%  [kernel]   [k] 
udp_sendmsg  3.30%  [kernel]   [k] udp_sendmsg
3.05%  [kernel]   [k] __udp4_lib_rcv   3.52%  [kernel]   [k] 
_raw_spin_lock   3.11%  [kernel]   [k] _raw_spin_lock_irqsave
3.02%  [kernel]   [k] __udp4_lib_lookup2.96%  [kernel]   [k] 
dma_issue_pending_all2.95%  [kernel]   [k] __udp4_lib_lookup
2.73%  [kernel]   [k] _raw_spin_lock_irqsave   2.91%  [kernel]   [k] 
try_to_wake_up   2.65%  [kernel]   [k] try_to_wake_up
2.68%  [kernel]   [k] sock_def_readable2.62%  [kernel]   [k] 
copy_user_generic_string 2.45%  [kernel]   [k] dma_issue_pending_all
2.61%  [kernel]   [k] _raw_spin_lock   2.57%  [kernel]   [k] 
__ip_append_data.isra.35 2.44%  [kernel]   [k] ksize
2.25%  [kernel]   [k] dma_issue_pending_all2.36%  [kernel]   [k] 
fib_table_lookup 2.43%  [kernel]   [k] sock_def_readable
2.21%  [kernel]   [k] __ip_append_data.isra.35 2.27%  [kernel]   [k] 
sock_def_readable2.18%  [kernel]   [k] copy_user_generic_string
2.09%  [kernel]   [k] resched_task 2.23%  [kernel]   [k] 
system_call  2.15%  [kernel]   [k] __ip_append_data.isra.35
1.99%  [kernel]   [k] copy_user_generic_string 2.15%  [kernel]   [k] 
_raw_spin_lock_irqsave   1.91%  [kernel]   [k] check_preempt_curr
1.96%  [kernel]   [k] fib_table_lookup 1.97%  [kernel]   [k] 
__udp4_lib_rcv   1.85%  [kernel]   [k] udp_send_skb
1.96%  [kernel]   [k] udp_send_skb 1.81%  [kernel]   [k] 
__alloc_skb  1.84%  [kernel]   [k] system_call
1.82%  [kernel]   [k] ksize1.79%  [kernel]   [k] 
udp_send_skb 1.80%  [kernel]   [k] _raw_spin_lock
1.75%  [kernel]   [k] system_call  1.77%  [kernel]   [k] 
__do_softirq 1.78%  [kernel]   [k] __netif_receive_skb
1.73%  [kernel]   [k] __wake_up_common 1.76%  [kernel]   [k] 
sock_sendmsg 1.74%  [kernel]   [k] task_waking_fair
1.60%  [kernel]   [k] task_waking_fair 1.69%  [kernel]   [k] 
enqueue_to_backlog   1.70%  [kernel]   [k] __do_softirq

Watching all cores instead. 

switch rate ~890KHzswitch rate ~570KHz
NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
_raw_spin_lock_irqsave
4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
__skb_recv_datagram
2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
copy_user_generic_string
2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
reschedule_interrupt
2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
sock_alloc_send_pskb
2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   [k] __schedule
2.08%  [kernel]  [k] __skb_recv_datagram   2.31%  [kernel]   [k] 
try_to_wake_up
1.81%  

Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mel Gorman
On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:
 On 10/02/2012 01:45 AM, Mel Gorman wrote:
 
 SIZE=64
 taskset -c 0 netserver
 taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
 
 Just FYI, unless you are running a hacked version of netperf, the
 50 in -i 50,6 will be silently truncated to 30.
 

I'm not using a hacked version of netperf. The 50,6 has been there a long
time so I'm not sure where I took it from any more. It might have been an
older version or me being over-zealous at the time.

 PS - I trust it is the receive-side throughput being reported/used
 with UDP_STREAM :)

Good question. Now that I examine the scripts, it is in fact the sending
side that is being reported which is flawed. Granted I'm not expecting any
UDP loss on loopback and looking through a range of results, the
difference is marginal. It's still wrong to report just the sending side
for UDP_STREAM and I'll correct the scripts for it in the future.

Thanks!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Eric Dumazet
On Wed, 2012-10-03 at 10:47 +0100, Mel Gorman wrote:
 On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:

  PS - I trust it is the receive-side throughput being reported/used
  with UDP_STREAM :)
 
 Good question. Now that I examine the scripts, it is in fact the sending
 side that is being reported which is flawed. Granted I'm not expecting any
 UDP loss on loopback and looking through a range of results, the
 difference is marginal. It's still wrong to report just the sending side
 for UDP_STREAM and I'll correct the scripts for it in the future.

You can have drops even on loopback if the receiver doesnt drain its
socket receive queue fast enough.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Mike Galbraith
On Wed, 2012-10-03 at 10:13 +0200, Mike Galbraith wrote:

 Watching all cores instead. 
 
 switch rate ~890KHzswitch rate ~570KHz
 NO_TTWU_QUEUE nohz=off TTWU_QUEUE nohz=off
 5.38%  [kernel]  [k] __schedule4.81%  [kernel]   [k] 
 _raw_spin_lock_irqsave
 4.29%  [kernel]  [k] _raw_spin_lock_irqsave3.36%  [kernel]   [k] 
 __skb_recv_datagram
 2.88%  [kernel]  [k] resched_task  2.71%  [kernel]   [k] 
 copy_user_generic_string
 2.60%  [kernel]  [k] copy_user_generic_string  2.67%  [kernel]   [k] 
 reschedule_interrupt
 2.38%  [kernel]  [k] __switch_to   2.62%  [kernel]   [k] 
 sock_alloc_send_pskb
 2.15%  [kernel]  [k] sock_alloc_send_pskb  2.52%  [kernel]   [k] 
 __schedule
 2.08%  [kernel]  [k] __skb_recv_datagram   2.31%  [kernel]   [k] 
 try_to_wake_up
 1.81%  [kernel]  [k] udp_sendmsg   2.14%  [kernel]   [k] 
 system_call
 1.76%  [kernel]  [k] system_call   1.98%  [kernel]   [k] 
 udp_sendmsg
 1.73%  [kernel]  [k] __udp4_lib_lookup 1.96%  [kernel]   [k] 
 __udp4_lib_lookup
 1.65%  [kernel]  [k] __slab_free.isra.42   1.78%  [kernel]   [k] 
 sock_def_readable
 1.62%  [kernel]  [k] try_to_wake_up1.63%  [kernel]   [k] 
 __slab_free.isra.42
 1.43%  [kernel]  [k] update_rq_clock   1.60%  [kernel]   [k] 
 __switch_to
 1.43%  [kernel]  [k] sock_def_readable 1.52%  [kernel]   [k] 
 dma_issue_pending_all
 1.41%  [kernel]  [k] dma_issue_pending_all 1.48%  [kernel]   [k] 
 __ip_append_data.isra.35
 1.40%  [kernel]  [k] menu_select   1.44%  [kernel]   [k] 
 _raw_spin_lock
 1.36%  [kernel]  [k] finish_task_switch1.38%  [kernel]   [k] 
 _raw_spin_unlock_irqrestore
 1.30%  [kernel]  [k] ksize 1.33%  [kernel]   [k] 
 __udp4_lib_rcv
 
 Strange.

nohz=off, pipe-test with one half pinned to CPU0, the other to CPU1.

procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
TTW_QUEUE
 1  0  0 3039488  50948 44472000 0 0 539724 1013417  1 15 
84  0  0
 1  0  0 3039488  50956 44472000 0 1 540853 1015679  1 15 
84  0  0
 1  0  0 3039364  50956 44472000 0 0 541630 1017239  1 16 
83  0  0
 2  0  0 3038992  50956 44472000 0 0 335550 1096569  4 20 
76  0  0
NO_TTWU_QUEUE
 1  0  0 3038992  50956 44472000 0 0 33100 1318984  1 27 71 
 0  0
 1  0  0 3038868  50956 44472000 0 0 33100 1319126  2 27 71 
 0  0
 1  0  0 3038868  50956 44472000 0 0 33097 1317968  1 27 72 
 0  0
 2  0  0 3038868  50964 44472000 0 1 33104 1318558  2 27 71 
 0  0

We can switch faster with NO_TTWU_QUEUE, so we switch more, and that
hurts netperf UDP_STREAM throughput.. somehow.  Fatter is better is not
the way context switch happy benchmarks usually work.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Rick Jones

On 10/03/2012 02:47 AM, Mel Gorman wrote:

On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote:

On 10/02/2012 01:45 AM, Mel Gorman wrote:


SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE


Just FYI, unless you are running a hacked version of netperf, the
50 in -i 50,6 will be silently truncated to 30.



I'm not using a hacked version of netperf. The 50,6 has been there a long
time so I'm not sure where I took it from any more. It might have been an
older version or me being over-zealous at the time.


No version has ever gone past 30.  It has been that way since the 
confidence interval code was contributed.  It doesn't change anything, 
so it hasn't messed-up any results.  It would be good to fix but not 
critical.



PS - I trust it is the receive-side throughput being reported/used
with UDP_STREAM :)


Good question. Now that I examine the scripts, it is in fact the sending
side that is being reported which is flawed. Granted I'm not expecting any
UDP loss on loopback and looking through a range of results, the
difference is marginal. It's still wrong to report just the sending side
for UDP_STREAM and I'll correct the scripts for it in the future.


Switching from sending to receiving throughput in UDP_STREAM could be a 
non-trivial disconnect in throughputs.  As Eric mentions, the receiver 
could be dropping lots of datagrams if it cannot keep-up, and netperf 
makes not attempt to provide any application-layer flow-control.


Not sure which version of netperf you are using to know whether or not 
it has gone to the omni code path.  If you aren't using 2.5.0 or 2.6.0 
then the confidence intervals will have been computed based on the 
receive side throughput, so you will at least know that it was stable, 
even if it wasn't the same as the sending side.


The top of trunk will use the remote's receive stats for the omni 
migration of a UDP_STREAM test too.  I think it is that way in 2.5.0 and 
2.6.0 as well but I've not gone into the repository to check.


Of course, that means you don't necessarily know that the sending 
throughput met your confidence intervals :)


If you are on 2.5.0 or later, you may find:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

helpful when looking to parse results.

One more, little thing - taskset may indeed be better for what you are 
doing (it will happen sooner certainly), but there is also the global 
-T option to bind netperf/netserver to the specified CPU id. 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dT_002c-Global-41


happy benchmarking,

rick jones
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Rick Jones

On 10/02/2012 01:45 AM, Mel Gorman wrote:


SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE


Just FYI, unless you are running a hacked version of netperf, the "50" 
in "-i 50,6" will be silently truncated to 30.


happy benchmarking,

rick jones

PS - I trust it is the receive-side throughput being reported/used with 
UDP_STREAM :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
> On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
> > On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
> > > On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
> > 
> > > > Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> > > > this?
> > > > 
> > > 
> > > You saw problems with TCP_RR where as this is UDP_STREAM.
> > 
> > Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
> > would tell me anything about why those numbers happen.
> > 
> > > I'm running this through MMTests with a version of the
> > > configs/config-global-dhp__network-performance file that only runs
> > > netperf-udp. Ultimately it runs netperf for a size something like
> > > this
> > > 
> > > SIZE=64
> > > taskset -c 0 netserver
> > > taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
> > > -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> > 
> 
> lock_stat points at the runqueue lock which makes sense as without the
> IPI the rq->lock has to be taken

Well, it'll still be locked, but locally when the IPI queues the wakee.

I can confirm your UDP_STREAM findings.. but do the same cross pin with
tbench, and the story reverses.  A difference is that netperf/netserver
utilization is 100%/63% NO_TTWU_QUEUE and 100%/57% TTWU_QUEUE, ie
netserver is using more cycles with NO_TTWU_QUEUE.

tbench/tbench_srv is around 73%/56%, both sleep vs 1, both take the same
locks in ttwu().. and perform better with NO_TTWU_QUEUE.

Ponder ponder speculate.. what won't happen with TTWU_QUEUE is you won't
contend on the remote lock while a task is trying to schedule off.  We
don't content with tbench or TCP_RR, NO_TTWU_QUEUE is the cheaper.. we
do contend with UDP_STREAM, TTWU_QUEUE _becomes_ the cheaper.  Oh dear.

> 3.3.0-vanilla
>   class namecon-bouncescontentions   
> waittime-min   waittime-max waittime-totalacq-bounces   acquisitions   
> holdtime-min   holdtime-max holdtime-total
> ---
>>lock: 37062  37063
>0.08  10.43   11037.66  410701252 1029063029   
> 0.00  14.35   234556106.12
>-
>>lock  14064  
> [] __schedule+0xc6/0x710
>>lock 33  
> [] idle_balance+0x13d/0x190
>>lock  11810  
> [] ttwu_queue+0x47/0xf0
>>lock283  
> [] task_rq_lock+0x56/0xa0
>-
>>lock  22305  
> [] ttwu_queue+0x47/0xf0
>>lock  11260  
> [] __schedule+0xc6/0x710
>>lock158  
> [] idle_balance+0x13d/0x190
>>lock  8  
> [] load_balance+0x356/0x500
> 
> 3.3.0-revert
>>lock: 10831  10833
>0.09  10.474448.19  87877  768253556   
> 0.00  16.00   140103672.33
>-
>>lock685  
> [] load_balance+0x348/0x500
>>lock   8688  
> [] try_to_wake_up+0x215/0x2e0
>>lock   1010  
> [] __schedule+0xc6/0x710
>>lock228  
> [] task_rq_lock+0x56/0xa0
>-
>>lock   3317  
> [] __schedule+0xc6/0x710
>>lock789  
> [] load_balance+0x348/0x500
>>lock363  
> [] load_balance+0x214/0x500
>>lock  2  
> [] load_balance+0x356/0x500
> 
> Note the difference in acq-bounces. I had to stop at this point and move
> back to some CMA breakage I introduced.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mel Gorman
On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
> On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
> > On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
> 
> > > Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> > > this?
> > > 
> > 
> > You saw problems with TCP_RR where as this is UDP_STREAM.
> 
> Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
> would tell me anything about why those numbers happen.
> 
> > I'm running this through MMTests with a version of the
> > configs/config-global-dhp__network-performance file that only runs
> > netperf-udp. Ultimately it runs netperf for a size something like
> > this
> > 
> > SIZE=64
> > taskset -c 0 netserver
> > taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
> > 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
> 

lock_stat points at the runqueue lock which makes sense as without the
IPI the rq->lock has to be taken

3.3.0-vanilla
  class namecon-bouncescontentions   
waittime-min   waittime-max waittime-totalacq-bounces   acquisitions   
holdtime-min   holdtime-max holdtime-total
---
   >lock: 37062  37063  
 0.08  10.43   11037.66  410701252 1029063029   
0.00  14.35   234556106.12
   -
   >lock  14064  
[] __schedule+0xc6/0x710
   >lock 33  
[] idle_balance+0x13d/0x190
   >lock  11810  
[] ttwu_queue+0x47/0xf0
   >lock283  
[] task_rq_lock+0x56/0xa0
   -
   >lock  22305  
[] ttwu_queue+0x47/0xf0
   >lock  11260  
[] __schedule+0xc6/0x710
   >lock158  
[] idle_balance+0x13d/0x190
   >lock  8  
[] load_balance+0x356/0x500

3.3.0-revert
   >lock: 10831  10833  
 0.09  10.474448.19  87877  768253556   
0.00  16.00   140103672.33
   -
   >lock685  
[] load_balance+0x348/0x500
   >lock   8688  
[] try_to_wake_up+0x215/0x2e0
   >lock   1010  
[] __schedule+0xc6/0x710
   >lock228  
[] task_rq_lock+0x56/0xa0
   -
   >lock   3317  
[] __schedule+0xc6/0x710
   >lock789  
[] load_balance+0x348/0x500
   >lock363  
[] load_balance+0x214/0x500
   >lock  2  
[] load_balance+0x356/0x500

Note the difference in acq-bounces. I had to stop at this point and move
back to some CMA breakage I introduced.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
> On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:

> > Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> > this?
> > 
> 
> You saw problems with TCP_RR where as this is UDP_STREAM.

Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
would tell me anything about why those numbers happen.

> I'm running this through MMTests with a version of the
> configs/config-global-dhp__network-performance file that only runs
> netperf-udp. Ultimately it runs netperf for a size something like
> this
> 
> SIZE=64
> taskset -c 0 netserver
> taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
> 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE

Thanks.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mel Gorman
On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
> On Tue, 2012-10-02 at 07:51 +0100, Mel Gorman wrote: 
> > I'm going through old test results to see could I find any leftover
> > performance regressions that have not yet been fixed (most have at this 
> > point
> > or at least changed in such a way to make a plain revert impossible). One
> > major regression still left is with netperf UDP_STREAM regression. Bisection
> > points the finger straight at 518cd623 (sched: Only queue remote wakeups
> > when crossing cache boundaries).  Problem was introduced between 3.2 and
> > 3.3, current kernel still sucks as the following results show.
> > 
> > NETPERF UDP
> >3.3.0 3.3.0 3.6.0
> >  vanilla   revert-518cd623   vanilla
> > Tput 64 328.38 (  0.00%)  436.58 ( 32.95%)  312.51 ( -4.83%)
> > Tput 128661.43 (  0.00%)  869.88 ( 31.52%)  625.70 ( -5.40%)
> > Tput 256   1310.27 (  0.00%) 1724.45 ( 31.61%) 1243.65 ( -5.08%)
> > Tput 1024  5466.85 (  0.00%) 6601.43 ( 20.75%) 4838.86 (-11.49%)
> > Tput 2048 10885.95 (  0.00%)12694.06 ( 16.61%) 9161.75 (-15.84%)
> > Tput 3312 15930.33 (  0.00%)19327.67 ( 21.33%)14106.26 (-11.45%)
> > Tput 4096 18025.47 (  0.00%)22183.12 ( 23.07%)16636.01 ( -7.71%)
> > Tput 8192 30076.42 (  0.00%)37280.86 ( 23.95%)28575.84 ( -4.99%)
> > Tput 1638447742.12 (  0.00%)56123.21 ( 17.55%)46060.57 ( -3.52%)
> 
> Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
> this?
> 

You saw problems with TCP_RR where as this is UDP_STREAM.

I'm running this through MMTests with a version of the
configs/config-global-dhp__network-performance file that only runs
netperf-udp. Ultimately it runs netperf for a size something like
this

SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 07:51 +0100, Mel Gorman wrote: 
> I'm going through old test results to see could I find any leftover
> performance regressions that have not yet been fixed (most have at this point
> or at least changed in such a way to make a plain revert impossible). One
> major regression still left is with netperf UDP_STREAM regression. Bisection
> points the finger straight at 518cd623 (sched: Only queue remote wakeups
> when crossing cache boundaries).  Problem was introduced between 3.2 and
> 3.3, current kernel still sucks as the following results show.
> 
> NETPERF UDP
>3.3.0 3.3.0 3.6.0
>  vanilla   revert-518cd623   vanilla
> Tput 64 328.38 (  0.00%)  436.58 ( 32.95%)  312.51 ( -4.83%)
> Tput 128661.43 (  0.00%)  869.88 ( 31.52%)  625.70 ( -5.40%)
> Tput 256   1310.27 (  0.00%) 1724.45 ( 31.61%) 1243.65 ( -5.08%)
> Tput 1024  5466.85 (  0.00%) 6601.43 ( 20.75%) 4838.86 (-11.49%)
> Tput 2048 10885.95 (  0.00%)12694.06 ( 16.61%) 9161.75 (-15.84%)
> Tput 3312 15930.33 (  0.00%)19327.67 ( 21.33%)14106.26 (-11.45%)
> Tput 4096 18025.47 (  0.00%)22183.12 ( 23.07%)16636.01 ( -7.71%)
> Tput 8192 30076.42 (  0.00%)37280.86 ( 23.95%)28575.84 ( -4.99%)
> Tput 1638447742.12 (  0.00%)56123.21 ( 17.55%)46060.57 ( -3.52%)

Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
this?

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 07:51 +0100, Mel Gorman wrote: 
 I'm going through old test results to see could I find any leftover
 performance regressions that have not yet been fixed (most have at this point
 or at least changed in such a way to make a plain revert impossible). One
 major regression still left is with netperf UDP_STREAM regression. Bisection
 points the finger straight at 518cd623 (sched: Only queue remote wakeups
 when crossing cache boundaries).  Problem was introduced between 3.2 and
 3.3, current kernel still sucks as the following results show.
 
 NETPERF UDP
3.3.0 3.3.0 3.6.0
  vanilla   revert-518cd623   vanilla
 Tput 64 328.38 (  0.00%)  436.58 ( 32.95%)  312.51 ( -4.83%)
 Tput 128661.43 (  0.00%)  869.88 ( 31.52%)  625.70 ( -5.40%)
 Tput 256   1310.27 (  0.00%) 1724.45 ( 31.61%) 1243.65 ( -5.08%)
 Tput 1024  5466.85 (  0.00%) 6601.43 ( 20.75%) 4838.86 (-11.49%)
 Tput 2048 10885.95 (  0.00%)12694.06 ( 16.61%) 9161.75 (-15.84%)
 Tput 3312 15930.33 (  0.00%)19327.67 ( 21.33%)14106.26 (-11.45%)
 Tput 4096 18025.47 (  0.00%)22183.12 ( 23.07%)16636.01 ( -7.71%)
 Tput 8192 30076.42 (  0.00%)37280.86 ( 23.95%)28575.84 ( -4.99%)
 Tput 1638447742.12 (  0.00%)56123.21 ( 17.55%)46060.57 ( -3.52%)

Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
this?

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mel Gorman
On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
 On Tue, 2012-10-02 at 07:51 +0100, Mel Gorman wrote: 
  I'm going through old test results to see could I find any leftover
  performance regressions that have not yet been fixed (most have at this 
  point
  or at least changed in such a way to make a plain revert impossible). One
  major regression still left is with netperf UDP_STREAM regression. Bisection
  points the finger straight at 518cd623 (sched: Only queue remote wakeups
  when crossing cache boundaries).  Problem was introduced between 3.2 and
  3.3, current kernel still sucks as the following results show.
  
  NETPERF UDP
 3.3.0 3.3.0 3.6.0
   vanilla   revert-518cd623   vanilla
  Tput 64 328.38 (  0.00%)  436.58 ( 32.95%)  312.51 ( -4.83%)
  Tput 128661.43 (  0.00%)  869.88 ( 31.52%)  625.70 ( -5.40%)
  Tput 256   1310.27 (  0.00%) 1724.45 ( 31.61%) 1243.65 ( -5.08%)
  Tput 1024  5466.85 (  0.00%) 6601.43 ( 20.75%) 4838.86 (-11.49%)
  Tput 2048 10885.95 (  0.00%)12694.06 ( 16.61%) 9161.75 (-15.84%)
  Tput 3312 15930.33 (  0.00%)19327.67 ( 21.33%)14106.26 (-11.45%)
  Tput 4096 18025.47 (  0.00%)22183.12 ( 23.07%)16636.01 ( -7.71%)
  Tput 8192 30076.42 (  0.00%)37280.86 ( 23.95%)28575.84 ( -4.99%)
  Tput 1638447742.12 (  0.00%)56123.21 ( 17.55%)46060.57 ( -3.52%)
 
 Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
 this?
 

You saw problems with TCP_RR where as this is UDP_STREAM.

I'm running this through MMTests with a version of the
configs/config-global-dhp__network-performance file that only runs
netperf-udp. Ultimately it runs netperf for a size something like
this

SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
 On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:

  Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
  this?
  
 
 You saw problems with TCP_RR where as this is UDP_STREAM.

Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
would tell me anything about why those numbers happen.

 I'm running this through MMTests with a version of the
 configs/config-global-dhp__network-performance file that only runs
 netperf-udp. Ultimately it runs netperf for a size something like
 this
 
 SIZE=64
 taskset -c 0 netserver
 taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE

Thanks.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mel Gorman
On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
 On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
  On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
 
   Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
   this?
   
  
  You saw problems with TCP_RR where as this is UDP_STREAM.
 
 Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
 would tell me anything about why those numbers happen.
 
  I'm running this through MMTests with a version of the
  configs/config-global-dhp__network-performance file that only runs
  netperf-udp. Ultimately it runs netperf for a size something like
  this
  
  SIZE=64
  taskset -c 0 netserver
  taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
  15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
 

lock_stat points at the runqueue lock which makes sense as without the
IPI the rq-lock has to be taken

3.3.0-vanilla
  class namecon-bouncescontentions   
waittime-min   waittime-max waittime-totalacq-bounces   acquisitions   
holdtime-min   holdtime-max holdtime-total
---
   rq-lock: 37062  37063  
 0.08  10.43   11037.66  410701252 1029063029   
0.00  14.35   234556106.12
   -
   rq-lock  14064  
[81420a76] __schedule+0xc6/0x710
   rq-lock 33  
[8107791d] idle_balance+0x13d/0x190
   rq-lock  11810  
[8106cac7] ttwu_queue+0x47/0xf0
   rq-lock283  
[81067f86] task_rq_lock+0x56/0xa0
   -
   rq-lock  22305  
[8106cac7] ttwu_queue+0x47/0xf0
   rq-lock  11260  
[81420a76] __schedule+0xc6/0x710
   rq-lock158  
[8107791d] idle_balance+0x13d/0x190
   rq-lock  8  
[810772a6] load_balance+0x356/0x500

3.3.0-revert
   rq-lock: 10831  10833  
 0.09  10.474448.19  87877  768253556   
0.00  16.00   140103672.33
   -
   rq-lock685  
[810771d8] load_balance+0x348/0x500
   rq-lock   8688  
[8106d045] try_to_wake_up+0x215/0x2e0
   rq-lock   1010  
[814209b6] __schedule+0xc6/0x710
   rq-lock228  
[81067f86] task_rq_lock+0x56/0xa0
   -
   rq-lock   3317  
[814209b6] __schedule+0xc6/0x710
   rq-lock789  
[810771d8] load_balance+0x348/0x500
   rq-lock363  
[810770a4] load_balance+0x214/0x500
   rq-lock  2  
[810771e6] load_balance+0x356/0x500

Note the difference in acq-bounces. I had to stop at this point and move
back to some CMA breakage I introduced.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Mike Galbraith
On Tue, 2012-10-02 at 14:14 +0100, Mel Gorman wrote: 
 On Tue, Oct 02, 2012 at 11:31:22AM +0200, Mike Galbraith wrote:
  On Tue, 2012-10-02 at 09:45 +0100, Mel Gorman wrote: 
   On Tue, Oct 02, 2012 at 09:49:36AM +0200, Mike Galbraith wrote:
  
Hm, 518cd623 fixed up the troubles I saw.  How exactly are you running
this?

   
   You saw problems with TCP_RR where as this is UDP_STREAM.
  
  Yeah, but I wanted to stare at UDP_STREAM as you run it to see if it
  would tell me anything about why those numbers happen.
  
   I'm running this through MMTests with a version of the
   configs/config-global-dhp__network-performance file that only runs
   netperf-udp. Ultimately it runs netperf for a size something like
   this
   
   SIZE=64
   taskset -c 0 netserver
   taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- 
   -P 15895 -s 32768 -S 32768 -m $SIZE -M $SIZE
  
 
 lock_stat points at the runqueue lock which makes sense as without the
 IPI the rq-lock has to be taken

Well, it'll still be locked, but locally when the IPI queues the wakee.

I can confirm your UDP_STREAM findings.. but do the same cross pin with
tbench, and the story reverses.  A difference is that netperf/netserver
utilization is 100%/63% NO_TTWU_QUEUE and 100%/57% TTWU_QUEUE, ie
netserver is using more cycles with NO_TTWU_QUEUE.

tbench/tbench_srv is around 73%/56%, both sleep vs 1, both take the same
locks in ttwu().. and perform better with NO_TTWU_QUEUE.

Ponder ponder speculate.. what won't happen with TTWU_QUEUE is you won't
contend on the remote lock while a task is trying to schedule off.  We
don't content with tbench or TCP_RR, NO_TTWU_QUEUE is the cheaper.. we
do contend with UDP_STREAM, TTWU_QUEUE _becomes_ the cheaper.  Oh dear.

 3.3.0-vanilla
   class namecon-bouncescontentions   
 waittime-min   waittime-max waittime-totalacq-bounces   acquisitions   
 holdtime-min   holdtime-max holdtime-total
 ---
rq-lock: 37062  37063
0.08  10.43   11037.66  410701252 1029063029   
 0.00  14.35   234556106.12
-
rq-lock  14064  
 [81420a76] __schedule+0xc6/0x710
rq-lock 33  
 [8107791d] idle_balance+0x13d/0x190
rq-lock  11810  
 [8106cac7] ttwu_queue+0x47/0xf0
rq-lock283  
 [81067f86] task_rq_lock+0x56/0xa0
-
rq-lock  22305  
 [8106cac7] ttwu_queue+0x47/0xf0
rq-lock  11260  
 [81420a76] __schedule+0xc6/0x710
rq-lock158  
 [8107791d] idle_balance+0x13d/0x190
rq-lock  8  
 [810772a6] load_balance+0x356/0x500
 
 3.3.0-revert
rq-lock: 10831  10833
0.09  10.474448.19  87877  768253556   
 0.00  16.00   140103672.33
-
rq-lock685  
 [810771d8] load_balance+0x348/0x500
rq-lock   8688  
 [8106d045] try_to_wake_up+0x215/0x2e0
rq-lock   1010  
 [814209b6] __schedule+0xc6/0x710
rq-lock228  
 [81067f86] task_rq_lock+0x56/0xa0
-
rq-lock   3317  
 [814209b6] __schedule+0xc6/0x710
rq-lock789  
 [810771d8] load_balance+0x348/0x500
rq-lock363  
 [810770a4] load_balance+0x214/0x500
rq-lock  2  
 [810771e6] load_balance+0x356/0x500
 
 Note the difference in acq-bounces. I had to stop at this point and move
 back to some CMA breakage I introduced.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-02 Thread Rick Jones

On 10/02/2012 01:45 AM, Mel Gorman wrote:


SIZE=64
taskset -c 0 netserver
taskset -c 1 netperf -t UDP_STREAM -i 50,6 -I 99,1 -l 20 -H 127.0.0.1 -- -P 
15895 -s 32768 -S 32768 -m $SIZE -M $SIZE


Just FYI, unless you are running a hacked version of netperf, the 50 
in -i 50,6 will be silently truncated to 30.


happy benchmarking,

rick jones

PS - I trust it is the receive-side throughput being reported/used with 
UDP_STREAM :)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/