You just demonstrated one of the basic properties of vector packet processing: 
as the offered load increases, the cost per vector element decreases. Although 
you didn’t explicitly report the vector sizes involved, the vector size 
necessarily increases as the offered load increases. Anyhow, it’s easy to fish 
that that statistic out of the node runtime stats:

<start traffic>
“clear run”
<wait a while>
“show run”
<stop traffic>

You might ask: OK, why should the cost per packet decrease as the number of 
packets in a vector increases? When you first enter a dispatch function, none 
of the code involved is likely to be in the i-cache. The first packet incurs a 
bunch of fixed overhead to drag code into the i-cache, and to warm up the 
branch predictor. All of the other packets in the vector profit. On a 
per-packet basis, cost decreases as the vector size increases.

There are a number of secondary effects with the same net result. Until the 
vector size reaches 2, none of the graph nodes bother about prefetching. When 
dealing with quad-looped nodes: ‘s/2/4/’.

This property gives rise to a second interesting property: given a specific 
offered load and configuration, the vector size reaches a stable equilibrium. 
Imagine the circuit time in equilibrium. Add a small delay [clock interrupt at 
kernel level?] which increases the graph dispatch circuit time (rx ... process 
... tx ... repeat).

The next rx vector size will be larger, but since it will be processed more 
efficiently, the vector size will eventually return to the equilibrium value.

HTH... Dave


From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Mikado
Sent: Thursday, November 22, 2018 10:22 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Performance test issues : Average cpu time cost changes as 
the tx speed changes.

Hi,
Recently I’m developing a plugin based on VPP 18.07 to decode specified 
packets. I added it between dpdk-input and interface-output  using the same 
method of adding sample plugin in VPP source code. To test its performance in 
theory , I used  clib_cpu_time_now() to calculate the average cpu time cost  
when packets  go through my pulgin.When I use differnet tx speed to send 
packets to the device using VPP and my plugin , it  turns out  that the average 
cpu time changes as the tx speed changes. At frist , I assume it is caused by 
the cpu time cost when VPP moves packets to the next node. So I calculate the 
average cpu time of each node but it appears  the same.Then, I use the sample 
plugin to operate  the same test. The result is similar although it does not 
fluctuate much.
Now I’m confused. Isn’t it that all packets go through the same code and cost 
the same cpu time ?

Here is my code and test result.

Code added in sample/node.c:
static uword
sample_node_fn (vlib_main_t * vm,
      vlib_node_runtime_t * node,
      vlib_frame_t * frame)
{
  from = vlib_frame_vector_args (frame);
  n_left_from = frame->n_vectors;
  next_index = node->cached_next_index;

  sample_main.last_cpu_time = clib_cpu_time_now ();

  sample_main.total_pkts += n_left_from;
       ………
  while (n_left_from > 0){
       ………
       }
       ………
  vlib_node_increment_counter (vm, sample_node.index,
                               SAMPLE_ERROR_SWAPPED, pkts_swapped);

  sample_main.total_cpu_time += clib_cpu_time_now() - sample_main.last_cpu_time;

  return frame->n_vectors;
}


Test result:
Tx speed(Mb/s)      Average cpu time
190                        14
475                        12
665                        9
950                        7
1140                      7
1900                      6
2895                      6
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11385): https://lists.fd.io/g/vpp-dev/message/11385
Mute This Topic: https://lists.fd.io/mt/28292345/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to