Thanks Yangin,
What does this define mean? Every 10 second some kind of book keeping of
the packet processing cycles ? Are you saying to make this even bigger in
time. 1000 seconds or something? If I want to disable what do I do?
Thanks, Shahaji

On Mon, Jul 6, 2020 at 10:30 PM Yanqin Wei <yanqin....@arm.com> wrote:

> Hi Shahaji,
>
>
>
> It seems to be caused by some periodic task.  In the pmd thread, pmd auto
> load balance would be done periodically.
>
> /* Time in microseconds of the interval in which rxq processing cycles used
>
> * in rxq to pmd assignments is measured and stored. */
>
> #define PMD_RXQ_INTERVAL_LEN 10000000LL
>
>
>
> Would you like to disable it if it is not necessary?
>
>
>
> Best Regards,
>
> Wei Yanqin
>
>
>
> *From:* Shahaji Bhosle <shahaji.bho...@broadcom.com>
> *Sent:* Monday, July 6, 2020 8:24 PM
> *To:* Yanqin Wei <yanqin....@arm.com>
> *Cc:* Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd <
> n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed <
> lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex
> Barba <alex.ba...@broadcom.com>
> *Subject:* Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP
> (iperf3)
>
>
>
> Hi Yanqin,
>
> The drops are random intervals, sometimes I can run for minutes without
> drops. The case is very borderline with when CPUs are close to 99% and with
> around 1000 flows. We see the drops once every 10-15 seconds and its random
> in nature. If I use one ring per core the drops go away, if I enable EMC
> then the drops go away etc.
>
> Thanks, Shahaji
>
>
>
> On Mon, Jul 6, 2020 at 5:27 AM Yanqin Wei <yanqin....@arm.com> wrote:
>
> Hi Shahaji,
>
>
>
> I have not measured context switch overhead, but I feel it should be
> acceptable. Because 10Mpps throughput with zero-packet drop(20s) could be
> achieved in some arm server.  Maybe you could make performance profiling on
> your test bench to find out the root cause of performance degradation of
>  multi-rings.
>
>
>
> Best Regards,
>
> Wei Yanqin
>
>
>
> *From:* Shahaji Bhosle <shahaji.bho...@broadcom.com>
> *Sent:* Thursday, July 2, 2020 9:27 PM
> *To:* Yanqin Wei <yanqin....@arm.com>
> *Cc:* Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd <
> n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed <
> lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex
> Barba <alex.ba...@broadcom.com>
> *Subject:* Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP
> (iperf3)
>
>
>
> Thanks Yanqin,
>
> I am not seeing any context switches beyond 40usec in our do nothing loop
> test. But when OvS packets multiple rings(queues) on the same CPU and the
> number of packet it starts batching (MAX_BURST_SIZE) the toops will will
> take more time, I can see rings getting getting filled up. And then its a
> feedback loop. CPUs are running close to 100% any disturbance at that point
> I think is too much.
>
> Do you have any data that you use to monitor OvS. I am doing all the above
> experiments without OvS.
>
> Thanks, Shahaji
>
>
>
> On Thu, Jul 2, 2020 at 4:43 AM Yanqin Wei <yanqin....@arm.com> wrote:
>
> Hi Shahaji,
>
> IIUC, 1Hz time tick cannot be disabled even if full dynticks, right? But I
> have no idea of why it caused packet loss because it should be only a small
> overhead when rcu_nocbs is enabled .
>
> Best Regards,
> Wei Yanqin
>
> ===========
>
> From: Shahaji Bhosle <shahaji.bho...@broadcom.com>
> Sent: Thursday, July 2, 2020 6:11 AM
> To: Yanqin Wei <yanqin....@arm.com>
> Cc: Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd <
> n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed <
> lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex
> Barba <alex.ba...@broadcom.com>
> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP
> (iperf3)
>
> Hi Yanqin,
> I added the patch you gave me to my script which runs a do nothing for
> loop. You can see the spikes in the below plot. 976/1000 times we are
> perfect, but around every 1 second u can see something going wrong. I dont
> see anything wrong in the trace-cmd world.
> Thanks, Shahaji
>
> root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
> + TARGET=2
> + MASK=4
> + NUM_ITER=1000
> + NUM_MS=100
> + N=37500000
> + LOGFILE=loop_1000iter_100ms.log
> + tee loop_1000iter_100ms.log
> + trace-cmd record -p function_graph -e all -M 4 -o
> trace_1000iter_100ms.dat taskset -c 2
> /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
>   plugin 'function_graph'
> Cycles/Second (Hz) = 3000000000
> Nano-seconds per cycle = 0.3333
>
> Using ISB() before rte_rdtsc()
> num_iter: 1000
> do_nothing_loop for (N)=37500000
> Running 1000 iterations of do_nothing_loop for (N)=37500000
>
> Average =          100282.193430333 u-secs
> Max     =          124777.488666667 u-secs
> Min     =          100000.017666667 u-secs
> \u03c3  =            1931.352376508 u-secs
>
> Average =              300846580.29 cycles
> Max     =              374332466.00 cycles
> Min     =              300000053.00 cycles
> \u03c3  =                5794057.13 cycles
>
> #\u03c3 = events
>  0 = 976
>  1 = 3
>  2 = 4
>  3 = 3
>  4 = 3
>  5 = 2
>  6 = 2
>  7 = 2
>  8 = 1
>  9 = 1
> 10 = 1
> 12 = 2
>
>
>
>
> On Wed, Jul 1, 2020 at 3:57 AM Yanqin Wei <mailto:yanqin....@arm.com>
> wrote:
> Hi Shahaji,
>
> Adding isb instruction can help rdtsc precise, which sync system counter
> to cntvct_el0. There is a patch in DPDK.
> https://patchwork.dpdk.org/patch/66561/
> So it may be not related with intermittent drops you observed.
>
> Best Regards,
> Wei Yanqin
>
> > -----Original Message-----
> > From: dev <mailto:ovs-dev-boun...@openvswitch.org> On Behalf Of Shahaji
> Bhosle
> > via dev
> > Sent: Wednesday, July 1, 2020 6:05 AM
> > To: Flavio Leitner <mailto:f...@sysclose.org>
> > Cc: mailto:ovs-dev@openvswitch.org; Ilya Maximets <mailto:
> i.maxim...@samsung.com>;
> > Lee Reed <mailto:lee.r...@broadcom.com>; Vinay Gupta
> > <mailto:vinay.gu...@broadcom.com>; Alex Barba <mailto:
> alex.ba...@broadcom.com>
> > Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP
> (iperf3)
> >
> > Hi Flavio,
> > I still see intermittent drops with rcu_nocbs. So I wrote that
> do_nothing()
> > loop..to avoid all the other distractions to see if Linux is messing
> with the OVS
> > loop just to see what is going on. The interesting thing I see the case
> *BOLD*
> > below where I use an ISB() instruction my STD deviation is well within
> Both the
> > results are basically DO NOTHING FOR 100msec and see what happens to
> > time :) Thanks, Shahaji
> >
> > static inline uint64_t
> > *rte_get_tsc_cycles*(void)
> > {
> > uint64_t tsc;
> > #ifdef USE_ISB
> > asm volatile("*isb*; mrs %0, pmccntr_el0" : "=r"(tsc)); #else asm
> > volatile("mrs %0, pmccntr_el0" : "=r"(tsc)); #endif return tsc; } #endif
> > /*RTE_ARM_EAL_RDTSC_USE_PMU*/
> >
> > ==================================
> > usleep(100);
> > for (volatile int i=0; i<num_iter; i++) { const uint64_t tsc_start =
> > rte_get_tsc_cycles();
> > /* do nothig for 1us second */
> > *#ifdef USE_ISB*
> > for(volatile int j=0; j < num_us; j++);       *<<<<<<<<<<<< THIS IS
> MESSED
> > UP, 100msec do nothing, I am getting 2033 usec STD DEVIATION* #else
> > *for(volatile int j=0; j < num_us; j++);       <<<<<<<<<<<< THIS LOOP HAS
> > VERY LOW STD DEVIATION*
> > * rte_isb();*
> > #endif
> > volatile uint64_t tsc_end = rte_get_tsc_cycles(); cycles[i] = tsc_end -
> tsc_start; }
> > usleep(100); calc_avg_var_stddev(num_iter, &cycles[0]);
> > ===================================
> > *#ifdef USE_ISB*
> > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
> > + TARGET=2
> > + MASK=4
> > + NUM_ITER=1000
> > + NUM_MS=100
> > + N=37500000
> > + LOGFILE=loop_1000iter_100ms.log
> > + tee loop_1000iter_100ms.log
> > + trace-cmd record -p function_graph -e all -M 4 -o
> > trace_1000iter_100ms.dat taskset -c 2
> > /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
> >   plugin 'function_graph'
> > Cycles/Second (Hz) = 3000000000
> > Nano-seconds per cycle = 0.3333
> >
> > Using ISB() before rte_rdtsc()
> > num_iter: 1000
> > do_nothing_loop for (N)=37500000
> > Running 1000 iterations of do_nothing_loop for (N)=37500000
> >
> > Average =          100328.158561667 u-secs
> > Max =          123024.795333333 u-secs
> > Min =          100000.017666667 u-secs
> > *\sigma  =            2033.118969489 u-secs*
> >
> > Average =              300984475.69 cycles
> > Max =              369074386.00 cycles
> > Min =              300000053.00 cycles
> > \sigma  =                6099356.91 cycles
> >
> > #\sigma = events
> >  0 = 968
> >  1 = 8
> >  2 = 5
> >  3 = 3
> >  4 = 3
> >  5 = 3
> >  6 = 3
> >  8 = 3
> > 10 = 3
> > 11 = 1
> >
> > *#ELSE*
> > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_loop
> > + TARGET=2
> > + MASK=4
> > + NUM_ITER=1000
> > + NUM_MS=100
> > + N=7316912
> > + LOGFILE=loop_1000iter_100ms.log
> > + tee loop_1000iter_100ms.log
> > + trace-cmd record -p function_graph -e all -M 4 -o
> > trace_1000iter_100ms.dat taskset -c 2
> > /home/root/arm_stb_user_loop_isb_loop
> > 1000 7316912
> >   plugin 'function_graph'
> > Cycles/Second (Hz) = 3000000000
> > Nano-seconds per cycle = 0.3333
> >
> > NO ISB() before rte_rdtsc()
> > num_iter: 1000
> > do_nothing_loop for (N)=7316912
> > Running 1000 iterations of do_nothing_loop for (N)=7316912
> >
> > Average =           99999.863256333 u-secs
> > Max =          100052.790333333 u-secs
> > Min =           99997.807333333 u-secs
> > *\u03c3 =               6.497043982 u-secs*
> >
> > Average =              299999589.77 cycles
> > Max =              300158371.00 cycles
> > Min =              299993422.00 cycles
> > \u03c3 =                  19491.13 cycles
> >
> > #\u03c3 = events
> >  0 = 900
> >  2 = 79
> >  4 = 17
> >  5 = 3
> >  8 = 1
> >
> >
> > On Tue, Jun 30, 2020 at 4:42 PM Flavio Leitner <mailto:f...@sysclose.org>
> wrote:
> >
> > >
> > >
> > > Hi Shahaji,
> > >
> > > Did it help with the rcu_nocbs?
> > >
> > > fbl
> > >
> > > On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote:
> > > > Thanks Flavio,
> > > > Are there any special requirements for RCU on ARM vs x86.
> > > >
> > > > I am following what the above document is saying...Do you think I
> > > > need to do something more than the below?
> > > > Thanks again and appreciate the help. Shahaji
> > > >
> > > > 1. Isolate the CPU cores
> > > > *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7* 2. Setting
> > > > CONFIG_NO_HZ_FULL=y
> > > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat
> > > > /proc/config.gz
> > > > |grep HZ
> > > > CONFIG_NO_HZ_COMMON=y
> > > > # CONFIG_HZ_PERIODIC is not set
> > > > # CONFIG_NO_HZ_IDLE is not set
> > > > *CONFIG_NO_HZ_FULL*=y
> > > > # CONFIG_NO_HZ_FULL_ALL is not set
> > > > # CONFIG_NO_HZ is not set
> > > > # CONFIG_HZ_100 is not set
> > > > CONFIG_HZ_250=y
> > > > # CONFIG_HZ_300 is not set
> > > > # CONFIG_HZ_1000 is not set
> > > > CONFIG_HZ=250
> > > >
> > > >
> > > >
> > > > On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner <mailto:
> f...@sysclose.org>
> > > wrote:
> > > >
> > > > >
> > > > > Right, you might want to review Documentation/timers/no_hz.rst
> > > > > from the kernel sources and look for RCU implications section
> > > > > where it explains how to move RCU callbacks.
> > > > >
> > > > > fbl
> > > > >
> > > > > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote:
> > > > > > Hi Flavio,
> > > > > > I wrote a small program which has do_nothing for loop and I
> > > > > > measure
> > > the
> > > > > > timestamps across the do nothing loop. I am seeing 3% of the
> > > > > > time
> > > around
> > > > > > the 1 second mark when the arch_timer fires I get the timestamps
> > > > > > to
> > > be
> > > > > off
> > > > > > by 25% of the exprected value. I ran trace-cmd to see what is
> > > > > > going
> > > on
> > > > > and
> > > > > > see the below. Looks like some issue with *gic_handle_irg*(),
> > > > > > not
> > > seeing
> > > > > > tihs behaviour on x86 host, something special with ARM v8.
> > > > > > Thanks, Shahaji
> > > > > >
> > > > > >   %21.77  (14181) arm_stb_user_lo                    rcu_dyntick
> #922
> > > > > >          |
> > > > > >          --- *rcu_dyntick*
> > > > > >             |
> > > > > >             |--%46.85-- gic_handle_irq  # 432
> > > > > >             |
> > > > > >             |--%23.32-- context_tracking_user_exit  # 215
> > > > > >             |
> > > > > >             |--%22.34-- context_tracking_user_enter  # 206
> > > > > >             |
> > > > > >             |--%2.60-- SyS_execve  # 24
> > > > > >             |
> > > > > >             |--%1.30-- do_page_fault  # 12
> > > > > >             |
> > > > > >             |--%0.65-- SyS_write  # 6
> > > > > >             |
> > > > > >             |--%0.65-- schedule  # 6
> > > > > >             |
> > > > > >             |--%0.65-- SyS_nanosleep  # 6
> > > > > >             |
> > > > > >             |--%0.65-- syscall_trace_enter  # 6
> > > > > >             |
> > > > > >             |--%0.65-- SyS_faccessat  # 6
> > > > > >
> > > > > >   %5.01  (14181) arm_stb_user_lo                rcu_utilization
> #212
> > > > > >          |
> > > > > >          --- *rcu_utilization*
> > > > > >             |
> > > > > >             |--%96.23-- gic_handle_irq  # 204
> > > > > >             |
> > > > > >             |--%1.89-- SyS_nanosleep  # 4
> > > > > >             |
> > > > > >             |--%0.94-- SyS_exit_group  # 2
> > > > > >             |
> > > > > >             |--%0.94-- do_notify_resume  # 2
> > > > > >
> > > > > >   %4.86  (14181) arm_stb_user_lo                      user_exit
> #206
> > > > > >          |
> > > > > >          --- *user_exit*
> > > > > >           context_tracking_user_exit
> > > > > >
> > > > > >   %4.86  (14181) arm_stb_user_lo     context_tracking_user_exit
> #206
> > > > > >          |
> > > > > >          --- context_tracking_user_exit
> > > > > >
> > > > > >   %4.86  (14181) arm_stb_user_lo    context_tracking_user_enter
> #206
> > > > > >          |
> > > > > >          --- context_tracking_user_enter
> > > > > >
> > > > > >   %4.86  (14181) arm_stb_user_lo                     user_enter
> #206
> > > > > >          |
> > > > > >          --- *user_enter*
> > > > > >           context_tracking_user_enter
> > > > > >
> > > > > >   %2.95  (14181) arm_stb_user_lo                 gic_handle_irq
> #125
> > > > > >          |
> > > > > >          --- gic_handle_irq
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner
> > > > > > <mailto:f...@sysclose.org>
> > > wrote:
> > > > > >
> > > > > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote:
> > > > > > > > Hi Flavio,
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > > I have captured the suggested information but do not see
> > > > > > > > anything
> > > > > that
> > > > > > > > could cause the packet drops.
> > > > > > > > Can you please take a look at the below data and see if you
> > > > > > > > can
> > > find
> > > > > > > > something unusual ?
> > > > > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated
> > > cores.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > -------------------------------------------------------------------
> > > > > > > > root@bcm958802a8046c:~# cstats ; sleep 10; cycles pmd thread
> > > > > > > > numa_id 0 core_id 1:
> > > > > > > >   idle cycles: 99140849 (7.93%)
> > > > > > > >   processing cycles: 1151423715 (92.07%)
> > > > > > > >   avg cycles per packet: 116.94 (1250564564/10693918)
> > > > > > > >   avg processing cycles per packet: 107.67
> > > > > > > > (1151423715/10693918) pmd thread numa_id 0 core_id 2:
> > > > > > > >   idle cycles: 118373662 (9.47%)
> > > > > > > >   processing cycles: 1132193442 (90.53%)
> > > > > > > >   avg cycles per packet: 124.39 (1250567104/10053309)
> > > > > > > >   avg processing cycles per packet: 112.62
> > > > > > > > (1132193442/10053309) pmd thread numa_id 0 core_id 3:
> > > > > > > >   idle cycles: 53805933 (4.30%)
> > > > > > > >   processing cycles: 1196762002 (95.70%)
> > > > > > > >   avg cycles per packet: 107.35 (1250567935/11649948)
> > > > > > > >   avg processing cycles per packet: 102.73
> > > > > > > > (1196762002/11649948) pmd thread numa_id 0 core_id 4:
> > > > > > > >   idle cycles: 189102938 (15.12%)
> > > > > > > >   processing cycles: 1061463293 (84.88%)
> > > > > > > >   avg cycles per packet: 143.47 (1250566231/8716828)
> > > > > > > >   avg processing cycles per packet: 121.77
> > > > > > > > (1061463293/8716828) pmd thread numa_id 0 core_id 5:
> > > > > > > > pmd thread numa_id 0 core_id 6:
> > > > > > > > pmd thread numa_id 0 core_id 7:
> > > > > > >
> > > > > > >
> > > > > > > The core_id 3 is high loaded, and then it's more likely to
> > > > > > > show the drop issue when some other event happens.
> > > > > > >
> > > > > > > I think you need to run perf as I recommended before and see
> > > > > > > if there are context switches happening and why they are
> happening.
> > > > > > >
> > > > > > > If a context switch happens, it's either because the core is
> > > > > > > not well isolated or some other thing is going on. It will
> > > > > > > help to understand why the queue wasn't serviced for a certain
> > > > > > > amount of time.
> > > > > > >
> > > > > > > The issue is that running perf might introduce some load, so
> > > > > > > you will need adjust the traffic rate accordingly.
> > > > > > >
> > > > > > > HTH,
> > > > > > > fbl
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > *Runtime summary*                          comm  parent
> > >  sched-in
> > > > > > > > run-time    min-run     avg-run     max-run  stddev
> migrations
> > > > > > > >                                           (count)
>  (msec)
> > > > >  (msec)
> > > > > > > >    (msec)      (msec)       %
> > > > > > > >
> > > > > > >
> > > > >
> > > ----------------------------------------------------------------------
> > > -----------------------------------------------
> > > > > > > >                 ksoftirqd/0[7]       2          1
> 0.079
> > > > > 0.079
> > > > > > > >     0.079       0.079    0.00       0
> > > > > > > >                   rcu_sched[8]       2         14
> 0.067
> > > > > 0.002
> > > > > > > >     0.004       0.009    9.96       0
> > > > > > > >                    rcuos/4[38]       2          6
> 0.027
> > > > > 0.002
> > > > > > > >     0.004       0.008   20.97       0
> > > > > > > >                    rcuos/5[45]       2          4
> 0.018
> > > > > 0.004
> > > > > > > >     0.004       0.005    6.63       0
> > > > > > > >                kworker/0:1[71]       2         12
> 0.156
> > > > > 0.008
> > > > > > > >     0.013       0.019    6.72       0
> > > > > > > >                  mmcqd/0[1230]       2          3
> 0.054
> > > > > 0.001
> > > > > > > >     0.018       0.031   47.29       0
> > > > > > > >             kworker/0:1H[1248]       2          1
> 0.006
> > > > > 0.006
> > > > > > > >     0.006       0.006    0.00       0
> > > > > > > >            kworker/u16:2[1547]       2         16
> 0.045
> > > > > 0.001
> > > > > > > >     0.002       0.012   26.19       0
> > > > > > > >                     ntpd[5282]       1          1
> 0.063
> > > > > 0.063
> > > > > > > >     0.063       0.063    0.00       0
> > > > > > > >                 watchdog[6988]       1          2
> 0.089
> > > > > 0.012
> > > > > > > >     0.044       0.076   72.26       0
> > > > > > > >             ovs-vswitchd[9239]       1          2
> 0.326
> > > > > 0.152
> > > > > > > >     0.163       0.173    6.45       0
> > > > > > > >        revalidator8[9309/9239]    9239          2
> 1.260
> > > > > 0.607
> > > > > > > >     0.630       0.652    3.58       0
> > > > > > > >                    perf[27150]   27140          1
> 0.000
> > > > > 0.000
> > > > > > > >     0.000       0.000    0.00       0
> > > > > > > >
> > > > > > > > Terminated tasks:
> > > > > > > >                   sleep[27151]   27150          4
> 1.002
> > > > > 0.015
> > > > > > > >     0.250       0.677   58.22       0
> > > > > > > >
> > > > > > > > Idle stats:
> > > > > > > >     CPU  0 idle for    999.814  msec  ( 99.84%)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > *CPU  1 idle entire time window    CPU  2 idle entire time
> window
> > > > > > > CPU  3
> > > > > > > > idle entire time window    CPU  4 idle entire time window*
> > > > > > > >     CPU  5 idle for    500.326  msec  ( 49.96%)
> > > > > > > >     CPU  6 idle entire time window
> > > > > > > >     CPU  7 idle entire time window
> > > > > > > >
> > > > > > > >     Total number of unique tasks: 14 Total number of context
> > > > > > > > switches: 115
> > > > > > > >            Total run time (msec):  3.198
> > > > > > > >     Total scheduling time (msec): 1001.425  (x 8)
> > > > > > > > (END)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > *02:16:22      UID      TGID       TID    %usr %system
> %guest
> > > > >  %wait
> > > > > > > >  %CPU   CPU  Command *02:16:23        0      9239         -
> > > 100.00
> > > > > > > 0.00
> > > > > > > >    0.00    0.00  100.00     5  ovs-vswitchd
> > > > > > > > 02:16:23        0         -      9239    2.00    0.00    0.00
> > > 0.00
> > > > > > > >  2.00     5  |__ovs-vswitchd
> > > > > > > > 02:16:23        0         -      9240    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     0  |__vfio-sync
> > > > > > > > 02:16:23        0         -      9241    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__eal-intr-thread
> > > > > > > > 02:16:23        0         -      9242    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__dpdk_watchdog1
> > > > > > > > 02:16:23        0         -      9244    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__urcu2
> > > > > > > > 02:16:23        0         -      9279    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__ct_clean3
> > > > > > > > 02:16:23        0         -      9308    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__handler9
> > > > > > > > 02:16:23        0         -      9309    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__revalidator8
> > > > > > > > 02:16:23        0         -      9328    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     6  |__pmd13
> > > > > > > > 02:16:23        0         -      9330  100.00    0.00    0.00
> > > 0.00
> > > > > > > >  100.00     3  |__pmd12
> > > > > > > > 02:16:23        0         -      9331  100.00    0.00    0.00
> > > 0.00
> > > > > > > >  100.00     1  |__pmd11
> > > > > > > > 02:16:23        0         -      9332    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     7  |__pmd10
> > > > > > > > 02:16:23        0         -      9333    0.00    0.00    0.00
> > > 0.00
> > > > > > > >  0.00     5  |__pmd16
> > > > > > > > 02:16:23        0         -      9334  100.00    0.00    0.00
> > > 0.00
> > > > > > > >  100.00     2  |__pmd15
> > > > > > > > 02:16:23        0         -      9335  100.00    0.00    0.00
> > > 0.00
> > > > > > > >  100.00     4  |__pmd14
> > > > > > > >
> > > > > > >
> > > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------------------------------------------
> > > -------------------------------------------------------------------
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinay
> > > > > > > >
> > > > > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner
> > > > > > > > <mailto:f...@sysclose.org
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle
> > > > > > > > > via
> > > dev
> > > > > wrote:
> > > > > > > > > > Hi Ben/Ilya,
> > > > > > > > > > Hope you guys are doing well and staying safe. I have
> > > > > > > > > > been
> > > > > chasing a
> > > > > > > > > weird
> > > > > > > > > > problem with small drops and I think that is causing
> > > > > > > > > > lots of
> > > TCP
> > > > > > > > > > retransmission.
> > > > > > > > > >
> > > > > > > > > > Setup details
> > > > > > > > > > iPerf3(1k-5K
> > > > > > > > > >
> > > Servers)<--
> > DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<====
> > > > > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<---
> > iPerf3(Clients
> > > > > > > > > > )
> > > > > > > > > >
> > > > > > > > > > All the Drops are ring drops on BONDed functions on the
> > > server
> > > > > > > side.  I
> > > > > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and
> > > > > > > > > > DPDK2
> > > all
> > > > > > > running
> > > > > > > > > with
> > > > > > > > > > 4 Rx rings each.
> > > > > > > > > >
> > > > > > > > > > What is interesting is when I give each Rx rings its own
> > > > > > > > > > CPU
> > > the
> > > > > > > drops go
> > > > > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1
> > > > > > > > > > the
> > > drops
> > > > > go
> > > > > > > away.
> > > > > > > > > > But I need to scale up the number of flows so trying to
> > > > > > > > > > run
> > > this
> > > > > > > with EMC
> > > > > > > > > > disabled.
> > > > > > > > > >
> > > > > > > > > > I can tell that the rings are not getting serviced for
> > > 30-40usec
> > > > > > > because
> > > > > > > > > of
> > > > > > > > > > some kind context switch or interrupts on these cores. I
> > > > > > > > > > have
> > > > > tried
> > > > > > > to do
> > > > > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all
> > > > > > > > > > the
> > > > > interrupts
> > > > > > > > > away
> > > > > > > > > > from these cores etc. But nothing helps. I mean it
> > > > > > > > > > improves,
> > > but
> > > > > the
> > > > > > > > > drops
> > > > > > > > > > still happen.
> > > > > > > > >
> > > > > > > > > When you disable the EMC (or reduce its efficiency) the
> > > > > > > > > per
> > > packet
> > > > > cost
> > > > > > > > > increases, then it becomes more sensitive to variations.
> > > > > > > > > If you
> > > > > share
> > > > > > > > > a CPU with multiple queues, you decrease the amount of
> > > > > > > > > time
> > > > > available
> > > > > > > > > to process the queue. In either case, there will be less
> > > > > > > > > room
> > > to
> > > > > > > tolerate
> > > > > > > > > variations.
> > > > > > > > >
> > > > > > > > > Well, you might want to use 'perf' and monitor for the
> > > scheduling
> > > > > > > events
> > > > > > > > > and then based on the stack trace see what is causing it
> > > > > > > > > and
> > > try to
> > > > > > > > > prevent it.
> > > > > > > > >
> > > > > > > > > For example:
> > > > > > > > > # perf record -e sched:sched_switch -a -g sleep 1
> > > > > > > > >
> > > > > > > > > For instance, you might see that another NIC used for
> > > management
> > > > > has
> > > > > > > > > IRQs assigned to one isolated CPU. You can move it to
> > > > > > > > > another
> > > CPU
> > > > > to
> > > > > > > > > reduce the noise, etc...
> > > > > > > > >
> > > > > > > > > Another suggestion is look at PMD thread idle statistics
> > > because it
> > > > > > > > > will tell you how much "extra" room you have left. As it
> > > approaches
> > > > > > > > > to 0, more fine tuned your setup needs to be to avoid
> drops.
> > > > > > > > >
> > > > > > > > > HTH,
> > > > > > > > > --
> > > > > > > > > fbl
> > > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > fbl
> > > > > > >
> > > > >
> > > > > --
> > > > > fbl
> > > > >
> > >
> > > --
> > > fbl
> > >
> > _______________________________________________
> > dev mailing list
> > mailto:d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to