Thanks Yangin, What does this define mean? Every 10 second some kind of book keeping of the packet processing cycles ? Are you saying to make this even bigger in time. 1000 seconds or something? If I want to disable what do I do? Thanks, Shahaji
On Mon, Jul 6, 2020 at 10:30 PM Yanqin Wei <yanqin....@arm.com> wrote: > Hi Shahaji, > > > > It seems to be caused by some periodic task. In the pmd thread, pmd auto > load balance would be done periodically. > > /* Time in microseconds of the interval in which rxq processing cycles used > > * in rxq to pmd assignments is measured and stored. */ > > #define PMD_RXQ_INTERVAL_LEN 10000000LL > > > > Would you like to disable it if it is not necessary? > > > > Best Regards, > > Wei Yanqin > > > > *From:* Shahaji Bhosle <shahaji.bho...@broadcom.com> > *Sent:* Monday, July 6, 2020 8:24 PM > *To:* Yanqin Wei <yanqin....@arm.com> > *Cc:* Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd < > n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed < > lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex > Barba <alex.ba...@broadcom.com> > *Subject:* Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP > (iperf3) > > > > Hi Yanqin, > > The drops are random intervals, sometimes I can run for minutes without > drops. The case is very borderline with when CPUs are close to 99% and with > around 1000 flows. We see the drops once every 10-15 seconds and its random > in nature. If I use one ring per core the drops go away, if I enable EMC > then the drops go away etc. > > Thanks, Shahaji > > > > On Mon, Jul 6, 2020 at 5:27 AM Yanqin Wei <yanqin....@arm.com> wrote: > > Hi Shahaji, > > > > I have not measured context switch overhead, but I feel it should be > acceptable. Because 10Mpps throughput with zero-packet drop(20s) could be > achieved in some arm server. Maybe you could make performance profiling on > your test bench to find out the root cause of performance degradation of > multi-rings. > > > > Best Regards, > > Wei Yanqin > > > > *From:* Shahaji Bhosle <shahaji.bho...@broadcom.com> > *Sent:* Thursday, July 2, 2020 9:27 PM > *To:* Yanqin Wei <yanqin....@arm.com> > *Cc:* Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd < > n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed < > lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex > Barba <alex.ba...@broadcom.com> > *Subject:* Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP > (iperf3) > > > > Thanks Yanqin, > > I am not seeing any context switches beyond 40usec in our do nothing loop > test. But when OvS packets multiple rings(queues) on the same CPU and the > number of packet it starts batching (MAX_BURST_SIZE) the toops will will > take more time, I can see rings getting getting filled up. And then its a > feedback loop. CPUs are running close to 100% any disturbance at that point > I think is too much. > > Do you have any data that you use to monitor OvS. I am doing all the above > experiments without OvS. > > Thanks, Shahaji > > > > On Thu, Jul 2, 2020 at 4:43 AM Yanqin Wei <yanqin....@arm.com> wrote: > > Hi Shahaji, > > IIUC, 1Hz time tick cannot be disabled even if full dynticks, right? But I > have no idea of why it caused packet loss because it should be only a small > overhead when rcu_nocbs is enabled . > > Best Regards, > Wei Yanqin > > =========== > > From: Shahaji Bhosle <shahaji.bho...@broadcom.com> > Sent: Thursday, July 2, 2020 6:11 AM > To: Yanqin Wei <yanqin....@arm.com> > Cc: Flavio Leitner <f...@sysclose.org>; ovs-dev@openvswitch.org; nd < > n...@arm.com>; Ilya Maximets <i.maxim...@samsung.com>; Lee Reed < > lee.r...@broadcom.com>; Vinay Gupta <vinay.gu...@broadcom.com>; Alex > Barba <alex.ba...@broadcom.com> > Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP > (iperf3) > > Hi Yanqin, > I added the patch you gave me to my script which runs a do nothing for > loop. You can see the spikes in the below plot. 976/1000 times we are > perfect, but around every 1 second u can see something going wrong. I dont > see anything wrong in the trace-cmd world. > Thanks, Shahaji > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc > + TARGET=2 > + MASK=4 > + NUM_ITER=1000 > + NUM_MS=100 > + N=37500000 > + LOGFILE=loop_1000iter_100ms.log > + tee loop_1000iter_100ms.log > + trace-cmd record -p function_graph -e all -M 4 -o > trace_1000iter_100ms.dat taskset -c 2 > /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000 > plugin 'function_graph' > Cycles/Second (Hz) = 3000000000 > Nano-seconds per cycle = 0.3333 > > Using ISB() before rte_rdtsc() > num_iter: 1000 > do_nothing_loop for (N)=37500000 > Running 1000 iterations of do_nothing_loop for (N)=37500000 > > Average = 100282.193430333 u-secs > Max = 124777.488666667 u-secs > Min = 100000.017666667 u-secs > \u03c3 = 1931.352376508 u-secs > > Average = 300846580.29 cycles > Max = 374332466.00 cycles > Min = 300000053.00 cycles > \u03c3 = 5794057.13 cycles > > #\u03c3 = events > 0 = 976 > 1 = 3 > 2 = 4 > 3 = 3 > 4 = 3 > 5 = 2 > 6 = 2 > 7 = 2 > 8 = 1 > 9 = 1 > 10 = 1 > 12 = 2 > > > > > On Wed, Jul 1, 2020 at 3:57 AM Yanqin Wei <mailto:yanqin....@arm.com> > wrote: > Hi Shahaji, > > Adding isb instruction can help rdtsc precise, which sync system counter > to cntvct_el0. There is a patch in DPDK. > https://patchwork.dpdk.org/patch/66561/ > So it may be not related with intermittent drops you observed. > > Best Regards, > Wei Yanqin > > > -----Original Message----- > > From: dev <mailto:ovs-dev-boun...@openvswitch.org> On Behalf Of Shahaji > Bhosle > > via dev > > Sent: Wednesday, July 1, 2020 6:05 AM > > To: Flavio Leitner <mailto:f...@sysclose.org> > > Cc: mailto:ovs-dev@openvswitch.org; Ilya Maximets <mailto: > i.maxim...@samsung.com>; > > Lee Reed <mailto:lee.r...@broadcom.com>; Vinay Gupta > > <mailto:vinay.gu...@broadcom.com>; Alex Barba <mailto: > alex.ba...@broadcom.com> > > Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP > (iperf3) > > > > Hi Flavio, > > I still see intermittent drops with rcu_nocbs. So I wrote that > do_nothing() > > loop..to avoid all the other distractions to see if Linux is messing > with the OVS > > loop just to see what is going on. The interesting thing I see the case > *BOLD* > > below where I use an ISB() instruction my STD deviation is well within > Both the > > results are basically DO NOTHING FOR 100msec and see what happens to > > time :) Thanks, Shahaji > > > > static inline uint64_t > > *rte_get_tsc_cycles*(void) > > { > > uint64_t tsc; > > #ifdef USE_ISB > > asm volatile("*isb*; mrs %0, pmccntr_el0" : "=r"(tsc)); #else asm > > volatile("mrs %0, pmccntr_el0" : "=r"(tsc)); #endif return tsc; } #endif > > /*RTE_ARM_EAL_RDTSC_USE_PMU*/ > > > > ================================== > > usleep(100); > > for (volatile int i=0; i<num_iter; i++) { const uint64_t tsc_start = > > rte_get_tsc_cycles(); > > /* do nothig for 1us second */ > > *#ifdef USE_ISB* > > for(volatile int j=0; j < num_us; j++); *<<<<<<<<<<<< THIS IS > MESSED > > UP, 100msec do nothing, I am getting 2033 usec STD DEVIATION* #else > > *for(volatile int j=0; j < num_us; j++); <<<<<<<<<<<< THIS LOOP HAS > > VERY LOW STD DEVIATION* > > * rte_isb();* > > #endif > > volatile uint64_t tsc_end = rte_get_tsc_cycles(); cycles[i] = tsc_end - > tsc_start; } > > usleep(100); calc_avg_var_stddev(num_iter, &cycles[0]); > > =================================== > > *#ifdef USE_ISB* > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc > > + TARGET=2 > > + MASK=4 > > + NUM_ITER=1000 > > + NUM_MS=100 > > + N=37500000 > > + LOGFILE=loop_1000iter_100ms.log > > + tee loop_1000iter_100ms.log > > + trace-cmd record -p function_graph -e all -M 4 -o > > trace_1000iter_100ms.dat taskset -c 2 > > /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000 > > plugin 'function_graph' > > Cycles/Second (Hz) = 3000000000 > > Nano-seconds per cycle = 0.3333 > > > > Using ISB() before rte_rdtsc() > > num_iter: 1000 > > do_nothing_loop for (N)=37500000 > > Running 1000 iterations of do_nothing_loop for (N)=37500000 > > > > Average = 100328.158561667 u-secs > > Max = 123024.795333333 u-secs > > Min = 100000.017666667 u-secs > > *\sigma = 2033.118969489 u-secs* > > > > Average = 300984475.69 cycles > > Max = 369074386.00 cycles > > Min = 300000053.00 cycles > > \sigma = 6099356.91 cycles > > > > #\sigma = events > > 0 = 968 > > 1 = 8 > > 2 = 5 > > 3 = 3 > > 4 = 3 > > 5 = 3 > > 6 = 3 > > 8 = 3 > > 10 = 3 > > 11 = 1 > > > > *#ELSE* > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_loop > > + TARGET=2 > > + MASK=4 > > + NUM_ITER=1000 > > + NUM_MS=100 > > + N=7316912 > > + LOGFILE=loop_1000iter_100ms.log > > + tee loop_1000iter_100ms.log > > + trace-cmd record -p function_graph -e all -M 4 -o > > trace_1000iter_100ms.dat taskset -c 2 > > /home/root/arm_stb_user_loop_isb_loop > > 1000 7316912 > > plugin 'function_graph' > > Cycles/Second (Hz) = 3000000000 > > Nano-seconds per cycle = 0.3333 > > > > NO ISB() before rte_rdtsc() > > num_iter: 1000 > > do_nothing_loop for (N)=7316912 > > Running 1000 iterations of do_nothing_loop for (N)=7316912 > > > > Average = 99999.863256333 u-secs > > Max = 100052.790333333 u-secs > > Min = 99997.807333333 u-secs > > *\u03c3 = 6.497043982 u-secs* > > > > Average = 299999589.77 cycles > > Max = 300158371.00 cycles > > Min = 299993422.00 cycles > > \u03c3 = 19491.13 cycles > > > > #\u03c3 = events > > 0 = 900 > > 2 = 79 > > 4 = 17 > > 5 = 3 > > 8 = 1 > > > > > > On Tue, Jun 30, 2020 at 4:42 PM Flavio Leitner <mailto:f...@sysclose.org> > wrote: > > > > > > > > > > > Hi Shahaji, > > > > > > Did it help with the rcu_nocbs? > > > > > > fbl > > > > > > On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote: > > > > Thanks Flavio, > > > > Are there any special requirements for RCU on ARM vs x86. > > > > > > > > I am following what the above document is saying...Do you think I > > > > need to do something more than the below? > > > > Thanks again and appreciate the help. Shahaji > > > > > > > > 1. Isolate the CPU cores > > > > *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7* 2. Setting > > > > CONFIG_NO_HZ_FULL=y > > > > root@bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat > > > > /proc/config.gz > > > > |grep HZ > > > > CONFIG_NO_HZ_COMMON=y > > > > # CONFIG_HZ_PERIODIC is not set > > > > # CONFIG_NO_HZ_IDLE is not set > > > > *CONFIG_NO_HZ_FULL*=y > > > > # CONFIG_NO_HZ_FULL_ALL is not set > > > > # CONFIG_NO_HZ is not set > > > > # CONFIG_HZ_100 is not set > > > > CONFIG_HZ_250=y > > > > # CONFIG_HZ_300 is not set > > > > # CONFIG_HZ_1000 is not set > > > > CONFIG_HZ=250 > > > > > > > > > > > > > > > > On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner <mailto: > f...@sysclose.org> > > > wrote: > > > > > > > > > > > > > > Right, you might want to review Documentation/timers/no_hz.rst > > > > > from the kernel sources and look for RCU implications section > > > > > where it explains how to move RCU callbacks. > > > > > > > > > > fbl > > > > > > > > > > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote: > > > > > > Hi Flavio, > > > > > > I wrote a small program which has do_nothing for loop and I > > > > > > measure > > > the > > > > > > timestamps across the do nothing loop. I am seeing 3% of the > > > > > > time > > > around > > > > > > the 1 second mark when the arch_timer fires I get the timestamps > > > > > > to > > > be > > > > > off > > > > > > by 25% of the exprected value. I ran trace-cmd to see what is > > > > > > going > > > on > > > > > and > > > > > > see the below. Looks like some issue with *gic_handle_irg*(), > > > > > > not > > > seeing > > > > > > tihs behaviour on x86 host, something special with ARM v8. > > > > > > Thanks, Shahaji > > > > > > > > > > > > %21.77 (14181) arm_stb_user_lo rcu_dyntick > #922 > > > > > > | > > > > > > --- *rcu_dyntick* > > > > > > | > > > > > > |--%46.85-- gic_handle_irq # 432 > > > > > > | > > > > > > |--%23.32-- context_tracking_user_exit # 215 > > > > > > | > > > > > > |--%22.34-- context_tracking_user_enter # 206 > > > > > > | > > > > > > |--%2.60-- SyS_execve # 24 > > > > > > | > > > > > > |--%1.30-- do_page_fault # 12 > > > > > > | > > > > > > |--%0.65-- SyS_write # 6 > > > > > > | > > > > > > |--%0.65-- schedule # 6 > > > > > > | > > > > > > |--%0.65-- SyS_nanosleep # 6 > > > > > > | > > > > > > |--%0.65-- syscall_trace_enter # 6 > > > > > > | > > > > > > |--%0.65-- SyS_faccessat # 6 > > > > > > > > > > > > %5.01 (14181) arm_stb_user_lo rcu_utilization > #212 > > > > > > | > > > > > > --- *rcu_utilization* > > > > > > | > > > > > > |--%96.23-- gic_handle_irq # 204 > > > > > > | > > > > > > |--%1.89-- SyS_nanosleep # 4 > > > > > > | > > > > > > |--%0.94-- SyS_exit_group # 2 > > > > > > | > > > > > > |--%0.94-- do_notify_resume # 2 > > > > > > > > > > > > %4.86 (14181) arm_stb_user_lo user_exit > #206 > > > > > > | > > > > > > --- *user_exit* > > > > > > context_tracking_user_exit > > > > > > > > > > > > %4.86 (14181) arm_stb_user_lo context_tracking_user_exit > #206 > > > > > > | > > > > > > --- context_tracking_user_exit > > > > > > > > > > > > %4.86 (14181) arm_stb_user_lo context_tracking_user_enter > #206 > > > > > > | > > > > > > --- context_tracking_user_enter > > > > > > > > > > > > %4.86 (14181) arm_stb_user_lo user_enter > #206 > > > > > > | > > > > > > --- *user_enter* > > > > > > context_tracking_user_enter > > > > > > > > > > > > %2.95 (14181) arm_stb_user_lo gic_handle_irq > #125 > > > > > > | > > > > > > --- gic_handle_irq > > > > > > > > > > > > > > > > > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner > > > > > > <mailto:f...@sysclose.org> > > > wrote: > > > > > > > > > > > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote: > > > > > > > > Hi Flavio, > > > > > > > > > > > > > > > > Thanks for your reply. > > > > > > > > I have captured the suggested information but do not see > > > > > > > > anything > > > > > that > > > > > > > > could cause the packet drops. > > > > > > > > Can you please take a look at the below data and see if you > > > > > > > > can > > > find > > > > > > > > something unusual ? > > > > > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated > > > cores. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > ---------------------------------------------------------------------- > > > ------------------------------------------------------------------- > > > > > > > > root@bcm958802a8046c:~# cstats ; sleep 10; cycles pmd thread > > > > > > > > numa_id 0 core_id 1: > > > > > > > > idle cycles: 99140849 (7.93%) > > > > > > > > processing cycles: 1151423715 (92.07%) > > > > > > > > avg cycles per packet: 116.94 (1250564564/10693918) > > > > > > > > avg processing cycles per packet: 107.67 > > > > > > > > (1151423715/10693918) pmd thread numa_id 0 core_id 2: > > > > > > > > idle cycles: 118373662 (9.47%) > > > > > > > > processing cycles: 1132193442 (90.53%) > > > > > > > > avg cycles per packet: 124.39 (1250567104/10053309) > > > > > > > > avg processing cycles per packet: 112.62 > > > > > > > > (1132193442/10053309) pmd thread numa_id 0 core_id 3: > > > > > > > > idle cycles: 53805933 (4.30%) > > > > > > > > processing cycles: 1196762002 (95.70%) > > > > > > > > avg cycles per packet: 107.35 (1250567935/11649948) > > > > > > > > avg processing cycles per packet: 102.73 > > > > > > > > (1196762002/11649948) pmd thread numa_id 0 core_id 4: > > > > > > > > idle cycles: 189102938 (15.12%) > > > > > > > > processing cycles: 1061463293 (84.88%) > > > > > > > > avg cycles per packet: 143.47 (1250566231/8716828) > > > > > > > > avg processing cycles per packet: 121.77 > > > > > > > > (1061463293/8716828) pmd thread numa_id 0 core_id 5: > > > > > > > > pmd thread numa_id 0 core_id 6: > > > > > > > > pmd thread numa_id 0 core_id 7: > > > > > > > > > > > > > > > > > > > > > The core_id 3 is high loaded, and then it's more likely to > > > > > > > show the drop issue when some other event happens. > > > > > > > > > > > > > > I think you need to run perf as I recommended before and see > > > > > > > if there are context switches happening and why they are > happening. > > > > > > > > > > > > > > If a context switch happens, it's either because the core is > > > > > > > not well isolated or some other thing is going on. It will > > > > > > > help to understand why the queue wasn't serviced for a certain > > > > > > > amount of time. > > > > > > > > > > > > > > The issue is that running perf might introduce some load, so > > > > > > > you will need adjust the traffic rate accordingly. > > > > > > > > > > > > > > HTH, > > > > > > > fbl > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Runtime summary* comm parent > > > sched-in > > > > > > > > run-time min-run avg-run max-run stddev > migrations > > > > > > > > (count) > (msec) > > > > > (msec) > > > > > > > > (msec) (msec) % > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > ----------------------------------------------- > > > > > > > > ksoftirqd/0[7] 2 1 > 0.079 > > > > > 0.079 > > > > > > > > 0.079 0.079 0.00 0 > > > > > > > > rcu_sched[8] 2 14 > 0.067 > > > > > 0.002 > > > > > > > > 0.004 0.009 9.96 0 > > > > > > > > rcuos/4[38] 2 6 > 0.027 > > > > > 0.002 > > > > > > > > 0.004 0.008 20.97 0 > > > > > > > > rcuos/5[45] 2 4 > 0.018 > > > > > 0.004 > > > > > > > > 0.004 0.005 6.63 0 > > > > > > > > kworker/0:1[71] 2 12 > 0.156 > > > > > 0.008 > > > > > > > > 0.013 0.019 6.72 0 > > > > > > > > mmcqd/0[1230] 2 3 > 0.054 > > > > > 0.001 > > > > > > > > 0.018 0.031 47.29 0 > > > > > > > > kworker/0:1H[1248] 2 1 > 0.006 > > > > > 0.006 > > > > > > > > 0.006 0.006 0.00 0 > > > > > > > > kworker/u16:2[1547] 2 16 > 0.045 > > > > > 0.001 > > > > > > > > 0.002 0.012 26.19 0 > > > > > > > > ntpd[5282] 1 1 > 0.063 > > > > > 0.063 > > > > > > > > 0.063 0.063 0.00 0 > > > > > > > > watchdog[6988] 1 2 > 0.089 > > > > > 0.012 > > > > > > > > 0.044 0.076 72.26 0 > > > > > > > > ovs-vswitchd[9239] 1 2 > 0.326 > > > > > 0.152 > > > > > > > > 0.163 0.173 6.45 0 > > > > > > > > revalidator8[9309/9239] 9239 2 > 1.260 > > > > > 0.607 > > > > > > > > 0.630 0.652 3.58 0 > > > > > > > > perf[27150] 27140 1 > 0.000 > > > > > 0.000 > > > > > > > > 0.000 0.000 0.00 0 > > > > > > > > > > > > > > > > Terminated tasks: > > > > > > > > sleep[27151] 27150 4 > 1.002 > > > > > 0.015 > > > > > > > > 0.250 0.677 58.22 0 > > > > > > > > > > > > > > > > Idle stats: > > > > > > > > CPU 0 idle for 999.814 msec ( 99.84%) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *CPU 1 idle entire time window CPU 2 idle entire time > window > > > > > > > CPU 3 > > > > > > > > idle entire time window CPU 4 idle entire time window* > > > > > > > > CPU 5 idle for 500.326 msec ( 49.96%) > > > > > > > > CPU 6 idle entire time window > > > > > > > > CPU 7 idle entire time window > > > > > > > > > > > > > > > > Total number of unique tasks: 14 Total number of context > > > > > > > > switches: 115 > > > > > > > > Total run time (msec): 3.198 > > > > > > > > Total scheduling time (msec): 1001.425 (x 8) > > > > > > > > (END) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *02:16:22 UID TGID TID %usr %system > %guest > > > > > %wait > > > > > > > > %CPU CPU Command *02:16:23 0 9239 - > > > 100.00 > > > > > > > 0.00 > > > > > > > > 0.00 0.00 100.00 5 ovs-vswitchd > > > > > > > > 02:16:23 0 - 9239 2.00 0.00 0.00 > > > 0.00 > > > > > > > > 2.00 5 |__ovs-vswitchd > > > > > > > > 02:16:23 0 - 9240 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 0 |__vfio-sync > > > > > > > > 02:16:23 0 - 9241 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__eal-intr-thread > > > > > > > > 02:16:23 0 - 9242 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__dpdk_watchdog1 > > > > > > > > 02:16:23 0 - 9244 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__urcu2 > > > > > > > > 02:16:23 0 - 9279 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__ct_clean3 > > > > > > > > 02:16:23 0 - 9308 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__handler9 > > > > > > > > 02:16:23 0 - 9309 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__revalidator8 > > > > > > > > 02:16:23 0 - 9328 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 6 |__pmd13 > > > > > > > > 02:16:23 0 - 9330 100.00 0.00 0.00 > > > 0.00 > > > > > > > > 100.00 3 |__pmd12 > > > > > > > > 02:16:23 0 - 9331 100.00 0.00 0.00 > > > 0.00 > > > > > > > > 100.00 1 |__pmd11 > > > > > > > > 02:16:23 0 - 9332 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 7 |__pmd10 > > > > > > > > 02:16:23 0 - 9333 0.00 0.00 0.00 > > > 0.00 > > > > > > > > 0.00 5 |__pmd16 > > > > > > > > 02:16:23 0 - 9334 100.00 0.00 0.00 > > > 0.00 > > > > > > > > 100.00 2 |__pmd15 > > > > > > > > 02:16:23 0 - 9335 100.00 0.00 0.00 > > > 0.00 > > > > > > > > 100.00 4 |__pmd14 > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > ---------------------------------------------------------------------- > > > ------------------------------------------------------------------- > > > > > > > > > > > > > > > > Thanks > > > > > > > > Vinay > > > > > > > > > > > > > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner > > > > > > > > <mailto:f...@sysclose.org > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle > > > > > > > > > via > > > dev > > > > > wrote: > > > > > > > > > > Hi Ben/Ilya, > > > > > > > > > > Hope you guys are doing well and staying safe. I have > > > > > > > > > > been > > > > > chasing a > > > > > > > > > weird > > > > > > > > > > problem with small drops and I think that is causing > > > > > > > > > > lots of > > > TCP > > > > > > > > > > retransmission. > > > > > > > > > > > > > > > > > > > > Setup details > > > > > > > > > > iPerf3(1k-5K > > > > > > > > > > > > > Servers)<-- > > DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<==== > > > > > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<--- > > iPerf3(Clients > > > > > > > > > > ) > > > > > > > > > > > > > > > > > > > > All the Drops are ring drops on BONDed functions on the > > > server > > > > > > > side. I > > > > > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and > > > > > > > > > > DPDK2 > > > all > > > > > > > running > > > > > > > > > with > > > > > > > > > > 4 Rx rings each. > > > > > > > > > > > > > > > > > > > > What is interesting is when I give each Rx rings its own > > > > > > > > > > CPU > > > the > > > > > > > drops go > > > > > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1 > > > > > > > > > > the > > > drops > > > > > go > > > > > > > away. > > > > > > > > > > But I need to scale up the number of flows so trying to > > > > > > > > > > run > > > this > > > > > > > with EMC > > > > > > > > > > disabled. > > > > > > > > > > > > > > > > > > > > I can tell that the rings are not getting serviced for > > > 30-40usec > > > > > > > because > > > > > > > > > of > > > > > > > > > > some kind context switch or interrupts on these cores. I > > > > > > > > > > have > > > > > tried > > > > > > > to do > > > > > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all > > > > > > > > > > the > > > > > interrupts > > > > > > > > > away > > > > > > > > > > from these cores etc. But nothing helps. I mean it > > > > > > > > > > improves, > > > but > > > > > the > > > > > > > > > drops > > > > > > > > > > still happen. > > > > > > > > > > > > > > > > > > When you disable the EMC (or reduce its efficiency) the > > > > > > > > > per > > > packet > > > > > cost > > > > > > > > > increases, then it becomes more sensitive to variations. > > > > > > > > > If you > > > > > share > > > > > > > > > a CPU with multiple queues, you decrease the amount of > > > > > > > > > time > > > > > available > > > > > > > > > to process the queue. In either case, there will be less > > > > > > > > > room > > > to > > > > > > > tolerate > > > > > > > > > variations. > > > > > > > > > > > > > > > > > > Well, you might want to use 'perf' and monitor for the > > > scheduling > > > > > > > events > > > > > > > > > and then based on the stack trace see what is causing it > > > > > > > > > and > > > try to > > > > > > > > > prevent it. > > > > > > > > > > > > > > > > > > For example: > > > > > > > > > # perf record -e sched:sched_switch -a -g sleep 1 > > > > > > > > > > > > > > > > > > For instance, you might see that another NIC used for > > > management > > > > > has > > > > > > > > > IRQs assigned to one isolated CPU. You can move it to > > > > > > > > > another > > > CPU > > > > > to > > > > > > > > > reduce the noise, etc... > > > > > > > > > > > > > > > > > > Another suggestion is look at PMD thread idle statistics > > > because it > > > > > > > > > will tell you how much "extra" room you have left. As it > > > approaches > > > > > > > > > to 0, more fine tuned your setup needs to be to avoid > drops. > > > > > > > > > > > > > > > > > > HTH, > > > > > > > > > -- > > > > > > > > > fbl > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > fbl > > > > > > > > > > > > > > > > > -- > > > > > fbl > > > > > > > > > > > -- > > > fbl > > > > > _______________________________________________ > > dev mailing list > > mailto:d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev