On 12/05/2021 03:11, Flavio Leitner wrote: > Hi, > > On Fri, Apr 30, 2021 at 11:31:26AM -0400, Mark Gray wrote: >> This series proposes a new method of distributing upcalls >> to user space threads attempting to resolve a number of >> issues with the current method. >> > > I ran some tests with old V10, current master and this RFC > including the kernel (based on 5.11.0) on a 28 cores system. >
Thanks Flavio > The old v10 had the issue of not scaling up in case of a high > load of upcalls. The test sends a burst of UDP packets which > causes upcalls. The table below shows how many packets could > be sent without increasing the upcall loss counter. > v10 master rfc > packets 2k5 >55k 10k > > So, it reproduced the same old v10 value. Regarding to branch > master then it's not determined due to test limitation. It is > at least above 55k (last time I think it was 63k). The RFC patch > resulted in a better number compared with v10 though the test > should be using only one thread as v10. I think that keeping > the CPU context could explain the difference. As this patch distributes packets to different kernel space threads (and hence user space threads) based on a flow hash, a single flow will only get distributed to one user space thread. I think this is what you are seeing here? Although "master" will currently distribute that to multiple user space threads (performing better), it means that upcalls can be processed out of order which is incorrect and undesired. I think this is ok because in real-world scenarios, there will always be multiple flows so they will get distributed between user space threads. A single flow consuming the throughput of a single thread is probably only going to be seen in benchmarks? > > Running the test with 8 parallel threads sending one burst of > UDP packets each resulted in the following table: > Branch missed lost > v10 52018 50288 > master 52022 0 > RFC 52021 0 > This looks good! > Now the wake ups, one thread: > Branch wake processing > master 20+ 16+ > RFC 3 1 > This looks great! > Column wake: number of different threads receiving > sched:sched_wakeup or irq:softirq_entry. > Column processing: number of CPUs with double digits > usage. > > And 8 parallel threads: > Branch wake processing > master 20+ 20+ > RFC 10 8+ > > The results show that this new patch-set addressed the main > thundering herd issue and the scalability issue I reported > during V10 review. Great! > > Unfortunately I can review the patches only next week. > No problem. Thanks again for the independent benchmarking. > Thanks, > fbl > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
