[this is obviously(?) a step between part 1 and part 2] So to better understand what was going on, I wanted to get a clear account of the context switches that go on when a packet is received and shoved up the stack.
I put a profiling howto onto the wiki. It uses LTTng for determining where thread context switches happen. https://github.com/rumpkernel/wiki/wiki/Howto:-Profiling-the-TCP-IP-stack-with-LTTng The picture at the bottom shows that there's still a bit of work left to do. I didn't even pick a place where in the capture where things are particularly wild. With a quick experiment I did get >300kpps and >6Gbps for single-core receive when pinning all threads to a single host core. That's a 50% improvement from what I get using two host cores (which is what the figure is showing). Now if I could only figure out how to use half a core ... ------------------------------------------------------------------------------ _______________________________________________ rumpkernel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rumpkernel-users
