"Nicholas Piggin" <npig...@gmail.com> writes: > On Wed Oct 11, 2023 at 11:23 PM AEST, Aaron Conole wrote: >> Nicholas Piggin <npig...@gmail.com> writes: >> >> > Hi, >> > >> > I'll post this out again to keep discussion going. Thanks all for the >> > testing and comments so far. >> >> Thanks for the update - did you mean for this to be tagged RFC as well? > > Yeah, it wasn't intended for merge with no RB or tests of course. > I intended to tag it RFC v2.
I only did a basic test with this because of some other stuff, and I only tested 1, 2, and 3. I didn't see any real performance changes, but that is only with a simple port-port test. I plan to do some additional testing with some recursive calls. That will also help to understand the limits a bit. That said, I'm very nervous about the key allocator, especially if it is possible that it runs out. We probably will need the limit to be bigger, but I want to get a worst-case flow from OVN side to understand. >> >> I don't see any performance data with the deployments on x86_64 and >> ppc64le that cause these stack overflows. Are you able to provide the >> impact on ppc64le and x86_64? > > Don't think it'll be easy but they are not be pushing such rates > so it wouldn't say much. If you want to show the worst case, those > tput and latency microbenchmarks should do it. > > It's the same tradeoff and reasons the per-cpu key allocator was > added in the first place, presumably. Probably more expensive than > stack, but similar order of magnitude O(cycles) vs slab which is > probably O(100s cycles). > >> I guess the change probably should be tagged as -next since it doesn't >> really have a specific set of commits it is "fixing." It's really like >> a major change and shouldn't really go through stable trees, but I'll >> let the maintainers tell me off if I got it wrong. > > It should go upstream first if anything. I thought it was relatively > simple and elegant to reuse the per-cpu key allocator though :( > > It is a kernel crash, so we need something for stable. But In a case > like this there's not one single problem. Linux kernel stack use has > always been pretty dumb - "don't use too much", for some values of > too much, and just cross fingers config and compiler and worlkoad > doesn't hit some overflow case. > > And powerpc has always used more stack x86, so probably it should stay > one power-of-two larger to be safe. And that may be the best fix for > -stable. Given the reply from David (with msg-id: <ff6cd12e28894f158d9a6c9f71574...@acums.aculab.com>), are there other things we can look at with respect to the compiler as well? > But also, ovs uses too much stack. Look at the stack sizes in the first > RFC patch, and ovs takes the 5 largest. That's because it has always > been the practice to not put large variables on stack, and when you're > introducing significant recursion that puts extra onus on you to be > lean. Even if it costs a percent. There are probably lots of places in > the kernel that could get a few cycles by sticking large structures on > stack, but unfortunately we can't all do that. Well, OVS operated this way for at least 6 years, so it isn't a recent thing. But we should look at it. I also wonder if we need to recurse in the internal devices, or if we shouldn't just push the skb into the packet queue. That will cut out 1/3 of the stack frame that you reported originally, and then when doing the xmit, will cut out 2/3rds. I have no idea what the performance impact hit there might be. Maybe it looks more like a latency hit rather than a throughput hit, but just speculating. > Thanks, > Nick _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev