Here are my current patches for comments.
--
Richard Russo
to...@enslaves.us
On Fri, Jul 5, 2019, at 12:23 PM, Richard Russo wrote:
> Hi,
>
> I've been experimenting with Recieve Side Scaling (RSS) for a tcp proxy
> application. The basic idea with RSS is by configuring the NICs,
> kernel, and application to use the same CPU for a given socket, cross
> CPU locking and communication is eliminated or at least significantly
> reduced. On my system, configuring RSS allowed me to handle about three
> times as many sessions before reaching CPU saturation, with the
> remaining bottleneck seeming to be kernel processing around socket
> creation and closing which requires cross cpu coordination.
>
> Aligning the incoming sockets is very simple, setting a socket option
> (IP_RSS_LISTEN_BUCKET) on the listen socket restricts the accepted
> socket to that bucket, and that's straight forward to add to the tcp
> listener code, and configuration.
>
> Aligning outgoing sockets is trickier -- there's no kernel help with a
> socket option or otherwise, an application has to run the hash
> (toeplitz) on the 4-tuple of {local ip, local port, remote ip, remote
> port } and only use an outgoing port if the hash matches. I've had
> trouble finding a good approach to handle this.
>
> The simplest thing would be to run the hash when a port is assigned by
> port_range and return the port if it hashes to the wrong bucket; but if
> you've already used all the acceptable ports for that port range, you
> spend a lot of time hashing the ports that are still in the range,
> without making any progress.
>
> If you have a port range per rss bucket, you could hash on port
> assignment, and not return the ports in case they hash to a wrong
> bucket; but in the case that the remote ip changes because you've
> configured it to use DNS or if you change the IP via "set server addr",
> the previously computed hashes are no longer valid -- you would really
> want to try all the ports again.
>
> What I ended up with was a lock on port ranges (instead of atomics as
> used in 07425de71777b688e77a9c70a7088c13e66e41e9 BUG/MEDIUM:
> port_range: Make the ring buffer lock-free), adding a revision counter
> to the port range, and resetting the port range whenever the server IP
> changed. To avoid running the hash during steady state, and because
> checking all the ports when the range needs to be filled, I also made
> port range filing incremental.
>
> This approach works, but it feels complicated, and it made my config
> much more verbose --- I had to duplicate my frontend sections, one for
> each RSS bucket, which sends to corresponding duplicated backends for
> each bucket; the backends had additional configuration to indicate the
> RSS bucket (and the number of buckets). Incidentally, because each RSS
> bucket has a distinct set of ports, and because my use case doesn't use
> any features which benefit from coordination within HAProxy (such as
> stick tables etc), this makes it possible to run in process mode rather
> than threaded mode without running into a lot of port already in use
> warnings/errors that would happen otherwise when sharing a port range.
>
> If it's helpful for the discussion, I can share my patches as-is, but
> if there are better ideas on how to structure this, I'd rather try to
> get the changes done in a nice way before sharing.
>
> Thanks!
>
> --
> Richard Russo
> to...@enslaves.us
>
>
0001-Allow-for-binding-listen-sockets-to-a-provided-RSS-b.patch
Description: Binary data
0002-Revert-BUG-MEDIUM-port_range-Make-the-ring-buffer-lo.patch
Description: Binary data
0003-add-port_range-locking-to-protect-against-concurrent.patch
Description: Binary data
0004-refill-port-ranges-when-addresses-change.patch
Description: Binary data
0005-Allow-for-RSS-aligned-port-selection-for-outgoing-co.patch
Description: Binary data