On 2015/09/18 13:36, Martin Pieuchot wrote: > On 18/09/15(Fri) 15:55, David Gwynne wrote: > > hashing bits of packet headers to tie connections to particular > > physical interfaces within a trunk turns out to be fairly expensive. > > in my very unscientific testing it is about 20% of the cost of udp > > traffic generated with tcpbench -u. > > > > we could tune or change the hash. eg, going from siphash 2 4 to > > siphash 1 1 halves the overhead of hashing. however, it occurred > > to me that sometimes we already know about connections. why not > > reuse that info if it is available? > > Why not, but I'd argue that's orthogonal to the fact that siphash > 2 4 has a high cost. > > > this lets pf embed the state id into the mbuf as a "flow id" so > > other subsystems can use it. eg, trunk can pull it out and use it. > > > > this diff steals the pad field in mbuf packet headers and uses it > > to embed a flow id. it makes pf fill it in, and trunk use it. this > > avoids the cost of hashing in trunk altogether. > > > > it could be used in other places too, eg, picking an upstream when > > we're going multipath routing. > > I've been through RFC 2992 again and indeed I believe we could use that.
as far as trunk(4) goes, we're ok from the perspective of 802.3-2000 section 43.2.1 says f)Frame ordering must be maintained for certain sequences of frame exchanges between MAC Clients (known as conversations, see 1.4). The Distributor ensures that all frames of a given conversation are passed to a single port. For any given port, the Collector is required to pass frames to the MAC Client in the order that they are received from that port. The Collector is otherwise free to select frames received from the aggregated ports in any order. Since there are no means for frames to be mis-ordered on a single link, this guarantees that frame ordering is maintained for any conversation. so we're OK from that perspective. > What about carp(4) and bridge(4)? I don't think it applies to bridge, load balancing is done at a lower level there (i.e. you'd have trunk as a member of a bridge if you wanted to balance across links). Probably the same for carp, there might be some opportunity, but it's already a bit of a minefield to have things working nicely with pfsync/defer in various different situations.
