On Tue, 25 Sep 2012 15:33:54 -0400 Derrick Brashear <sha...@gmail.com> wrote:
> > Here you are talking about enabling the Linux IP_MTU_DISCOVER > > functionality, and the ICMP error queue stuff, correct? > > No. This is code which pads packets to discover when they stop being > passed. Okay, so in my mind, there are effectively three different MTU-related systems right now; agree? hi/lo/icmp(linux) They're not all working or even enabled, and they may try to do similar things, but... I mean, in terms of code presence in the tree. It's just a little confusing, since each of them is kinda spread out, and they all use the same terminology and vars/fields and such ("MTU"). I'm not complaining (though sure, it'd be nice to have this easier to understand); just trying to explicitly note this to be clear. > > So, this sounds like either RX_ACK_MTU, lastPacketSize, > > lastPingSize, etc, or it sounds like the 'mtuout' label in > > rxi_CheckCall. One of those, yes? > > well, the lastPacket/lastPing is related to low and high. the mtuout > case is low. lastPacket/lastPing can't lower the mtu, though, as I understand it. They can only raise it. > > What we could possibly do for Linux is to have two sockets open, one > > which is set to always set DF, and one to never set DF, and we could > > choose ourselves (Linux doesn't let you set this per-call; we'd have > > to setsockopt every time we want to switch... I think other > > platforms may let us set this per-call). What we could then do is > > always send the MTU pings with DF set, and everything else with DF > > not set, and only adjust MTU based on those MTU pings. > > that sounds like a reasonable approach, though, I suspect this is more > portable than just Linux quite simply, and the more places we can have > it, the better. Yes, and I'm trying to think of how to do this so Linux ICMP processing can still be incorporated, without having to have a completely separate "Linux implementation". I don't mean to exclude the others; just mentioning Linux since it seems harder there to specify DF-vs-not at a fine granularity. If I can try to draw out how this works / would work: - "normally" every N seconds, we send out a padded DF ping a little larger than the known path MTU. If we get a response or an ICMP frag error, set the pmtu. - After X seconds/packets of packet loss, we send out a padded DF ping smaller than the known path MTU. If we get a response or ICMP frag error, set the pmtu. If we don't get either after Y seconds, repeat with smaller packets. - All other packets clear DF. Currently the 'pings' are done as call events, which I think is really adding to the complexity. If we could do this per-peer (as has been suggested for the NATping stuff, too), I think it would make this easier to follow and would reduce overhead. Rx ping acks iirc need to be tied to a call, though; would it be possible to use "version" packets for this again? That could possibly be done with event objects tied to peers, but ever since the NAT-ping thing I've been wondering about a separate thread for handling peer processing. Since NAT-ping is potentially quite frequent (every 6 seconds or whatever for potentially hundreds or even thousands of peers), it seems like that alone is a lot of pressure on the event thread for just idle behavior. If we instead had a thread that just slept for 6 seconds and then traversed every peer for NAT-ping, we could handle other things along the way, like MTU processing. So, thoughts/etc? -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel