I'm using this email to report on the problem, what I've found, and lay out what our options are.

Background
With the advent of Solaris 10 8/07 (aka s10u4), the internal Private kernel interfaces AFS used to access network interface properties changed due to the integration of the pfhooks/netstack feature. Specifically, a argument was added to the ill_* functions to accommodate the netstack changes. This aspect negates their use vis a vis maintaining AFS driver binary compatibility across all permutations of the Solaris kernel.

Situation
The AFS driver code uses the ILL_* macros and functions (defined in <inet/ip.h>) to walk a list of network interfaces and, as is the case in SetServerPrefs() in src/afs/afs_server.c, pick the best interface to bind to in order to talk to the AFS server holding the cell's root volume. They are also used in src/rx/SOLARIS/rx_knet.c to gather MTU settings of the interface a rx packet was received on, and uses the retrieved value to adjust the RX UDP packet size to prevent fragmentation.

My research has concluded that there are no straight-forward Public interfaces in the Solaris kernel which exist all the way back to Solaris 10 FCS. Also, there are no Private interfaces which directly address our needs and are stable back to Solaris 10 FCS.

What to do?
There are a few alternatives we can consider, and I'd like to present them for discussion... ordered from "most likely" to "least likely":

1) We can mimic what we've traditionally done and instead of using ILL_*, use the Public ldi_ioctl() interface to make sockio calls to / dev/udp and fill Private structs with returned network interface information. While this may be alright to do in the case of SetServerPrefs(), it would be a huge performance impact in the rx code. When a rx UDP is received via , the call stack looks like this:

rxi_ReceivePacket->rxi_FindConnection->rxi_FindPeer- >rxi_InitPeerParams()->rxi_FindIfMTU()->rxi_GetIFInfo()

Both rxi_FindIfMTU() and rxi_GetIFInfo() walk the ILL structs to get interface address and MTU and from what I can tell, it does this for *every* *received* *packet*. So, being that AFS seems rather obsessive about staying up-to-date on a interface's MTU, it would mean that we would be doing ioctls on a file (/dev/udp) for every rx packet we get. This would be hellishly expensive. Would this be a correct assumption?


2) Option 2 would be to use the above mentioned ioctl-based method, but to remove it entirely from the critical code path. We could, at AFSinit() time, create a worker thread which would periodically update a global struct of interface telemetry. The worker thread would wake up every, say, 30 seconds (tunable), lock the struct via mutex, update it, unlock, and return to sleep. The RX and ServerPredfs code can read their desired values from this struct when they need it, spinning if need be.


3) This is Rob's idea, so blame him if you reel back in horror. We find a conditional by testing for a netstack symbol in the kernel ip module. If TRUE, we have a pointer function that points to the new ILL_ functions with the extra argument. If FALSE, we point to the old ones. Yum. This would certainly involve the least amount of code.


4) We toss caution to the wind and let modern routers deal with UDP frags the way they should be and dispense with the UDP packet size adjustments based on MTU, or at least nail them to 1500. If you're still using AFS over a PPP connection... well... sorry 'bout that. We also let the kernel routing table do its job and dispense with selecting interfaces. I don't think even the NFS code jumps through these kinds of hoops. Is there a reason we should be? I admit I'm not too familiar with the inner details and history of things here, so feel free to gently clue me in.


5) Continue to use the ILL method and release OpenAFS 1.4.5 with the code being compatible with s10u4. We simply tell people that if you want to run OpenAFS client version 1.4.5 or greater, you also need to run Solaris KU 120012-14 (x86) or whatever the analog is if you're running SPARC.

6) Any other idears?

/dale


--
Dale Ghent
Specialist, Storage and UNIX Systems
UMBC - Office of Information Technology
ECS 201 - x51705



_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to