Hi all,
we have noticed file corruption in certain circumstances when using the OpenAFS for Windows client in combination with antivirus products and non-default network MTU settings.

Isolating exactly what factors _must_ be present has been a challenge, but with tremendous help from Jeff Altman, we think we are close enough now to report a general warning.

The combination of properties we have investigated most thoroughly is:

 Windows XP (32-bit)
 Symantec EndPoint Protection 11.0.4 firewall (Network Threat Protection)
 OpenAFS 1.5.34 or later
 Windows network interface with a non-default MTU set

Only the 'Network Threat Protection' component of Symantec EndPoint Protection (SEP) affects OpenAFS. Disabling that software makes the problem go away. It seems that SEP 11.0.5 (latest) does NOT cause corruption, although we find terrible performance issues with that version. It also seems to be necessary that the network MTU be explicitly set. Several of our problematic systems turned out to have this set to 1300, although this parameter isn't exposed through the normal Windows tools. It is a registry setting that 3rd party network tuning tools might tweek though. We don't know yet what range of MTU causes a problem, or if just setting it at all causes the problem.

The relevent change with OpenAFS 1.5.34 is that, prior to that version, the client internally capped packet size at 1260 to work around problems with some VPNs, and that reduced packet size may have been shielding this problem. The default behaviour since then is to use whatever Windows reports for MTU.


Other observations:

- The corruption appears as blocks of changed bytes in the file, often a
multiple of 168 adjacent bytes, and shows up clearly if checksums are
performed, or the file format has inherent checksums, but might be subtle
and hard to detect if there is nothing to compare to.  The corruption
occurs only on file writes to AFS, never reads.

- Small files (<10MB) were never affected. Files > 50MB had a high probability of corruption.

- The corruption is similar in general characteristics, but different in detail, each time we copy a file.

- Re-reading a file just written to AFS, and small enough to be entirely in cache, gives a correct checksum on the Windows client, but a wrong one from any other client, implying the data were changed between the cache manager and the network.

- We also tried some variants of Windows (XP 32/64, Vista 32/64) running under Parallels on a Mac. Parallels has its own 'security' tool which implements antivirus/firewall functionality, although not much documentation about exactly what. Some of the tests with parallels security turned on _also_ generated file corruption. The Windows guest did not have SEP firewall installed. The Parallels tests were all done with 1.5.66. We haven't been able to do much testing of the parallels VMs, but the corruption was superficially similar to that produced by SEP -- isolated blocks of adjacent changed bytes scattered around the file. As far as I can tell Parallels for Mac licenses 'Kaspersky' antivirus software - it doesn't use Symantec behind the scenes.


Given that other software and registry settings on the client systems are largely beyond our control, we'll probably be setting the RxMaxMTU parameter to its old value of 1260 as a workaround for now.

The relevent registry keys to look for are:

HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<adapter-id>\MTU
(not present by default)

and

HKLM\System\CurrentControlSet\Services\TransarcAFSDaemon\Parameters\RxMaxMTU
(default value is 0)

The MTU on the real network interface is the relevent one (not the loopback adapter)

Richard

--
Richard Brittain,  Research Computing Group,
                   Kiewit Computing Services, 6224 Baker/Berry Library
                   Dartmouth College, Hanover NH 03755
[email protected] 6-2085
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to