On Dec 19, 2011, at 16:28 , [email protected] wrote: > Are you testing this in "lab" conditions? Im curious as to how you are > replicating the issue.
I think it's described fairly accurately in https://rt.central.org/rt/Ticket/Display.html?id=130327 . In short: have a few dozen clients writing large files to the same fileserver, then wait for O(30m). See how 1.4 clients - and 1.6 clients with idledead disabled - succeed, and unmodified 1.6 clients fail and hang up. > Also if you can, can you try this by running it on a single core or disabling > threads and getting the same results? If it helps shed more light on the issue and find a solution more satisfactory than disabling idledead (which I'd be absolutely happy with), I will. But according to this thread https://lists.openafs.org/pipermail/openafs-devel/2011-November/018583.html , it seems already well understood what's going on. > Quoting Stephan Wiesand <[email protected]>: > >> OS: EL6.1 >> arch: amd64 >> kernels: 2.6.32-131.21.1.el6, 2.6.32-220.1.1.el6 (module built against >> 2.6.32-71.el6) >> >> It builds, and it basically works. >> >> It seems to partially address the nat ping issue, but servers still get >> pinged more often than intended. >> >> It fails to fix RT #130327. If a fileserver is very busy, clients fail >> writing to it and then hang, making AFS unusable on the client machine until >> it's rebooted. -- Stephan Wiesand DESY - DV - Platanenallee 6 15738 Zeuthen, Germany _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
