> Folks, > > One of our AFS file servers crashed this afternoon. OpenAFS 1.6.1 on > RHEL 6 with kernel 2.6.32-279.9.1.el6.x86_64. It looks like the > salvager hung and eventually the dafileserver stopped responding to > clients. > I had similar problem at monday and tuesday this week. dafileserver crashed, was restarted by bosserver but after some time salvager stopped salvaging (defined number of salvage processes was there, but only sleeping and not repairing data). And some FSSYNC error messages were at log. Then I manually restarted fileserver process and it worked for some time, salvaging volumes. But only till next dafileserver crash. This was seen several times, also with older binaries from openafs-1.6.1 (current were openafs-1.6.1a).
After recompiling openafs with debug info and next crash I found that it segfaulted in FD_ISSET in function CallHandler in file src/vol/fssync-server.c . I saw that it is possible to use poll() interface instead of select() in the code, so I forced it to use this poll() code (#define HAVE_POLL) and it is working without crash from tuesday till now. I don't know if this have no issues, I didn't found test for poll() in configure script so this poll() code doesn't seem to be normally used. Pavel Semerad _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
