Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-26 Thread Jack Neely
On Thu, Nov 08, 2012 at 06:20:18PM +0100, Stephan Wiesand wrote: Hi Dan, On Nov 8, 2012, at 16:41 , Dan Van Der Ster wrote: [...] All of the nasty details of this incident here: https://afs.web.cern.ch/afs/reports/html/afs200SegFaults.html We're now running with a workaround,

Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-09 Thread Dan Van Der Ster
On Nov 8, 2012, at 5:18 PM, Andrew Deason adea...@sinenomine.net wrote: Note that 1.6 and beyond is safe from this RHEL kernel change since Simon already patched fssync to use poll() 5 years ago ;) That's not true; the code was written to use poll() but was not enabled until very recently.

[OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Dan Van Der Ster
Dear OpenAFS 1.4.x Users, At CERN we just suffered from a confusing problem where the fileserver process would regularly segfault (on only one new server just put into production). Since a gdb of the fileserver core file was showing random bit flips here and there, we initially suspected a bad

Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Derrick Brashear
On Thu, Nov 8, 2012 at 10:41 AM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Dear OpenAFS 1.4.x Users, At CERN we just suffered from a confusing problem where the fileserver process would regularly segfault (on only one new server just put into production). Since a gdb of the

Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Renata Maria Dart
Hi, does this issue apply to both rhel5 and 6? Thanks, Renata Unless you manually set HAVE_POLL, you may not have it enabled in 1.6: we didn't actually do the configure test for it. It will be fixed in 1.6.2. Incidentally, of note, currently salvsync unlike fssync doesn't ever try poll().

RE: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Dan Van Der Ster
Der Ster; openafs-info@openafs.org Subject: Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware Hi, does this issue apply to both rhel5 and 6? Thanks, Renata Unless you manually set HAVE_POLL, you may not have it enabled in 1.6: we didn't actually do the configure test

Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Arne Wiebalck
From what I see on our most recent RHEL derived SLC kernels this change is only in 6. Cheers, Arne On Nov 8, 2012, at 5:46 PM, Renata Maria Dart ren...@slac.stanford.edu wrote: Hi, does this issue apply to both rhel5 and 6? Thanks, Renata Unless you manually set HAVE_POLL, you

Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Stephan Wiesand
Hi Dan, On Nov 8, 2012, at 16:41 , Dan Van Der Ster wrote: [...] All of the nasty details of this incident here: https://afs.web.cern.ch/afs/reports/html/afs200SegFaults.html We're now running with a workaround, ulimit -Hn 1024; ulimit -Sn 1024 in our init scripts until we manage to

RE: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware

2012-11-08 Thread Renata Maria Dart
Hi Dan, thanks for your efforts in researching this problem and posting it. And thanks to Arne for his response as well. Renata On Thu, 8 Nov 2012, Dan Van Der Ster wrote: Just run ulimit -Hn If it says 4096 your AFS will probably crash. If it says 1024 you are safe (as far as we've