Re: Fwd: uninterruptable fcntl calls
On Fri, 2007-02-02 at 14:28 -0500, Aaron Wiebe wrote: > Greetings, > > I've run into a situation where fcntl F_SETLKW calls lock up nearly > completely. I've tried several approaches to handle this case, and > have yet to come up with some method of handling this. I've never > really ventured outside userspace, so I'm turning to this list to try > and get a handle on this. > > Over NFSv3 udp, this situation takes place VERY rarely, however with > the volume I do, its creating a problem. > > In short, I am attempting to read or write lock, and the call hangs to > the point where a sigkill is not captured - no signal is. I've tried > alarming out and I've tried switching the socket to nonblocking - > nothing I can think of prevents or even allows me to handle the case. > I understand NFS locking can be rather sketchy at times - but all I > need is the ability to handle the case. > > I can force the process to die by sending a sigkill, then stracing. > The strace reports the process as sigstop, then processes the kill > signal. > > All I need here is a method of capturing this case. I can "repair" > the stuck lock by regenerating the file, but I can't capture the case > in order to handle this in code. > > Any help would be useful - I am currently running 2.6.15.6 compiled > with the NFS patches from linux-nfs.org, but this case was happening > before applying those patches. I'd be happy to provide any more > information nessecary. I've been struggling with this one for a few > months now. > > Thanks, > -Aaron > > > Straces: > > rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 > alarm(120) = 0 > fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} > [hangs] > > Or: > > fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) > fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 > fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} > > > > Code used for locking: > > static int db_lock(int fd, int type) > { > struct flock fl; > struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); > int ret, c = 0; > > if(!(fd > 0)) > return -1; > > #ifdef SIGALRM_HACK > /* after two minutes, wig out */ > sigalrm_set(); > alarm(120); > #endif > > fl.l_whence = SEEK_SET; > fl.l_start = 0; > fl.l_len = 0; > fl.l_type = type; > > #ifdef NONBLOCKING_HACK > set_nonblocking(fd); > #endif > > while((ret = fcntl(fd, F_SETLKW, )) < 0) > { > c++; > if(c > 600) > { > /* we've been waiting for 60 seconds... */ > my_error("stuck on fcntl request, aborting"); > return -1; > } > tv->tv_nsec = 100; /* 10th of a second wait */ > tv->tv_sec = 0; > nanosleep(tv, NULL); > } > free(tv); > #ifdef SIGALRM_HACK > sigalrm_unset(); > #endif > #ifdef NONBLOCKING_HACK > unset_nonblocking(fd); > #endif > return ret; > } Should have been fixed in mainline in 2.6.16 by the following patch http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commitdiff;h=a9a801787a761616589a6526d7a29c13f4deb3d8;hp=03f28e3a2059fc466761d872122f30acb7be61ae Cheers, Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: uninterruptable fcntl calls
Greetings, I've run into a situation where fcntl F_SETLKW calls lock up nearly completely. I've tried several approaches to handle this case, and have yet to come up with some method of handling this. I've never really ventured outside userspace, so I'm turning to this list to try and get a handle on this. Over NFSv3 udp, this situation takes place VERY rarely, however with the volume I do, its creating a problem. In short, I am attempting to read or write lock, and the call hangs to the point where a sigkill is not captured - no signal is. I've tried alarming out and I've tried switching the socket to nonblocking - nothing I can think of prevents or even allows me to handle the case. I understand NFS locking can be rather sketchy at times - but all I need is the ability to handle the case. I can force the process to die by sending a sigkill, then stracing. The strace reports the process as sigstop, then processes the kill signal. All I need here is a method of capturing this case. I can "repair" the stuck lock by regenerating the file, but I can't capture the case in order to handle this in code. Any help would be useful - I am currently running 2.6.15.6 compiled with the NFS patches from linux-nfs.org, but this case was happening before applying those patches. I'd be happy to provide any more information nessecary. I've been struggling with this one for a few months now. Thanks, -Aaron Straces: rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 alarm(120) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} [hangs] Or: fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} Code used for locking: static int db_lock(int fd, int type) { struct flock fl; struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); int ret, c = 0; if(!(fd > 0)) return -1; #ifdef SIGALRM_HACK /* after two minutes, wig out */ sigalrm_set(); alarm(120); #endif fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = type; #ifdef NONBLOCKING_HACK set_nonblocking(fd); #endif while((ret = fcntl(fd, F_SETLKW, )) < 0) { c++; if(c > 600) { /* we've been waiting for 60 seconds... */ my_error("stuck on fcntl request, aborting"); return -1; } tv->tv_nsec = 100; /* 10th of a second wait */ tv->tv_sec = 0; nanosleep(tv, NULL); } free(tv); #ifdef SIGALRM_HACK sigalrm_unset(); #endif #ifdef NONBLOCKING_HACK unset_nonblocking(fd); #endif return ret; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
uninterruptable fcntl calls
Greetings, I've run into a situation where fcntl F_SETLKW calls lock up nearly completely. I've tried several approaches to handle this case, and have yet to come up with some method of handling this. I've never really ventured outside userspace, so I'm turning to this list to try and get a handle on this. Over NFSv3 udp, this situation takes place VERY rarely, however with the volume I do, its creating a problem. In short, I am attempting to read or write lock, and the call hangs to the point where a sigkill is not captured - no signal is. I've tried alarming out and I've tried switching the socket to nonblocking - nothing I can think of prevents or even allows me to handle the case. I understand NFS locking can be rather sketchy at times - but all I need is the ability to handle the case. I can force the process to die by sending a sigkill, then stracing. The strace reports the process as sigstop, then processes the kill signal. All I need here is a method of capturing this case. I can "repair" the stuck lock by regenerating the file, but I can't capture the case in order to handle this in code. Any help would be useful - I am currently running 2.6.15.6 compiled with the NFS patches from linux-nfs.org, but this case was happening before applying those patches. I'd be happy to provide any more information nessecary. I've been struggling with this one for a few months now. Thanks, -Aaron Straces: rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 alarm(120) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} [hangs] Or: fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} Code used for locking: static int db_lock(int fd, int type) { struct flock fl; struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); int ret, c = 0; if(!(fd > 0)) return -1; #ifdef SIGALRM_HACK /* after two minutes, wig out */ sigalrm_set(); alarm(120); #endif fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = type; #ifdef NONBLOCKING_HACK set_nonblocking(fd); #endif while((ret = fcntl(fd, F_SETLKW, )) < 0) { c++; if(c > 600) { /* we've been waiting for 60 seconds... */ my_error("stuck on fcntl request, aborting"); return -1; } tv->tv_nsec = 100; /* 10th of a second wait */ tv->tv_sec = 0; nanosleep(tv, NULL); } free(tv); #ifdef SIGALRM_HACK sigalrm_unset(); #endif #ifdef NONBLOCKING_HACK unset_nonblocking(fd); #endif return ret; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
uninterruptable fcntl calls
Greetings, I've run into a situation where fcntl F_SETLKW calls lock up nearly completely. I've tried several approaches to handle this case, and have yet to come up with some method of handling this. I've never really ventured outside userspace, so I'm turning to this list to try and get a handle on this. Over NFSv3 udp, this situation takes place VERY rarely, however with the volume I do, its creating a problem. In short, I am attempting to read or write lock, and the call hangs to the point where a sigkill is not captured - no signal is. I've tried alarming out and I've tried switching the socket to nonblocking - nothing I can think of prevents or even allows me to handle the case. I understand NFS locking can be rather sketchy at times - but all I need is the ability to handle the case. I can force the process to die by sending a sigkill, then stracing. The strace reports the process as sigstop, then processes the kill signal. All I need here is a method of capturing this case. I can repair the stuck lock by regenerating the file, but I can't capture the case in order to handle this in code. Any help would be useful - I am currently running 2.6.15.6 compiled with the NFS patches from linux-nfs.org, but this case was happening before applying those patches. I'd be happy to provide any more information nessecary. I've been struggling with this one for a few months now. Thanks, -Aaron Straces: rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 alarm(120) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} [hangs] Or: fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} Code used for locking: static int db_lock(int fd, int type) { struct flock fl; struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); int ret, c = 0; if(!(fd 0)) return -1; #ifdef SIGALRM_HACK /* after two minutes, wig out */ sigalrm_set(); alarm(120); #endif fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = type; #ifdef NONBLOCKING_HACK set_nonblocking(fd); #endif while((ret = fcntl(fd, F_SETLKW, fl)) 0) { c++; if(c 600) { /* we've been waiting for 60 seconds... */ my_error(stuck on fcntl request, aborting); return -1; } tv-tv_nsec = 100; /* 10th of a second wait */ tv-tv_sec = 0; nanosleep(tv, NULL); } free(tv); #ifdef SIGALRM_HACK sigalrm_unset(); #endif #ifdef NONBLOCKING_HACK unset_nonblocking(fd); #endif return ret; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: uninterruptable fcntl calls
Greetings, I've run into a situation where fcntl F_SETLKW calls lock up nearly completely. I've tried several approaches to handle this case, and have yet to come up with some method of handling this. I've never really ventured outside userspace, so I'm turning to this list to try and get a handle on this. Over NFSv3 udp, this situation takes place VERY rarely, however with the volume I do, its creating a problem. In short, I am attempting to read or write lock, and the call hangs to the point where a sigkill is not captured - no signal is. I've tried alarming out and I've tried switching the socket to nonblocking - nothing I can think of prevents or even allows me to handle the case. I understand NFS locking can be rather sketchy at times - but all I need is the ability to handle the case. I can force the process to die by sending a sigkill, then stracing. The strace reports the process as sigstop, then processes the kill signal. All I need here is a method of capturing this case. I can repair the stuck lock by regenerating the file, but I can't capture the case in order to handle this in code. Any help would be useful - I am currently running 2.6.15.6 compiled with the NFS patches from linux-nfs.org, but this case was happening before applying those patches. I'd be happy to provide any more information nessecary. I've been struggling with this one for a few months now. Thanks, -Aaron Straces: rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 alarm(120) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} [hangs] Or: fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} Code used for locking: static int db_lock(int fd, int type) { struct flock fl; struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); int ret, c = 0; if(!(fd 0)) return -1; #ifdef SIGALRM_HACK /* after two minutes, wig out */ sigalrm_set(); alarm(120); #endif fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = type; #ifdef NONBLOCKING_HACK set_nonblocking(fd); #endif while((ret = fcntl(fd, F_SETLKW, fl)) 0) { c++; if(c 600) { /* we've been waiting for 60 seconds... */ my_error(stuck on fcntl request, aborting); return -1; } tv-tv_nsec = 100; /* 10th of a second wait */ tv-tv_sec = 0; nanosleep(tv, NULL); } free(tv); #ifdef SIGALRM_HACK sigalrm_unset(); #endif #ifdef NONBLOCKING_HACK unset_nonblocking(fd); #endif return ret; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: uninterruptable fcntl calls
On Fri, 2007-02-02 at 14:28 -0500, Aaron Wiebe wrote: Greetings, I've run into a situation where fcntl F_SETLKW calls lock up nearly completely. I've tried several approaches to handle this case, and have yet to come up with some method of handling this. I've never really ventured outside userspace, so I'm turning to this list to try and get a handle on this. Over NFSv3 udp, this situation takes place VERY rarely, however with the volume I do, its creating a problem. In short, I am attempting to read or write lock, and the call hangs to the point where a sigkill is not captured - no signal is. I've tried alarming out and I've tried switching the socket to nonblocking - nothing I can think of prevents or even allows me to handle the case. I understand NFS locking can be rather sketchy at times - but all I need is the ability to handle the case. I can force the process to die by sending a sigkill, then stracing. The strace reports the process as sigstop, then processes the kill signal. All I need here is a method of capturing this case. I can repair the stuck lock by regenerating the file, but I can't capture the case in order to handle this in code. Any help would be useful - I am currently running 2.6.15.6 compiled with the NFS patches from linux-nfs.org, but this case was happening before applying those patches. I'd be happy to provide any more information nessecary. I've been struggling with this one for a few months now. Thanks, -Aaron Straces: rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0 alarm(120) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} [hangs] Or: fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0 fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} Code used for locking: static int db_lock(int fd, int type) { struct flock fl; struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec)); int ret, c = 0; if(!(fd 0)) return -1; #ifdef SIGALRM_HACK /* after two minutes, wig out */ sigalrm_set(); alarm(120); #endif fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = type; #ifdef NONBLOCKING_HACK set_nonblocking(fd); #endif while((ret = fcntl(fd, F_SETLKW, fl)) 0) { c++; if(c 600) { /* we've been waiting for 60 seconds... */ my_error(stuck on fcntl request, aborting); return -1; } tv-tv_nsec = 100; /* 10th of a second wait */ tv-tv_sec = 0; nanosleep(tv, NULL); } free(tv); #ifdef SIGALRM_HACK sigalrm_unset(); #endif #ifdef NONBLOCKING_HACK unset_nonblocking(fd); #endif return ret; } Should have been fixed in mainline in 2.6.16 by the following patch http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commitdiff;h=a9a801787a761616589a6526d7a29c13f4deb3d8;hp=03f28e3a2059fc466761d872122f30acb7be61ae Cheers, Trond - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/