Re: syslog() thread unsafety
15.06.2017 0:09, Konstantin Belousov wrote: > On Wed, Jun 14, 2017 at 11:39:39PM +0700, Eugene Grosbein wrote: >> 14.06.2017 21:12, Konstantin Belousov wrote: >> >>> If the issue is that mpd5 cancels logging thread, and this leaves the >>> mutex in the locked state, the right solution is to establish a cleanup >>> handler around the locked region. Note that this can only work if the >>> cancellation is in deferred mode, async mode is unsafe by definition. >>> >>> Try something like this, untested even a minimal bit. >> >> [skip] >> >> I've given it a spin with unpatched mpd5 and it seems to work just fine now. >> I'm curious, should these two lines be swapped? >> >> +THREAD_LOCK(); >> +pthread_cleanup_push(syslog_cancel_cleanup, NULL); >> >> It seems it could be a race between another thread's pthread_cancel() >> and pthread_cleanup_push() here. > As I mentioned in my previous reply, you can only rely on some > guarantees for the deferred cancellation mode. In this mode, only some > calls are allowed to become cancellation points. In particular, the > cancellation cannot happen between THREAD_LOCK() and push(). Forgot to mention that mpd5 does not change the mode to PTHREAD_CANCEL_ASYNCHRONOUS from the default deferred. > That is, popen() looks safe. >> getlogin.c >> lib/libc/stdio: findfp.c, fclose.c > I already wrote about fclose() above, might be it indeed should grow the > cleanup handler for the general case. I do not see any code which would > need cancellation protection in findfp, what exactly do you mean there ? > Same for getlogin(). That was just quick scan for THREAD_LOCK() :-) I'm new to these details of pthreads. >> Please consider committing the fix at least for syslog.c > Confirm that it fixed your case, then I will commit the change. Well, my stress test for mpd5 has been running for several hours and no hang still. Without your patch, it locked in a minute, so it seems the case is fixed, thanks! Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-BETA1: lsof build failure
Larry Rosenman wrote: Current lsof is 4.90M. Ack, that does. Larry Sysutils/lsof maintainer On 6/14/17, 8:13 AM, "Peter"wrote: FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: syslog() thread unsafety
On Wed, Jun 14, 2017 at 11:39:39PM +0700, Eugene Grosbein wrote: > 14.06.2017 21:12, Konstantin Belousov wrote: > > > If the issue is that mpd5 cancels logging thread, and this leaves the > > mutex in the locked state, the right solution is to establish a cleanup > > handler around the locked region. Note that this can only work if the > > cancellation is in deferred mode, async mode is unsafe by definition. > > > > Try something like this, untested even a minimal bit. > > [skip] > > I've given it a spin with unpatched mpd5 and it seems to work just fine now. > I'm curious, should these two lines be swapped? > > + THREAD_LOCK(); > + pthread_cleanup_push(syslog_cancel_cleanup, NULL); > > It seems it could be a race between another thread's pthread_cancel() > and pthread_cleanup_push() here. As I mentioned in my previous reply, you can only rely on some guarantees for the deferred cancellation mode. In this mode, only some calls are allowed to become cancellation points. In particular, the cancellation cannot happen between THREAD_LOCK() and push(). OTOH, if you configured the thread for async cancellation mode, then cancellation can happen anywhere. If you called any non-signal-safe function, any invariant can be broken. That is, the cancellation cleanup is not supposed to be useful with async cancellation. And you are not supposed to call into stdio or syslog() while in async. > > Anyway, we have several other places in the lib/ with similar code > possibly missing pthread_cleanup_push(): > > lib/libc/gen: popen.c, I doubt that popen() needs the cancellation protection. The _close() calls there are explicitely not the cancellation points, they are direct syscall invocations, and are not subject to cancellation testing. The only potentially problematic case could be the fclose() call in the locked region. But again, fclose(3) only potential cancellation point is the fp->_close() call. For fdopen()-ed FILEs, _close is set to the __sclose() wrapper which only calls _close(). That is, popen() looks safe. > getlogin.c > lib/libc/stdio: findfp.c, fclose.c I already wrote about fclose() above, might be it indeed should grow the cleanup handler for the general case. I do not see any code which would need cancellation protection in findfp, what exactly do you mean there ? Same for getlogin(). > > Please consider committing the fix at least for syslog.c Confirm that it fixed your case, then I will commit the change. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: syslog() thread unsafety
14.06.2017 21:12, Konstantin Belousov wrote: > If the issue is that mpd5 cancels logging thread, and this leaves the > mutex in the locked state, the right solution is to establish a cleanup > handler around the locked region. Note that this can only work if the > cancellation is in deferred mode, async mode is unsafe by definition. > > Try something like this, untested even a minimal bit. [skip] I've given it a spin with unpatched mpd5 and it seems to work just fine now. I'm curious, should these two lines be swapped? + THREAD_LOCK(); + pthread_cleanup_push(syslog_cancel_cleanup, NULL); It seems it could be a race between another thread's pthread_cancel() and pthread_cleanup_push() here. Anyway, we have several other places in the lib/ with similar code possibly missing pthread_cleanup_push(): lib/libc/gen: popen.c, getlogin.c lib/libc/stdio: findfp.c, fclose.c Please consider committing the fix at least for syslog.c ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: syslog() thread unsafety
On Wed, Jun 14, 2017 at 08:49:36PM +0700, Eugene Grosbein wrote: > Hi! > > Our [v]syslog() implementation in src/lib/libc/gen/syslog.c > tries to be thread-safe and uses "syslog_mutex". > > It may lock this mutex then call blocking system calls like sendto(). > If a thread owning this mutex is pthread_cancel()'d in process, > the mutex stays UMUTEX_CONTESTED and every other thread calling [v]syslog() > deadlocks. > > I can reproduce this with net/mpd5 daemon reliably within some seconds > from the beginning of my stress test involving RADIUS so mpd5 runs > multi-threaded. > > I've tried to wrap every mpd5's [v]syslog() call around with > pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, ) and > pthread_setcancelstate(oldstate, NULL) statements and this eliminates the > problem > but that's not a solution in any way. > > This problem seems to be root cause of multiple complaints > about mpd5 being unstable under FreeBSD 9.x+ version (PR: 186114, 214482). If the issue is that mpd5 cancels logging thread, and this leaves the mutex in the locked state, the right solution is to establish a cleanup handler around the locked region. Note that this can only work if the cancellation is in deferred mode, async mode is unsafe by definition. Try something like this, untested even a minimal bit. diff --git a/lib/libc/gen/syslog.c b/lib/libc/gen/syslog.c index dfca67360fa..2c390efe95e 100644 --- a/lib/libc/gen/syslog.c +++ b/lib/libc/gen/syslog.c @@ -129,8 +129,8 @@ syslog(int pri, const char *fmt, ...) va_end(ap); } -void -vsyslog(int pri, const char *fmt, va_list ap) +static void +vsyslog1(int pri, const char *fmt, va_list ap) { int cnt; char ch, *p; @@ -151,13 +151,9 @@ vsyslog(int pri, const char *fmt, va_list ap) saved_errno = errno; - THREAD_LOCK(); - /* Check priority against setlogmask values. */ - if (!(LOG_MASK(LOG_PRI(pri)) & LogMask)) { - THREAD_UNLOCK(); + if (!(LOG_MASK(LOG_PRI(pri)) & LogMask)) return; - } /* Set default facility if none specified. */ if ((pri & LOG_FACMASK) == 0) @@ -167,10 +163,8 @@ vsyslog(int pri, const char *fmt, va_list ap) tbuf_cookie.base = tbuf; tbuf_cookie.left = sizeof(tbuf); fp = fwopen(_cookie, writehook); - if (fp == NULL) { - THREAD_UNLOCK(); + if (fp == NULL) return; - } /* Build the message. */ (void)time(); @@ -200,7 +194,6 @@ vsyslog(int pri, const char *fmt, va_list ap) fmt_fp = fwopen(_cookie, writehook); if (fmt_fp == NULL) { fclose(fp); - THREAD_UNLOCK(); return; } @@ -285,10 +278,8 @@ vsyslog(int pri, const char *fmt, va_list ap) */ disconnectlog(); connectlog(); - if (send(LogFile, tbuf, cnt, 0) >= 0) { - THREAD_UNLOCK(); + if (send(LogFile, tbuf, cnt, 0) >= 0) return; - } /* * if the resend failed, fall through to * possible scenario 2 @@ -303,15 +294,11 @@ vsyslog(int pri, const char *fmt, va_list ap) if (status == CONNPRIV) break; _usleep(1); - if (send(LogFile, tbuf, cnt, 0) >= 0) { - THREAD_UNLOCK(); + if (send(LogFile, tbuf, cnt, 0) >= 0) return; - } } - } else { - THREAD_UNLOCK(); + } else return; - } /* * Output the message to the console; try not to block @@ -333,10 +320,25 @@ vsyslog(int pri, const char *fmt, va_list ap) (void)_writev(fd, iov, 2); (void)_close(fd); } +} + +static void +syslog_cancel_cleanup(void *arg __unused) +{ THREAD_UNLOCK(); } +void +vsyslog(int pri, const char *fmt, va_list ap) +{ + + THREAD_LOCK(); + pthread_cleanup_push(syslog_cancel_cleanup, NULL); + vsyslog1(pri, fmt, ap); + pthread_cleanup_pop(1); +} + /* Should be called with mutex acquired */ static void disconnectlog(void) @@ -423,9 +425,11 @@ openlog_unlocked(const char *ident, int logstat, int logfac) void openlog(const char *ident, int logstat, int logfac) { + THREAD_LOCK(); + pthread_cleanup_push(syslog_cancel_cleanup, NULL); openlog_unlocked(ident, logstat, logfac); - THREAD_UNLOCK(); + pthread_cleanup_pop(1); } ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To
syslog() thread unsafety
Hi! Our [v]syslog() implementation in src/lib/libc/gen/syslog.c tries to be thread-safe and uses "syslog_mutex". It may lock this mutex then call blocking system calls like sendto(). If a thread owning this mutex is pthread_cancel()'d in process, the mutex stays UMUTEX_CONTESTED and every other thread calling [v]syslog() deadlocks. I can reproduce this with net/mpd5 daemon reliably within some seconds from the beginning of my stress test involving RADIUS so mpd5 runs multi-threaded. I've tried to wrap every mpd5's [v]syslog() call around with pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, ) and pthread_setcancelstate(oldstate, NULL) statements and this eliminates the problem but that's not a solution in any way. This problem seems to be root cause of multiple complaints about mpd5 being unstable under FreeBSD 9.x+ version (PR: 186114, 214482). Please take a look. Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-BETA1: lsof build failure
Current lsof is 4.90M. Larry Sysutils/lsof maintainer On 6/14/17, 8:13 AM, "Peter"wrote: FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-BETA1: lsof build failure
FYI, please check if reproducible and/or issue: Installed this from SVN & local build: 11.1-BETA1 FreeBSD 11.1-BETA1 #0 r319858:319867M ... amd64 Then tried to update lsof-4.90.f,8 and got this error: cc -pipe -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_VM_MEMATTR_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=11000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="11.1-BETA1" -I/usr/src/sys -O2 -c dvch.c -o dvch.o --- dnode.o --- dnode.c:906:13: error: no member named 'i_dev' in 'struct inode' if (i->i_dev ~ ^ dnode.c:916:27: error: no member named 'i_dev' in 'struct inode' dev = Dev2Udev((KA_T)i->i_dev); ~ ^ 2 errors generated. *** [dnode.o] Error code 1 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"