Re: CURRENT (r248061):Thunderbird SIGNAL 11 with OpenLDAP / nscd(1) broken pipe/
On 2013-03-10 00:36, Hartmann, O. wrote: Am 03/09/13 23:21, schrieb Per olof Ljungmark: On 2013-03-09 10:25, Hartmann, O. wrote: Am 03/09/13 10:07, schrieb Hartmann, O.: For the introduction, I filed a PR for this at beginning of 2012 and suffered from the very same problem close to two years before on ALL FreeBSD versions and platforms using OpenLDAP as the user backend: ports/164239: [PATCH] mail/thunderbird: crash with nss_ldap Even with the suggested patch by the maintainer the problem stayed. With the introduction of bad code due to updates with r247804 and the following issues of SIGNAL 13/broken pipe, the problem now is even worse in FreeBSD 10.0 r248061. From my limited point of view I guess this long lasting unresolved problem could have been revealed itself and I hope this could be fixed along with fixing nscd(1). Again, Thunderbird in all flavours since 2010 crashes on FreeBSD 8/9 and now 10.0-CURRENT when it is used on systems with user backend in OpenLDAP or any LDAP (Thunderbird works on non-OpenLDAP backed systems of the same OS revision). I was able to solve the problem by starting Firefox first and only Firefox getting started prior to Thunderbird resolved the problem for a while, but closing Firefox and waiting a bit left Thunderbird unstarteable again until Firefox was closed and reopened again. I guess this strange behaviour reveals a deeper issue not necessarily bound to nscd(1) (since the problem with Thunderbird also occurs without nscd(1), BUT always bound to the use of OpenLDAP backend (with security/pam_ldap and net/nss_ldap from ports). Now, on FreeBSD 10.0-CURRENT r248061/amd64, Thunderbird dies immediately with SIGNAL 11 on those boxes with OpenLDAP backend and no trick makes Thunderbird starting enymore. In my desperation, I did a truss, see below and it seems to me that there is a problem getting the effective UID, since the SIGNAL 11 arises after geteuid(). At the moment, I have switched off nscd(1) by default since it is broken in CURRENT or doing very strange things (see list about broken pipe in the system, sudo(1) or even the port's system (SIGNAL 13)). I think there is a major issue covered and I hope this could be solved by the problems triggered. it is hard to believe that I'm the only one using FreeBSD for both workstation and server environment in conjuction with OpenLDAP and facing the problem with a popular software like Thunderbird. If it is a stupid configuation problem then this must be very, very special since it is now sticky with me for years. Here comes the truss ...: open(/etc/pwd.db,O_RDONLY,00) = 4 (0x4) fcntl(4,F_SETFD,FD_CLOEXEC) = 0 (0x0) fstat(4,{ mode=-rw-r--r-- ,inode=117927,size=40960,blksize=16384 }) = 0 (0x0) read(4,\0\^F\^Ua\0\0\0\^B\0\0\^D\M-R\0...,260) = 260 (0x104) pread(0x4,0x801bfc000,0x1000,0x6000,0x1,0x0) = 4096 (0x1000) pread(0x4,0x813927000,0x1000,0x2000,0x1,0x0) = 4096 (0x1000) close(4) = 0 (0x0) socket(PF_LOCAL,SOCK_STREAM,0) = 4 (0x4) connect(4,{ AF_UNIX /var/run/nscd },15)ERR#2 'No such file or directory' close(4) = 0 (0x0) sigprocmask(SIG_SETMASK,SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) sigaction(SIGPIPE,{ SIG_IGN 0x0 ss_t },{ SIG_IGN SA_RESTART ss_t }) = 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) getpid() = 3235 (0xca3) geteuid()= 2002 (0x7d2) open(/usr/local/etc/nss_ldap.conf,O_RDONLY,0666) = 4 (0x4) fstat(4,{ mode=-rw-r--r-- ,inode=5818085,size=7997,blksize=16384 }) = 0 (0x0) fstat(4,{ mode=-rw-r--r-- ,inode=5818085,size=7997,blksize=16384 }) = 0 (0x0) read(4,@(#)$Id: ldap.conf,v 2.47 2006/0...,16384) = 7997 (0x1f3d) read(4,0x813928000,16384)= 0 (0x0) close(4) = 0 (0x0) __sysctl(0x7fffb1f8,0x2,0x7fffb220,0x7fffb200,0x0,0x0) = 0 (0x0) gettimeofday({1362819606.123684 },0x0) = 0 (0x0) getpid() = 3235 (0xca3) issetugid(0x35001c1c,0x80,0x801b1b600,0x10,0x2,0x1) = 0 (0x0) open(/etc/resolv.conf,O_RDONLY,0666) = 4 (0x4) fstat(4,{ mode=-rw-r--r-- ,inode=117845,size=101,blksize=16384 }) = 0 (0x0) read(4,# Generated by resolvconf\nnames...,16384) = 101 (0x65) read(4,0x813928000,16384)= 0 (0x0) close(4) = 0 (0x0) __sysctl(0x7fffab28,0x2,0x7fffad20,0x7fffab30,0x0,0x0) = 0 (0x0) issetugid(0x801526ae8,0x2e,0x2e,0x2e,0x101010101010101,0x8080808080808080) = 0 (0x0)
CURRENT (r248128): PKGNG weirdness: openldap-sasl-client-2.4.33_1 conflicts with installed package(s): openldap-sasl-client-2.4.33_1 AND /usr/local AND net/openldap24-sasl-client
On FreeBSD 10.0-CURRENT #0 r248128: Sun Mar 10 10:41:10 CET 2013 I receive the error message below when trying to update installed port openldap24-sasl-client: === Cleaning for openldap-sasl-client-2.4.33_1 === Waiting on fetch checksum for net/openldap24-sasl-client === === openldap-sasl-client-2.4.33_1 conflicts with installed package(s): openldap-sasl-client-2.4.33_1 /usr/local net/openldap24-sasl-client They install files into the same place. You may want to stop build with Ctrl + C. This looks weird tome, since how can /usr/local/ be a port? My update tool is ports-mgmt/portmaster. Regards, Oliver ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[head tinderbox] failure on i386/i386
TB --- 2013-03-10 17:20:18 - tinderbox 2.10 running on freebsd-current.sentex.ca TB --- 2013-03-10 17:20:18 - FreeBSD freebsd-current.sentex.ca 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #0: Mon Mar 26 13:54:12 EDT 2012 d...@freebsd-current.sentex.ca:/usr/obj/usr/src/sys/GENERIC amd64 TB --- 2013-03-10 17:20:18 - starting HEAD tinderbox run for i386/i386 TB --- 2013-03-10 17:20:18 - cleaning the object tree TB --- 2013-03-10 17:20:18 - /usr/local/bin/svn stat /src TB --- 2013-03-10 17:20:22 - At svn revision 248133 TB --- 2013-03-10 17:20:23 - building world TB --- 2013-03-10 17:20:23 - CROSS_BUILD_TESTING=YES TB --- 2013-03-10 17:20:23 - MAKEOBJDIRPREFIX=/obj TB --- 2013-03-10 17:20:23 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2013-03-10 17:20:23 - SRCCONF=/dev/null TB --- 2013-03-10 17:20:23 - TARGET=i386 TB --- 2013-03-10 17:20:23 - TARGET_ARCH=i386 TB --- 2013-03-10 17:20:23 - TZ=UTC TB --- 2013-03-10 17:20:23 - __MAKE_CONF=/dev/null TB --- 2013-03-10 17:20:23 - cd /src TB --- 2013-03-10 17:20:23 - /usr/bin/make -B buildworld Building an up-to-date make(1) World build started on Sun Mar 10 17:20:28 UTC 2013 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools [...] c++ -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen -I. -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ -DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ -DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ -I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c /src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGRTTI.cpp -o CGRTTI.o c++ -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen -I. -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ -DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ -DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ -I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c /src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp -o CGRecordLayoutBuilder.o c++ -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen -I. -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ -DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ -DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ -I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c /src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp -o CGStmt.o /src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp: In member function 'void clang::CodeGen::CodeGenFunction::EmitAsmStmt(const clang::AsmStmt)': /src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp:1418: internal compiler error: in var_ann, at tree-flow-inline.h:127 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. *** [CGStmt.o] Error code 1 Stop in /src/lib/clang/libclangcodegen. *** [all] Error code 1 Stop in /src/lib/clang. *** [cross-tools] Error code 1 Stop in /src. *** [_cross-tools] Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2013-03-10 17:37:54 - WARNING: /usr/bin/make returned exit code 1 TB --- 2013-03-10 17:37:54 - ERROR: failed to build world TB --- 2013-03-10 17:37:54 - 890.21 user 110.57 system 1056.11 real http://tinderbox.freebsd.org/tinderbox-head-ss-build-HEAD-i386-i386.full ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r247839: broken pipe - for top, sudo and ports
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote: On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote: On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems (3) the same symptoms)), many services drop a sporadic broken pipe This happesn to system's top (I have to type it several times to get finally a top), it happens to sudo su -, it happens to SSH (drops connection with broken pipe) and as I reported earlier, it seems to affect the entire port system, since I can not build any port, I receive *** [do-extract] Signal 13 This is dramatic for me, because several modules (rtc, linux_adobe ...) can not be recompiled as it is required by the last /usr/src/UPDATING entry 20130304. Since dbus fails to start and even the nVidia driver (which is a kernel module, it canot be built and therefore ... ). Dimitry, I put you into CC, just in case. It seems that the last commits (not only the new DRM2 mess) broke something. I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\ Have you tried backing up to just before all of pjd@'s file descriptor and capsicum commits? It broke some other stuff initially related to fd passing, so I don't think it is beyond imagination that it broke something with UNIX domain sockets in general. Is there a consensus already if this is result of my changes or davide's r247804? I just upgraded my laptop to today's HEAD and I don't see any weird behaviour yet. If someone can provide a way to reproduce the problem, I'd be happy to investigate. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp7F62niw8U0.pgp Description: PGP signature
Re: r247839: broken pipe - for top, sudo and ports
On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote: Jilles Tjoelker jil...@stack.nl writes: On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote: A truss top reveals this, is this of help? [...] stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) socket(PF_LOCAL,SOCK_STREAM,0) = 4 (0x4) connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0) fcntl(4,F_SETFL,O_NONBLOCK) = 0 (0x0) kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2) = 5 (0x5) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0) = 6 (0x6) kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1) sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0) ERR#32 'Broken pipe' SIGNAL 13 (SIGPIPE) process exit, rval = 0 Apparently there is a bug that causes nscd to close the connection immediately but even then it is wrong that this terminates the calling program with SIGPIPE. The below patch prevents the SIGPIPE but cannot revive the connection to nscd. This may cause numeric UIDs in top or increase the load on the directory server. It is compile tested only. [...] The patch seems to fix the issue in a world after r247804. I don't see numeric UIDs in top but without the patch top crashes with SIGPIPE a lot less frequently than sudo or make install (in base/ports) for me. In my case shutting down nscd helped, too. Compared to stock nsswitch.conf I only have cache added. Can you find what causes nscd to close the connection quickly, such as using ktrace? -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r247839: broken pipe - for top, sudo and ports
Am 03/10/13 21:44, schrieb Pawel Jakub Dawidek: On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote: On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote: On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems (3) the same symptoms)), many services drop a sporadic broken pipe This happesn to system's top (I have to type it several times to get finally a top), it happens to sudo su -, it happens to SSH (drops connection with broken pipe) and as I reported earlier, it seems to affect the entire port system, since I can not build any port, I receive *** [do-extract] Signal 13 This is dramatic for me, because several modules (rtc, linux_adobe ...) can not be recompiled as it is required by the last /usr/src/UPDATING entry 20130304. Since dbus fails to start and even the nVidia driver (which is a kernel module, it canot be built and therefore ... ). Dimitry, I put you into CC, just in case. It seems that the last commits (not only the new DRM2 mess) broke something. I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\ Have you tried backing up to just before all of pjd@'s file descriptor and capsicum commits? It broke some other stuff initially related to fd passing, so I don't think it is beyond imagination that it broke something with UNIX domain sockets in general. Is there a consensus already if this is result of my changes or davide's r247804? I just upgraded my laptop to today's HEAD and I don't see any weird behaviour yet. If someone can provide a way to reproduce the problem, I'd be happy to investigate. Just checked on one of my servers running most recent FreeBSD: FreeBSD 10.0-CURRENT #0 r248106: Sat Mar 9 16:43:06 CET 2013 amd64 Starting nscd with service nscd onestart and try to recompile suod with portmaster sudo results in [...] === Extracting for sudo-1.8.6.p7 = SHA256 Checksum OK for sudo-1.8.6p7.tar.gz. *** [do-extract] Signal 13 /etc/nscd.conf is as it is provided in /usr/share/examples/etc/nscd.conf. sudo itself also acts weird and sometimes, sporadically, with broken pipe. If I start nscd by system startup, even now with the above shown recent version of FBSD CURRENT, sometimes OpenLDAP's slapd doesn' startup - this is critical. As stated before, switching off nscd solves the problem. Regards, oh ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r247839: broken pipe - for top, sudo and ports
Jilles Tjoelker jil...@stack.nl writes: On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote: Jilles Tjoelker jil...@stack.nl writes: On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote: A truss top reveals this, is this of help? [...] stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) socket(PF_LOCAL,SOCK_STREAM,0) = 4 (0x4) connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0) fcntl(4,F_SETFL,O_NONBLOCK) = 0 (0x0) kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2) = 5 (0x5) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0) = 6 (0x6) kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1) sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0) ERR#32 'Broken pipe' SIGNAL 13 (SIGPIPE) process exit, rval = 0 Apparently there is a bug that causes nscd to close the connection immediately but even then it is wrong that this terminates the calling program with SIGPIPE. The below patch prevents the SIGPIPE but cannot revive the connection to nscd. This may cause numeric UIDs in top or increase the load on the directory server. It is compile tested only. [...] The patch seems to fix the issue in a world after r247804. I don't see numeric UIDs in top but without the patch top crashes with SIGPIPE a lot less frequently than sudo or make install (in base/ports) for me. In my case shutting down nscd helped, too. Compared to stock nsswitch.conf I only have cache added. Can you find what causes nscd to close the connection quickly, such as using ktrace? # single user mode $ ktrace -p $(pgrep nscd); top -b; ktrace -c; kdump 71 nscd GIO fd 5 wrote 0 bytes 71 nscd GIO fd 5 read 32 bytes 0x 0400 1000 0100 |..| 0x0012 |..| 71 nscd RET kevent 1 71 nscd CALL accept(0x4,0,0) 71 nscd RET accept 6 71 nscd CALL getsockopt(0x6,0,0x1,0x7f9fce28,0x7f9fce24) 71 nscd RET getsockopt 0 71 nscd CALL kevent(0x5,0x7f9fcf00,0x2,0,0,0x7f9fcf50) 71 nscd GIO fd 5 wrote 64 bytes 0x 0600 f9ff 1100 401f |@.| 0x0012 401f 40e6 4002 0800 0600 |..@...@.@.| 0x0024 1100 0100 0400 |..| 0x0036 40e6 4002 0800 |..@.@.| 71 nscd GIO fd 5 read 0 bytes 71 nscd RET kevent 0 71 nscd CALL kevent(0x5,0x7f9fcec0,0x1,0,0,0x7f9fcee0) 71 nscd GIO fd 5 wrote 32 bytes 0x 0400 1100 |..| 0x0012 |..| 71 nscd GIO fd 5 read 0 bytes 71 nscd RET kevent 0 71 nscd CALL kevent(0x5,0,0,0x7f9fcec0,0x1,0) 71 nscd GIO fd 5 wrote 0 bytes 71 nscd GIO fd 5 read 32 bytes 0x 0600 f9ff 3000 0100 |..0...| 0x0012 40e6 4002 0800 |..@.@.| 71 nscd RET kevent 1 71 nscd CALL close(0x6) 71 nscd RET close 0 71 nscd CALL kevent(0x5,0,0,0x7f9fcec0,0x1,0) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r247839: broken pipe - for top, sudo and ports
On Sun, Mar 10, 2013 at 08:26:03PM -0200, Jan Beich wrote: Jilles Tjoelker jil...@stack.nl writes: On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote: Jilles Tjoelker jil...@stack.nl writes: On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote: A truss top reveals this, is this of help? [...] stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) stat(/etc/nsswitch.conf,{ mode=-rw-r--r-- ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0) socket(PF_LOCAL,SOCK_STREAM,0) = 4 (0x4) connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0) fcntl(4,F_SETFL,O_NONBLOCK) = 0 (0x0) kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2) = 5 (0x5) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0) = 6 (0x6) kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0) kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1) sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0) ERR#32 'Broken pipe' SIGNAL 13 (SIGPIPE) process exit, rval = 0 Apparently there is a bug that causes nscd to close the connection immediately but even then it is wrong that this terminates the calling program with SIGPIPE. The below patch prevents the SIGPIPE but cannot revive the connection to nscd. This may cause numeric UIDs in top or increase the load on the directory server. It is compile tested only. [...] The patch seems to fix the issue in a world after r247804. I don't see numeric UIDs in top but without the patch top crashes with SIGPIPE a lot less frequently than sudo or make install (in base/ports) for me. In my case shutting down nscd helped, too. Compared to stock nsswitch.conf I only have cache added. Can you find what causes nscd to close the connection quickly, such as using ktrace? # single user mode $ ktrace -p $(pgrep nscd); top -b; ktrace -c; kdump 71 nscd GIO fd 5 wrote 0 bytes 71 nscd GIO fd 5 read 32 bytes 0x 0400 1000 0100 |..| 0x0012 |..| 71 nscd RET kevent 1 71 nscd CALL accept(0x4,0,0) 71 nscd RET accept 6 We are in usr.sbin/nscd/nscd.c accept_connection() here. 71 nscd CALL getsockopt(0x6,0,0x1,0x7f9fce28,0x7f9fce24) 71 nscd RET getsockopt 0 Probably getpeereid(). On another note, nscd leaks the file descriptor if this, the below init_query_state() or the below kevent() fails. 71 nscd CALL kevent(0x5,0x7f9fcf00,0x2,0,0,0x7f9fcf50) 71 nscd GIO fd 5 wrote 64 bytes 0x 0600 f9ff 1100 401f |@.| 0x0012 401f 40e6 4002 0800 0600 |..@...@.@.| 0x0024 1100 0100 0400 |..| 0x0036 40e6 4002 0800 |..@.@.| Adding an EVFILT_TIMER and an EVFILT_READ. The data field for the EVFILT_TIMER is a bit strange. I would expect 0x1f40 (8000 decimal) but it puts instead 0x1f401f40. This does not happen when I run tools/regression/kqueue/kqtest on a stable/9 amd64 machine or on ref10-amd64 which currently runs r247722. On a head (r248047) i386 machine, the data field looks right. 71 nscd GIO fd 5 read 0 bytes 71 nscd RET kevent 0 71 nscd CALL kevent(0x5,0x7f9fcec0,0x1,0,0,0x7f9fcee0) 71 nscd GIO fd 5 wrote 32 bytes 0x 0400 1100 |..| 0x0012 |..| Probably registering interest for the next connection. 71 nscd GIO fd 5 read 0 bytes 71 nscd RET kevent 0 71 nscd CALL kevent(0x5,0,0,0x7f9fcec0,0x1,0) 71 nscd GIO fd 5 wrote 0 bytes 71 nscd GIO fd 5 read 32 bytes 0x 0600 f9ff 3000 0100 |..0...| 0x0012 40e6 4002 0800 |..@.@.| The timer has already expired. This cannot be right. (It cannot be that EVFILT_READ is broken and eight seconds actually passed because the send calls would have worked in that case.) tools/regression/kqueue/kqtest works correctly on the aforementioned