Re: CURRENT (r248061):Thunderbird SIGNAL 11 with OpenLDAP / nscd(1) broken pipe/

2013-03-10 Thread Per olof Ljungmark
On 2013-03-10 00:36, Hartmann, O. wrote:
 Am 03/09/13 23:21, schrieb Per olof Ljungmark:
 On 2013-03-09 10:25, Hartmann, O. wrote:
 Am 03/09/13 10:07, schrieb Hartmann, O.:
 For the introduction, I filed a PR for this at beginning of 2012 and
 suffered from the very same problem close to two years before on ALL
 FreeBSD versions and platforms using OpenLDAP as the user backend:

 ports/164239: [PATCH] mail/thunderbird: crash with nss_ldap

 Even with the suggested patch by the maintainer the problem stayed.

 With the introduction of bad code due to updates with  r247804 and the
 following issues of SIGNAL 13/broken pipe, the problem now is even worse
 in FreeBSD 10.0 r248061.
 From my limited point of view I guess this long lasting unresolved
 problem could have been revealed itself and I hope this could be fixed
 along with fixing nscd(1).

 Again, Thunderbird in all flavours since 2010 crashes on FreeBSD 8/9 and
 now 10.0-CURRENT when it is used on systems with user backend in
 OpenLDAP or any LDAP (Thunderbird works on non-OpenLDAP backed systems
 of the same OS revision).

 I was able to solve the problem by starting Firefox first and only
 Firefox getting started prior to Thunderbird resolved the problem for a
 while, but closing Firefox and waiting a bit left Thunderbird
 unstarteable again until Firefox was closed and reopened again.

 I guess this strange behaviour reveals a deeper issue not necessarily
 bound to nscd(1) (since the problem with Thunderbird also occurs without
 nscd(1), BUT always bound to the use of OpenLDAP backend (with
 security/pam_ldap and net/nss_ldap from ports).

 Now, on FreeBSD 10.0-CURRENT r248061/amd64, Thunderbird dies immediately
 with SIGNAL 11 on those boxes with OpenLDAP backend and no trick makes
 Thunderbird starting enymore.

 In my desperation, I did a truss, see below and it seems to me that
 there is a problem getting the effective UID, since the SIGNAL 11 arises
 after geteuid().

 At the moment, I have switched off nscd(1) by default since it is broken
 in CURRENT or doing very strange things (see list about broken pipe in
 the system, sudo(1) or even the port's system (SIGNAL 13)).

 I think there is a major issue covered and I hope this could be solved
 by the problems triggered.

 it is hard to believe that I'm the only one using FreeBSD for both
 workstation and server environment in conjuction with OpenLDAP and
 facing the problem with a popular software like Thunderbird.

 If it is a stupid configuation problem then this must be very, very
 special since it is now sticky with me for years.

 Here comes the truss ...:

 open(/etc/pwd.db,O_RDONLY,00)  = 4 (0x4)
 fcntl(4,F_SETFD,FD_CLOEXEC)  = 0 (0x0)
 fstat(4,{ mode=-rw-r--r-- ,inode=117927,size=40960,blksize=16384 }) = 0
 (0x0)
 read(4,\0\^F\^Ua\0\0\0\^B\0\0\^D\M-R\0...,260) = 260 (0x104)
 pread(0x4,0x801bfc000,0x1000,0x6000,0x1,0x0) = 4096 (0x1000)
 pread(0x4,0x813927000,0x1000,0x2000,0x1,0x0) = 4096 (0x1000)
 close(4) = 0 (0x0)
 socket(PF_LOCAL,SOCK_STREAM,0)   = 4 (0x4)
 connect(4,{ AF_UNIX /var/run/nscd },15)ERR#2 'No such file or
 directory'
 close(4) = 0 (0x0)
 sigprocmask(SIG_SETMASK,SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
 = 0 (0x0)
 sigaction(SIGPIPE,{ SIG_IGN 0x0 ss_t },{ SIG_IGN SA_RESTART ss_t }) = 0
 (0x0)
 sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
 getpid() = 3235 (0xca3)
 geteuid()= 2002 (0x7d2)
 open(/usr/local/etc/nss_ldap.conf,O_RDONLY,0666) = 4 (0x4)
 fstat(4,{ mode=-rw-r--r-- ,inode=5818085,size=7997,blksize=16384 }) = 0
 (0x0)
 fstat(4,{ mode=-rw-r--r-- ,inode=5818085,size=7997,blksize=16384 }) = 0
 (0x0)
 read(4,@(#)$Id: ldap.conf,v 2.47 2006/0...,16384) = 7997 (0x1f3d)
 read(4,0x813928000,16384)= 0 (0x0)
 close(4) = 0 (0x0)
 __sysctl(0x7fffb1f8,0x2,0x7fffb220,0x7fffb200,0x0,0x0) = 0 
 (0x0)
 gettimeofday({1362819606.123684 },0x0)   = 0 (0x0)
 getpid() = 3235 (0xca3)
 issetugid(0x35001c1c,0x80,0x801b1b600,0x10,0x2,0x1) = 0 (0x0)
 open(/etc/resolv.conf,O_RDONLY,0666)   = 4 (0x4)
 fstat(4,{ mode=-rw-r--r-- ,inode=117845,size=101,blksize=16384 }) = 0 (0x0)
 read(4,# Generated by resolvconf\nnames...,16384) = 101 (0x65)
 read(4,0x813928000,16384)= 0 (0x0)
 close(4) = 0 (0x0)
 __sysctl(0x7fffab28,0x2,0x7fffad20,0x7fffab30,0x0,0x0) = 0 
 (0x0)
 issetugid(0x801526ae8,0x2e,0x2e,0x2e,0x101010101010101,0x8080808080808080)
 = 0 (0x0)
 

CURRENT (r248128): PKGNG weirdness: openldap-sasl-client-2.4.33_1 conflicts with installed package(s): openldap-sasl-client-2.4.33_1 AND /usr/local AND net/openldap24-sasl-client

2013-03-10 Thread O. Hartmann
On
FreeBSD 10.0-CURRENT #0 r248128: Sun Mar 10 10:41:10 CET 2013
I receive the error message below when trying to update installed port
openldap24-sasl-client:

===  Cleaning for openldap-sasl-client-2.4.33_1
=== Waiting on fetch  checksum for net/openldap24-sasl-client ===

===  openldap-sasl-client-2.4.33_1 conflicts with installed
package(s): 
  openldap-sasl-client-2.4.33_1
  /usr/local
  net/openldap24-sasl-client

  They install files into the same place.
  You may want to stop build with Ctrl + C.


This looks weird tome, since how can /usr/local/ be a port?

My update tool is ports-mgmt/portmaster.

Regards,

Oliver

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[head tinderbox] failure on i386/i386

2013-03-10 Thread FreeBSD Tinderbox
TB --- 2013-03-10 17:20:18 - tinderbox 2.10 running on freebsd-current.sentex.ca
TB --- 2013-03-10 17:20:18 - FreeBSD freebsd-current.sentex.ca 8.3-PRERELEASE 
FreeBSD 8.3-PRERELEASE #0: Mon Mar 26 13:54:12 EDT 2012 
d...@freebsd-current.sentex.ca:/usr/obj/usr/src/sys/GENERIC  amd64
TB --- 2013-03-10 17:20:18 - starting HEAD tinderbox run for i386/i386
TB --- 2013-03-10 17:20:18 - cleaning the object tree
TB --- 2013-03-10 17:20:18 - /usr/local/bin/svn stat /src
TB --- 2013-03-10 17:20:22 - At svn revision 248133
TB --- 2013-03-10 17:20:23 - building world
TB --- 2013-03-10 17:20:23 - CROSS_BUILD_TESTING=YES
TB --- 2013-03-10 17:20:23 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-03-10 17:20:23 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-03-10 17:20:23 - SRCCONF=/dev/null
TB --- 2013-03-10 17:20:23 - TARGET=i386
TB --- 2013-03-10 17:20:23 - TARGET_ARCH=i386
TB --- 2013-03-10 17:20:23 - TZ=UTC
TB --- 2013-03-10 17:20:23 - __MAKE_CONF=/dev/null
TB --- 2013-03-10 17:20:23 - cd /src
TB --- 2013-03-10 17:20:23 - /usr/bin/make -B buildworld
 Building an up-to-date make(1)
 World build started on Sun Mar 10 17:20:28 UTC 2013
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
[...]
c++  -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen 
-I. 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include 
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ 
-DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ 
-DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ 
-I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c 
/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGRTTI.cpp
 -o CGRTTI.o
c++  -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen 
-I. 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include 
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ 
-DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ 
-DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ 
-I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c 
/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp
 -o CGRecordLayoutBuilder.o
c++  -O2 -pipe -I/src/lib/clang/libclangcodegen/../../../contrib/llvm/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/include 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen 
-I. 
-I/src/lib/clang/libclangcodegen/../../../contrib/llvm/../../lib/clang/include 
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\i386-unknown-freebsd10.0\ 
-DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ 
-DDEFAULT_SYSROOT=\/obj/i386.i386/src/tmp\ 
-I/obj/i386.i386/src/tmp/legacy/usr/include -fno-exceptions -fno-rtti -c 
/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp
 -o CGStmt.o
/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp:
 In member function 'void clang::CodeGen::CodeGenFunction::EmitAsmStmt(const 
clang::AsmStmt)':
/src/lib/clang/libclangcodegen/../../../contrib/llvm/tools/clang/lib/CodeGen/CGStmt.cpp:1418:
 internal compiler error: in var_ann, at tree-flow-inline.h:127
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.
*** [CGStmt.o] Error code 1

Stop in /src/lib/clang/libclangcodegen.
*** [all] Error code 1

Stop in /src/lib/clang.
*** [cross-tools] Error code 1

Stop in /src.
*** [_cross-tools] Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2013-03-10 17:37:54 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2013-03-10 17:37:54 - ERROR: failed to build world
TB --- 2013-03-10 17:37:54 - 890.21 user 110.57 system 1056.11 real


http://tinderbox.freebsd.org/tinderbox-head-ss-build-HEAD-i386-i386.full
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote:
 On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote:
  On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems
  (3) the same symptoms)), many services drop a sporadic
  
  broken pipe
  
  This happesn to system's top (I have to type it several times to get
  finally a top), it happens to sudo su -, it happens to SSH (drops
  connection with broken pipe) and as I reported earlier, it seems to
  affect the entire port system, since I can not build any port, I receive
  
  *** [do-extract] Signal 13
  
  This is dramatic for me, because several modules (rtc, linux_adobe ...)
  can not be recompiled as it is required by the last /usr/src/UPDATING
  entry 20130304.
  
  Since dbus fails to start and even the nVidia driver (which is a kernel
  module, it canot be built and therefore ... ).
  
  Dimitry, I put you into CC, just in case. It seems that the last commits
  (not only the new DRM2 mess) broke something.
  
  I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\
 
 Have you tried backing up to just before all of pjd@'s file descriptor and
 capsicum commits?  It broke some other stuff initially related to fd passing,
 so I don't think it is beyond imagination that it broke something with UNIX
 domain sockets in general.

Is there a consensus already if this is result of my changes or davide's
r247804?

I just upgraded my laptop to today's HEAD and I don't see any weird
behaviour yet. If someone can provide a way to reproduce the problem,
I'd be happy to investigate.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp7F62niw8U0.pgp
Description: PGP signature


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Jilles Tjoelker
On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote:
 Jilles Tjoelker jil...@stack.nl writes:

  On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote:

  A truss top reveals this, is this of help?

  [...]
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  socket(PF_LOCAL,SOCK_STREAM,0)   = 4 (0x4)
  connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0)
  fcntl(4,F_SETFL,O_NONBLOCK)  = 0 (0x0)
  kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2) = 5 (0x5)
  kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
  kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0)   = 6 (0x6)
  kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
  kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0)
  kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1)
  sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0)  ERR#32 'Broken pipe'
  SIGNAL 13 (SIGPIPE)
  process exit, rval = 0

  Apparently there is a bug that causes nscd to close the connection
  immediately but even then it is wrong that this terminates the calling
  program with SIGPIPE.

  The below patch prevents the SIGPIPE but cannot revive the connection to
  nscd. This may cause numeric UIDs in top or increase the load on the
  directory server. It is compile tested only.
 [...]

 The patch seems to fix the issue in a world after r247804. I don't see
 numeric UIDs in top but without the patch top crashes with SIGPIPE a lot
 less frequently than sudo or make install (in base/ports) for me.

 In my case shutting down nscd helped, too. Compared to stock
 nsswitch.conf I only have cache added.

Can you find what causes nscd to close the connection quickly, such as
using ktrace?

-- 
Jilles Tjoelker
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Hartmann, O.
Am 03/10/13 21:44, schrieb Pawel Jakub Dawidek:
 On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote:
 On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote:
 On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems
 (3) the same symptoms)), many services drop a sporadic

 broken pipe

 This happesn to system's top (I have to type it several times to get
 finally a top), it happens to sudo su -, it happens to SSH (drops
 connection with broken pipe) and as I reported earlier, it seems to
 affect the entire port system, since I can not build any port, I receive

 *** [do-extract] Signal 13

 This is dramatic for me, because several modules (rtc, linux_adobe ...)
 can not be recompiled as it is required by the last /usr/src/UPDATING
 entry 20130304.

 Since dbus fails to start and even the nVidia driver (which is a kernel
 module, it canot be built and therefore ... ).

 Dimitry, I put you into CC, just in case. It seems that the last commits
 (not only the new DRM2 mess) broke something.

 I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\

 Have you tried backing up to just before all of pjd@'s file descriptor and
 capsicum commits?  It broke some other stuff initially related to fd passing,
 so I don't think it is beyond imagination that it broke something with UNIX
 domain sockets in general.
 
 Is there a consensus already if this is result of my changes or davide's
 r247804?
 
 I just upgraded my laptop to today's HEAD and I don't see any weird
 behaviour yet. If someone can provide a way to reproduce the problem,
 I'd be happy to investigate.
 


Just checked on one of my servers running most recent FreeBSD:
FreeBSD 10.0-CURRENT #0 r248106: Sat Mar  9 16:43:06 CET 2013 amd64

Starting nscd with

service nscd onestart

and try to recompile suod with

portmaster sudo

results in

[...]
===  Extracting for sudo-1.8.6.p7
= SHA256 Checksum OK for sudo-1.8.6p7.tar.gz.
*** [do-extract] Signal 13

/etc/nscd.conf is as it is provided in /usr/share/examples/etc/nscd.conf.

sudo itself also acts weird and sometimes, sporadically, with broken
pipe. If I start nscd by system startup, even now with the above shown
recent version of FBSD CURRENT, sometimes OpenLDAP's slapd doesn'
startup - this is critical.

As stated before, switching off nscd solves the problem.

Regards,

oh
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Jan Beich
Jilles Tjoelker jil...@stack.nl writes:

 On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote:

 Jilles Tjoelker jil...@stack.nl writes:

  On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote:

  A truss top reveals this, is this of help?

  [...]
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
  ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
  socket(PF_LOCAL,SOCK_STREAM,0)   = 4 (0x4)
  connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0)
  fcntl(4,F_SETFL,O_NONBLOCK)  = 0 (0x0)
  kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2)
  = 5 (0x5)
  kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
  kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0)   = 6 (0x6)
  kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
  kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0)
  kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1)
  sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0)  ERR#32 'Broken pipe'
  SIGNAL 13 (SIGPIPE)
  process exit, rval = 0

  Apparently there is a bug that causes nscd to close the connection
  immediately but even then it is wrong that this terminates the calling
  program with SIGPIPE.

  The below patch prevents the SIGPIPE but cannot revive the connection to
  nscd. This may cause numeric UIDs in top or increase the load on the
  directory server. It is compile tested only.
 [...]

 The patch seems to fix the issue in a world after r247804. I don't see
 numeric UIDs in top but without the patch top crashes with SIGPIPE a lot
 less frequently than sudo or make install (in base/ports) for me.

 In my case shutting down nscd helped, too. Compared to stock
 nsswitch.conf I only have cache added.

 Can you find what causes nscd to close the connection quickly, such as
 using ktrace?

# single user mode
$ ktrace -p $(pgrep nscd); top -b; ktrace -c; kdump
71 nscd GIO   fd 5 wrote 0 bytes
   
71 nscd GIO   fd 5 read 32 bytes
   0x 0400     1000   0100  |..|
   0x0012       |..|

71 nscd RET   kevent 1
71 nscd CALL  accept(0x4,0,0)
71 nscd RET   accept 6
71 nscd CALL  getsockopt(0x6,0,0x1,0x7f9fce28,0x7f9fce24)
71 nscd RET   getsockopt 0
71 nscd CALL  kevent(0x5,0x7f9fcf00,0x2,0,0,0x7f9fcf50)
71 nscd GIO   fd 5 wrote 64 bytes
   0x 0600    f9ff 1100   401f  |@.|
   0x0012  401f  40e6 4002 0800  0600   |..@...@.@.|
   0x0024    1100 0100  0400    |..|
   0x0036  40e6 4002 0800   |..@.@.|

71 nscd GIO   fd 5 read 0 bytes
   
71 nscd RET   kevent 0
71 nscd CALL  kevent(0x5,0x7f9fcec0,0x1,0,0,0x7f9fcee0)
71 nscd GIO   fd 5 wrote 32 bytes
   0x 0400     1100     |..|
   0x0012       |..|

71 nscd GIO   fd 5 read 0 bytes
   
71 nscd RET   kevent 0
71 nscd CALL  kevent(0x5,0,0,0x7f9fcec0,0x1,0)
71 nscd GIO   fd 5 wrote 0 bytes
   
71 nscd GIO   fd 5 read 32 bytes
   0x 0600    f9ff 3000   0100  |..0...|
   0x0012    40e6 4002 0800 |..@.@.|

71 nscd RET   kevent 1
71 nscd CALL  close(0x6)
71 nscd RET   close 0
71 nscd CALL  kevent(0x5,0,0,0x7f9fcec0,0x1,0)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Jilles Tjoelker
On Sun, Mar 10, 2013 at 08:26:03PM -0200, Jan Beich wrote:
 Jilles Tjoelker jil...@stack.nl writes:
  On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote:
  Jilles Tjoelker jil...@stack.nl writes:
   On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote:
   A truss top reveals this, is this of help?

   [...]
   stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
   ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
   stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
   ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
   stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
   ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
   stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
   ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
   stat(/etc/nsswitch.conf,{ mode=-rw-r--r--
   ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
   socket(PF_LOCAL,SOCK_STREAM,0)   = 4 (0x4)
   connect(4,{ AF_UNIX /var/run/nscd },15)= 0 (0x0)
   fcntl(4,F_SETFL,O_NONBLOCK)  = 0 (0x0)
   kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2)
   = 5 (0x5)
   kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
   kqueue(0x5,0x7fffd2e0,0x1,0x0,0x0,0x0)   = 6 (0x6)
   kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
   kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0)
   kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1)
   sendmsg(0x4,0x7fffd290,0x0,0x1,0x1,0x0)  ERR#32 'Broken pipe'
   SIGNAL 13 (SIGPIPE)
   process exit, rval = 0

   Apparently there is a bug that causes nscd to close the connection
   immediately but even then it is wrong that this terminates the calling
   program with SIGPIPE.

   The below patch prevents the SIGPIPE but cannot revive the connection to
   nscd. This may cause numeric UIDs in top or increase the load on the
   directory server. It is compile tested only.
  [...]

  The patch seems to fix the issue in a world after r247804. I don't see
  numeric UIDs in top but without the patch top crashes with SIGPIPE a lot
  less frequently than sudo or make install (in base/ports) for me.

  In my case shutting down nscd helped, too. Compared to stock
  nsswitch.conf I only have cache added.

  Can you find what causes nscd to close the connection quickly, such as
  using ktrace?

 # single user mode
 $ ktrace -p $(pgrep nscd); top -b; ktrace -c; kdump
 71 nscd GIO   fd 5 wrote 0 bytes

 71 nscd GIO   fd 5 read 32 bytes
0x 0400     1000   0100  
 |..|
0x0012       |..|
 
 71 nscd RET   kevent 1
 71 nscd CALL  accept(0x4,0,0)
 71 nscd RET   accept 6

We are in usr.sbin/nscd/nscd.c accept_connection() here.

 71 nscd CALL  getsockopt(0x6,0,0x1,0x7f9fce28,0x7f9fce24)
 71 nscd RET   getsockopt 0

Probably getpeereid().

On another note, nscd leaks the file descriptor if this, the below
init_query_state() or the below kevent() fails.

 71 nscd CALL  kevent(0x5,0x7f9fcf00,0x2,0,0,0x7f9fcf50)
 71 nscd GIO   fd 5 wrote 64 bytes
0x 0600    f9ff 1100   401f  
 |@.|
0x0012  401f  40e6 4002 0800  0600   
 |..@...@.@.|
0x0024    1100 0100  0400    
 |..|
0x0036  40e6 4002 0800   |..@.@.|
 

Adding an EVFILT_TIMER and an EVFILT_READ.

The data field for the EVFILT_TIMER is a bit strange. I would expect
0x1f40 (8000 decimal) but it puts instead
0x1f401f40. This does not happen when I run
tools/regression/kqueue/kqtest on a stable/9 amd64 machine or on
ref10-amd64 which currently runs r247722.

On a head (r248047) i386 machine, the data field looks right.

 71 nscd GIO   fd 5 read 0 bytes

 71 nscd RET   kevent 0
 71 nscd CALL  kevent(0x5,0x7f9fcec0,0x1,0,0,0x7f9fcee0)
 71 nscd GIO   fd 5 wrote 32 bytes
0x 0400     1100     
 |..|
0x0012       |..|
 

Probably registering interest for the next connection.

 71 nscd GIO   fd 5 read 0 bytes

 71 nscd RET   kevent 0
 71 nscd CALL  kevent(0x5,0,0,0x7f9fcec0,0x1,0)
 71 nscd GIO   fd 5 wrote 0 bytes

 71 nscd GIO   fd 5 read 32 bytes
0x 0600    f9ff 3000   0100  
 |..0...|
0x0012    40e6 4002 0800 |..@.@.|
 

The timer has already expired. This cannot be right. (It cannot be that
EVFILT_READ is broken and eight seconds actually passed because the send
calls would have worked in that case.)

tools/regression/kqueue/kqtest works correctly on the aforementioned