Bug#435056: nfs server hangs on /proc bind-mount
On Sun, Aug 05, 2007 at 08:53:08PM +0200, Steinar H. Gunderson wrote: Hm. I'm trying to reproduce it now, and I can perhaps do it once out of ten. Sort of hard to track down... Finally! I got it to be weird while I was stracing it. From there it was only a matter of remembering the following commit: | From dd087896285da9e160e13ee9f7d75381b67895e3 Mon Sep 17 00:00:00 2001 | From: J. Bruce Fields [EMAIL PROTECTED] | Date: Thu, 26 Jul 2007 16:30:46 -0400 | Subject: [PATCH] Use __fpurge to ensure single-line writes to cache files | | On a recent Debian/Sid machine, I saw libc retrying stdio writes that | returned write errors. The result is that if an export downcall returns | an error (which it can in normal operation, since it currently | (incorrectly) returns -ENOENT on any negative downcall), then subsequent | downcalls will write multiple lines (including the original line that | received the error). | | The result is that the server fails to respond to any rpc call that | refers to an unexported mount point (such as a readdir of a directory | containing such a mountpoint), so client commands hang. | | I don't know whether this libc behavior is correct or expected, but it | seems safest to add the __fpurge() (suggested by Neil) to ensure data is | thrown away. | | Signed-off-by: J. Bruce Fields [EMAIL PROTECTED] | Signed-off-by: Neil Brown [EMAIL PROTECTED] I uploaded the current git version to unstable earlier today, and it contains this fix. It sounds like it should fix your issue -- could you please give it a shot? /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Steinar H. Gunderson wrote: [Sun Aug 05 2007, 02:35:48PM EDT] This is interesting -- I got home to my testing machine, and I managed to reproduce it -- but only once. When I restarted nfs-kernel-server (via the init.d script), the hanging processes resumed, and from there it was completely unreproducible. Does this fix it for you too? If I put the server restart in a loop, the client makes slow progress. Restarting just once does not alleviate the problem forever. It's possible that I'm seeing it hang once per subdirectory, since I have a number of mounts under the export. I would need to test more to discover. This is the loop I used: while true; do /etc/init.d/nfs-kernel-server restart sleep 5 done Thanks for reproducing it. I'm glad it's not something peculiar to my configuration. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
On Sat, Jul 28, 2007 at 08:21:47PM -0400, Aron Griffis wrote: server -- mkdir -p /test/proc mount -o bind /proc /test/proc echo '/test 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check)' /etc/exports exportfs -a client -- mkdir /test mount server:/test /test /bin/ls /test # works /bin/ls /test/proc # hangs This is interesting -- I got home to my testing machine, and I managed to reproduce it -- but only once. When I restarted nfs-kernel-server (via the init.d script), the hanging processes resumed, and from there it was completely unreproducible. Does this fix it for you too? /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
On Sun, Aug 05, 2007 at 02:41:18PM -0400, Aron Griffis wrote: If I put the server restart in a loop, the client makes slow progress. Restarting just once does not alleviate the problem forever. It's possible that I'm seeing it hang once per subdirectory, since I have a number of mounts under the export. I would need to test more to discover. Hm. I'm trying to reproduce it now, and I can perhaps do it once out of ten. Sort of hard to track down... When rpc.mountd is hanging, is it perchance using a lot of CPU? Could you strace it while it hangs? (It looks a bit to me like it's doing lots of devmapper stuff, which has been problematic earlier.) Also, the output of rpc.mountd -d all -F during a problematic session would be useful, in particular at what point it hangs. /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Hell Aron and Steinar, Am 2007-07-28 21:15:30, schrieb Aron Griffis: I don't want to export my /proc, I want to export a filesystem that has proc mounted on a subdir. Consider an NFS-root structure: /chroots/foo/proc /chroots/foo/sys /chroots/foo/usr /chroots/foo/lib /chroots/foo/etc /chroots/foo/dev /chroots/foo/tmp Which can not work, since /proc must be the /proc of the machine WHICH is mounting the nfs-share. /proc contain infos about the running processes on the current machine and if ANY programs accessing /proc and do not find the corresponding PID/infos or such, it will hang. It will NOT even find its OWN PID. MANY programs depend on the /local proc and not a bind-mounted /proc from another machine. Do you have already tried to run a testmachine where /proc is not mounted? Try it and you will see tonns of unexpected errors and program behaviors. My Development-Station (Dual-Opteron) refuse to enter init 2 while booting. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSN LinuxMichi 0033/6/6192519367100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Bug#435056: nfs server hangs on /proc bind-mount
Michelle Konzack wrote: [Fri Aug 03 2007, 07:57:36AM EDT] Which can not work, since /proc must be the /proc of the machine WHICH is mounting the nfs-share. Your statements represent a misreading of the bug. Let's take a step-by-step approach: 1. The server has /etc/exports: /foo 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check) 2. The client can see the content of that filesystem, for example: /foo/bar/baz.txt 3. The server now mounts a directory: mount /dev/sdb1 /foo/bar 4. Now at this point, the server should see new content on /foo/bar, but the client should continue to see the underlying content. In other words, the client can still access /foo/bar/baz.txt HOWEVER, in some cases at least, the NFS connection is instead hanging on step 4. The client sends a LOOKUP on /foo/bar and the server never responds. The client retransmits the LOOKUP indefinitely. This seems to be easy to demostrate by mounting procfs on /foo/bar, but I've now seen it using other filesystems. The only reason I use the chroot example is because it is common to export a chroot environment as nfs-root. The clients see only the one filesystem, yes, but the server mounts additional directories so that it's possible to build and install software more easily in the server environment. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
On Sat, Jul 28, 2007 at 09:33:38PM -0400, Aron Griffis wrote: But NFS exports filesystems, so at least in theory, the /proc in there should be ignored completely unless you export it. Exactly. But instead it's hanging Well, nfs-utils does not mess with the mount after it has handed it over to the kernel, so on the surface of it, it looks like a kernel bug. Then again, if downgrading nfs-utils helps... (Note the simple reproducer I gave in the start of this bug report means that you also can test ;-) Unfortunately, I am on vacation and in no position to test NFS-related matters at all. Is it mountd or the kernel that freezes? Not sure. It can be worked around by continuously restarting the nfs server, so that sounds like userland. In wireshark it shows as a LOOKUP on /proc, and the server never replies. On what port? The mountd port? /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Package: nfs-kernel-server Version: 1:1.1.0-11 Severity: serious --- Please enter the report below this line. --- When a client attempts to read a bind-mounted proc directory from the server, the server never responds. Found this with wireshark and narrowed down to a simple test. Other bind-mounts seem to be unaffected (tested with lvm volumes) server -- mkdir -p /test/proc mount -o bind /proc /test/proc echo '/test 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check)' /etc/exports exportfs -a client -- mkdir /test mount server:/test /test /bin/ls /test # works /bin/ls /test/proc # hangs The problem goes away if I directly mount proc on /test/proc instead of using a bind-mount. --- System information. --- Architecture: i386 Kernel: Linux 2.6.21-2-686 Debian Release: lenny/sid 650 testing security.debian.org 650 testing ftp.us.debian.org 650 testing debian-multimedia.fx-services.com 600 unstabledebian-multimedia.fx-services.com 1002 unstablen01se.net --- Package information. --- Depends (Version) | Installed ==-+-= nfs-common (= 1:1.0.8-1) | 1:1.1.0-11 ucf| 3.001 lsb-base (= 1.3-9ubuntu3) | 3.1-23.1 libblkid1 (= 1.39-1) | 1.39+1.40-WIP-2006.11.14+dfsg-2 libc6 (= 2.5-5) | 2.6-2 libcomerr2 (= 1.33-3) | 1.39+1.40-WIP-2006.11.14+dfsg-2 libgssapi2 | 0.11-1 libkrb53 (= 1.6.dfsg.1) | 1.6.dfsg.1-6 libnfsidmap2 | 0.19-0+b1 librpcsecgss3 | 0.14-2 libwrap0 | 7.6.dbs-13 libblkid1 (= 1.39+1.40-WIP-2006.11.14+dfsg-2) | 1.39+1.40-WIP-2006.11.14+dfsg-2 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
severity 435056 normal thanks On Sat, Jul 28, 2007 at 08:21:47PM -0400, Aron Griffis wrote: When a client attempts to read a bind-mounted proc directory from the server, the server never responds. Found this with wireshark and narrowed down to a simple test. Other bind-mounts seem to be unaffected (tested with lvm volumes) Uhm, why do you want to export your /proc? I'm unsure if that's supported at all. In any case, the severity massively inflated -- I assume you meant grave and not serious, but not being able to export bind-mounted /proc surely does not make the entire package unusable or nearly so. /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Steinar H. Gunderson wrote: [Sat Jul 28 2007, 08:45:47PM EDT] In any case, the severity massively inflated -- I assume you meant grave and not serious, Regarding this part, the NFS server freezing in a situation when it previously worked is surely a serious error. And I think that the scenario is a common one for NFS-root environments. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Steinar H. Gunderson wrote: [Sat Jul 28 2007, 08:45:47PM EDT] Uhm, why do you want to export your /proc? I'm unsure if that's supported at all. In any case, the severity massively inflated -- I assume you meant grave and not serious, but not being able to export bind-mounted /proc surely does not make the entire package unusable or nearly so. I don't want to export my /proc, I want to export a filesystem that has proc mounted on a subdir. Consider an NFS-root structure: /chroots/foo/proc /chroots/foo/sys /chroots/foo/usr /chroots/foo/lib /chroots/foo/etc /chroots/foo/dev /chroots/foo/tmp This is exported via /chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check) (It's read-only because the client uses unionfs to make a writeable tmpfs over the NFS-root.) On the server I like to do fast builds and such. So I treat it like a chroot there and bind-mount important dirs. /dev/mapper/raid_vg-foo /chroots/foo ext3 noatime 0 2 /dev/chroots/foo/dev none bind 0 0 /dev/pts/chroots/foo/dev/pts devpts defaults 0 0 /dev/shm/chroots/foo/dev/shm none bind 0 0 /home/agriffis /chroots/foo/home/agriffis none bind 0 0 /proc /chroots/foo/proc none bind 0 0 /sys/chroots/foo/sys none bind 0 0 /tmp/chroots/foo/tmp none bind 0 0 This is quite a common scenario when working on large-scale NFS-root installations. This worked well with 1.0.9 but has stopped working with 1.1.0 because of the NFS server freeze when it encounters the bind-mounted proc dir. In fact I determined it also freezes when proc is mounted directly, nevermind bind-mounting. My original report said that it worked in that configuration, but it doesn't. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
On Sat, Jul 28, 2007 at 09:15:30PM -0400, Aron Griffis wrote: I don't want to export my /proc, I want to export a filesystem that has proc mounted on a subdir. But NFS exports filesystems, so at least in theory, the /proc in there should be ignored completely unless you export it. /chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check) In fact I determined it also freezes when proc is mounted directly, nevermind bind-mounting. My original report said that it worked in that configuration, but it doesn't. Have you tried changing to subtree_check? Is it mountd or the kernel that freezes? /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#435056: nfs server hangs on /proc bind-mount
Steinar H. Gunderson wrote: [Sat Jul 28 2007, 09:18:05PM EDT] On Sat, Jul 28, 2007 at 09:15:30PM -0400, Aron Griffis wrote: I don't want to export my /proc, I want to export a filesystem that has proc mounted on a subdir. But NFS exports filesystems, so at least in theory, the /proc in there should be ignored completely unless you export it. Exactly. But instead it's hanging /chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check) In fact I determined it also freezes when proc is mounted directly, nevermind bind-mounting. My original report said that it worked in that configuration, but it doesn't. Have you tried changing to subtree_check? I haven't yet. I thought no_subtree_check only referred to permissions, and nohide refers to sub-filesystems? I was also using no_subtree_check on 1.0.9 with this configuration. (Note the simple reproducer I gave in the start of this bug report means that you also can test ;-) Is it mountd or the kernel that freezes? Not sure. It can be worked around by continuously restarting the nfs server, so that sounds like userland. In wireshark it shows as a LOOKUP on /proc, and the server never replies. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]