Bug#435056: nfs server hangs on /proc bind-mount

2007-08-06 Thread Steinar H. Gunderson
On Sun, Aug 05, 2007 at 08:53:08PM +0200, Steinar H. Gunderson wrote:
 Hm. I'm trying to reproduce it now, and I can perhaps do it once out of ten.
 Sort of hard to track down...

Finally!

I got it to be weird while I was stracing it. From there it was only a
matter of remembering the following commit:

| From dd087896285da9e160e13ee9f7d75381b67895e3 Mon Sep 17 00:00:00 2001
| From: J. Bruce Fields [EMAIL PROTECTED]
| Date: Thu, 26 Jul 2007 16:30:46 -0400
| Subject: [PATCH] Use __fpurge to ensure single-line writes to cache files
| 
| On a recent Debian/Sid machine, I saw libc retrying stdio writes that
| returned write errors.  The result is that if an export downcall returns
| an error (which it can in normal operation, since it currently
| (incorrectly) returns -ENOENT on any negative downcall), then subsequent
| downcalls will write multiple lines (including the original line that
| received the error).
| 
| The result is that the server fails to respond to any rpc call that
| refers to an unexported mount point (such as a readdir of a directory
| containing such a mountpoint), so client commands hang.
| 
| I don't know whether this libc behavior is correct or expected, but it
| seems safest to add the __fpurge() (suggested by Neil) to ensure data is
| thrown away.
| 
| Signed-off-by: J. Bruce Fields [EMAIL PROTECTED]
| Signed-off-by: Neil Brown [EMAIL PROTECTED]

I uploaded the current git version to unstable earlier today, and it contains
this fix. It sounds like it should fix your issue -- could you please give it
a shot?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-08-05 Thread Aron Griffis
Steinar H. Gunderson wrote:  [Sun Aug 05 2007, 02:35:48PM EDT]
 This is interesting -- I got home to my testing machine, and
 I managed to reproduce it -- but only once. When I restarted
 nfs-kernel-server (via the init.d script), the hanging processes
 resumed, and from there it was completely unreproducible.
 
 Does this fix it for you too?

If I put the server restart in a loop, the client makes slow progress.
Restarting just once does not alleviate the problem forever.  It's
possible that I'm seeing it hang once per subdirectory, since I have
a number of mounts under the export.  I would need to test more to
discover.

This is the loop I used:

while true; do 
/etc/init.d/nfs-kernel-server restart
sleep 5
done

Thanks for reproducing it.  I'm glad it's not something peculiar to my
configuration.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-08-05 Thread Steinar H. Gunderson
On Sat, Jul 28, 2007 at 08:21:47PM -0400, Aron Griffis wrote:
 server
 --
 mkdir -p /test/proc
 mount -o bind /proc /test/proc
 echo '/test 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check)'  
 /etc/exports
 exportfs -a
 
 client
 --
 mkdir /test
 mount server:/test /test
 /bin/ls /test   #  works
 /bin/ls /test/proc  #  hangs

This is interesting -- I got home to my testing machine, and I managed to
reproduce it -- but only once. When I restarted nfs-kernel-server (via the
init.d script), the hanging processes resumed, and from there it was
completely unreproducible.

Does this fix it for you too?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-08-05 Thread Steinar H. Gunderson
On Sun, Aug 05, 2007 at 02:41:18PM -0400, Aron Griffis wrote:
 If I put the server restart in a loop, the client makes slow progress.
 Restarting just once does not alleviate the problem forever.  It's
 possible that I'm seeing it hang once per subdirectory, since I have
 a number of mounts under the export.  I would need to test more to
 discover.

Hm. I'm trying to reproduce it now, and I can perhaps do it once out of ten.
Sort of hard to track down...

When rpc.mountd is hanging, is it perchance using a lot of CPU? Could you
strace it while it hangs? (It looks a bit to me like it's doing lots of
devmapper stuff, which has been problematic earlier.) Also, the output of
rpc.mountd -d all -F during a problematic session would be useful, in
particular at what point it hangs.

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-08-03 Thread Michelle Konzack
Hell Aron and Steinar,

Am 2007-07-28 21:15:30, schrieb Aron Griffis:
 I don't want to export my /proc, I want to export a filesystem that
 has proc mounted on a subdir. 
 
 Consider an NFS-root structure:
 
 /chroots/foo/proc
 /chroots/foo/sys
 /chroots/foo/usr
 /chroots/foo/lib
 /chroots/foo/etc
 /chroots/foo/dev
 /chroots/foo/tmp


Which can not work, since /proc must be the /proc of the machine WHICH
is mounting the nfs-share.

/proc contain infos about the running processes on the current machine
and if ANY programs accessing /proc and do not find the corresponding
PID/infos or such, it will hang.  It will NOT even find its OWN PID.

MANY programs depend on the /local proc and not a bind-mounted /proc
from another machine.

Do you have already tried to run a testmachine where /proc is not
mounted?  Try it and you will see tonns of unexpected errors and
program behaviors.  My Development-Station (Dual-Opteron) refuse to
enter init 2 while booting.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
   50, rue de Soultz MSN LinuxMichi
0033/6/6192519367100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Bug#435056: nfs server hangs on /proc bind-mount

2007-08-03 Thread Aron Griffis
Michelle Konzack wrote:  [Fri Aug 03 2007, 07:57:36AM EDT]
 Which can not work, since /proc must be the /proc of the machine WHICH
 is mounting the nfs-share.

Your statements represent a misreading of the bug.  Let's take
a step-by-step approach:

1. The server has /etc/exports:

/foo 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check)

2. The client can see the content of that filesystem, for example:

/foo/bar/baz.txt

3. The server now mounts a directory:

mount /dev/sdb1 /foo/bar

4. Now at this point, the server should see new content on /foo/bar,
   but the client should continue to see the underlying content.  In
   other words, the client can still access /foo/bar/baz.txt

HOWEVER, in some cases at least, the NFS connection is instead hanging
on step 4.  The client sends a LOOKUP on /foo/bar and the server never
responds.  The client retransmits the LOOKUP indefinitely.

This seems to be easy to demostrate by mounting procfs on /foo/bar,
but I've now seen it using other filesystems.

The only reason I use the chroot example is because it is common to
export a chroot environment as nfs-root.  The clients see only the one
filesystem, yes, but the server mounts additional directories so that
it's possible to build and install software more easily in the server
environment.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-29 Thread Steinar H. Gunderson
On Sat, Jul 28, 2007 at 09:33:38PM -0400, Aron Griffis wrote:
 But NFS exports filesystems, so at least in theory, the /proc in there should
 be ignored completely unless you export it.
 Exactly.  But instead it's hanging

Well, nfs-utils does not mess with the mount after it has handed it over to
the kernel, so on the surface of it, it looks like a kernel bug. Then again,
if downgrading nfs-utils helps...

 (Note the simple reproducer I gave in the start of this bug report
 means that you also can test ;-)

Unfortunately, I am on vacation and in no position to test NFS-related
matters at all.

 Is it mountd or the kernel that freezes?
 Not sure.  It can be worked around by continuously restarting the nfs
 server, so that sounds like userland.  In wireshark it shows as
 a LOOKUP on /proc, and the server never replies.

On what port? The mountd port?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Aron Griffis
Package: nfs-kernel-server
Version: 1:1.1.0-11
Severity: serious

--- Please enter the report below this line. ---
When a client attempts to read a bind-mounted proc directory from the
server, the server never responds.  Found this with wireshark and
narrowed down to a simple test.  Other bind-mounts seem to be
unaffected (tested with lvm volumes)

server
--
mkdir -p /test/proc
mount -o bind /proc /test/proc
echo '/test 10.0.0.0/16(rw,no_root_squash,async,no_subtree_check)'  
/etc/exports
exportfs -a

client
--
mkdir /test
mount server:/test /test
/bin/ls /test   #  works
/bin/ls /test/proc  #  hangs

The problem goes away if I directly mount proc on /test/proc instead
of using a bind-mount.

--- System information. ---
Architecture: i386
Kernel:   Linux 2.6.21-2-686

Debian Release: lenny/sid
  650 testing security.debian.org 
  650 testing ftp.us.debian.org 
  650 testing debian-multimedia.fx-services.com 
  600 unstabledebian-multimedia.fx-services.com 
 1002 unstablen01se.net 

--- Package information. ---
Depends  (Version) | Installed
==-+-=
nfs-common  (= 1:1.0.8-1) | 1:1.1.0-11
ucf| 3.001
lsb-base (= 1.3-9ubuntu3) | 3.1-23.1
libblkid1  (= 1.39-1) | 
1.39+1.40-WIP-2006.11.14+dfsg-2
libc6   (= 2.5-5) | 2.6-2
libcomerr2 (= 1.33-3) | 
1.39+1.40-WIP-2006.11.14+dfsg-2
libgssapi2 | 0.11-1
libkrb53   (= 1.6.dfsg.1) | 1.6.dfsg.1-6
libnfsidmap2   | 0.19-0+b1
librpcsecgss3  | 0.14-2
libwrap0   | 7.6.dbs-13
libblkid1 (= 1.39+1.40-WIP-2006.11.14+dfsg-2) | 
1.39+1.40-WIP-2006.11.14+dfsg-2



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Steinar H. Gunderson
severity 435056 normal
thanks

On Sat, Jul 28, 2007 at 08:21:47PM -0400, Aron Griffis wrote:
 When a client attempts to read a bind-mounted proc directory from the
 server, the server never responds.  Found this with wireshark and
 narrowed down to a simple test.  Other bind-mounts seem to be
 unaffected (tested with lvm volumes)

Uhm, why do you want to export your /proc? I'm unsure if that's supported at
all. In any case, the severity massively inflated -- I assume you meant
grave and not serious, but not being able to export bind-mounted /proc
surely does not make the entire package unusable or nearly so.

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Aron Griffis
Steinar H. Gunderson wrote:  [Sat Jul 28 2007, 08:45:47PM EDT]
 In any case, the severity massively inflated -- I assume you meant
 grave and not serious,

Regarding this part, the NFS server freezing in a situation when it
previously worked is surely a serious error.  And I think that the
scenario is a common one for NFS-root environments.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Aron Griffis
Steinar H. Gunderson wrote:  [Sat Jul 28 2007, 08:45:47PM EDT]
 Uhm, why do you want to export your /proc? I'm unsure if that's supported at
 all. In any case, the severity massively inflated -- I assume you meant
 grave and not serious, but not being able to export bind-mounted /proc
 surely does not make the entire package unusable or nearly so.

I don't want to export my /proc, I want to export a filesystem that
has proc mounted on a subdir. 

Consider an NFS-root structure:

/chroots/foo/proc
/chroots/foo/sys
/chroots/foo/usr
/chroots/foo/lib
/chroots/foo/etc
/chroots/foo/dev
/chroots/foo/tmp

This is exported via

/chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check)

(It's read-only because the client uses unionfs to make a writeable
tmpfs over the NFS-root.)

On the server I like to do fast builds and such.  So I treat it like
a chroot there and bind-mount important dirs.

/dev/mapper/raid_vg-foo /chroots/foo   ext3   noatime  0 2
/dev/chroots/foo/dev   none   bind 0 0
/dev/pts/chroots/foo/dev/pts   devpts defaults 0 0
/dev/shm/chroots/foo/dev/shm   none   bind 0 0
/home/agriffis  /chroots/foo/home/agriffis none   bind 0 0
/proc   /chroots/foo/proc  none   bind 0 0
/sys/chroots/foo/sys   none   bind 0 0
/tmp/chroots/foo/tmp   none   bind 0 0

This is quite a common scenario when working on large-scale NFS-root
installations.  This worked well with 1.0.9 but has stopped working
with 1.1.0 because of the NFS server freeze when it encounters the
bind-mounted proc dir.

In fact I determined it also freezes when proc is mounted directly,
nevermind bind-mounting.  My original report said that it worked in
that configuration, but it doesn't.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Steinar H. Gunderson
On Sat, Jul 28, 2007 at 09:15:30PM -0400, Aron Griffis wrote:
 I don't want to export my /proc, I want to export a filesystem that
 has proc mounted on a subdir. 

But NFS exports filesystems, so at least in theory, the /proc in there should
be ignored completely unless you export it.

 /chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check)
 
 In fact I determined it also freezes when proc is mounted directly,
 nevermind bind-mounting.  My original report said that it worked in
 that configuration, but it doesn't.

Have you tried changing to subtree_check?

Is it mountd or the kernel that freezes?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#435056: nfs server hangs on /proc bind-mount

2007-07-28 Thread Aron Griffis
Steinar H. Gunderson wrote:  [Sat Jul 28 2007, 09:18:05PM EDT]
 On Sat, Jul 28, 2007 at 09:15:30PM -0400, Aron Griffis wrote:
  I don't want to export my /proc, I want to export a filesystem that
  has proc mounted on a subdir. 
 
 But NFS exports filesystems, so at least in theory, the /proc in there should
 be ignored completely unless you export it.

Exactly.  But instead it's hanging

  /chroots/foo 10.0.0.0/8(ro,no_root_squash,no_subtree_check)
  
  In fact I determined it also freezes when proc is mounted directly,
  nevermind bind-mounting.  My original report said that it worked in
  that configuration, but it doesn't.
 
 Have you tried changing to subtree_check?

I haven't yet.  I thought no_subtree_check only referred to
permissions, and nohide refers to sub-filesystems?  I was also using
no_subtree_check on 1.0.9 with this configuration.

(Note the simple reproducer I gave in the start of this bug report
means that you also can test ;-)

 Is it mountd or the kernel that freezes?

Not sure.  It can be worked around by continuously restarting the nfs
server, so that sounds like userland.  In wireshark it shows as
a LOOKUP on /proc, and the server never replies.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]