Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
FWIW, this still happens when both client server are running Linux 3.11.0-rc5 (vanilla). $ dpkg -l | grep nfs | cut -c-70 ii libnfsidmap2:amd64 0.25-4amd64 ii nfs-common 1:1.2.6-4 amd64 ii nfs-kernel-server 1:1.2.6-4 amd64 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.deb.2.10.1308120113360.7...@trent.utfs.org
Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
Sorry for the noise, here's another oddity, same setup (client server running 3.11-rc5): $ find /mnt/nfs/usr/share/ -name getopt.awk -ls 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk It's the same file, but gets reported 10 times! Hence the error when trying to tar(1) the directory: $ tar -cf - /mnt/nfs/usr/share/awk/ /dev/null tar: Removing leading `/' from member names tar: /mnt/nfs/usr/share/awk/: Cannot savedir: Too many levels of symbolic links tar: Exiting with failure status due to previous errors On the server: $ find /mnt/disk/usr/share/ -name getopt.awk -ls 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/disk/usr/share/awk/getopt.awk So, is JFS NFS really br0ken and nobody noticed? -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.deb.2.10.1308120122500.7...@trent.utfs.org
Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
On Mon, Aug 12, 2013 at 01:29:15AM -0700, Christian Kujau wrote: Sorry for the noise, here's another oddity, same setup (client server running 3.11-rc5): $ find /mnt/nfs/usr/share/ -name getopt.awk -ls 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk It's the same file, but gets reported 10 times! Hence the error when trying to tar(1) the directory: $ tar -cf - /mnt/nfs/usr/share/awk/ /dev/null tar: Removing leading `/' from member names tar: /mnt/nfs/usr/share/awk/: Cannot savedir: Too many levels of symbolic links tar: Exiting with failure status due to previous errors On the server: $ find /mnt/disk/usr/share/ -name getopt.awk -ls 250724 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/disk/usr/share/awk/getopt.awk So, is JFS NFS really br0ken and nobody noticed? It does sound like a jfs bug, and I don't know if anyone tests nfs exports of jfs regularly. It might be interesting to get a network trace (something like tcpdump -s0 -wtmp.pcap; then wireshark tmp.pcap and look at the cookie fields in the readdir calls and replies. The server shouldn't return the same one twice on one read through the directory. And when the client uses a cookie it should get the next entries, not already-returned entries.) You could also just run strace -egetdents64 -v ls on the server on the exported filesystem, in a problem directory, and see if the offsets are unique. --b. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130812162924.gb2...@fieldses.org
Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
On Mon, 12 Aug 2013 at 12:29, J. Bruce Fields wrote: It might be interesting to get a network trace (something like tcpdump -s0 -wtmp.pcap; then wireshark tmp.pcap and look at the cookie fields in the readdir calls and replies. I've created #60737[0] to track this issue upstream and attached a pcap to the bug, obtained while running find dir -ls on the client. But I fail to look at the right details in tcpdump/wireshare, I don't see any cookie information... You could also just run strace -egetdents64 -v ls on the server on the exported filesystem, in a problem directory, and see if the offsets are unique. strace returned nothing for getdents64, only getdents. My test filesystems are 256 MB in size, maybe this is too small for getdents64 to be used? All the calls to getdents however return unique offsets, if I did this right: $ strace -egetdents -v ls /mnt/disk_jfs/usr/share/terminfo/q 21 | egrep -o d_off=[0-9]* | sort When running ls (even w/o -l) on the client on that NFS share, this readdir loop message is printed. HTH, Christian. [0] https://bugzilla.kernel.org/show_bug.cgi?id=60737 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.deb.2.10.1308121257020.7...@trent.utfs.org
Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
Interesting stuff. Out of curiosity I just tried this myself, both client server are virtual machines running Debian/stable (3.2.0-4-amd64) and I was able to reproduce this. A test case would be: ## server: $ apt-get install nfs-kernel-server jfsutils $ dd if=/dev/zero bs=1M count=256 /var/test.img $ losetup -f /var/test.img $ mkfs.jfs /dev/loop0 $ mount -t jfs /dev/loop0 /mnt/disk $ tar -C / -cf - usr/share | tar -C /mnt/disk/ -xf - $ tail -1 /etc/exports /mnt/disk 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check) $ service nfs-kernel-server restart ## client $ apt-get install nfs-common $ showmount -e server | tail -1 /mnt/disk 192.168.0.0/24 $ tail -1 /etc/fstab server:/mnt/disk /mnt/nfs nfs rsize=8192,wsize=8192,intr 0 0 $ mount /mnt/nfs $ mount | tail -1 server:/mnt/disk on /mnt/nfs type nfs4 (rw,relatime,vers=4,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.137,minorversion=0,local_lock=none,addr=192.168.0.138) $ tar -cf - /mnt/nfs/ /dev/null tar: Removing leading `/' from member names tar: Removing leading `/' from hard link targets tar: /mnt/nfs/usr/share/perl/5.14.2/Pod/: Cannot savedir: Too many levels of symbolic links tar: Exiting with failure status due to previous errors $ dmesg | tail [ 63.912327] RPC: Registered named UNIX socket transport module. [ 63.913801] RPC: Registered udp transport module. [ 63.914713] RPC: Registered tcp transport module. [ 63.915644] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 63.949485] FS-Cache: Loaded [ 63.972688] FS-Cache: Netfs 'nfs' registered for caching [ 63.993300] Installing knfsd (copyright (C) 1996 o...@monad.swb.de). [ 284.733629] loop: module loaded [ 840.372846] NFS: directory 5.14.2/Pod contains a readdir loop.Please contact your server vendor. The file: Simple has duplicate cookie 18 [ 840.375842] NFS: directory 5.14.2/Pod contains a readdir loop.Please contact your server vendor. The file: Simple has duplicate cookie 18 There are no messages on the server when this happens. The message on the client repeats on every attempt, this Cannot savedir above may be triggering it. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.deb.2.10.1308092352550.7...@trent.utfs.org
Bug#714974: [Jfs-discussion] NFS 'readdir loop' error on JFS
On 08/10/2013 02:28 AM, Christian Kujau wrote: Interesting stuff. Out of curiosity I just tried this myself, both client server are virtual machines running Debian/stable (3.2.0-4-amd64) and I was able to reproduce this. A test case would be: I still haven't rebooted that machine - last chance to ask for any test info - as it looks like you have a test case anyway. I haven't lost any data that I know of - just programs complaining etc. IMO, at one time, jfs was really a better choice ( good set of tools). Even in a few cases where hardware failed the jfs tools worked well. Today with everyone banging on ext4 it has become the better choice. ( I don't think IBM is interested in supporting jfs - no idea if they are phasing out jfs2? ). Karl Schmidt EMail k...@xtronics.com Transtronics, Inc. WEB http://secure.transtronics.com 3209 West 9th Street Ph (785) 841-3089 Lawrence, KS 66049 FAX (785) 841-0434 The world runs on individuals pursuing their separate interests. The great achievements of civilization have not come from government bureaus. Einstein didn’t construct his theory under order from a bureaucrat. Henry Ford didn’t revolutionize the automobile industry that way. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/520697dc.1030...@xtronics.com