Bug#426571: nfs-kernel-server: Frequent can't read superblock, rpc.mountd spinning at 100% CPU

2007-06-03 Thread Steinar H. Gunderson
On Fri, Jun 01, 2007 at 02:48:42PM +0200, Tobias Diedrich wrote:
 I also noticed that while I had upgraded nfs-kernel-server I was
 still using an older version of libblkid1.  Maybe that upgrade fixed
 it.

I remember issues with it, yes -- those were leaks, but still...

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent can't read superblock, rpc.mountd spinning at 100% CPU

2007-06-02 Thread Tobias Diedrich
Tobias Diedrich wrote:
 Steinar H. Gunderson wrote:
  On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
   After upgrading nfs-kernel-server to a new testing version,
   nfs started behaving erratically. Upgrade to newest unstable version did
   not help.
  
  Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
  should probably be able to help you.
  
   [blkid_probe_all_new taking _really long_]
  
  That's a bit odd, too. :-)
  
   blk_id_probe_all_new seems to scan /proc/partitions and the do some
   probeing on the devices?
  
  Mm. I take it you don't have any really weird devices attached that could be
  responsible for the slowdown?
 
 Not that I can think of.
 However, I just upgraded the server to new hardware (faster cpu,
 bigger disks) and the problem seems to be gone now...
 I probably should try ltracing rpc.mountd to look at the execution
 time of blk_id_probe_all_new again, just in case it's still
 excessive, but the faster CPU might compensate for that.
 
 Hmm, no... Seems fine now:
 14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17,
 0xf7ef4023) = 0
 14:41:51.094040 __xstat64(3, /home/ranma/mail, 0xffe7ef94) = 0
 
 I also noticed that while I had upgraded nfs-kernel-server I was
 still using an older version of libblkid1.  Maybe that upgrade fixed
 it.
 
 Anyway, I should be able to boot up the old components to do some
 tests with them...  Maybe it's a timing problem that only shows up
 on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+)

Unfortunately (or maybe fortunately ;)), while I did boot up the old
parts I could not reproduce the problem again (during the short
testing time).  I also tried downgrading libblkid1 back to testing
but it still worked flawlessly.

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent can't read superblock, rpc.mountd spinning at 100% CPU

2007-06-02 Thread Tobias Diedrich
Steinar H. Gunderson wrote:
 On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
  After upgrading nfs-kernel-server to a new testing version,
  nfs started behaving erratically. Upgrade to newest unstable version did
  not help.
 
 Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
 should probably be able to help you.
 
  [blkid_probe_all_new taking _really long_]
 
 That's a bit odd, too. :-)
 
  blk_id_probe_all_new seems to scan /proc/partitions and the do some
  probeing on the devices?
 
 Mm. I take it you don't have any really weird devices attached that could be
 responsible for the slowdown?

Not that I can think of.
However, I just upgraded the server to new hardware (faster cpu,
bigger disks) and the problem seems to be gone now...
I probably should try ltracing rpc.mountd to look at the execution
time of blk_id_probe_all_new again, just in case it's still
excessive, but the faster CPU might compensate for that.

Hmm, no... Seems fine now:
14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17,
0xf7ef4023) = 0
14:41:51.094040 __xstat64(3, /home/ranma/mail, 0xffe7ef94) = 0

I also noticed that while I had upgraded nfs-kernel-server I was
still using an older version of libblkid1.  Maybe that upgrade fixed
it.

Anyway, I should be able to boot up the old components to do some
tests with them...  Maybe it's a timing problem that only shows up
on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+)

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent can't read superblock, rpc.mountd spinning at 100% CPU

2007-05-30 Thread Steinar H. Gunderson
On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
 After upgrading nfs-kernel-server to a new testing version,
 nfs started behaving erratically. Upgrade to newest unstable version did
 not help.

Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
should probably be able to help you.

 [blkid_probe_all_new taking _really long_]

That's a bit odd, too. :-)

 blk_id_probe_all_new seems to scan /proc/partitions and the do some
 probeing on the devices?

Mm. I take it you don't have any really weird devices attached that could be
responsible for the slowdown?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent can't read superblock, rpc.mountd spinning at 100% CPU

2007-05-29 Thread Tobias Diedrich
Package: nfs-kernel-server
Version: 1:1.1.0-2
Severity: important

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


After upgrading nfs-kernel-server to a new testing version,
nfs started behaving erratically. Upgrade to newest unstable version did
not help.

mounting now often fails with can't read superblock and yields an
almost immediat nfs: server nukunuku not responding, still trying in
the kernel log.

Also, when the mount did work, accessing a directory after some period
of inactivity takes forever (again with nfs: server nukunuku not
responding, still trying and sometime later nfs: server nukunuku OK).

During the server not responding time (and even some time after that,
if the mount attempt times out) rpc.mountd spins at 100% CPU.
ltrace reveals that the call to blkid_probe_all_new takes ages:

[rpc.mountd is idle]
19:03:05.293900 select(1024, 0xbfef925c, 0, 0, 0)  = 1
[incoming mount request]
19:03:05.301495 svc_getreqset(0xbfef925c, 0xbfef925c, 0, 0, 0
unfinished ...
19:03:05.302757 inet_ntoa(0xf108a8c0)  =
192.168.8.241
19:03:05.305276 hosts_ctl(0x805677e, 0x8057485, 0xb7daa6a0, 0x8057485,
0xbfef8bd0) = 1
[...]
[blkid_probe_all_new taking _really long_]
19:03:06.641248 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
[back from blkid_probe_all_new]
19:03:26.403212 __xstat64(3, /home/ranma/mail, 0xbfef7914)   = 0
19:03:26.404184 blkid_devno_to_devname(64771, 0, 0xbfef7914, 17,
0xb7f9f023) = 0x8063f50
19:03:26.405733 blkid_get_dev(0x806ba80, 0x8063f50, 3, 17, 0xb7f9f023) =
0x806ef00
19:03:26.426065 free(0x8063f50)=
void
19:03:26.426640 blkid_tag_iterate_begin(0x806ef00, 0x8063f50, 3, 17,
0xb7f9f023) = 0x806ced0
19:03:26.428163 blkid_tag_next(0x806ced0, 0xbfef7974, 0xbfef7978, 17,
0xb7f9f023) = 0
19:03:26.428751 blkid_tag_iterate_end(0x806ced0, 0xbfef7974, 0xbfef7978,
17, 0xb7f9f023) = 0x806ced0
19:03:26.429343 memset(0xbfef79ac, '\000', 16) =
0xbfef79ac
19:03:26.429878 __ctype_b_loc()=
0xb7daa690
19:03:26.430506 fwrite(uuid /ranma/mail ya.is-a-geek.or..., 5, 1,
0x8063de8) = 1
[...]
[lather, rinse repeat?]
19:03:27.535890 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
19:03:44.935949 __xstat64(3, /home/ranma/mail, 0xbfef7914)   = 0
[...]
19:03:45.966056 blkid_probe_all_new(0x806ba80, 0x46460470, 0,
0x465c5cc5, 0) = 0
19:04:03.208417 __xstat64(3, /home/ranma/mail, 0xbfef9074)   = 0
[...]
19:04:03.09 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:23.466075 __xstat64(3, /home, 0xbfef9074)  = 0
[I see a pattern]
19:04:24.462181 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:41.805271 __xstat64(3, /usr, 0xbfef9074)   = 0
[This might take a while]
19:04:41.817657 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:58.288851 __xstat64(3, /var, 0xbfef9074)   = 0
[...]
19:05:52.771674 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
19:06:13.230946 __xstat64(3, /home/ranma/mail, 0xbfef8f84)   = 0
[...]
19:06:15.953164 select(1024, 0xbfef925c, 0, 0, 0
[finally idle again!]

So a single mount request caused rpc.mountd to spin at 100% CPU for
about 3 minutes.

blk_id_probe_all_new seems to scan /proc/partitions and the do some
probeing on the devices?

This is my /proc/partitions:
|major minor  #blocks  name
|
|   8 0  160086528 sda
|   8 1 497983 sda1
|   8 21461915 sda2
|   8 33911827 sda3
|   8 4  150416595 sda4
|   816  156290904 sdb
|   817 497983 sdb1
|   8181461915 sdb2
|   8193911827 sdb3
|   820  150416595 sdb4
|   832  156290904 sdc
|   833 497983 sdc1
|   8341461915 sdc2
|   8353911827 sdc3
|   836  150416595 sdc4
|   9 0 497856 md0
|   9 3  300833024 md3
|   9 23911744 md2
|   9 11461824 md1
| 253 08388608 dm-0
| 253 18388608 dm-1
| 253 2  146800640 dm-2
| 253 31048576 dm-3
| 253 4   62914560 dm-4
| 253 5   41943040 dm-5

- -- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.21.1
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii  libblkid 1.39+1.40-WIP-2006.11.14+dfsg-2 block device id library
ii  libc62.5-7   GNU C Library: Shared libraries
ii  libcomer 1.39+1.40-WIP-2006.11.14+dfsg-2 common error description library
ii  libgssap 0.10-4  A mechanism-switch gssapi library
ii  libkrb53 1.4.4-8 MIT Kerberos runtime libraries
ii  libnfsid 0.18-0  An nfs idmapping library
ii  librpcse 0.14-2