Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount

2011-05-29 Thread Mark Moseley
On Thu, May 26, 2011 at 8:02 AM, Ian Kent ra...@themaw.net wrote:
 On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote:
 On Wed, 25 May 2011 16:08:15 -0400
 Jeff Layton jlay...@redhat.com wrote:

  On Wed, 27 Apr 2011 16:23:07 -0700
  Mark Moseley moseleym...@gmail.com wrote:
 
  
   I posted this to bugzilla a while back but I figured I'd paste it here 
   too:
  
   -
  
   I've been getting bit by the exact same bug and been bisecting for the 
   past
   couple of weeks. It's slow going as it can sometimes take a day for the 
   BUG()
   to show up (though can also at time take 10 minutes). And I've also seen 
   it
   more than once where something was good after a day and then BUG()'d 
   later on,
   just to make things more complicated. So the upshot is that while I feel
   confident enough about this latest batch of bisecting to post it here, I
   wouldn't bet my life on it. I hope this isn't a case where bisecting 
   just shows
   where the bug gets exposed but not where it actually got planted :)
  
   Incidentally, I tried the patch from the top of this thread and it 
   didn't seem
   to make a difference. I still got bit.
  
   I've posted on the linux-fsdevel thread that Jeff Layton started about 
   it,
   http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more 
   details
   on my setup (though I'll be happy to provide anything else you need). 
   Though in
   that thread you'll see that I'm not using autofs explicitly, the Netapp 
   GX
   cluster NFS appears to use autofs to do the implicit submounts (I'm not 
   100%
   sure that's the correct terminology, so hopefully you know what I mean).
  
   Here's my bisect log, ending up at commit
   e61da20a50d21725ff27571a6dff9468e4fb7146
  
   git bisect start 'fs'
   # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
   git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
   # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
   git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
   # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 
   'for_linus' of
   git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
   git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6
   # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: 
   consolidate code
   git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277
   # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 
   'for-linus' of
   git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
   git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1
   # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback
   restarting
   git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711
   # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up
   autofs4_free_ino()
   git bisect bad 6651149371b842715906311b4631b8489cebf7e8
   # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and 
   use it
   instead of the system workqueue
   git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b
   # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use 
   d_automount()
   rather than abusing follow_link()
   git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f
   # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add 
   d_manage()
   dentry operation
   git bisect good b5b801779d59165c4ecf1009009109545bd1f642
   # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode
   operations
   git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146
   # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove 
   unused code
   git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606
 
  I can more or less reproduce this at will now, I think even with very
  few NFS operations on an automounted nfsv4 mount. Here's an oops from a
  2.6.39 kernel:
 
  [  119.419789] tun0: Features changed: 0x4800 - 0x4000
  [  178.242917] FS-Cache: Netfs 'nfs' registered for caching
  [  178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses 
  genfs_contexts
  [  178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses 
  genfs_contexts
  [  523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) 
  [unmount of nfs4 0:2c]
  [  523.953306] [ cut here ]
  [  523.954013] kernel BUG at fs/dcache.c:925!
  [  523.954013] invalid opcode:  [#1] SMP
  [  523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent
  [  523.954013] CPU 1
  [  523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse 
  ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache 
  cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs 
  snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec 
  snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi 

Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount

2011-05-26 Thread Jeff Layton
On Wed, 25 May 2011 16:08:15 -0400
Jeff Layton jlay...@redhat.com wrote:

 On Wed, 27 Apr 2011 16:23:07 -0700
 Mark Moseley moseleym...@gmail.com wrote:
 
  
  I posted this to bugzilla a while back but I figured I'd paste it here too:
  
  -
  
  I've been getting bit by the exact same bug and been bisecting for the past
  couple of weeks. It's slow going as it can sometimes take a day for the 
  BUG()
  to show up (though can also at time take 10 minutes). And I've also seen it
  more than once where something was good after a day and then BUG()'d later 
  on,
  just to make things more complicated. So the upshot is that while I feel
  confident enough about this latest batch of bisecting to post it here, I
  wouldn't bet my life on it. I hope this isn't a case where bisecting just 
  shows
  where the bug gets exposed but not where it actually got planted :)
  
  Incidentally, I tried the patch from the top of this thread and it didn't 
  seem
  to make a difference. I still got bit.
  
  I've posted on the linux-fsdevel thread that Jeff Layton started about it,
  http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more 
  details
  on my setup (though I'll be happy to provide anything else you need). 
  Though in
  that thread you'll see that I'm not using autofs explicitly, the Netapp GX
  cluster NFS appears to use autofs to do the implicit submounts (I'm not 100%
  sure that's the correct terminology, so hopefully you know what I mean).
  
  Here's my bisect log, ending up at commit
  e61da20a50d21725ff27571a6dff9468e4fb7146
  
  git bisect start 'fs'
  # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
  git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
  # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
  git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
  # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' 
  of
  git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
  git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6
  # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate 
  code
  git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277
  # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' 
  of
  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
  git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1
  # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback
  restarting
  git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711
  # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up
  autofs4_free_ino()
  git bisect bad 6651149371b842715906311b4631b8489cebf7e8
  # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use 
  it
  instead of the system workqueue
  git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b
  # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount()
  rather than abusing follow_link()
  git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f
  # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage()
  dentry operation
  git bisect good b5b801779d59165c4ecf1009009109545bd1f642
  # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode
  operations
  git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146
  # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused 
  code
  git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606
 
 I can more or less reproduce this at will now, I think even with very
 few NFS operations on an automounted nfsv4 mount. Here's an oops from a
 2.6.39 kernel:
 
 [  119.419789] tun0: Features changed: 0x4800 - 0x4000
 [  178.242917] FS-Cache: Netfs 'nfs' registered for caching
 [  178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses genfs_contexts
 [  178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses genfs_contexts
 [  523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) [unmount 
 of nfs4 0:2c]
 [  523.953306] [ cut here ]
 [  523.954013] kernel BUG at fs/dcache.c:925!
 [  523.954013] invalid opcode:  [#1] SMP 
 [  523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent
 [  523.954013] CPU 1 
 [  523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse 
 ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache 
 cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs 
 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec 
 snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device 
 snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media sp5100_tco 
 v4l2_compat_ioctl32 edac_core parport_pc snd_page_alloc i2c_piix4 
 edac_mce_amd k10temp parport wmi r8169 microcode mii virtio_net kvm_amd kvm 
 ipv6 ata_generic 

Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount

2011-05-26 Thread Ian Kent
On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote:
 On Wed, 25 May 2011 16:08:15 -0400
 Jeff Layton jlay...@redhat.com wrote:
 
  On Wed, 27 Apr 2011 16:23:07 -0700
  Mark Moseley moseleym...@gmail.com wrote:
  
   
   I posted this to bugzilla a while back but I figured I'd paste it here 
   too:
   
   -
   
   I've been getting bit by the exact same bug and been bisecting for the 
   past
   couple of weeks. It's slow going as it can sometimes take a day for the 
   BUG()
   to show up (though can also at time take 10 minutes). And I've also seen 
   it
   more than once where something was good after a day and then BUG()'d 
   later on,
   just to make things more complicated. So the upshot is that while I feel
   confident enough about this latest batch of bisecting to post it here, I
   wouldn't bet my life on it. I hope this isn't a case where bisecting just 
   shows
   where the bug gets exposed but not where it actually got planted :)
   
   Incidentally, I tried the patch from the top of this thread and it didn't 
   seem
   to make a difference. I still got bit.
   
   I've posted on the linux-fsdevel thread that Jeff Layton started about it,
   http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more 
   details
   on my setup (though I'll be happy to provide anything else you need). 
   Though in
   that thread you'll see that I'm not using autofs explicitly, the Netapp GX
   cluster NFS appears to use autofs to do the implicit submounts (I'm not 
   100%
   sure that's the correct terminology, so hopefully you know what I mean).
   
   Here's my bisect log, ending up at commit
   e61da20a50d21725ff27571a6dff9468e4fb7146
   
   git bisect start 'fs'
   # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
   git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
   # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
   git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
   # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 
   'for_linus' of
   git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
   git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6
   # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: 
   consolidate code
   git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277
   # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 
   'for-linus' of
   git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
   git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1
   # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback
   restarting
   git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711
   # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up
   autofs4_free_ino()
   git bisect bad 6651149371b842715906311b4631b8489cebf7e8
   # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and 
   use it
   instead of the system workqueue
   git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b
   # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount()
   rather than abusing follow_link()
   git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f
   # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage()
   dentry operation
   git bisect good b5b801779d59165c4ecf1009009109545bd1f642
   # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode
   operations
   git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146
   # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused 
   code
   git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606
  
  I can more or less reproduce this at will now, I think even with very
  few NFS operations on an automounted nfsv4 mount. Here's an oops from a
  2.6.39 kernel:
  
  [  119.419789] tun0: Features changed: 0x4800 - 0x4000
  [  178.242917] FS-Cache: Netfs 'nfs' registered for caching
  [  178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses 
  genfs_contexts
  [  178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses 
  genfs_contexts
  [  523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) 
  [unmount of nfs4 0:2c]
  [  523.953306] [ cut here ]
  [  523.954013] kernel BUG at fs/dcache.c:925!
  [  523.954013] invalid opcode:  [#1] SMP 
  [  523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent
  [  523.954013] CPU 1 
  [  523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse 
  ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache 
  cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs 
  snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec 
  snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device 
  snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media 

Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount

2011-05-26 Thread Jeff Layton
On Thu, 26 May 2011 23:02:32 +0800
Ian Kent ra...@themaw.net wrote:

 On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote:
  On Wed, 25 May 2011 16:08:15 -0400
  Jeff Layton jlay...@redhat.com wrote:
  
   On Wed, 27 Apr 2011 16:23:07 -0700
   Mark Moseley moseleym...@gmail.com wrote:
   

I posted this to bugzilla a while back but I figured I'd paste it here 
too:

-

I've been getting bit by the exact same bug and been bisecting for the 
past
couple of weeks. It's slow going as it can sometimes take a day for the 
BUG()
to show up (though can also at time take 10 minutes). And I've also 
seen it
more than once where something was good after a day and then BUG()'d 
later on,
just to make things more complicated. So the upshot is that while I feel
confident enough about this latest batch of bisecting to post it here, I
wouldn't bet my life on it. I hope this isn't a case where bisecting 
just shows
where the bug gets exposed but not where it actually got planted :)

Incidentally, I tried the patch from the top of this thread and it 
didn't seem
to make a difference. I still got bit.

I've posted on the linux-fsdevel thread that Jeff Layton started about 
it,
http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more 
details
on my setup (though I'll be happy to provide anything else you need). 
Though in
that thread you'll see that I'm not using autofs explicitly, the Netapp 
GX
cluster NFS appears to use autofs to do the implicit submounts (I'm not 
100%
sure that's the correct terminology, so hopefully you know what I mean).

Here's my bisect log, ending up at commit
e61da20a50d21725ff27571a6dff9468e4fb7146

git bisect start 'fs'
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 
'for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6
# good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: 
consolidate code
git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277
# bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 
'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1
# good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback
restarting
git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711
# bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up
autofs4_free_ino()
git bisect bad 6651149371b842715906311b4631b8489cebf7e8
# good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and 
use it
instead of the system workqueue
git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b
# good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use 
d_automount()
rather than abusing follow_link()
git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f
# good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add 
d_manage()
dentry operation
git bisect good b5b801779d59165c4ecf1009009109545bd1f642
# bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up 
inode
operations
git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146
# good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove 
unused code
git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606
   
   I can more or less reproduce this at will now, I think even with very
   few NFS operations on an automounted nfsv4 mount. Here's an oops from a
   2.6.39 kernel:
   
   [  119.419789] tun0: Features changed: 0x4800 - 0x4000
   [  178.242917] FS-Cache: Netfs 'nfs' registered for caching
   [  178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses 
   genfs_contexts
   [  178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses 
   genfs_contexts
   [  523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) 
   [unmount of nfs4 0:2c]
   [  523.953306] [ cut here ]
   [  523.954013] kernel BUG at fs/dcache.c:925!
   [  523.954013] invalid opcode:  [#1] SMP 
   [  523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent
   [  523.954013] CPU 1 
   [  523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse 
   ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache 
   cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs 
   snd_hda_codec_hdmi