Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount
On Thu, May 26, 2011 at 8:02 AM, Ian Kent ra...@themaw.net wrote: On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote: On Wed, 25 May 2011 16:08:15 -0400 Jeff Layton jlay...@redhat.com wrote: On Wed, 27 Apr 2011 16:23:07 -0700 Mark Moseley moseleym...@gmail.com wrote: I posted this to bugzilla a while back but I figured I'd paste it here too: - I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :) Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit. I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean). Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 git bisect start 'fs' # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1 git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470 # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6 # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277 # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1 # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711 # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino() git bisect bad 6651149371b842715906311b4631b8489cebf7e8 # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link() git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation git bisect good b5b801779d59165c4ecf1009009109545bd1f642 # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146 # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606 I can more or less reproduce this at will now, I think even with very few NFS operations on an automounted nfsv4 mount. Here's an oops from a 2.6.39 kernel: [ 119.419789] tun0: Features changed: 0x4800 - 0x4000 [ 178.242917] FS-Cache: Netfs 'nfs' registered for caching [ 178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses genfs_contexts [ 178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses genfs_contexts [ 523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c] [ 523.953306] [ cut here ] [ 523.954013] kernel BUG at fs/dcache.c:925! [ 523.954013] invalid opcode: [#1] SMP [ 523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent [ 523.954013] CPU 1 [ 523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi
Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount
On Wed, 25 May 2011 16:08:15 -0400 Jeff Layton jlay...@redhat.com wrote: On Wed, 27 Apr 2011 16:23:07 -0700 Mark Moseley moseleym...@gmail.com wrote: I posted this to bugzilla a while back but I figured I'd paste it here too: - I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :) Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit. I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean). Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 git bisect start 'fs' # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1 git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470 # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6 # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277 # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1 # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711 # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino() git bisect bad 6651149371b842715906311b4631b8489cebf7e8 # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link() git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation git bisect good b5b801779d59165c4ecf1009009109545bd1f642 # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146 # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606 I can more or less reproduce this at will now, I think even with very few NFS operations on an automounted nfsv4 mount. Here's an oops from a 2.6.39 kernel: [ 119.419789] tun0: Features changed: 0x4800 - 0x4000 [ 178.242917] FS-Cache: Netfs 'nfs' registered for caching [ 178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses genfs_contexts [ 178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses genfs_contexts [ 523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c] [ 523.953306] [ cut here ] [ 523.954013] kernel BUG at fs/dcache.c:925! [ 523.954013] invalid opcode: [#1] SMP [ 523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent [ 523.954013] CPU 1 [ 523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media sp5100_tco v4l2_compat_ioctl32 edac_core parport_pc snd_page_alloc i2c_piix4 edac_mce_amd k10temp parport wmi r8169 microcode mii virtio_net kvm_amd kvm ipv6 ata_generic
Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount
On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote: On Wed, 25 May 2011 16:08:15 -0400 Jeff Layton jlay...@redhat.com wrote: On Wed, 27 Apr 2011 16:23:07 -0700 Mark Moseley moseleym...@gmail.com wrote: I posted this to bugzilla a while back but I figured I'd paste it here too: - I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :) Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit. I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean). Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 git bisect start 'fs' # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1 git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470 # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6 # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277 # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1 # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711 # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino() git bisect bad 6651149371b842715906311b4631b8489cebf7e8 # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link() git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation git bisect good b5b801779d59165c4ecf1009009109545bd1f642 # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146 # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606 I can more or less reproduce this at will now, I think even with very few NFS operations on an automounted nfsv4 mount. Here's an oops from a 2.6.39 kernel: [ 119.419789] tun0: Features changed: 0x4800 - 0x4000 [ 178.242917] FS-Cache: Netfs 'nfs' registered for caching [ 178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses genfs_contexts [ 178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses genfs_contexts [ 523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c] [ 523.953306] [ cut here ] [ 523.954013] kernel BUG at fs/dcache.c:925! [ 523.954013] invalid opcode: [#1] SMP [ 523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent [ 523.954013] CPU 1 [ 523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media
Re: [autofs] BUG() in shrink_dcache_for_umount_subtree on nfs4 mount
On Thu, 26 May 2011 23:02:32 +0800 Ian Kent ra...@themaw.net wrote: On Thu, 2011-05-26 at 09:49 -0400, Jeff Layton wrote: On Wed, 25 May 2011 16:08:15 -0400 Jeff Layton jlay...@redhat.com wrote: On Wed, 27 Apr 2011 16:23:07 -0700 Mark Moseley moseleym...@gmail.com wrote: I posted this to bugzilla a while back but I figured I'd paste it here too: - I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :) Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit. I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean). Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 git bisect start 'fs' # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1 git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470 # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6 # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277 # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1 # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711 # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino() git bisect bad 6651149371b842715906311b4631b8489cebf7e8 # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link() git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation git bisect good b5b801779d59165c4ecf1009009109545bd1f642 # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146 # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606 I can more or less reproduce this at will now, I think even with very few NFS operations on an automounted nfsv4 mount. Here's an oops from a 2.6.39 kernel: [ 119.419789] tun0: Features changed: 0x4800 - 0x4000 [ 178.242917] FS-Cache: Netfs 'nfs' registered for caching [ 178.269980] SELinux: initialized (dev 0:2c, type nfs4), uses genfs_contexts [ 178.282296] SELinux: initialized (dev 0:2d, type nfs4), uses genfs_contexts [ 523.953284] BUG: Dentry 8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c] [ 523.953306] [ cut here ] [ 523.954013] kernel BUG at fs/dcache.c:925! [ 523.954013] invalid opcode: [#1] SMP [ 523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent [ 523.954013] CPU 1 [ 523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi