Re: some [big] changes to ZPL (ZFS<->VFS )
On r303834 I can no longer reproduce the problem. -Alan On Sat, Aug 6, 2016 at 5:05 AM, Andriy Gaponwrote: > On 05/08/2016 23:31, Alan Somers wrote: >> I'm not certain it's related, but on a head build at r303767 I see a >> LOR and a reproducible panic that involve the snapdir code. > > Alan, > > thank you very much for the clear report and for the very easy > reproduction scenario. I am not sure how I missed this simple and > severe bug. > Please try r303791, it should fix the problem. > > I believe that the LOR is not new and been there since we started using > distinct tags for .zfs special vnodes. > >> First, the LOR: >> $ zpool destroy foo >> >> lock order reversal: >> 1st 0xf800404c8b78 zfs (zfs) @ >> /usr/home/alans/freebsd/head/sys/kern/vfs_mount.c:1244 >> 2nd 0xf800404c85f0 zfs_gfs (zfs_gfs) @ >> /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/gfs.c:484 >> stack backtrace: >> #0 0x80aa90b0 at witness_debugger+0x70 >> #1 0x80aa8fa4 at witness_checkorder+0xe54 >> #2 0x80a22072 at __lockmgr_args+0x4c2 >> #3 0x80af8e7c at vop_stdlock+0x3c >> #4 0x81018880 at VOP_LOCK1_APV+0xe0 >> #5 0x80b19f2a at _vn_lock+0x9a >> #6 0x821b9c53 at gfs_file_create+0x73 >> #7 0x821b9cfd at gfs_dir_create+0x1d >> #8 0x8228aa07 at zfsctl_mknode_snapdir+0x47 >> #9 0x821ba1a5 at gfs_dir_lookup+0x185 >> #10 0x821ba68d at gfs_vop_lookup+0x1d >> #11 0x82289a42 at zfsctl_root_lookup+0xf2 >> #12 0x8228a8c3 at zfsctl_umount_snapshots+0x83 >> #13 0x822a1d2b at zfs_umount+0x7b >> #14 0x80b02a14 at dounmount+0x6f4 >> #15 0x80b0228d at sys_unmount+0x35d >> #16 0x80ebbb7b at amd64_syscall+0x2db >> #17 0x80e9b72b at Xfast_syscall+0xfb >> >> >> Here's the panic: >> $ zpool create testpool da0 >> $ touch /testpool/testfile >> $ zfs snapshot testpool@testsnap >> $ cd /testpool/.zfs/snapshots >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 04 >> fault virtual address = 0x8 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0x80b19f1c >> stack pointer = 0x28:0xfe0b54bf7430 >> frame pointer = 0x28:0xfe0b54bf74a0 >> code segment= base 0x0, limit 0xf, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags= interrupt enabled, resume, IOPL = 0 >> current process = 966 (bash) >> trap number = 12 >> panic: page fault >> cpuid = 2 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe0b54bf6fc0 >> vpanic() at vpanic+0x182/frame 0xfe0b54bf7040 >> panic() at panic+0x43/frame 0xfe0b54bf70a0 >> trap_fatal() at trap_fatal+0x351/frame 0xfe0b54bf7100 >> trap_pfault() at trap_pfault+0x1fd/frame 0xfe0b54bf7160 >> trap() at trap+0x284/frame 0xfe0b54bf7370 >> calltrap() at calltrap+0x8/frame 0xfe0b54bf7370 >> --- trap 0xc, rip = 0x80b19f1c, rsp = 0xfe0b54bf7440, rbp >> = 0xfe0b54bf74a0 --- >> _vn_lock() at _vn_lock+0x8c/frame 0xfe0b54bf74a0 >> zfs_lookup() at zfs_lookup+0x50d/frame 0xfe0b54bf7540 >> zfs_freebsd_lookup() at zfs_freebsd_lookup+0x91/frame 0xfe0b54bf7680 >> VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xda/frame 0xfe0b54bf76b0 >> vfs_cache_lookup() at vfs_cache_lookup+0xd6/frame 0xfe0b54bf7710 >> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xda/frame 0xfe0b54bf7740 >> lookup() at lookup+0x5a2/frame 0xfe0b54bf77d0 >> namei() at namei+0x5b2/frame 0xfe0b54bf7890 >> kern_statat() at kern_statat+0xa8/frame 0xfe0b54bf7a40 >> sys_stat() at sys_stat+0x2d/frame 0xfe0b54bf7ae0 >> amd64_syscall() at amd64_syscall+0x2db/frame 0xfe0b54bf7bf0 >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe0b54bf7bf0 >> >> >> I can provide core files, test scripts, whatever you need. Thanks for >> tackling this difficult problem. >> >> -Alan >> >> On Fri, Aug 5, 2016 at 12:36 AM, Andriy Gapon wrote: >>> On 03/08/2016 17:25, Andriy Gapon wrote: Another change that was not strictly required and which is probably too intrusive is killing the support for case insensitive operations. My thinking was that FreeBSD VFS does not provide support for those anyway. But I'll probably restore the code, at least in the bottom half of the ZPL, before committing the change. >>> >>> It turned out that most of the removed code was dead anyway and it took >>> just a few lines of code to restore support for case-insensitive >>> filesystems. Filesystems with mixed case sensitivity behave exactly the >>> same as case-sensitive filesystem as it has always been the case on FreeBSD. >>> >>> Anyway the big change has just been committed: >>> https://svnweb.freebsd.org/changeset/base/303763 >>> Please test away. >>> >>> Another
Re: some [big] changes to ZPL (ZFS<->VFS )
On 05/08/2016 23:31, Alan Somers wrote: > I'm not certain it's related, but on a head build at r303767 I see a > LOR and a reproducible panic that involve the snapdir code. Alan, thank you very much for the clear report and for the very easy reproduction scenario. I am not sure how I missed this simple and severe bug. Please try r303791, it should fix the problem. I believe that the LOR is not new and been there since we started using distinct tags for .zfs special vnodes. > First, the LOR: > $ zpool destroy foo > > lock order reversal: > 1st 0xf800404c8b78 zfs (zfs) @ > /usr/home/alans/freebsd/head/sys/kern/vfs_mount.c:1244 > 2nd 0xf800404c85f0 zfs_gfs (zfs_gfs) @ > /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/gfs.c:484 > stack backtrace: > #0 0x80aa90b0 at witness_debugger+0x70 > #1 0x80aa8fa4 at witness_checkorder+0xe54 > #2 0x80a22072 at __lockmgr_args+0x4c2 > #3 0x80af8e7c at vop_stdlock+0x3c > #4 0x81018880 at VOP_LOCK1_APV+0xe0 > #5 0x80b19f2a at _vn_lock+0x9a > #6 0x821b9c53 at gfs_file_create+0x73 > #7 0x821b9cfd at gfs_dir_create+0x1d > #8 0x8228aa07 at zfsctl_mknode_snapdir+0x47 > #9 0x821ba1a5 at gfs_dir_lookup+0x185 > #10 0x821ba68d at gfs_vop_lookup+0x1d > #11 0x82289a42 at zfsctl_root_lookup+0xf2 > #12 0x8228a8c3 at zfsctl_umount_snapshots+0x83 > #13 0x822a1d2b at zfs_umount+0x7b > #14 0x80b02a14 at dounmount+0x6f4 > #15 0x80b0228d at sys_unmount+0x35d > #16 0x80ebbb7b at amd64_syscall+0x2db > #17 0x80e9b72b at Xfast_syscall+0xfb > > > Here's the panic: > $ zpool create testpool da0 > $ touch /testpool/testfile > $ zfs snapshot testpool@testsnap > $ cd /testpool/.zfs/snapshots > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 04 > fault virtual address = 0x8 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80b19f1c > stack pointer = 0x28:0xfe0b54bf7430 > frame pointer = 0x28:0xfe0b54bf74a0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 966 (bash) > trap number = 12 > panic: page fault > cpuid = 2 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0b54bf6fc0 > vpanic() at vpanic+0x182/frame 0xfe0b54bf7040 > panic() at panic+0x43/frame 0xfe0b54bf70a0 > trap_fatal() at trap_fatal+0x351/frame 0xfe0b54bf7100 > trap_pfault() at trap_pfault+0x1fd/frame 0xfe0b54bf7160 > trap() at trap+0x284/frame 0xfe0b54bf7370 > calltrap() at calltrap+0x8/frame 0xfe0b54bf7370 > --- trap 0xc, rip = 0x80b19f1c, rsp = 0xfe0b54bf7440, rbp > = 0xfe0b54bf74a0 --- > _vn_lock() at _vn_lock+0x8c/frame 0xfe0b54bf74a0 > zfs_lookup() at zfs_lookup+0x50d/frame 0xfe0b54bf7540 > zfs_freebsd_lookup() at zfs_freebsd_lookup+0x91/frame 0xfe0b54bf7680 > VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xda/frame 0xfe0b54bf76b0 > vfs_cache_lookup() at vfs_cache_lookup+0xd6/frame 0xfe0b54bf7710 > VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xda/frame 0xfe0b54bf7740 > lookup() at lookup+0x5a2/frame 0xfe0b54bf77d0 > namei() at namei+0x5b2/frame 0xfe0b54bf7890 > kern_statat() at kern_statat+0xa8/frame 0xfe0b54bf7a40 > sys_stat() at sys_stat+0x2d/frame 0xfe0b54bf7ae0 > amd64_syscall() at amd64_syscall+0x2db/frame 0xfe0b54bf7bf0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe0b54bf7bf0 > > > I can provide core files, test scripts, whatever you need. Thanks for > tackling this difficult problem. > > -Alan > > On Fri, Aug 5, 2016 at 12:36 AM, Andriy Gaponwrote: >> On 03/08/2016 17:25, Andriy Gapon wrote: >>> Another change that was not strictly required and which is probably too >>> intrusive is killing the support for case insensitive operations. My >>> thinking was that FreeBSD VFS does not provide support for those anyway. >>> But I'll probably restore the code, at least in the bottom half of the >>> ZPL, before committing the change. >> >> It turned out that most of the removed code was dead anyway and it took >> just a few lines of code to restore support for case-insensitive >> filesystems. Filesystems with mixed case sensitivity behave exactly the >> same as case-sensitive filesystem as it has always been the case on FreeBSD. >> >> Anyway the big change has just been committed: >> https://svnweb.freebsd.org/changeset/base/303763 >> Please test away. >> >> Another note is that the filesystem name cache is now disabled for case >> insensitive filesystems and filesystems with normalization other than >> none. That may hurt the lookup performance, but should ensure >> correctness of operations. >> >> -- >>
Re: some [big] changes to ZPL (ZFS<->VFS )
I'm not certain it's related, but on a head build at r303767 I see a LOR and a reproducible panic that involve the snapdir code. First, the LOR: $ zpool destroy foo lock order reversal: 1st 0xf800404c8b78 zfs (zfs) @ /usr/home/alans/freebsd/head/sys/kern/vfs_mount.c:1244 2nd 0xf800404c85f0 zfs_gfs (zfs_gfs) @ /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/gfs.c:484 stack backtrace: #0 0x80aa90b0 at witness_debugger+0x70 #1 0x80aa8fa4 at witness_checkorder+0xe54 #2 0x80a22072 at __lockmgr_args+0x4c2 #3 0x80af8e7c at vop_stdlock+0x3c #4 0x81018880 at VOP_LOCK1_APV+0xe0 #5 0x80b19f2a at _vn_lock+0x9a #6 0x821b9c53 at gfs_file_create+0x73 #7 0x821b9cfd at gfs_dir_create+0x1d #8 0x8228aa07 at zfsctl_mknode_snapdir+0x47 #9 0x821ba1a5 at gfs_dir_lookup+0x185 #10 0x821ba68d at gfs_vop_lookup+0x1d #11 0x82289a42 at zfsctl_root_lookup+0xf2 #12 0x8228a8c3 at zfsctl_umount_snapshots+0x83 #13 0x822a1d2b at zfs_umount+0x7b #14 0x80b02a14 at dounmount+0x6f4 #15 0x80b0228d at sys_unmount+0x35d #16 0x80ebbb7b at amd64_syscall+0x2db #17 0x80e9b72b at Xfast_syscall+0xfb Here's the panic: $ zpool create testpool da0 $ touch /testpool/testfile $ zfs snapshot testpool@testsnap $ cd /testpool/.zfs/snapshots Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80b19f1c stack pointer = 0x28:0xfe0b54bf7430 frame pointer = 0x28:0xfe0b54bf74a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 966 (bash) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0b54bf6fc0 vpanic() at vpanic+0x182/frame 0xfe0b54bf7040 panic() at panic+0x43/frame 0xfe0b54bf70a0 trap_fatal() at trap_fatal+0x351/frame 0xfe0b54bf7100 trap_pfault() at trap_pfault+0x1fd/frame 0xfe0b54bf7160 trap() at trap+0x284/frame 0xfe0b54bf7370 calltrap() at calltrap+0x8/frame 0xfe0b54bf7370 --- trap 0xc, rip = 0x80b19f1c, rsp = 0xfe0b54bf7440, rbp = 0xfe0b54bf74a0 --- _vn_lock() at _vn_lock+0x8c/frame 0xfe0b54bf74a0 zfs_lookup() at zfs_lookup+0x50d/frame 0xfe0b54bf7540 zfs_freebsd_lookup() at zfs_freebsd_lookup+0x91/frame 0xfe0b54bf7680 VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xda/frame 0xfe0b54bf76b0 vfs_cache_lookup() at vfs_cache_lookup+0xd6/frame 0xfe0b54bf7710 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xda/frame 0xfe0b54bf7740 lookup() at lookup+0x5a2/frame 0xfe0b54bf77d0 namei() at namei+0x5b2/frame 0xfe0b54bf7890 kern_statat() at kern_statat+0xa8/frame 0xfe0b54bf7a40 sys_stat() at sys_stat+0x2d/frame 0xfe0b54bf7ae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfe0b54bf7bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe0b54bf7bf0 I can provide core files, test scripts, whatever you need. Thanks for tackling this difficult problem. -Alan On Fri, Aug 5, 2016 at 12:36 AM, Andriy Gaponwrote: > On 03/08/2016 17:25, Andriy Gapon wrote: >> Another change that was not strictly required and which is probably too >> intrusive is killing the support for case insensitive operations. My >> thinking was that FreeBSD VFS does not provide support for those anyway. >> But I'll probably restore the code, at least in the bottom half of the >> ZPL, before committing the change. > > It turned out that most of the removed code was dead anyway and it took > just a few lines of code to restore support for case-insensitive > filesystems. Filesystems with mixed case sensitivity behave exactly the > same as case-sensitive filesystem as it has always been the case on FreeBSD. > > Anyway the big change has just been committed: > https://svnweb.freebsd.org/changeset/base/303763 > Please test away. > > Another note is that the filesystem name cache is now disabled for case > insensitive filesystems and filesystems with normalization other than > none. That may hurt the lookup performance, but should ensure > correctness of operations. > > -- > Andriy Gapon > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: some [big] changes to ZPL (ZFS<->VFS )
On 03/08/2016 17:25, Andriy Gapon wrote: > Another change that was not strictly required and which is probably too > intrusive is killing the support for case insensitive operations. My > thinking was that FreeBSD VFS does not provide support for those anyway. > But I'll probably restore the code, at least in the bottom half of the > ZPL, before committing the change. It turned out that most of the removed code was dead anyway and it took just a few lines of code to restore support for case-insensitive filesystems. Filesystems with mixed case sensitivity behave exactly the same as case-sensitive filesystem as it has always been the case on FreeBSD. Anyway the big change has just been committed: https://svnweb.freebsd.org/changeset/base/303763 Please test away. Another note is that the filesystem name cache is now disabled for case insensitive filesystems and filesystems with normalization other than none. That may hurt the lookup performance, but should ensure correctness of operations. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
some [big] changes to ZPL (ZFS<->VFS )
I would like to get more comments about the following review request: https://reviews.freebsd.org/D6533 Both about the code changes and about the general direction of the changes. I've been asked to try to get this change into 11.0 because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209158 So, there is some urgency to my request. To be honest I've never wanted to rush in this change, but I probably have been sitting on it for too long. ZFS POSIX Layer is originally written for Solaris VFS which is very different from FreeBSD VFS. Many things that FreeBSD VFS manages on behalf of all filesystems are also (re-)implemented in ZPL in a different way. As a result, ZPL contains code that is redundant on FreeBSD or duplicates VFS functionality or, in the worst cases, badly interacts / interferes with VFS. I am talking mostly about the operations that work on a filesystem namespace rather than on contents and properties of files. The former operations typically work on 2 or more vnodes. The vnodes are usually locked in a particular way or are expected to be locked in a particular way by the filesystem. ZPL on the other hand has quite an elaborate locking system of its own. It includes a lock that protects a znode, a lock that protects the znode's notion of its parent, locks that protect a name to znode relationship for the znode's children. So, we have two competing locking systems for the filesystem nodes and they are not aware of each other. The most prominent problem is a deadlock caused by the lock order reversal of vnode locks that may happen with concurrent zfs_rename() and lookup(). The deadlock is a result of zfs_rename() not observing the vnode locking contract expected by the VFS. I should mention here that VOP_RENAME is the weirdest method with respect to locking requirements. Every filesystem implementation is expected to perform quite an elaborate locking "dance" in VOP_RENAME. This is in contrast to all other operations where all necessary locking is done by the VFS before calling into the filesystem code. So, I started by fixing zfs_rename() but that required to make some changes to the "bottom half" of the ZPL (the "upper half" being the code that implements the vnode and VFS operations). And that required making changes in other VOPs anyway. And so the next step was to remove redundant re-lookup of child nodes in methods like zfs_rmdir and zfs_remove. Before I could stop I removed all ZPL internal locking that was supposed to protect parent-child relationships of filesystem nodes. Those relationships are now protected exclusively by the vnode locks and the code is changed to take advantage of that fact and to properly interact with VFS. As a minor detail, the removal of the internal locking allowed all ZPL dmu_tx_assign calls to use TXG_WAIT mode. Another change that was not strictly required and which is probably too intrusive is killing the support for case insensitive operations. My thinking was that FreeBSD VFS does not provide support for those anyway. But I'll probably restore the code, at least in the bottom half of the ZPL, before committing the change. On a more general level, I decided that the upper half of ZPL would necessarily be quite different between different VFS models and so it does not make much sense to keep huge chunks of ifdef-ed out code (or compiled in, but never reached code) that's not useful in FreeBSD at all. I think that keeping that code makes it harder to maintain the FreeBSD code and to _correctly_ merge upstream changes. E.g., if a change can be merged without conflicts to an ifdef-ed out blocked, that does not mean that the merge is correct. Last, but not least, the work was sponsored by HybridCluster / ClusterHQ Inc back in 2013. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"