Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On Mon 13-08-18 12:12:52, Jan Kara wrote: > On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote: > > The generic/344 failure seems to be caused by a WARNING triggered in > > the nvdimm code: > > OK, apparently this is nothing new for you as generic/344 fails for you > even with 3.17. But it should not :). I'll try to see if I can reproduce > this in my test setup during more test runs (I don't remember seeing it > during occasional runs I do) and debug it further. I could reproduce this relatively easily but it took me quite a while to figure out how to best fix this. I'll send the fix shortly (mm: Fix warning in insert_pfn()). Honza -- Jan Kara SUSE Labs, CR ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On Mon, Aug 13, 2018 at 12:12:52PM +0200, Jan Kara wrote: > > The generic/081 regression appears to be a device-mapper issue... > > I'll see if this reproduces for me. Doesn't seem to be related to the DAX > patches you caary though. It does seem to be a DAX-specific failure though. > > The generic/344 failure seems to be caused by a WARNING triggered in > > the nvdimm code: > > OK, apparently this is nothing new for you as generic/344 fails for you > even with 3.17. But it should not :). I'll try to see if I can reproduce > this in my test setup during more test runs (I don't remember seeing it > during occasional runs I do) and debug it further. Thanks! In case it wasn't clear, I wasn't planning on letting these failures prevent the patches from going upstream. As you say, the generic/081 failure looks unrelated to ext4, and the generic/344 isn't a regression. - Ted ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote: > On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote: > > I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with > > CONFIG_KASAN disabled, and I expect it shouldn't show up anything > > concerning. So assuming nothing surprising pops up, yes it should be > > merged at the next merge window. > > ... and here are the results. The first is 4.17, and the second is > the ext4 git tree: > > ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds > Failures: ext4/033 generic/344 generic/491 generic/503 > > ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds > Failures: generic/081 generic/344 generic/388 > > The generic/388 failure is a known flake (shutdown stress test). > > The generic/081 regression appears to be a device-mapper issue: > > generic/081 [22:06:33][ 15.079661] run fstests generic/081 at > 2018-08-10 22:06:33 > [ 15.795745] device-mapper: ioctl: can't change device type (old=4 vs > new=1) after initial table load. > [failed, exit status 1] [22:06:36]- output mismatch (see > /results/ext4/results-dax/generic/081.out.bad) > --- tests/generic/081.out 2018-08-09 18:00:42.0 -0400 > +++ /results/ext4/results-dax/generic/081.out.bad 2018-08-10 > 22:06:36.440005460 -0400 > @@ -1,2 +1,4 @@ > QA output created by 081 > Silence is golden > +Failed to create snapshot > +(see /results/ext4/results-dax/generic/081.full for details) > ... > (Run 'diff -u tests/generic/081.out > /results/ext4/results-dax/generic/081.out.bad' to see the entire diff) I'll see if this reproduces for me. Doesn't seem to be related to the DAX patches you caary though. > The generic/344 failure seems to be caused by a WARNING triggered in > the nvdimm code: OK, apparently this is nothing new for you as generic/344 fails for you even with 3.17. But it should not :). I'll try to see if I can reproduce this in my test setup during more test runs (I don't remember seeing it during occasional runs I do) and debug it further. Honza > generic/344 [22:06:36][ 18.126280] run fstests generic/344 at > 2018-08-10 22:06:36 > [ 18.303113] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at > your own risk > [ 18.456988] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at > your own risk > [ 97.375912] WARNING: CPU: 2 PID: 1712 at > /usr/projects/linux/ext4/mm/memory.c:1801 insert_pfn+0x15a/0x170 > [ 97.377261] CPU: 2 PID: 1712 Comm: holetest Not tainted > 4.18.0-rc4-xfstests-00039-g863c37fcb14f #497 > [ 97.378486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.11.1-1 04/01/2014 > [ 97.379516] RIP: 0010:insert_pfn+0x15a/0x170 > [ 97.380064] Code: 19 1b 01 eb dd 48 85 d2 74 07 48 23 1d 3f 19 1b 01 48 09 > df 48 89 f8 0f 1f 40 00 48 b9 00 02 00 00 00 00 00 04 48 09 c1 eb c8 <0f> 0b > e9 5e ff ff ff bb f4 ff ff ff e9 5e ff ff ff e8 80 7b ec ff > [ 97.382469] RSP: :acfd0457fc60 EFLAGS: 00010206 > [ 97.383123] RAX: 00179e3b RBX: fff0 RCX: > 0002 > [ 97.384062] RDX: 000f RSI: 8f5c28f5c28f5c29 RDI: > 800179e3b225 > [ 97.385134] RBP: 923761654558 R08: 0001 R09: > 0001 > [ 97.386213] R10: 92376f415cc0 R11: 0001 R12: > 92377e880cc0 > [ 97.387264] R13: 7fab98aab000 R14: 003e860d R15: > 92376156 > [ 97.388327] FS: 7fab9049c700() GS:92377f20() > knlGS: > [ 97.389514] CS: 0010 DS: ES: CR0: 80050033 > [ 97.390367] CR2: 7fab98aabc00 CR3: 0006a165a004 CR4: > 00360ee0 > [ 97.391432] Call Trace: > [ 97.391798] __vm_insert_mixed+0x7e/0xc0 > [ 97.392376] vmf_insert_mixed_mkwrite+0xf/0x30 > [ 97.393048] dax_iomap_pte_fault+0xb8b/0xe40 > [ 97.393691] ext4_dax_huge_fault+0x145/0x200 > [ 97.394268] do_wp_page+0x175/0x5b0 > [ 97.394710] __handle_mm_fault+0x587/0xbb0 > [ 97.395228] __do_page_fault+0x20c/0x490 > [ 97.395729] ? async_page_fault+0x8/0x30 > [ 97.396251] async_page_fault+0x1e/0x30 > [ 97.396719] RIP: 0033:0x5598144275ea > [ 97.397187] Code: 0f 85 8a 00 00 00 31 d2 48 85 db 4b 8d 04 34 7e 1f 0f 1f > 80 00 00 00 00 48 89 d1 48 83 c2 01 48 0f af 0d 71 1b 20 00 48 39 d3 <48> 89 > 2c 08 75 e8 8b 0d 36 1b 20 00 31 c0 85 c9 74 0a 8b 15 2e 1b > [ 97.399752] RSP: 002b:7fab9049bf20 EFLAGS: 00010212 > [ 97.400541] RAX: 7fab90c9ec00 RBX: 0001 RCX: > 07e0d000 > [ 97.401603] RDX: 7e0e RSI: RDI: > 0001 > [ 97.402673] RBP: 7fab9049c700 R08: 7fab9049c700 R09: > 7fab9049c700 > [ 97.403755] R10: 7fab9049c9d0 R11: 0202 R12: > 7fab90c9e000 > [ 97.404851] R13: 7ffc4608c9e0 R14: 0c00 R15: > 55981608e250 >
Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote: > I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with > CONFIG_KASAN disabled, and I expect it shouldn't show up anything > concerning. So assuming nothing surprising pops up, yes it should be > merged at the next merge window. ... and here are the results. The first is 4.17, and the second is the ext4 git tree: ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds Failures: ext4/033 generic/344 generic/491 generic/503 ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds Failures: generic/081 generic/344 generic/388 The generic/388 failure is a known flake (shutdown stress test). The generic/081 regression appears to be a device-mapper issue: generic/081 [22:06:33][ 15.079661] run fstests generic/081 at 2018-08-10 22:06:33 [ 15.795745] device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. [failed, exit status 1] [22:06:36]- output mismatch (see /results/ext4/results-dax/generic/081.out.bad) --- tests/generic/081.out 2018-08-09 18:00:42.0 -0400 +++ /results/ext4/results-dax/generic/081.out.bad 2018-08-10 22:06:36.440005460 -0400 @@ -1,2 +1,4 @@ QA output created by 081 Silence is golden +Failed to create snapshot +(see /results/ext4/results-dax/generic/081.full for details) ... (Run 'diff -u tests/generic/081.out /results/ext4/results-dax/generic/081.out.bad' to see the entire diff) The generic/344 failure seems to be caused by a WARNING triggered in the nvdimm code: generic/344 [22:06:36][ 18.126280] run fstests generic/344 at 2018-08-10 22:06:36 [ 18.303113] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [ 18.456988] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [ 97.375912] WARNING: CPU: 2 PID: 1712 at /usr/projects/linux/ext4/mm/memory.c:1801 insert_pfn+0x15a/0x170 [ 97.377261] CPU: 2 PID: 1712 Comm: holetest Not tainted 4.18.0-rc4-xfstests-00039-g863c37fcb14f #497 [ 97.378486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 [ 97.379516] RIP: 0010:insert_pfn+0x15a/0x170 [ 97.380064] Code: 19 1b 01 eb dd 48 85 d2 74 07 48 23 1d 3f 19 1b 01 48 09 df 48 89 f8 0f 1f 40 00 48 b9 00 02 00 00 00 00 00 04 48 09 c1 eb c8 <0f> 0b e9 5e ff ff ff bb f4 ff ff ff e9 5e ff ff ff e8 80 7b ec ff [ 97.382469] RSP: :acfd0457fc60 EFLAGS: 00010206 [ 97.383123] RAX: 00179e3b RBX: fff0 RCX: 0002 [ 97.384062] RDX: 000f RSI: 8f5c28f5c28f5c29 RDI: 800179e3b225 [ 97.385134] RBP: 923761654558 R08: 0001 R09: 0001 [ 97.386213] R10: 92376f415cc0 R11: 0001 R12: 92377e880cc0 [ 97.387264] R13: 7fab98aab000 R14: 003e860d R15: 92376156 [ 97.388327] FS: 7fab9049c700() GS:92377f20() knlGS: [ 97.389514] CS: 0010 DS: ES: CR0: 80050033 [ 97.390367] CR2: 7fab98aabc00 CR3: 0006a165a004 CR4: 00360ee0 [ 97.391432] Call Trace: [ 97.391798] __vm_insert_mixed+0x7e/0xc0 [ 97.392376] vmf_insert_mixed_mkwrite+0xf/0x30 [ 97.393048] dax_iomap_pte_fault+0xb8b/0xe40 [ 97.393691] ext4_dax_huge_fault+0x145/0x200 [ 97.394268] do_wp_page+0x175/0x5b0 [ 97.394710] __handle_mm_fault+0x587/0xbb0 [ 97.395228] __do_page_fault+0x20c/0x490 [ 97.395729] ? async_page_fault+0x8/0x30 [ 97.396251] async_page_fault+0x1e/0x30 [ 97.396719] RIP: 0033:0x5598144275ea [ 97.397187] Code: 0f 85 8a 00 00 00 31 d2 48 85 db 4b 8d 04 34 7e 1f 0f 1f 80 00 00 00 00 48 89 d1 48 83 c2 01 48 0f af 0d 71 1b 20 00 48 39 d3 <48> 89 2c 08 75 e8 8b 0d 36 1b 20 00 31 c0 85 c9 74 0a 8b 15 2e 1b [ 97.399752] RSP: 002b:7fab9049bf20 EFLAGS: 00010212 [ 97.400541] RAX: 7fab90c9ec00 RBX: 0001 RCX: 07e0d000 [ 97.401603] RDX: 7e0e RSI: RDI: 0001 [ 97.402673] RBP: 7fab9049c700 R08: 7fab9049c700 R09: 7fab9049c700 [ 97.403755] R10: 7fab9049c9d0 R11: 0202 R12: 7fab90c9e000 [ 97.404851] R13: 7ffc4608c9e0 R14: 0c00 R15: 55981608e250 [ 97.405892] irq event stamp: 968 [ 97.406460] hardirqs last enabled at (967): [] _raw_spin_unlock_irq+0x29/0x40 [ 97.407826] hardirqs last disabled at (968): [] error_entry+0x7f/0x100 [ 97.409080] softirqs last enabled at (390): [] __do_softirq+0x319/0x4d9 [ 97.410363] softirqs last disabled at (383): [] irq_exit+0xc1/0xd0 [ 97.411400] ---[ end trace 69669a34a73c1a49 ]--- [ 117.726077] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [ 117.727671] EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity,dax [ 117.796623] EXT4-fs (pmem1): DAX enabled. Warning:
Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On Fri, Aug 10, 2018 at 02:52:53PM -0500, Eric Sandeen wrote: > > Hi Ted, hadn't seem feedback from you on this by the time it gathered reviews, > is this something you plan to merge for realz? (I see it's on your dev > branch now, just not sure of its permanence at this point.) Yes, the dev branch is pretty much locked down since I assume the merge window is openning over the weekend. The reason why I've been silent is because I haven't had a chance to do was a test run with DAX, because up until recently I wasn't able to run a DAX regression test run. (That's because of the CONFIG_KASAN incompatibility with CONFIG_ZONE_DEVICE that caused the kernel to instantly blow up on boot if I tried to enable emulated /dev/pmem devices.) I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with CONFIG_KASAN disabled, and I expect it shouldn't show up anything concerning. So assuming nothing surprising pops up, yes it should be merged at the next merge window. - Ted ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
On 7/10/18 2:10 PM, Ross Zwisler wrote: > Inodes using DAX should only ever have exceptional entries in their page > caches. Make this clear by warning if the iteration in > dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a > comment for the pagevec_release() call which only deals with struct page > pointers. > > Signed-off-by: Ross Zwisler > Reviewed-by: Jan Kara Hi Ted, hadn't seem feedback from you on this by the time it gathered reviews, is this something you plan to merge for realz? (I see it's on your dev branch now, just not sure of its permanence at this point.) Thanks, -Eric > --- > fs/dax.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 641192808bb6..897b51e41d8f 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -566,7 +566,8 @@ struct page *dax_layout_busy_page(struct address_space > *mapping) > if (index >= end) > break; > > - if (!radix_tree_exceptional_entry(pvec_ent)) > + if (WARN_ON_ONCE( > + !radix_tree_exceptional_entry(pvec_ent))) > continue; > > xa_lock_irq(>i_pages); > @@ -578,6 +579,13 @@ struct page *dax_layout_busy_page(struct address_space > *mapping) > if (page) > break; > } > + > + /* > + * We don't expect normal struct page entries to exist in our > + * tree, but we keep these pagevec calls so that this code is > + * consistent with the common pattern for handling pagevecs > + * throughout the kernel. > + */ > pagevec_remove_exceptionals(); > pagevec_release(); > index++; > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional
Inodes using DAX should only ever have exceptional entries in their page caches. Make this clear by warning if the iteration in dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a comment for the pagevec_release() call which only deals with struct page pointers. Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara --- fs/dax.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/dax.c b/fs/dax.c index 641192808bb6..897b51e41d8f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -566,7 +566,8 @@ struct page *dax_layout_busy_page(struct address_space *mapping) if (index >= end) break; - if (!radix_tree_exceptional_entry(pvec_ent)) + if (WARN_ON_ONCE( +!radix_tree_exceptional_entry(pvec_ent))) continue; xa_lock_irq(>i_pages); @@ -578,6 +579,13 @@ struct page *dax_layout_busy_page(struct address_space *mapping) if (page) break; } + + /* +* We don't expect normal struct page entries to exist in our +* tree, but we keep these pagevec calls so that this code is +* consistent with the common pattern for handling pagevecs +* throughout the kernel. +*/ pagevec_remove_exceptionals(); pagevec_release(); index++; -- 2.14.4 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm