Hi all,

I'm debugging the issue from block layer and brd.c code, after initial 
investigation brd.c:brd_make_request() seems to be generating the error at 
brd.c:brd_make_request() :-

if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
                if (sector & ((PAGE_SIZE >> SECTOR_SHIFT) - 1) ||
                    bio->bi_iter.bi_size & ~PAGE_MASK)
                        printk(KERN_ERR "ERROR : %s %s %d\n", __FILE__, 
__func__, __LINE__);
                        goto io_error;

<----------------------------------------------------------------------------------------------------------------------------------------------->
[  960.726281] ERROR : /home/neo/work/linux-4.10-rc1/drivers/block/brd.c 
brd_make_request 348
[  960.726287] CPU: 3 PID: 13185 Comm: xfs_io Tainted: G           OE   
4.10.0-rc1 #4
[  960.726289] Hardware name: Supermicro X10SAE/X10SAE, BIOS 2.0a 05/09/2014
[  960.726290] Call Trace:
[  960.726297]  dump_stack+0x63/0x87
[  960.726302]  brd_make_request+0x18d/0x224 [brd]
[  960.726306]  ? bio_alloc_bioset+0x1a2/0x2e0
[  960.726310]  generic_make_request+0x103/0x210
[  960.726313]  submit_bio+0x75/0x150
[  960.726316]  submit_bio_wait+0x60/0x90
[  960.726322]  blkdev_issue_zeroout+0x77/0xc0
[  960.726327]  __dax_zero_page_range+0x9d/0x150
[  960.726332]  iomap_zero_range_actor+0x8d/0x1f0
[  960.726337]  ? iomap_fiemap_actor+0x80/0x80
[  960.726340]  iomap_apply+0xb3/0x130
[  960.726345]  iomap_zero_range+0x58/0x80
[  960.726349]  ? iomap_fiemap_actor+0x80/0x80
[  960.726369]  ext4_block_zero_page_range+0x334/0x4f0 [ext4]
[  960.726398]  ? ext4_fallocate+0x498/0x8b0 [ext4]
[  960.726418]  ext4_zero_partial_blocks+0xcd/0xe0 [ext4]
[  960.726443]  ext4_fallocate+0x4a9/0x8b0 [ext4]
[  960.726449]  vfs_fallocate+0x155/0x220
[  960.726454]  SyS_fallocate+0x44/0x70
[  960.726459]  do_syscall_64+0x67/0x180
[  960.726464]  entry_SYSCALL64_slow_path+0x25/0x25
[  960.726466] RIP: 0033:0x7ff1cb1a78da
[  960.726468] RSP: 002b:00007ffd71218f98 EFLAGS: 00000246 ORIG_RAX: 
000000000000011d
[  960.726472] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007ff1cb1a78da
[  960.726474] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000003
[  960.726476] RBP: 0000000000e690d0 R08: 00007ff1cb475060 R09: 0000000000e6710c
[  960.726477] R10: 0000000000000400 R11: 0000000000000246 R12: 0000000000e67100
[  960.726479] R13: 0000000000000001 R14: 0000000000e68940 R15: 0000000000e67100
[  960.726492] ERROR ./include/linux/bio.h bio_io_error 419
<----------------------------------------------------------------------------------------------------------------------------------------------->
 
Give me some time to debug more and come up with the solution.

Regards,
-Chaitanya
  
From: Eryu Guan <eg...@redhat.com>
Sent: Friday, January 6, 2017 2:32:39 AM
To: linux-block@vger.kernel.org
Cc: Chaitanya Kulkarni; linux-...@vger.kernel.org; linux-e...@vger.kernel.org
Subject: [BUG v4.10-rc1] fzero returns EIO on DAX mount
    
Hi all,

I hit a regression in my xfstests run on DAX mount with 4.10-rc1 and rc2
kernel, some fzero/fpunch/ftruncate operations start returning EIO, extN
and xfs are all affected. 4.9 kernel is doing well.

This is a simple reproducer:

        modprobe brd rd_size=$((1024*1024))
        mkfs -t ext4 -F /dev/ram0
        mount -o dax /dev/ram0 /mnt/ext4
        xfs_io -fc "pwrite 0 1024" -c "fzero 0 1024" /mnt/ext4/testfile

The last xfs_io returns:
        wrote 1024/1024 bytes at offset 0
        1 KiB, 1 ops; 0.0000 sec (17.013 KiB/sec and 17.0132 ops/sec)
        fallocate: Input/output error

And git bisect pointed the first bad commit to

e73c23ff736e1ea371dfa419d7bf8e77ee53044a
Author: Chaitanya Kulkarni <chaitanya.kulka...@hgst.com>
Date:   Wed Nov 30 12:28:58 2016 -0800

    block: add async variant of blkdev_issue_zeroout

    Similar to __blkdev_issue_discard this variant allows submitting
    the final bio asynchronously and chaining multiple ranges
    into a single completion.

    Signed-off-by: Chaitanya Kulkarni <chaitanya.kulka...@hgst.com>
    Reviewed-by: Christoph Hellwig <h...@lst.de>
    Signed-off-by: Jens Axboe <ax...@fb.com>

I've confirmed that "reverting" it fixed all the new test failures, by
"reverting" I mean revert it when it's the top.

A preliminary investigation shows that it's submit_bio_wait() in
blkdev_issue_zeroout() where EIO is returned.

block/blk-lib.c:blkdev_issue_zeroout
        ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,          
                                                                                
                              
                        &bio, discard);                                         
                                                                                
                              
        if (ret == 0 && bio) {                                                  
                                                                                
                              
                ret = submit_bio_wait(bio);                                     
                                                                                
                              
                bio_put(bio);                                                   
                                                                                
                              
        }

Thanks,
Eryu

P.S. failed xfstests cases

xfs:
generic/008 generic/029 generic/030 generic/074 generic/075 generic/086
generic/091 generic/112 generic/127 generic/135 generic/231 generic/263
generic/392 xfs/071 xfs/190 xfs/229 xfs/290

ext4:
generic/008 generic/075 generic/091 generic/112 generic/127 generic/263

ext2,ext3:
generic/091 generic/127 generic/263
    --
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to