[PATCH RFC] btrfs: clone: Flush data before doing clone

2018-08-27 Thread Qu Wenruo
Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
can still fall back to CoW even only (unrelated) part of the
preallocated extent is shared.

This makes the follow case to do unnecessary CoW:

 # xfs_io -f -c "falloc 0 2M" $mnt/file
 # xfs_io -c "pwrite 0 1M" $mnt/file
 # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
 # sync

The pwrite will still be CoWed, since at writeback time, the
preallocated extent is already shared, btrfs_cross_ref_exist() will
return 1 and make run_delalloc_nocow() fall back to cow_file_range().

This is definitely an overkilling workaround, but this should be the
simplest way without further screwing up already complex NOCOW routine.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h |  1 +
 fs/btrfs/file.c  |  4 ++--
 fs/btrfs/ioctl.c | 21 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 53af9f5253f4..ddacc41ff124 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3228,6 +3228,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle 
*trans,
   struct btrfs_inode *inode);
 int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info);
 void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info);
+int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end);
 int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync);
 void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
 int skip_pinned);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2be00e873e92..118bfd019c6c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1999,7 +1999,7 @@ int btrfs_release_file(struct inode *inode, struct file 
*filp)
return 0;
 }
 
-static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
+int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
 {
int ret;
struct blk_plug plug;
@@ -2056,7 +2056,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
 * multi-task, and make the performance up.  See
 * btrfs_wait_ordered_range for an explanation of the ASYNC check.
 */
-   ret = start_ordered_ops(inode, start, end);
+   ret = btrfs_start_ordered_ops(inode, start, end);
if (ret)
goto out;
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 63600dc2ac4c..866979f530bc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4266,6 +4266,27 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
goto out_unlock;
}
 
+   /*
+* btrfs_cross_ref_exist() only does check at extent level,
+* we could cause unexpected NOCOW write to be COWed.
+* E.g.:
+* falloc 0 2M file1
+* pwrite 0 1M file1 (at this point it should go NOCOW)
+* reflink src=file1 srcoff=1M dst=file1 dstoff=4M len=1M
+* sync
+*
+* In above case, due to the preallocated extent is shared
+* the data at 0~1M can't go NOCOW.
+*
+* So flush the whole src inode to avoid any unneeded CoW.
+*/
+   ret = btrfs_start_ordered_ops(src, 0, -1);
+   if (ret < 0)
+   goto out_unlock;
+   ret = btrfs_wait_ordered_range(src, 0, -1);
+   if (ret < 0)
+   goto out_unlock;
+
/*
 * Lock the target range too. Right after we replace the file extent
 * items in the fs tree (which now point to the cloned data), we might
-- 
2.18.0



Re: [PATCH v2] btrfs: Always check nocow for quota enabled case to make sure we won't reserve unnecessary data space

2018-08-27 Thread Qu Wenruo
On 2018/8/24 下午4:09, Misono Tomohiro wrote:
[snip]
>>
>> BTW, what's the possibility of such problem in your test environment?
> 
> It's like one in several times.
> It may depend on hardware performance? (the machine is not so fast),
> 
> I also noticed following warning happens too (not always):
> 

After digging into the case, it's more complex than just my patch.

Firstly, we lacks a lot of underflow check when modifying bytes_may_use.
So we need to do all the underflow detection for every modifier of
bytes_may_use.

Secondly, btrfs_cross_ref_exist() check makes NODATACOW check in
__btrfs_buffered_write() unreliable.

For the following case, at __btrfs_buffered_write() time we're pretty
sure we could do NODATACOW, but at sync time, due to cloned range,
btrfs_cross_ref_exist() would detect reflinked prealloc extent, then
falls back to CoW, and finally cause bytes_may_use underflow:

---
mkfs.btrfs -f $dev > $full_log

mount $dev $mnt -o nospace_cache
btrfs quota enable $mnt
btrfs quota rescan -w $mnt

xfs_io -f -c "falloc 0 2M" $mnt/file1 > /dev/null
xfs_io -c "pwrite -b 1M 0 1M" $mnt/file1 > /dev/null
xfs_io -c "reflink $mnt/file1 1M 4M 1M" $mnt/file1 > /dev/null
sync


Even without my patch, the "pwrite" command is still CoWed, which could
be avoided.
And that's the reason my patch is causing the underflow.

To fix this, we need more accurate btrfs_cross_ref_exist() check, not
only for @disk_bytenr but also check @len.

Or we could try to flush the whole inode in clone_range() so we could go
through NOCOW routine before clone really happens.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: Scrub aborts due to corrupt leaf

2018-08-27 Thread Chris Murphy
On Mon, Aug 27, 2018 at 8:12 PM, Larkin Lowrey
 wrote:
> On 8/27/2018 12:46 AM, Qu Wenruo wrote:
>>
>>
>>> The system uses ECC memory and edac-util has not reported any errors.
>>> However, I will run a memtest anyway.
>>
>> So it should not be the memory problem.
>>
>> BTW, what's the current generation of the fs?
>>
>> # btrfs inspect dump-super  | grep generation
>>
>> The corrupted leaf has generation 2862, I'm not sure how recent did the
>> corruption happen.
>
>
> generation  358392
> chunk_root_generation   357256
> cache_generation358392
> uuid_tree_generation358392
> dev_item.generation 0
>
> I don't recall the last time I ran a scrub but I doubt it has been more than
> a year.
>
> I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
> everything up.


I'd expect --init-csum-tree on recreates the data csum tree, and will
not assume metadata leaf is correct and just recompute a csum for it.


-- 
Chris Murphy


Re: Scrub aborts due to corrupt leaf

2018-08-27 Thread Larkin Lowrey

On 8/27/2018 12:46 AM, Qu Wenruo wrote:



The system uses ECC memory and edac-util has not reported any errors.
However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent did the
corruption happen.


generation  358392
chunk_root_generation   357256
cache_generation    358392
uuid_tree_generation    358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been more 
than a year.


I am running 'btrfs check --init-csum-tree' now. Hopefully that clears 
everything up.


Thank you for your help and advice,

--Larkin


Re: DRDY errors are not consistent with scrub results

2018-08-27 Thread Chris Murphy
On Mon, Aug 27, 2018 at 6:49 PM, Cerem Cem ASLAN  wrote:
> Thanks for your guidance, I'll get the device replaced first thing in
> the morning.
>
> Here is balance results which I think resulted not too bad:
>
> sudo btrfs balance start /mnt/peynir/
> WARNING:
>
> Full balance without filters requested. This operation is very
> intense and takes potentially very long. It is recommended to
> use the balance filters to narrow down the balanced data.
> Use 'btrfs balance start --full-balance' option to skip this
> warning. The operation will start in 10 seconds.
> Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting balance without any filters.
> Done, had to relocate 18 out of 18 chunks
>
> I suppose this means I've not lost any data, but I'm very prone to due
> to previous `smartctl ...` results.


OK so nothing fatal anyway. We'd have to see any kernel messages that
appeared during the balance to see if there were read or write errors,
but presumably any failure means the balance fails so... might get you
by for a while actually.







-- 
Chris Murphy


Re: DRDY errors are not consistent with scrub results

2018-08-27 Thread Cerem Cem ASLAN
Thanks for your guidance, I'll get the device replaced first thing in
the morning.

Here is balance results which I think resulted not too bad:

sudo btrfs balance start /mnt/peynir/
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the balanced data.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Done, had to relocate 18 out of 18 chunks

I suppose this means I've not lost any data, but I'm very prone to due
to previous `smartctl ...` results.

Chris Murphy , 28 Ağu 2018 Sal, 03:39
tarihinde şunu yazdı:
>
> On Mon, Aug 27, 2018 at 6:38 PM, Chris Murphy  wrote:
>
> >> Metadata,single: Size:8.00MiB, Used:0.00B
> >>/dev/mapper/master-root 8.00MiB
> >>
> >> Metadata,DUP: Size:2.00GiB, Used:562.08MiB
> >>/dev/mapper/master-root 4.00GiB
> >>
> >> System,single: Size:4.00MiB, Used:0.00B
> >>/dev/mapper/master-root 4.00MiB
> >>
> >> System,DUP: Size:32.00MiB, Used:16.00KiB
> >>/dev/mapper/master-root64.00MiB
> >>
> >> Unallocated:
> >>/dev/mapper/master-root   915.24GiB
> >
> >
> > OK this looks like it maybe was created a while ago, it has these
> > empty single chunk items that was common a while back. There is a low
> > risk to clean it up, but I still advise backup first:
> >
> > 'btrfs balance start -mconvert=dup '
>
> You can skip this advise now, it really doesn't matter. But future
> Btrfs shouldn't have both single and DUP chunks like this one is
> showing, if you're using relatively recent btrfs-progs to create the
> file system.
>
>
> --
> Chris Murphy


Re: DRDY errors are not consistent with scrub results

2018-08-27 Thread Chris Murphy
On Mon, Aug 27, 2018 at 6:38 PM, Chris Murphy  wrote:

>> Metadata,single: Size:8.00MiB, Used:0.00B
>>/dev/mapper/master-root 8.00MiB
>>
>> Metadata,DUP: Size:2.00GiB, Used:562.08MiB
>>/dev/mapper/master-root 4.00GiB
>>
>> System,single: Size:4.00MiB, Used:0.00B
>>/dev/mapper/master-root 4.00MiB
>>
>> System,DUP: Size:32.00MiB, Used:16.00KiB
>>/dev/mapper/master-root64.00MiB
>>
>> Unallocated:
>>/dev/mapper/master-root   915.24GiB
>
>
> OK this looks like it maybe was created a while ago, it has these
> empty single chunk items that was common a while back. There is a low
> risk to clean it up, but I still advise backup first:
>
> 'btrfs balance start -mconvert=dup '

You can skip this advise now, it really doesn't matter. But future
Btrfs shouldn't have both single and DUP chunks like this one is
showing, if you're using relatively recent btrfs-progs to create the
file system.


-- 
Chris Murphy


Re: DRDY errors are not consistent with scrub results

2018-08-27 Thread Chris Murphy
On Mon, Aug 27, 2018 at 6:05 PM, Cerem Cem ASLAN  wrote:
> Note that I've directly received this reply, not by mail list. I'm not
> sure this is intended or not.

I intended to do Reply to All but somehow this doesn't always work out
between the user and Gmail, I'm just gonna assume gmail is being an
asshole again.


> Chris Murphy , 28 Ağu 2018 Sal, 02:25
> tarihinde şunu yazdı:
>>
>> On Mon, Aug 27, 2018 at 4:51 PM, Cerem Cem ASLAN  
>> wrote:
>> > Hi,
>> >
>> > I'm getting DRDY ERR messages which causes system crash on the server:
>> >
>> > # tail -n 40 /var/log/kern.log.1
>> > Aug 24 21:04:55 aea3 kernel: [  939.228059] lxc-bridge: port
>> > 5(vethI7JDHN) entered disabled state
>> > Aug 24 21:04:55 aea3 kernel: [  939.300602] eth0: renamed from vethQ5Y2OF
>> > Aug 24 21:04:55 aea3 kernel: [  939.328245] IPv6: ADDRCONF(NETDEV_UP):
>> > eth0: link is not ready
>> > Aug 24 21:04:55 aea3 kernel: [  939.328453] IPv6:
>> > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> > Aug 24 21:04:55 aea3 kernel: [  939.328474] IPv6:
>> > ADDRCONF(NETDEV_CHANGE): vethI7JDHN: link becomes ready
>> > Aug 24 21:04:55 aea3 kernel: [  939.328491] lxc-bridge: port
>> > 5(vethI7JDHN) entered blocking state
>> > Aug 24 21:04:55 aea3 kernel: [  939.328493] lxc-bridge: port
>> > 5(vethI7JDHN) entered forwarding state
>> > Aug 24 21:04:59 aea3 kernel: [  943.085647] cgroup: cgroup2: unknown
>> > option "nsdelegate"
>> > Aug 24 21:16:15 aea3 kernel: [ 1619.400016] perf: interrupt took too
>> > long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to
>> > 79750
>> > Aug 24 21:17:11 aea3 kernel: [ 1675.515815] perf: interrupt took too
>> > long (3137 > 3132), lowering kernel.perf_event_max_sample_rate to
>> > 63750
>> > Aug 24 21:17:13 aea3 kernel: [ 1677.080837] cgroup: cgroup2: unknown
>> > option "nsdelegate"
>> > Aug 25 22:38:31 aea3 kernel: [92955.512098] usb 4-2: USB disconnect,
>> > device number 2
>> > Aug 26 02:14:21 aea3 kernel: [105906.035038] lxc-bridge: port
>> > 4(vethCTKU4K) entered disabled state
>> > Aug 26 02:15:30 aea3 kernel: [105974.107521] lxc-bridge: port
>> > 4(vethO59BPD) entered disabled state
>> > Aug 26 02:15:30 aea3 kernel: [105974.109991] device vethO59BPD left
>> > promiscuous mode
>> > Aug 26 02:15:30 aea3 kernel: [105974.109995] lxc-bridge: port
>> > 4(vethO59BPD) entered disabled state
>> > Aug 26 02:15:30 aea3 kernel: [105974.710490] lxc-bridge: port
>> > 4(vethBAYODL) entered blocking state
>> > Aug 26 02:15:30 aea3 kernel: [105974.710493] lxc-bridge: port
>> > 4(vethBAYODL) entered disabled state
>> > Aug 26 02:15:30 aea3 kernel: [105974.710545] device vethBAYODL entered
>> > promiscuous mode
>> > Aug 26 02:15:30 aea3 kernel: [105974.710598] IPv6:
>> > ADDRCONF(NETDEV_UP): vethBAYODL: link is not ready
>> > Aug 26 02:15:30 aea3 kernel: [105974.710600] lxc-bridge: port
>> > 4(vethBAYODL) entered blocking state
>> > Aug 26 02:15:30 aea3 kernel: [105974.710601] lxc-bridge: port
>> > 4(vethBAYODL) entered forwarding state
>> > Aug 26 02:16:35 aea3 kernel: [106039.674089] BTRFS: device fsid
>> > 5b844c7a-0cbd-40a7-a8e3-6bc636aba033 devid 1 transid 984 /dev/dm-3
>> > Aug 26 02:17:21 aea3 kernel: [106085.352453] ata4.00: failed command: READ 
>> > DMA
>> > Aug 26 02:17:21 aea3 kernel: [106085.352901] ata4.00: status: { DRDY ERR }
>> > Aug 26 02:18:56 aea3 kernel: [106180.648062] ata4.00: exception Emask
>> > 0x0 SAct 0x0 SErr 0x0 action 0x0
>> > Aug 26 02:18:56 aea3 kernel: [106180.648333] ata4.00: BMDMA stat 0x25
>> > Aug 26 02:18:56 aea3 kernel: [106180.648515] ata4.00: failed command: READ 
>> > DMA
>> > Aug 26 02:18:56 aea3 kernel: [106180.648706] ata4.00: cmd
>> > c8/00:08:80:9c:bb/00:00:00:00:00/e3 tag 0 dma 4096 in
>> > Aug 26 02:18:56 aea3 kernel: [106180.648706]  res
>> > 51/40:00:80:9c:bb/00:00:00:00:00/03 Emask 0x9 (media error)
>> > Aug 26 02:18:56 aea3 kernel: [106180.649380] ata4.00: status: { DRDY ERR }
>> > Aug 26 02:18:56 aea3 kernel: [106180.649743] ata4.00: error: { UNC }
>>
>> Classic case of uncorrectable read error due to sector failure.
>>
>>
>>
>> > Aug 26 02:18:56 aea3 kernel: [106180.779311] ata4.00: configured for 
>> > UDMA/133
>> > Aug 26 02:18:56 aea3 kernel: [106180.779331] sd 3:0:0:0: [sda] tag#0
>> > FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> > Aug 26 02:18:56 aea3 kernel: [106180.779335] sd 3:0:0:0: [sda] tag#0
>> > Sense Key : Medium Error [current]
>> > Aug 26 02:18:56 aea3 kernel: [106180.779339] sd 3:0:0:0: [sda] tag#0
>> > Add. Sense: Unrecovered read error - auto reallocate failed
>> > Aug 26 02:18:56 aea3 kernel: [106180.779343] sd 3:0:0:0: [sda] tag#0
>> > CDB: Read(10) 28 00 03 bb 9c 80 00 00 08 00
>> > Aug 26 02:18:56 aea3 kernel: [106180.779346] blk_update_request: I/O
>> > error, dev sda, sector 62626944
>>
>> And the drive has reported the physical sector that's failing.
>>
>>
>>
>> > Aug 26 02:18:56 aea3 kernel: [106180.779703] BTRFS error (device
>> > dm-2): bdev /dev/mapper/master-root errs: wr 0, rd 40, flush 0,
>> > 

Re: btrfs-progs: btrfs-convert: unable to find block group for 0

2018-08-27 Thread Qu Wenruo


On 2018/8/28 上午6:33, Tucker Boniface wrote:
> Hello, I am trying to convert an ext4 partition to btrfs using
> btrfs-convert. I am running Arch Linux with kernel 4.18.5 and
> btrfs-progs 4.17.1. The full error is inline below.
> 
> -> # btrfs-convert /dev/sda1
> create btrfs filesystem:
> blocksize: 4096
> nodesize:  16384
> features:  extref, skinny-metadata (default)
> creating ext2 image file
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> extent-tree.c:2743: alloc_tree_block: BUG_ON `ret` triggered, value -28

This means ENOSPC.
Your ext* doesn't have enough free space to contain btrfs' metadata.

Thanks,
Qu

> btrfs-convert(+0x1d512)[0x564d08cd3512]
> btrfs-convert(btrfs_alloc_free_block+0x1e8)[0x564d08cda188]
> btrfs-convert(+0x15bd1)[0x564d08ccbbd1]
> btrfs-convert(btrfs_search_slot+0xf27)[0x564d08ccd7d7]
> btrfs-convert(btrfs_csum_file_block+0x499)[0x564d08cdf219]
> btrfs-convert(+0xe625)[0x564d08cc4625]
> btrfs-convert(main+0x1abf)[0x564d08cc3adf]
> /usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f1822576223]
> btrfs-convert(_start+0x2e)[0x564d08cc41be]
> Aborted



signature.asc
Description: OpenPGP digital signature


[RFC PATCH v2 1/6] fs: pass iocb to direct I/O get_block()

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

Split out dio_get_block_t which is the same as get_block_t except that
it takes the iocb as well, and update fs/direct-io.c and all callers to
use it. This is preparation for replacing the use of bh->b_private in
the direct I/O code with iocb->private.

Signed-off-by: Omar Sandoval 
---
 fs/affs/file.c  |  9 -
 fs/btrfs/inode.c|  3 ++-
 fs/direct-io.c  | 13 ++---
 fs/ext2/inode.c |  9 -
 fs/ext4/ext4.h  |  2 --
 fs/ext4/inode.c | 27 ++-
 fs/f2fs/data.c  |  5 +++--
 fs/fat/inode.c  |  9 -
 fs/gfs2/aops.c  |  5 +++--
 fs/hfs/inode.c  |  9 -
 fs/hfsplus/inode.c  |  9 -
 fs/jfs/inode.c  |  9 -
 fs/nilfs2/inode.c   |  9 -
 fs/ocfs2/aops.c | 15 +--
 fs/reiserfs/inode.c |  4 ++--
 fs/udf/inode.c  |  9 -
 include/linux/fs.h  | 10 ++
 17 files changed, 113 insertions(+), 43 deletions(-)

diff --git a/fs/affs/file.c b/fs/affs/file.c
index a85817f54483..66a1a5601d65 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -389,6 +389,13 @@ static void affs_write_failed(struct address_space 
*mapping, loff_t to)
}
 }
 
+static int affs_get_block_dio(struct kiocb *iocb, struct inode *inode,
+ sector_t block, struct buffer_head *bh_result,
+ int create)
+{
+   return affs_get_block(inode, block, bh_result, create);
+}
+
 static ssize_t
 affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
@@ -406,7 +413,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return 0;
}
 
-   ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block);
+   ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block_dio);
if (ret < 0 && iov_iter_rw(iter) == WRITE)
affs_write_failed(mapping, offset + count);
return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eba61bcb9bb3..b61ea6dd9956 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7659,7 +7659,8 @@ static int btrfs_get_blocks_direct_write(struct 
extent_map **map,
return ret;
 }
 
-static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
+static int btrfs_get_blocks_direct(struct kiocb *iocb, struct inode *inode,
+  sector_t iblock,
   struct buffer_head *bh_result, int create)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 093fb54cd316..f631aa98849b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -82,7 +82,7 @@ struct dio_submit {
int reap_counter;   /* rate limit reaping */
sector_t final_block_in_request;/* doesn't change */
int boundary;   /* prev block is at a boundary */
-   get_block_t *get_block; /* block mapping function */
+   dio_get_block_t *get_block; /* block mapping function */
dio_submit_t *submit_io;/* IO submition function */
 
loff_t logical_offset_in_bio;   /* current first logical block in bio */
@@ -713,8 +713,8 @@ static int get_more_blocks(struct dio *dio, struct 
dio_submit *sdio,
create = 0;
}
 
-   ret = (*sdio->get_block)(dio->inode, fs_startblk,
-   map_bh, create);
+   ret = (*sdio->get_block)(dio->iocb, dio->inode, fs_startblk,
+map_bh, create);
 
/* Store for completion */
dio->private = map_bh->b_private;
@@ -1170,7 +1170,7 @@ static inline int drop_refcount(struct dio *dio)
 static inline ssize_t
 do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
  struct block_device *bdev, struct iov_iter *iter,
- get_block_t get_block, dio_iodone_t end_io,
+ dio_get_block_t get_block, dio_iodone_t end_io,
  dio_submit_t submit_io, int flags)
 {
unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
@@ -1398,9 +1398,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
 
 ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 struct block_device *bdev, struct iov_iter *iter,
-get_block_t get_block,
-dio_iodone_t end_io, dio_submit_t submit_io,
-int flags)
+dio_get_block_t get_block, dio_iodone_t end_io,
+dio_submit_t submit_io, int flags)
 {
/*
 * The block device state is needed in the end to finally
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..f390e6392238 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -930,6 +930,13 @@ static sector_t 

[RFC PATCH v2 4/6] fs: stop propagating bh->b_private for direct I/O

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

Currently, the direct I/O code saves the value of bh->b_private set
by the filesystem and passes it to the end_io callback. However, struct
kiocb already has a ->private member which can be used for this purpose,
with the added benefit of being available before get_block is called,
too. The only users of the bh->b_private functionality have been
converted to use iocb->private, so stop passing it around.

Signed-off-by: Omar Sandoval 
---
 fs/direct-io.c | 7 +--
 fs/ext4/inode.c| 3 +--
 fs/ocfs2/aops.c| 5 +
 include/linux/fs.h | 3 +--
 4 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index f631aa98849b..80e488afe6c6 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -122,8 +122,6 @@ struct dio {
loff_t i_size;  /* i_size when submitted */
dio_iodone_t *end_io;   /* IO completion function */
 
-   void *private;  /* copy from map_bh.b_private */
-
/* BIO completion state */
spinlock_t bio_lock;/* protects BIO fields below */
int page_errors;/* errno from get_user_pages() */
@@ -288,7 +286,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 
if (dio->end_io) {
// XXX: ki_pos??
-   err = dio->end_io(dio->iocb, offset, ret, dio->private);
+   err = dio->end_io(dio->iocb, offset, ret);
if (err)
ret = err;
}
@@ -716,9 +714,6 @@ static int get_more_blocks(struct dio *dio, struct 
dio_submit *sdio,
ret = (*sdio->get_block)(dio->iocb, dio->inode, fs_startblk,
 map_bh, create);
 
-   /* Store for completion */
-   dio->private = map_bh->b_private;
-
if (ret == 0 && buffer_defer_completion(map_bh))
ret = dio_set_defer_completion(dio);
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 841d79919cef..0f42793765bf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3612,8 +3612,7 @@ const struct iomap_ops ext4_iomap_ops = {
.iomap_end  = ext4_iomap_end,
 };
 
-static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-   ssize_t size, void *private)
+static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset, ssize_t size)
 {
 ext4_io_end_t *io_end = iocb->private;
 
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index fc4a18b6ad3c..c1232df20be5 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2404,10 +2404,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
  * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
  * to protect io on one node from truncation on another.
  */
-static int ocfs2_dio_end_io(struct kiocb *iocb,
-   loff_t offset,
-   ssize_t bytes,
-   void *private)
+static int ocfs2_dio_end_io(struct kiocb *iocb, loff_t offset, ssize_t bytes)
 {
struct ocfs2_dio_write_ctxt *dwc;
struct inode *inode = file_inode(iocb->ki_filp);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 85db69835023..f1a235f0fa21 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,8 +83,7 @@ typedef int (get_block_t)(struct inode *inode, sector_t 
iblock,
 typedef int (dio_get_block_t)(struct kiocb *iocb, struct inode *inode,
  sector_t iblock, struct buffer_head *bh_result,
  int create);
-typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
-   ssize_t bytes, void *private);
+typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, ssize_t bytes);
 
 #define MAY_EXEC   0x0001
 #define MAY_WRITE  0x0002
-- 
2.18.0



[RFC PATCH v2 6/6] Btrfs: stop abusing current->journal_info in btrfs_direct_IO()

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

Now that we can pass around the struct btrfs_dio_data through the
different callbacks generically, we don't need to shove it in
current->journal_info.

Signed-off-by: Omar Sandoval 
---
 fs/btrfs/inode.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6efa6a6e3e20..38a41e9d6e93 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7654,7 +7654,6 @@ static int btrfs_get_blocks_direct_write(struct 
extent_map **map,
WARN_ON(dio_data->reserve < len);
dio_data->reserve -= len;
dio_data->unsubmitted_oe_range_end = start + len;
-   current->journal_info = dio_data;
 out:
return ret;
 }
@@ -7666,7 +7665,7 @@ static int btrfs_get_blocks_direct(struct kiocb *iocb, 
struct inode *inode,
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct extent_map *em;
struct extent_state *cached_state = NULL;
-   struct btrfs_dio_data *dio_data = NULL;
+   struct btrfs_dio_data *dio_data = iocb->private;
u64 start = iblock << inode->i_blkbits;
u64 lockstart, lockend;
u64 len = bh_result->b_size;
@@ -7681,25 +7680,13 @@ static int btrfs_get_blocks_direct(struct kiocb *iocb, 
struct inode *inode,
lockstart = start;
lockend = start + len - 1;
 
-   if (current->journal_info) {
-   /*
-* Need to pull our outstanding extents and set journal_info to 
NULL so
-* that anything that needs to check if there's a transaction 
doesn't get
-* confused.
-*/
-   dio_data = current->journal_info;
-   current->journal_info = NULL;
-   }
-
/*
 * If this errors out it's because we couldn't invalidate pagecache for
 * this range and we need to fallback to buffered.
 */
if (lock_extent_direct(inode, lockstart, lockend, _state,
-  create)) {
-   ret = -ENOTBLK;
-   goto err;
-   }
+  create))
+   return -ENOTBLK;
 
em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0);
if (IS_ERR(em)) {
@@ -7767,9 +7754,6 @@ static int btrfs_get_blocks_direct(struct kiocb *iocb, 
struct inode *inode,
 unlock_err:
clear_extent_bit(_I(inode)->io_tree, lockstart, lockend,
 unlock_bits, 1, 0, _state);
-err:
-   if (dio_data)
-   current->journal_info = dio_data;
return ret;
 }
 
@@ -8470,7 +8454,7 @@ static void btrfs_submit_direct(struct kiocb *iocb, 
struct bio *dio_bio,
 * time by btrfs_direct_IO().
 */
if (write) {
-   struct btrfs_dio_data *dio_data = current->journal_info;
+   struct btrfs_dio_data *dio_data = iocb->private;
 
dio_data->unsubmitted_oe_range_end = dip->logical_offset +
dip->bytes;
@@ -8612,13 +8596,13 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
/*
 * We need to know how many extents we reserved so that we can
 * do the accounting properly if we go over the number we
-* originally calculated.  Abuse current->journal_info for this.
+* originally calculated.
 */
dio_data.reserve = round_up(count,
fs_info->sectorsize);
dio_data.unsubmitted_oe_range_start = (u64)offset;
dio_data.unsubmitted_oe_range_end = (u64)offset;
-   current->journal_info = _data;
+   iocb->private = _data;
down_read(_I(inode)->dio_sem);
} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 _I(inode)->runtime_flags)) {
@@ -8633,7 +8617,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
   btrfs_submit_direct, flags);
if (iov_iter_rw(iter) == WRITE) {
up_read(_I(inode)->dio_sem);
-   current->journal_info = NULL;
if (ret < 0 && ret != -EIOCBQUEUED) {
if (dio_data.reserve)
btrfs_delalloc_release_space(inode, 
data_reserved,
-- 
2.18.0



[RFC PATCH v2 5/6] fs: pass iocb to direct I/O submit_io()

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

Btrfs abuses current->journal_info in btrfs_direct_IO() in order to pass
around some state to get_block() and submit_io(). However, iocb->private
is free for Btrfs to use, we just need it passed to submit_io(). Btrfs
is the only user of submit_io(), so this doesn't affect any other
filesystems.

Signed-off-by: Omar Sandoval 
---
 fs/btrfs/inode.c   | 4 ++--
 fs/direct-io.c | 3 ++-
 include/linux/fs.h | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b61ea6dd9956..6efa6a6e3e20 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8427,8 +8427,8 @@ static int btrfs_submit_direct_hook(struct 
btrfs_dio_private *dip)
return 0;
 }
 
-static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
-   loff_t file_offset)
+static void btrfs_submit_direct(struct kiocb *iocb, struct bio *dio_bio,
+   struct inode *inode, loff_t file_offset)
 {
struct btrfs_dio_private *dip = NULL;
struct bio *bio = NULL;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 80e488afe6c6..aa367e70456d 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -473,7 +473,8 @@ static inline void dio_bio_submit(struct dio *dio, struct 
dio_submit *sdio)
dio->bio_disk = bio->bi_disk;
 
if (sdio->submit_io) {
-   sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio);
+   sdio->submit_io(dio->iocb, bio, dio->inode,
+   sdio->logical_offset_in_bio);
dio->bio_cookie = BLK_QC_T_NONE;
} else
dio->bio_cookie = submit_bio(bio);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f1a235f0fa21..daf1df811f67 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3003,8 +3003,8 @@ extern int generic_file_open(struct inode * inode, struct 
file * filp);
 extern int nonseekable_open(struct inode * inode, struct file * filp);
 
 #ifdef CONFIG_BLOCK
-typedef void (dio_submit_t)(struct bio *bio, struct inode *inode,
-   loff_t file_offset);
+typedef void (dio_submit_t)(struct kiocb *iocb, struct bio *bio,
+   struct inode *inode, loff_t file_offset);
 
 enum {
/* need locking between buffered and direct access */
-- 
2.18.0



[RFC PATCH v2 3/6] ocfs2: use iocb->private instead of bh->b_private

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

As part of simplifying all of the private data passed around for direct
I/O, bh->b_private will no longer be passed to dio_iodone_t. Instead,
filesystems should use iocb->private. ocfs2 already uses iocb->private
for storing a couple of flag bits, but we can use it as a tagged pointer
and hide all of the messiness in helpers.

Cc: Mark Fasheh 
Cc: Joel Becker 
Signed-off-by: Omar Sandoval 
---
 fs/ocfs2/aops.c | 19 ---
 fs/ocfs2/aops.h | 64 +
 2 files changed, 54 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 93ca23c56b07..fc4a18b6ad3c 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2104,12 +2104,13 @@ struct ocfs2_dio_write_ctxt {
 };
 
 static struct ocfs2_dio_write_ctxt *
-ocfs2_dio_alloc_write_ctx(struct buffer_head *bh, int *alloc)
+ocfs2_dio_alloc_write_ctx(struct kiocb *iocb, int *alloc)
 {
-   struct ocfs2_dio_write_ctxt *dwc = NULL;
+   struct ocfs2_dio_write_ctxt *dwc;
 
-   if (bh->b_private)
-   return bh->b_private;
+   dwc = ocfs2_iocb_private(iocb);
+   if (dwc)
+   return dwc;
 
dwc = kmalloc(sizeof(struct ocfs2_dio_write_ctxt), GFP_NOFS);
if (dwc == NULL)
@@ -2118,7 +2119,7 @@ ocfs2_dio_alloc_write_ctx(struct buffer_head *bh, int 
*alloc)
dwc->dw_zero_count = 0;
dwc->dw_orphaned = 0;
dwc->dw_writer_pid = task_pid_nr(current);
-   bh->b_private = dwc;
+   ocfs2_iocb_set_private(iocb, dwc);
*alloc = 1;
 
return dwc;
@@ -2184,7 +2185,7 @@ static int ocfs2_dio_wr_get_block(struct kiocb *iocb, 
struct inode *inode,
bh_result->b_state = 0;
}
 
-   dwc = ocfs2_dio_alloc_write_ctx(bh_result, _get_block);
+   dwc = ocfs2_dio_alloc_write_ctx(iocb, _get_block);
if (unlikely(dwc == NULL)) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -2408,6 +2409,7 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
ssize_t bytes,
void *private)
 {
+   struct ocfs2_dio_write_ctxt *dwc;
struct inode *inode = file_inode(iocb->ki_filp);
int level;
int ret = 0;
@@ -2415,8 +2417,9 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
/* this io's submitter should not have unlocked this before we could */
BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
 
-   if (bytes > 0 && private)
-   ret = ocfs2_dio_end_io_write(inode, private, offset, bytes);
+   dwc = ocfs2_iocb_private(iocb);
+   if (bytes > 0 && dwc)
+   ret = ocfs2_dio_end_io_write(inode, dwc, offset, bytes);
 
ocfs2_iocb_clear_rw_locked(iocb);
 
diff --git a/fs/ocfs2/aops.h b/fs/ocfs2/aops.h
index 3494a62ed749..2c3219e0c010 100644
--- a/fs/ocfs2/aops.h
+++ b/fs/ocfs2/aops.h
@@ -63,32 +63,54 @@ int ocfs2_size_fits_inline_data(struct buffer_head *di_bh, 
u64 new_size);
 
 int ocfs2_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
-/* all ocfs2_dio_end_io()'s fault */
-#define ocfs2_iocb_is_rw_locked(iocb) \
-   test_bit(0, (unsigned long *)>private)
+
+/*
+ * Direct I/O uses iocb->private as a tagged pointer. The bottom two bits
+ * defined below are used for communication between ocfs2_dio_end_io() and
+ * ocfs2_file_write/read_iter().
+ */
+#define OCFS2_IOCB_RW_LOCK 1
+#define OCFS2_IOCB_RW_LOCK_LEVEL 2
+
+static inline void *ocfs2_iocb_private(struct kiocb *iocb)
+{
+   return (void *)((unsigned long)iocb->private & ~3);
+}
+
+static inline void ocfs2_iocb_set_private(struct kiocb *iocb, void *private)
+{
+   iocb->private = (void *)(((unsigned long)iocb->private & 3) |
+((unsigned long)private & ~3));
+}
+
+static inline bool ocfs2_iocb_is_rw_locked(struct kiocb *iocb)
+{
+   return (unsigned long)iocb->private & OCFS2_IOCB_RW_LOCK;
+}
+
 static inline void ocfs2_iocb_set_rw_locked(struct kiocb *iocb, int level)
 {
-   set_bit(0, (unsigned long *)>private);
+   unsigned long private = (unsigned long)iocb->private;
+
+   private |= OCFS2_IOCB_RW_LOCK;
if (level)
-   set_bit(1, (unsigned long *)>private);
+   private |= OCFS2_IOCB_RW_LOCK_LEVEL;
else
-   clear_bit(1, (unsigned long *)>private);
+   private &= ~OCFS2_IOCB_RW_LOCK_LEVEL;
+   iocb->private = (void *)private;
 }
 
-/*
- * Using a named enum representing lock types in terms of #N bit stored in
- * iocb->private, which is going to be used for communication between
- * ocfs2_dio_end_io() and ocfs2_file_write/read_iter().
- */
-enum ocfs2_iocb_lock_bits {
-   OCFS2_IOCB_RW_LOCK = 0,
-   OCFS2_IOCB_RW_LOCK_LEVEL,
-   OCFS2_IOCB_NUM_LOCKS
-};
-
-#define ocfs2_iocb_clear_rw_locked(iocb) \
-   clear_bit(OCFS2_IOCB_RW_LOCK, (unsigned long *)>private)
-#define 

[RFC PATCH v2 2/6] ext4: use iocb->private instead of bh->b_private

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

As part of simplifying all of the private data passed around for direct
I/O, bh->b_private will no longer be passed to dio_iodone_t. iocb is
still available there, however, so convert ext4 to use it. Note that
ext4_file_write_iter() also uses iocb->private, but
ext4_direct_IO_write() resets it to NULL after reading it.

Also note that the comment above ext4_should_dioread_nolock() is no
longer accurate. It seems that it should be possible to remove the data
journaling restriction now?

Cc: "Theodore Ts'o" 
Cc: Andreas Dilger 
Signed-off-by: Omar Sandoval 
---
 fs/ext4/inode.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 18ad91b1c8f6..841d79919cef 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -884,18 +884,16 @@ static int ext4_dio_get_block_unwritten_async(struct 
kiocb *iocb,
/*
 * When doing DIO using unwritten extents, we need io_end to convert
 * unwritten extents to written on IO completion. We allocate io_end
-* once we spot unwritten extent and store it in b_private. Generic
-* DIO code keeps b_private set and furthermore passes the value to
-* our completion callback in 'private' argument.
+* once we spot unwritten extent and store it in iocb->private.
 */
if (!ret && buffer_unwritten(bh_result)) {
-   if (!bh_result->b_private) {
+   if (!iocb->private) {
ext4_io_end_t *io_end;
 
io_end = ext4_init_io_end(inode, GFP_KERNEL);
if (!io_end)
return -ENOMEM;
-   bh_result->b_private = io_end;
+   iocb->private = io_end;
ext4_set_io_unwritten_flag(inode, io_end);
}
set_buffer_defer_completion(bh_result);
@@ -3617,7 +3615,7 @@ const struct iomap_ops ext4_iomap_ops = {
 static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
ssize_t size, void *private)
 {
-ext4_io_end_t *io_end = private;
+ext4_io_end_t *io_end = iocb->private;
 
/* if not async direct IO just return */
if (!io_end)
-- 
2.18.0



[RFC PATCH v2 0/6] Btrfs: stop abusing current->journal_info for direct I/O

2018-08-27 Thread Omar Sandoval
From: Omar Sandoval 

Hi,

This is a different approach from v1 [1] of this series to stop abusing
current->journal_info in Btrfs. This approach unifies everything to use
iocb->private instead of map_bh->b_private. Patches 1 and 5 pass the
iocb to a couple of callbacks which need it. Patches 2 and 3 migrates
the users of b_private to use iocb->private, and patch 4 gets rid of the
b_private handling in the direct I/O code. Patch 6 cleans up Btrfs.

I'm not convinced that this is cleaner that my first approach, but it at
least avoids growing the argument list to do_blockdev_direct_IO(), which
was Al's complaint of v1.

Thanks!

1: https://www.spinics.net/lists/linux-btrfs/msg77859.html

Omar Sandoval (6):
  fs: pass iocb to direct I/O get_block()
  ext4: use iocb->private instead of bh->b_private
  ocfs2: use iocb->private instead of bh->b_private
  fs: stop propagating bh->b_private for direct I/O
  fs: pass iocb to direct I/O submit_io()
  Btrfs: stop abusing current->journal_info in btrfs_direct_IO()

 fs/affs/file.c  |  9 ++-
 fs/btrfs/inode.c| 36 +++--
 fs/direct-io.c  | 23 +++-
 fs/ext2/inode.c |  9 ++-
 fs/ext4/ext4.h  |  2 --
 fs/ext4/inode.c | 40 
 fs/f2fs/data.c  |  5 ++--
 fs/fat/inode.c  |  9 ++-
 fs/gfs2/aops.c  |  5 ++--
 fs/hfs/inode.c  |  9 ++-
 fs/hfsplus/inode.c  |  9 ++-
 fs/jfs/inode.c  |  9 ++-
 fs/nilfs2/inode.c   |  9 ++-
 fs/ocfs2/aops.c | 39 ++-
 fs/ocfs2/aops.h | 64 ++---
 fs/reiserfs/inode.c |  4 +--
 fs/udf/inode.c  |  9 ++-
 include/linux/fs.h  | 17 ++--
 18 files changed, 187 insertions(+), 120 deletions(-)

-- 
2.18.0



corruption_errs

2018-08-27 Thread John Petrini
Hi List,

I'm seeing corruption errors when running btrfs device stats but I'm
not sure what that means exactly. I've just completed a full scrub and
it reported no errors. I'm hoping someone here can enlighten me.
Thanks!

[/dev/sdd].write_io_errs   0
[/dev/sdd].read_io_errs0
[/dev/sdd].flush_io_errs   0
[/dev/sdd].corruption_errs 331
[/dev/sdd].generation_errs 0
[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 324
[/dev/sde].generation_errs 0
[/dev/sdi].write_io_errs   0
[/dev/sdi].read_io_errs0
[/dev/sdi].flush_io_errs   0
[/dev/sdi].corruption_errs 381
[/dev/sdi].generation_errs 0
[/dev/sdk].write_io_errs   0
[/dev/sdk].read_io_errs0
[/dev/sdk].flush_io_errs   0
[/dev/sdk].corruption_errs 492
[/dev/sdk].generation_errs 0
[/dev/sdl].write_io_errs   0
[/dev/sdl].read_io_errs0
[/dev/sdl].flush_io_errs   0
[/dev/sdl].corruption_errs 449
[/dev/sdl].generation_errs 0
[/dev/sdj].write_io_errs   0
[/dev/sdj].read_io_errs0
[/dev/sdj].flush_io_errs   0
[/dev/sdj].corruption_errs 391
[/dev/sdj].generation_errs 0
[/dev/sdg].write_io_errs   0
[/dev/sdg].read_io_errs0
[/dev/sdg].flush_io_errs   0
[/dev/sdg].corruption_errs 485
[/dev/sdg].generation_errs 0
[/dev/sdh].write_io_errs   0
[/dev/sdh].read_io_errs0
[/dev/sdh].flush_io_errs   0
[/dev/sdh].corruption_errs 444
[/dev/sdh].generation_errs 0
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 398
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 400
[/dev/sdc].generation_errs 0


DRDY errors are not consistent with scrub results

2018-08-27 Thread Cerem Cem ASLAN
Hi,

I'm getting DRDY ERR messages which causes system crash on the server:

# tail -n 40 /var/log/kern.log.1
Aug 24 21:04:55 aea3 kernel: [  939.228059] lxc-bridge: port
5(vethI7JDHN) entered disabled state
Aug 24 21:04:55 aea3 kernel: [  939.300602] eth0: renamed from vethQ5Y2OF
Aug 24 21:04:55 aea3 kernel: [  939.328245] IPv6: ADDRCONF(NETDEV_UP):
eth0: link is not ready
Aug 24 21:04:55 aea3 kernel: [  939.328453] IPv6:
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Aug 24 21:04:55 aea3 kernel: [  939.328474] IPv6:
ADDRCONF(NETDEV_CHANGE): vethI7JDHN: link becomes ready
Aug 24 21:04:55 aea3 kernel: [  939.328491] lxc-bridge: port
5(vethI7JDHN) entered blocking state
Aug 24 21:04:55 aea3 kernel: [  939.328493] lxc-bridge: port
5(vethI7JDHN) entered forwarding state
Aug 24 21:04:59 aea3 kernel: [  943.085647] cgroup: cgroup2: unknown
option "nsdelegate"
Aug 24 21:16:15 aea3 kernel: [ 1619.400016] perf: interrupt took too
long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to
79750
Aug 24 21:17:11 aea3 kernel: [ 1675.515815] perf: interrupt took too
long (3137 > 3132), lowering kernel.perf_event_max_sample_rate to
63750
Aug 24 21:17:13 aea3 kernel: [ 1677.080837] cgroup: cgroup2: unknown
option "nsdelegate"
Aug 25 22:38:31 aea3 kernel: [92955.512098] usb 4-2: USB disconnect,
device number 2
Aug 26 02:14:21 aea3 kernel: [105906.035038] lxc-bridge: port
4(vethCTKU4K) entered disabled state
Aug 26 02:15:30 aea3 kernel: [105974.107521] lxc-bridge: port
4(vethO59BPD) entered disabled state
Aug 26 02:15:30 aea3 kernel: [105974.109991] device vethO59BPD left
promiscuous mode
Aug 26 02:15:30 aea3 kernel: [105974.109995] lxc-bridge: port
4(vethO59BPD) entered disabled state
Aug 26 02:15:30 aea3 kernel: [105974.710490] lxc-bridge: port
4(vethBAYODL) entered blocking state
Aug 26 02:15:30 aea3 kernel: [105974.710493] lxc-bridge: port
4(vethBAYODL) entered disabled state
Aug 26 02:15:30 aea3 kernel: [105974.710545] device vethBAYODL entered
promiscuous mode
Aug 26 02:15:30 aea3 kernel: [105974.710598] IPv6:
ADDRCONF(NETDEV_UP): vethBAYODL: link is not ready
Aug 26 02:15:30 aea3 kernel: [105974.710600] lxc-bridge: port
4(vethBAYODL) entered blocking state
Aug 26 02:15:30 aea3 kernel: [105974.710601] lxc-bridge: port
4(vethBAYODL) entered forwarding state
Aug 26 02:16:35 aea3 kernel: [106039.674089] BTRFS: device fsid
5b844c7a-0cbd-40a7-a8e3-6bc636aba033 devid 1 transid 984 /dev/dm-3
Aug 26 02:17:21 aea3 kernel: [106085.352453] ata4.00: failed command: READ DMA
Aug 26 02:17:21 aea3 kernel: [106085.352901] ata4.00: status: { DRDY ERR }
Aug 26 02:18:56 aea3 kernel: [106180.648062] ata4.00: exception Emask
0x0 SAct 0x0 SErr 0x0 action 0x0
Aug 26 02:18:56 aea3 kernel: [106180.648333] ata4.00: BMDMA stat 0x25
Aug 26 02:18:56 aea3 kernel: [106180.648515] ata4.00: failed command: READ DMA
Aug 26 02:18:56 aea3 kernel: [106180.648706] ata4.00: cmd
c8/00:08:80:9c:bb/00:00:00:00:00/e3 tag 0 dma 4096 in
Aug 26 02:18:56 aea3 kernel: [106180.648706]  res
51/40:00:80:9c:bb/00:00:00:00:00/03 Emask 0x9 (media error)
Aug 26 02:18:56 aea3 kernel: [106180.649380] ata4.00: status: { DRDY ERR }
Aug 26 02:18:56 aea3 kernel: [106180.649743] ata4.00: error: { UNC }
Aug 26 02:18:56 aea3 kernel: [106180.779311] ata4.00: configured for UDMA/133
Aug 26 02:18:56 aea3 kernel: [106180.779331] sd 3:0:0:0: [sda] tag#0
FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 26 02:18:56 aea3 kernel: [106180.779335] sd 3:0:0:0: [sda] tag#0
Sense Key : Medium Error [current]
Aug 26 02:18:56 aea3 kernel: [106180.779339] sd 3:0:0:0: [sda] tag#0
Add. Sense: Unrecovered read error - auto reallocate failed
Aug 26 02:18:56 aea3 kernel: [106180.779343] sd 3:0:0:0: [sda] tag#0
CDB: Read(10) 28 00 03 bb 9c 80 00 00 08 00
Aug 26 02:18:56 aea3 kernel: [106180.779346] blk_update_request: I/O
error, dev sda, sector 62626944
Aug 26 02:18:56 aea3 kernel: [106180.779703] BTRFS error (device
dm-2): bdev /dev/mapper/master-root errs: wr 0, rd 40, flush 0,
corrupt 0, gen 0
Aug 26 02:18:56 aea3 kernel: [106180.779936] ata4: EH complete


I always saw these DRDY errors whenever I experience physical hard
drive errors, so I expect `btrfs scrub` show some kind of similar
errors but it doesn't:

btrfs scrub status /mnt/peynir/
scrub status for 8827cb0e-52d7-4f99-90fd-a975cafbfa46
scrub started at Tue Aug 28 00:43:55 2018 and finished after 00:02:07
total bytes scrubbed: 12.45GiB with 0 errors

I took new snapshots for both root and the LXC containers and nothing
gone wrong. To be confident, I reformat the swap partition (which I
saw some messages about swap partition in the crash screen).

I'm not sure how to proceed at the moment. Taking succesfull backups
made me think that everything might be okay but I'm not sure if I
should continue trusting the drive or not. What additional checks
should I perform?


btrfs-progs: btrfs-convert: unable to find block group for 0

2018-08-27 Thread Tucker Boniface
Hello, I am trying to convert an ext4 partition to btrfs using 
btrfs-convert. I am running Arch Linux with kernel 4.18.5 and 
btrfs-progs 4.17.1. The full error is inline below.


-> # btrfs-convert /dev/sda1
create btrfs filesystem:
blocksize: 4096
nodesize:  16384
features:  extref, skinny-metadata (default)
creating ext2 image file
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
extent-tree.c:2743: alloc_tree_block: BUG_ON `ret` triggered, value -28
btrfs-convert(+0x1d512)[0x564d08cd3512]
btrfs-convert(btrfs_alloc_free_block+0x1e8)[0x564d08cda188]
btrfs-convert(+0x15bd1)[0x564d08ccbbd1]
btrfs-convert(btrfs_search_slot+0xf27)[0x564d08ccd7d7]
btrfs-convert(btrfs_csum_file_block+0x499)[0x564d08cdf219]
btrfs-convert(+0xe625)[0x564d08cc4625]
btrfs-convert(main+0x1abf)[0x564d08cc3adf]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f1822576223]
btrfs-convert(_start+0x2e)[0x564d08cc41be]
Aborted


BTRFS support per-subvolume compression, isn't it?

2018-08-27 Thread Eugene Bright
Greetings!

BTRFS wiki says there is no per-subvolume compression option [1].

At the same time next command allow me to set properties per-subvolume:
btrfs property set /volume compression zstd

Corresponding get command shows distinct properties for every subvolume.
Should wiki be updated?

-- 
King regards
Eugene Bright
IT engineer


[1] 
https://btrfs.wiki.kernel.org/index.php/Compression#Can_I_set_compression_per-subvolume.3F

signature.asc
Description: This is a digitally signed message part.