Re: Full filesystem btrfs rebalance kernel panic to read-only lock

2018-11-08 Thread Qu Wenruo


On 2018/11/9 上午6:40, Pieter Maes wrote:
> Hello,
> 
> So, I've had the full disk issue, so when I tried re-balancing,
> I got a panic, that pushed filesystem read-only and I'm unable to
> balance or grow the filesystem now.
> 
> fs info:
> btrfs fi show /
> Label: none  uuid: 9b591b6b-6040-437e-9398-6883ca3bf1bb
>     Total devices 1 FS bytes used 614.94GiB
>     devid    1 size 750.00GiB used 750.00GiB path /dev/mapper/vg0-root
> 
> btrfs fi df /
> Data, single: total=740.94GiB, used=610.75GiB
> System, DUP: total=32.00MiB, used=112.00KiB
> Metadata, DUP: total=4.50GiB, used=3.94GiB

Metadata usage the the biggest problem.
It's already used up.

> GlobalReserve, single: total=512.00MiB, used=255.06MiB

And the reserved space is also been used, that's a pretty bad news.

> 
> btrfs sub list -ta /
> ID    gen    top level    path   
> --    ---    -       
> 
> btrfs --version
> btrfs-progs v4.4
> 
> Log when booting machine now from root:
> 
> 
> 
> [   54.746700] [ cut here ]
> [   54.746701] BTRFS: Transaction aborted (error -28)

Transaction can't even be done due to lack of space.

[snip]
> 
> 
> 
> When booting to a net/livecd rescue
> First I run a check with repair:
> 
> 
> 
> enabling repair mode
> Checking filesystem on /dev/vg0/root
> UUID: 9b591b6b-6040-437e-9398-6883ca3bf1bb
> checking extents
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> reset nbytes for ino 6228034 root 5

It's a minor problem.
So the fs itself is still pretty health.

> checking csums
> checking root refs
> found 664259596288 bytes used err is 0
> total csum bytes: 619404608
> total tree bytes: 4237737984
> total fs tree bytes: 1692581888
> total extent tree bytes: 1461665792
> btree space waste bytes: 945044758
> file data blocks allocated: 1568329531392
>  referenced 537131163648
> 
> 
> But then when I try to mount the fs:
> 
> 
[snip]
> 
> rescue kernel: 4.9.120
> 
> 
> 
> I've grown the blockdevice, but there is no way I can grow the fs,
> it doesn't want to mount in my rescue system, and it only mounts
> read-only when booting from it, so I can't do it from there either

Btrfs-progs could do it with some extra dirty work.
(I purposed offline device resize idea, but didn't implement it yet)

You could use this branch:
https://github.com/adam900710/btrfs-progs/tree/dirty_fix

It's a quick and dirty fix to allow "btrfs-corrupt-block -X " to
extent device size to max.

Please try above command to see if it solves your problem.

Thanks,
Qu

> 
> I hope someone can help me out with this.
> Thanks!
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Btrfs: incremental send, fix infinite loop when apply children dir moves

2018-11-08 Thread robbieko

robbieko 於 2018-11-06 20:23 寫到:

Hi,

I can reproduce the infinite loop, the following will describe the
reason and example.

Example:
tree --inodes parent/ send/
parent/
`-- [261]  261
`-- [271]  271
`-- [266]  266
|-- [259]  259
|-- [260]  260
|   `-- [267]  267
|-- [264]  264
|   `-- [258]  258
|   `-- [257]  257
|-- [265]  265
|-- [268]  268
|-- [269]  269
|   `-- [262]  262
|-- [270]  270
|-- [272]  272
|   |-- [263]  263
|   `-- [275]  275
`-- [274]  274
`-- [273]  273
send/
`-- [275]  275
`-- [274]  274
`-- [273]  273
`-- [262]  262
`-- [269]  269
`-- [258]  258
`-- [271]  271
`-- [268]  268
`-- [267]  267
`-- [270]  270
|-- [259]  259
|   `-- [265]  265
`-- [272]  272
`-- [257]  257
|-- [260]  260
`-- [264]  264
`-- [263]  263
`-- [261]  
261
`-- [
266]  266



1. While process inode 257, we delay its rename operation because inode 
272
has not been renamed (since 272 > 257, that is, beyond the current 
progress).


2. And so on (inode 258-274), we can get a bunch of waiting waiting
relationships
257 -> (wait for) 272
258 -> 269
259 -> 270
260 -> 272
261 -> 263
262 -> 274
263 -> 264
264 -> 257
265 -> 270
266 -> 263
267 -> 268
268 -> 271
269 -> 262
270 -> 271
271 -> 258
272 -> 274
274 -> 275

3. While processing inode 275, we rename ./261/271/272/275 to ./275,
and then now we start processing the waiting subdirectories in
apply_children_dir_moves.

4. We first initialize the stack into an empty list, and then we add
274 to the stack
because 274 is waiting for 275 to complete.
Every time we take the first object in the stack to process it.

5. So we can observe the change in object in the stack.
loop:
 round 1. 274
2. 262 -> 272
3. 272 -> 269
4. 269 -> 257 -> 260
5. 257 -> 260 -> 258
6. 260 -> 258 -> 264
7. 258 -> 264
8. 264 -> 271
9. 271 -> 263
  10. 263 -> 268 -> 270
  11. 268 -> 270 -> 261 -> 266
  12. 270 -> 261 -> 266 -> 267
  13. 261 -> 266 -> 267 -> 259 -> 265 (since 270 path loop, so
we add 270 waiting for 267)
  14. 266 -> 267 -> 259 -> 265
  15. 267 -> 266 -> 259 -> 265  (since 266 path loop, so we
add 266 waiting for 270, but we don't add to stack)
  16. 266 -> 259 -> 265 -> 270
  17. 266 -> 259 -> 265 -> 270  (since 266 path loop, so we
add 266 waiting for 270, but we don't add to stack)
  18. 266 -> 259 -> 265 -> 270  (since 266 path loop, so we
add 266 waiting for 270, but we don't add to stack)
  19. 266 -> 259 -> 265 -> 270  (since 266 path loop, so we
add 266 waiting for 270, but we don't add to stack)
   ... infinite loop

6. In round 13, we processing 270, we delayed the rename because 270
has a path loop with 267,
and then we add 259, 265 to the stack, but we don't remove from
pending_dir_moves rb_tree.

7. In round 15, we processing 266, we delayed the rename because 266
has a path loop with 270,
So we look for parent_ino equal to 270 from pending_dir_moves, and we
find ino 259
because it was not removed from pending_dir_moves.
Then we create a new pending_dir and join the ino 259, because the ino
259 is currently in the stack,
the new pending_dir ino 266 is also indirectly added to the stack,
placed between 267 and 259.

So we fix this problem by remove node from pending_dir_moves,
avoid add new pending_dir_move to stack list.



Does anyone have any suggestions ?
Later, I will submit the case in xfstest.



Qu Wenruo 於 2018-11-05 22:35 寫到:

On 2018/11/5 下午7:11, Filipe Manana wrote:
On Mon, Nov 5, 2018 at 4:10 AM robbieko  
wrote:


Filipe Manana 於 2018-10-30 19:36 寫到:
On Tue, Oct 30, 2018 at 7:00 AM robbieko  
wrote:


From: Robbie Ko 

In apply_children_dir_moves, we first create an empty list 
(stack),
then we get an entry from pending_dir_moves and add it to the 
stack,

but we didn't delete the entry from rb_tree.

So, in add_pending_dir_move, we create a new entry and then use 
the

parent_ino in the current rb_tree to find the 

Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread Qu Wenruo


On 2018/11/8 下午10:48, Filipe Manana wrote:
> On Thu, Nov 8, 2018 at 2:37 PM Filipe Manana  wrote:
>>
>> On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo  wrote:
>>>
>>>
>>>
>>> On 2018/11/8 下午9:17, fdman...@kernel.org wrote:
 From: Filipe Manana 

 When creating a block group we don't need to set the log for full commit
 if the new block group is not used for data. Logged items can only point
 to logical addresses of data block groups (through file extent items) so
 there is no need to for the next fsync to fallback to a transaction commit
 if the new block group is for metadata.
>>>
>>> Is it possible for the log tree blocks to be allocated in that new block
>>> group?
>>
>> Yes.
> 
> Now I realize what might be your concern, and this would cause trouble.
> Surprised this didn't trigger any problem and I had this (together
> with other changes) running tests for some weeks already.

Maybe it's related metadata chunk pre-allocation so it will be super
hard to hit in normal case, or some extent allocation policy preventing
us from allocating tree block of newly created bg.

Thanks,
Qu

> 
>>
>>>
>>> Thanks,
>>> Qu
>>>

 Signed-off-by: Filipe Manana 
 ---
  fs/btrfs/extent-tree.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 577878324799..588fbd1606fb 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct 
 btrfs_trans_handle *trans, u64 bytes_used,
   struct btrfs_block_group_cache *cache;
   int ret;

 - btrfs_set_log_full_commit(fs_info, trans);
 + if (type & BTRFS_BLOCK_GROUP_DATA)
 + btrfs_set_log_full_commit(fs_info, trans);

   cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size);
   if (!cache)

>>>



signature.asc
Description: OpenPGP digital signature


Full filesystem btrfs rebalance kernel panic to read-only lock

2018-11-08 Thread Pieter Maes
Hello,

So, I've had the full disk issue, so when I tried re-balancing,
I got a panic, that pushed filesystem read-only and I'm unable to
balance or grow the filesystem now.

fs info:
btrfs fi show /
Label: none  uuid: 9b591b6b-6040-437e-9398-6883ca3bf1bb
    Total devices 1 FS bytes used 614.94GiB
    devid    1 size 750.00GiB used 750.00GiB path /dev/mapper/vg0-root

btrfs fi df /
Data, single: total=740.94GiB, used=610.75GiB
System, DUP: total=32.00MiB, used=112.00KiB
Metadata, DUP: total=4.50GiB, used=3.94GiB
GlobalReserve, single: total=512.00MiB, used=255.06MiB

btrfs sub list -ta /
ID    gen    top level    path   
--    ---    -       

btrfs --version
btrfs-progs v4.4

Log when booting machine now from root:



[   54.746700] [ cut here ]
[   54.746701] BTRFS: Transaction aborted (error -28)
[   54.746734] WARNING: CPU: 6 PID: 481 at
/build/linux-hwe-q2wgwz/linux-hwe-4.15.0/fs/btrfs/extent-tree.c:6997
__btrfs_free_extent.isra.62+0x2a7/0xdf0 [btrfs]
[   54.746734] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
sunrpc autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0
multipath linear ast igb ttm drm_kms_helper dca i2c_algo_bit syscopyarea
sysfillrect sysimgblt raid1 fb_sys_fops bcache ahci ptp drm libahci
pps_core nvme nvme_core wmi
[   54.746748] CPU: 6 PID: 481 Comm: mount Not tainted 4.15.0-36-generic
#39~16.04.1-Ubuntu
[   54.746749] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8
Series/Z10PA-U8 Series, BIOS 3403 03/01/2017
[   54.746757] RIP: 0010:__btrfs_free_extent.isra.62+0x2a7/0xdf0 [btrfs]
[   54.746757] RSP: 0018:b9540d66b858 EFLAGS: 00010286
[   54.746758] RAX:  RBX: 019518102000 RCX:
0001
[   54.746759] RDX: 0001 RSI: 0002 RDI:
0246
[   54.746759] RBP: b9540d66b900 R08:  R09:
0026
[   54.746760] R10:  R11:  R12:
995a0b52
[   54.746760] R13: ffe4 R14: 9959f7114230 R15:
0005
[   54.746761] FS:  7f467684a840() GS:995a3f38()
knlGS:
[   54.746762] CS:  0010 DS:  ES:  CR0: 80050033
[   54.746762] CR2: 7fca430351e4 CR3: 003f6dd6a004 CR4:
001606e0
[   54.746763] Call Trace:
[   54.746768]  ? check_preempt_wakeup+0x210/0x240
[   54.746771]  ? tracing_record_taskinfo_skip+0x24/0x50
[   54.746772]  ? tracing_record_taskinfo+0x13/0x90
[   54.746780]  __btrfs_run_delayed_refs+0x322/0x11b0 [btrfs]
[   54.746782]  ? __set_page_dirty_nobuffers+0x11e/0x160
[   54.746791]  ? btree_set_page_dirty+0xe/0x10 [btrfs]
[   54.746800]  ? btrfs_mark_buffer_dirty+0x79/0xa0 [btrfs]
[   54.746808]  btrfs_run_delayed_refs+0xf6/0x1c0 [btrfs]
[   54.746817]  btrfs_truncate_inode_items+0xaf7/0x1000 [btrfs]
[   54.746825]  ? reserve_metadata_bytes+0x2e7/0xb10 [btrfs]
[   54.746835]  btrfs_evict_inode+0x47d/0x5a0 [btrfs]
[   54.746838]  evict+0xca/0x1a0
[   54.746839]  iput+0x1d2/0x220
[   54.746849]  btrfs_orphan_cleanup+0x20f/0x490 [btrfs]
[   54.746858]  btrfs_cleanup_fs_roots+0x11b/0x1c0 [btrfs]
[   54.746868]  ? lookup_extent_mapping+0x13/0x20 [btrfs]
[   54.746879]  ? btrfs_check_rw_degradable+0xf5/0x170 [btrfs]
[   54.746885]  btrfs_remount+0x2f1/0x520 [btrfs]
[   54.746887]  ? shrink_dcache_sb+0x12e/0x180
[   54.746889]  do_remount_sb+0x6d/0x1e0
[   54.746890]  do_mount+0x797/0xd00
[   54.746910]  ? memdup_user+0x4f/0x70
[   54.746912]  SyS_mount+0x95/0xe0
[   54.746914]  do_syscall_64+0x73/0x130
[   54.746916]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   54.746917] RIP: 0033:0x7f4676129b9a
[   54.746917] RSP: 002b:7ffc5d515838 EFLAGS: 0202 ORIG_RAX:
00a5
[   54.746918] RAX: ffda RBX: 00978030 RCX:
7f4676129b9a
[   54.746919] RDX: 00978210 RSI: 0097a520 RDI:
00978230
[   54.746919] RBP:  R08:  R09:
0014
[   54.746920] R10: c0ed0020 R11: 0202 R12:
00978230
[   54.746920] R13: 00978210 R14:  R15:
0003
[   54.746921] Code: 8b 45 90 48 8b 40 60 f0 0f ba a8 d8 cd 00 00 02 72
1b 41 83 fd fb 0f 84 5f 03 00 00 44 89 ee 48 c7 c7 58 76 51 c0 e8 a9 55
a2 de <0f> 0b 48 8b 7d 90 44 89 e9 ba 55 1b 00 00 48 c7 c6 80 08 51 c0
[   54.746934] ---[ end trace 18d422c4358ee800 ]---
[   54.746936] BTRFS: error (device dm-0) in __btrfs_free_extent:6997:
errno=-28 No space left
[   54.746937] BTRFS: error (device dm-0) in
btrfs_run_delayed_refs:3082: errno=-28 No space left
[   54.746976] BTRFS error (device dm-0): Error removing orphan entry,
stopping orphan cleanup
[   54.746977] BTRFS error (device dm-0): could not do orphan cleanup -22

root kernel: 4.15.0-36-generic #39~16.04.1-Ubuntu



When booting to a net/livecd rescue
First I run a check with repair:



enabling repair mode
Checking 

Re: [PATCH v15.1 03/13] btrfs: dedupe: Introduce function to add hash into in-memory tree

2018-11-08 Thread Timofey Titovets
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi :
>
> From: Wang Xiaoguang 
>
> Introduce static function inmem_add() to add hash into in-memory tree.
> And now we can implement the btrfs_dedupe_add() interface.
>
> Signed-off-by: Qu Wenruo 
> Signed-off-by: Wang Xiaoguang 
> Reviewed-by: Josef Bacik 
> Signed-off-by: Lu Fengqi 
> ---
>  fs/btrfs/dedupe.c | 150 ++
>  1 file changed, 150 insertions(+)
>
> diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
> index 06523162753d..784bb3a8a5ab 100644
> --- a/fs/btrfs/dedupe.c
> +++ b/fs/btrfs/dedupe.c
> @@ -19,6 +19,14 @@ struct inmem_hash {
> u8 hash[];
>  };
>
> +static inline struct inmem_hash *inmem_alloc_hash(u16 algo)
> +{
> +   if (WARN_ON(algo >= ARRAY_SIZE(btrfs_hash_sizes)))
> +   return NULL;
> +   return kzalloc(sizeof(struct inmem_hash) + btrfs_hash_sizes[algo],
> +   GFP_NOFS);
> +}
> +
>  static struct btrfs_dedupe_info *
>  init_dedupe_info(struct btrfs_ioctl_dedupe_args *dargs)
>  {
> @@ -167,3 +175,145 @@ int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
> /* Place holder for bisect, will be implemented in later patches */
> return 0;
>  }
> +
> +static int inmem_insert_hash(struct rb_root *root,
> +struct inmem_hash *hash, int hash_len)
> +{
> +   struct rb_node **p = >rb_node;
> +   struct rb_node *parent = NULL;
> +   struct inmem_hash *entry = NULL;
> +
> +   while (*p) {
> +   parent = *p;
> +   entry = rb_entry(parent, struct inmem_hash, hash_node);
> +   if (memcmp(hash->hash, entry->hash, hash_len) < 0)
> +   p = &(*p)->rb_left;
> +   else if (memcmp(hash->hash, entry->hash, hash_len) > 0)
> +   p = &(*p)->rb_right;
> +   else
> +   return 1;
> +   }
> +   rb_link_node(>hash_node, parent, p);
> +   rb_insert_color(>hash_node, root);
> +   return 0;
> +}
> +
> +static int inmem_insert_bytenr(struct rb_root *root,
> +  struct inmem_hash *hash)
> +{
> +   struct rb_node **p = >rb_node;
> +   struct rb_node *parent = NULL;
> +   struct inmem_hash *entry = NULL;
> +
> +   while (*p) {
> +   parent = *p;
> +   entry = rb_entry(parent, struct inmem_hash, bytenr_node);
> +   if (hash->bytenr < entry->bytenr)
> +   p = &(*p)->rb_left;
> +   else if (hash->bytenr > entry->bytenr)
> +   p = &(*p)->rb_right;
> +   else
> +   return 1;
> +   }
> +   rb_link_node(>bytenr_node, parent, p);
> +   rb_insert_color(>bytenr_node, root);
> +   return 0;
> +}
> +
> +static void __inmem_del(struct btrfs_dedupe_info *dedupe_info,
> +   struct inmem_hash *hash)
> +{
> +   list_del(>lru_list);
> +   rb_erase(>hash_node, _info->hash_root);
> +   rb_erase(>bytenr_node, _info->bytenr_root);
> +
> +   if (!WARN_ON(dedupe_info->current_nr == 0))
> +   dedupe_info->current_nr--;
> +
> +   kfree(hash);
> +}
> +
> +/*
> + * Insert a hash into in-memory dedupe tree
> + * Will remove exceeding last recent use hash.
> + *
> + * If the hash mathced with existing one, we won't insert it, to
> + * save memory
> + */
> +static int inmem_add(struct btrfs_dedupe_info *dedupe_info,
> +struct btrfs_dedupe_hash *hash)
> +{
> +   int ret = 0;
> +   u16 algo = dedupe_info->hash_algo;
> +   struct inmem_hash *ihash;
> +
> +   ihash = inmem_alloc_hash(algo);
> +
> +   if (!ihash)
> +   return -ENOMEM;
> +
> +   /* Copy the data out */
> +   ihash->bytenr = hash->bytenr;
> +   ihash->num_bytes = hash->num_bytes;
> +   memcpy(ihash->hash, hash->hash, btrfs_hash_sizes[algo]);
> +
> +   mutex_lock(_info->lock);
> +
> +   ret = inmem_insert_bytenr(_info->bytenr_root, ihash);
> +   if (ret > 0) {
> +   kfree(ihash);
> +   ret = 0;
> +   goto out;
> +   }
> +
> +   ret = inmem_insert_hash(_info->hash_root, ihash,
> +   btrfs_hash_sizes[algo]);
> +   if (ret > 0) {
> +   /*
> +* We only keep one hash in tree to save memory, so if
> +* hash conflicts, free the one to insert.
> +*/
> +   rb_erase(>bytenr_node, _info->bytenr_root);
> +   kfree(ihash);
> +   ret = 0;
> +   goto out;
> +   }
> +
> +   list_add(>lru_list, _info->lru_list);
> +   dedupe_info->current_nr++;
> +
> +   /* Remove the last dedupe hash if we exceed limit */
> +   while (dedupe_info->current_nr > dedupe_info->limit_nr) {
> +   struct inmem_hash *last;
> +
> +   last = list_entry(dedupe_info->lru_list.prev,
> + 

Re: Where is my disk space ?

2018-11-08 Thread Chris Murphy
On Thu, Nov 8, 2018 at 2:27 AM, Barbet Alain  wrote:
> Hi !
> Just to give you end of the story:
> I move my /var/lib/docker to my home (other partition), and my space
> come back ...

I'm not sure why that would matter. Both btrfs du and regular du
showed only ~350M used in /var which is about what I'd expect. And
also the 'btrfs sub list' output doesn't show any subvolumes/snapshots
for Docker. The upstream Docker behavior on Btrfs is that it uses
subvolumes and snapshots for everything, quickly you'll see a lot of
them. However many distributions override the default Docker behavior,
e.g. with Docker storage setup, and will cause it to always favor a
particular driver. For example the Docker overlay2 driver, which
leverages kernel overlayfs, which will work on any file system
including Btrfs. And I'm not exactly sure where the upper dirs are
stored, but I'd be surprised if they're not in /var.

Anyway, if you're using Docker, moving stuff around will almost
certainly break it. And as I'm an extreme expert in messing up Docker
storage, I can vouch for the strategy of stopping the docker daemon,
recursively deleting everything in /var/lib/docker/ and then starting
Docker. Now you get to go fetch all your images again. And anyway, you
shouldn't be storing any data in the containers, they should be
throwaway things, important data should be stored elsewhere including
any state information for the container. :-D Avoid container misery by
having a workflow that expects containers to be transient disposable
objects.


-- 
Chris Murphy


Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread Filipe Manana
On Thu, Nov 8, 2018 at 2:37 PM Filipe Manana  wrote:
>
> On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo  wrote:
> >
> >
> >
> > On 2018/11/8 下午9:17, fdman...@kernel.org wrote:
> > > From: Filipe Manana 
> > >
> > > When creating a block group we don't need to set the log for full commit
> > > if the new block group is not used for data. Logged items can only point
> > > to logical addresses of data block groups (through file extent items) so
> > > there is no need to for the next fsync to fallback to a transaction commit
> > > if the new block group is for metadata.
> >
> > Is it possible for the log tree blocks to be allocated in that new block
> > group?
>
> Yes.

Now I realize what might be your concern, and this would cause trouble.
Surprised this didn't trigger any problem and I had this (together
with other changes) running tests for some weeks already.

>
> >
> > Thanks,
> > Qu
> >
> > >
> > > Signed-off-by: Filipe Manana 
> > > ---
> > >  fs/btrfs/extent-tree.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index 577878324799..588fbd1606fb 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct 
> > > btrfs_trans_handle *trans, u64 bytes_used,
> > >   struct btrfs_block_group_cache *cache;
> > >   int ret;
> > >
> > > - btrfs_set_log_full_commit(fs_info, trans);
> > > + if (type & BTRFS_BLOCK_GROUP_DATA)
> > > + btrfs_set_log_full_commit(fs_info, trans);
> > >
> > >   cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size);
> > >   if (!cache)
> > >
> >


Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread Filipe Manana
On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo  wrote:
>
>
>
> On 2018/11/8 下午9:17, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When creating a block group we don't need to set the log for full commit
> > if the new block group is not used for data. Logged items can only point
> > to logical addresses of data block groups (through file extent items) so
> > there is no need to for the next fsync to fallback to a transaction commit
> > if the new block group is for metadata.
>
> Is it possible for the log tree blocks to be allocated in that new block
> group?

Yes.

>
> Thanks,
> Qu
>
> >
> > Signed-off-by: Filipe Manana 
> > ---
> >  fs/btrfs/extent-tree.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 577878324799..588fbd1606fb 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct 
> > btrfs_trans_handle *trans, u64 bytes_used,
> >   struct btrfs_block_group_cache *cache;
> >   int ret;
> >
> > - btrfs_set_log_full_commit(fs_info, trans);
> > + if (type & BTRFS_BLOCK_GROUP_DATA)
> > + btrfs_set_log_full_commit(fs_info, trans);
> >
> >   cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size);
> >   if (!cache)
> >
>


Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread Qu Wenruo


On 2018/11/8 下午9:17, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When creating a block group we don't need to set the log for full commit
> if the new block group is not used for data. Logged items can only point
> to logical addresses of data block groups (through file extent items) so
> there is no need to for the next fsync to fallback to a transaction commit
> if the new block group is for metadata.

Is it possible for the log tree blocks to be allocated in that new block
group?

Thanks,
Qu

> 
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/extent-tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 577878324799..588fbd1606fb 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
> *trans, u64 bytes_used,
>   struct btrfs_block_group_cache *cache;
>   int ret;
>  
> - btrfs_set_log_full_commit(fs_info, trans);
> + if (type & BTRFS_BLOCK_GROUP_DATA)
> + btrfs_set_log_full_commit(fs_info, trans);
>  
>   cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size);
>   if (!cache)
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: Check for missing device before bio submission in btrfs_map_bio

2018-11-08 Thread Nikolay Borisov
Before btrfs_map_bio submits all stripe bio it does a number of checks
to ensure the device for every stripe is present. However, it doesn't
do a DEV_STATE_MISSING check, instead this is relegated to the lower
level btrfs_schedule_bio (in the async submission case, sync submission
doesn't check DEV_STATE_MISSING at all). Additionally
btrfs_schedule_bios does the duplicate device->bdev check which has
already been performed in btrfs_map_bio.

This patch moves the DEV_STATE_MISSING check in btrfs_map_bio and
removes the duplicate device->bdev check. Doing so ensures that no bio
cloning/submission happens for both async/sync requests in the face of
missing device. This makes the async io submission path slightly shorter
in terms of instruction count. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 44c5e8ccb644..3312cad65209 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6106,12 +6106,6 @@ static noinline void btrfs_schedule_bio(struct 
btrfs_device *device,
int should_queue = 1;
struct btrfs_pending_bios *pending_bios;
 
-   if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state) ||
-   !device->bdev) {
-   bio_io_error(bio);
-   return;
-   }
-
/* don't bother with additional async steps for reads, right now */
if (bio_op(bio) == REQ_OP_READ) {
btrfsic_submit_bio(bio);
@@ -6240,7 +6234,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, 
struct bio *bio,
 
for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
dev = bbio->stripes[dev_nr].dev;
-   if (!dev || !dev->bdev ||
+   if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
+  >dev_state) ||
(bio_op(first_bio) == REQ_OP_WRITE &&
!test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))) {
bbio_error(bbio, first_bio, logical);
-- 
2.17.1



[PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread fdmanana
From: Filipe Manana 

When creating a block group we don't need to set the log for full commit
if the new block group is not used for data. Logged items can only point
to logical addresses of data block groups (through file extent items) so
there is no need to for the next fsync to fallback to a transaction commit
if the new block group is for metadata.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..588fbd1606fb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans, u64 bytes_used,
struct btrfs_block_group_cache *cache;
int ret;
 
-   btrfs_set_log_full_commit(fs_info, trans);
+   if (type & BTRFS_BLOCK_GROUP_DATA)
+   btrfs_set_log_full_commit(fs_info, trans);
 
cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size);
if (!cache)
-- 
2.11.0



Re: [PATCH v15.1 02/13] btrfs: dedupe: Introduce function to initialize dedupe info

2018-11-08 Thread Timofey Titovets
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi :
>
> From: Wang Xiaoguang 
>
> Add generic function to initialize dedupe info.
>
> Signed-off-by: Qu Wenruo 
> Signed-off-by: Wang Xiaoguang 
> Reviewed-by: Josef Bacik 
> Signed-off-by: Lu Fengqi 
> ---
>  fs/btrfs/Makefile  |   2 +-
>  fs/btrfs/dedupe.c  | 169 +
>  fs/btrfs/dedupe.h  |  12 +++
>  include/uapi/linux/btrfs.h |   3 +
>  4 files changed, 185 insertions(+), 1 deletion(-)
>  create mode 100644 fs/btrfs/dedupe.c
>
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index ca693dd554e9..78fdc87dba39 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -10,7 +10,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
> root-tree.o dir-item.o \
>export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
>compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
>reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
> -  uuid-tree.o props.o free-space-tree.o tree-checker.o
> +  uuid-tree.o props.o free-space-tree.o tree-checker.o dedupe.o
>
>  btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>  btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
> new file mode 100644
> index ..06523162753d
> --- /dev/null
> +++ b/fs/btrfs/dedupe.c
> @@ -0,0 +1,169 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2016 Fujitsu.  All rights reserved.
> + */
> +
> +#include "ctree.h"
> +#include "dedupe.h"
> +#include "btrfs_inode.h"
> +#include "delayed-ref.h"
> +
> +struct inmem_hash {
> +   struct rb_node hash_node;
> +   struct rb_node bytenr_node;
> +   struct list_head lru_list;
> +
> +   u64 bytenr;
> +   u32 num_bytes;
> +
> +   u8 hash[];
> +};
> +
> +static struct btrfs_dedupe_info *
> +init_dedupe_info(struct btrfs_ioctl_dedupe_args *dargs)
> +{
> +   struct btrfs_dedupe_info *dedupe_info;
> +
> +   dedupe_info = kzalloc(sizeof(*dedupe_info), GFP_NOFS);
> +   if (!dedupe_info)
> +   return ERR_PTR(-ENOMEM);
> +
> +   dedupe_info->hash_algo = dargs->hash_algo;
> +   dedupe_info->backend = dargs->backend;
> +   dedupe_info->blocksize = dargs->blocksize;
> +   dedupe_info->limit_nr = dargs->limit_nr;
> +
> +   /* only support SHA256 yet */
> +   dedupe_info->dedupe_driver = crypto_alloc_shash("sha256", 0, 0);
> +   if (IS_ERR(dedupe_info->dedupe_driver)) {
> +   kfree(dedupe_info);
> +   return ERR_CAST(dedupe_info->dedupe_driver);
> +   }
> +
> +   dedupe_info->hash_root = RB_ROOT;
> +   dedupe_info->bytenr_root = RB_ROOT;
> +   dedupe_info->current_nr = 0;
> +   INIT_LIST_HEAD(_info->lru_list);
> +   mutex_init(_info->lock);
> +
> +   return dedupe_info;
> +}
> +
> +/*
> + * Helper to check if parameters are valid.
> + * The first invalid field will be set to (-1), to info user which parameter
> + * is invalid.
> + * Except dargs->limit_nr or dargs->limit_mem, in that case, 0 will returned
> + * to info user, since user can specify any value to limit, except 0.
> + */
> +static int check_dedupe_parameter(struct btrfs_fs_info *fs_info,
> + struct btrfs_ioctl_dedupe_args *dargs)
> +{
> +   u64 blocksize = dargs->blocksize;
> +   u64 limit_nr = dargs->limit_nr;
> +   u64 limit_mem = dargs->limit_mem;
> +   u16 hash_algo = dargs->hash_algo;
> +   u8 backend = dargs->backend;
> +
> +   /*
> +* Set all reserved fields to -1, allow user to detect
> +* unsupported optional parameters.
> +*/
> +   memset(dargs->__unused, -1, sizeof(dargs->__unused));
> +   if (blocksize > BTRFS_DEDUPE_BLOCKSIZE_MAX ||
> +   blocksize < BTRFS_DEDUPE_BLOCKSIZE_MIN ||
> +   blocksize < fs_info->sectorsize ||
> +   !is_power_of_2(blocksize) ||
> +   blocksize < PAGE_SIZE) {
> +   dargs->blocksize = (u64)-1;
> +   return -EINVAL;
> +   }
> +   if (hash_algo >= ARRAY_SIZE(btrfs_hash_sizes)) {
> +   dargs->hash_algo = (u16)-1;
> +   return -EINVAL;
> +   }
> +   if (backend >= BTRFS_DEDUPE_BACKEND_COUNT) {
> +   dargs->backend = (u8)-1;
> +   return -EINVAL;
> +   }
> +
> +   /* Backend specific check */
> +   if (backend == BTRFS_DEDUPE_BACKEND_INMEMORY) {
> +   /* only one limit is accepted for enable*/
> +   if (dargs->limit_nr && dargs->limit_mem) {
> +   dargs->limit_nr = 0;
> +   dargs->limit_mem = 0;
> +   return -EINVAL;
> +   }
> +
> +   if (!limit_nr && !limit_mem)
> +   dargs->limit_nr = BTRFS_DEDUPE_LIMIT_NR_DEFAULT;
> +   else {
> +   u64 tmp = (u64)-1;

Re: [PATCH -next] btrfs: remove set but not used variable 'tree'

2018-11-08 Thread David Sterba
On Thu, Nov 08, 2018 at 02:14:43AM +, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> fs/btrfs/extent_io.c: In function 'end_extent_writepage':
> fs/btrfs/extent_io.c:2406:25: warning:
>  variable 'tree' set but not used [-Wunused-but-set-variable]
> 
> It not used any more after
> commit 2922040236f9 ("btrfs: Remove extent_io_ops::writepage_end_io_hook")
> 
> Signed-off-by: YueHaibing 

Thanks, the patches are still out of mainline so the commit id is not
stable and I can fold in the fixup. Same for the other one.


Re: [PATCH v15.1 01/13] btrfs: dedupe: Introduce dedupe framework and its header

2018-11-08 Thread Timofey Titovets
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi :
>
> From: Wang Xiaoguang 
>
> Introduce the header for btrfs in-band(write time) de-duplication
> framework and needed header.
>
> The new de-duplication framework is going to support 2 different dedupe
> methods and 1 dedupe hash.
>
> Signed-off-by: Qu Wenruo 
> Signed-off-by: Wang Xiaoguang 
> Signed-off-by: Lu Fengqi 
> ---
>  fs/btrfs/ctree.h   |   7 ++
>  fs/btrfs/dedupe.h  | 128 -
>  fs/btrfs/disk-io.c |   1 +
>  include/uapi/linux/btrfs.h |  34 ++
>  4 files changed, 168 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 80953528572d..910050d904ef 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1118,6 +1118,13 @@ struct btrfs_fs_info {
> spinlock_t ref_verify_lock;
> struct rb_root block_tree;
>  #endif
> +
> +   /*
> +* Inband de-duplication related structures
> +*/
> +   unsigned long dedupe_enabled:1;
> +   struct btrfs_dedupe_info *dedupe_info;
> +   struct mutex dedupe_ioctl_lock;
>  };
>
>  static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
> diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h
> index 90281a7a35a8..222ce7b4d827 100644
> --- a/fs/btrfs/dedupe.h
> +++ b/fs/btrfs/dedupe.h
> @@ -6,7 +6,131 @@
>  #ifndef BTRFS_DEDUPE_H
>  #define BTRFS_DEDUPE_H
>
> -/* later in-band dedupe will expand this struct */
> -struct btrfs_dedupe_hash;
> +#include 
>
> +/* 32 bytes for SHA256 */
> +static const int btrfs_hash_sizes[] = { 32 };
> +
> +/*
> + * For caller outside of dedupe.c
> + *
> + * Different dedupe backends should have their own hash structure
> + */
> +struct btrfs_dedupe_hash {
> +   u64 bytenr;
> +   u32 num_bytes;
> +
> +   /* last field is a variable length array of dedupe hash */
> +   u8 hash[];
> +};
> +
> +struct btrfs_dedupe_info {
> +   /* dedupe blocksize */
> +   u64 blocksize;
> +   u16 backend;
> +   u16 hash_algo;
> +
> +   struct crypto_shash *dedupe_driver;
> +
> +   /*
> +* Use mutex to portect both backends
> +* Even for in-memory backends, the rb-tree can be quite large,
> +* so mutex is better for such use case.
> +*/
> +   struct mutex lock;
> +
> +   /* following members are only used in in-memory backend */
> +   struct rb_root hash_root;
> +   struct rb_root bytenr_root;
> +   struct list_head lru_list;
> +   u64 limit_nr;
> +   u64 current_nr;
> +};
> +
> +static inline int btrfs_dedupe_hash_hit(struct btrfs_dedupe_hash *hash)
> +{
> +   return (hash && hash->bytenr);
> +}
> +
> +/*
> + * Initial inband dedupe info
> + * Called at dedupe enable time.
> + *
> + * Return 0 for success
> + * Return <0 for any error
> + * (from unsupported param to tree creation error for some backends)
> + */
> +int btrfs_dedupe_enable(struct btrfs_fs_info *fs_info,
> +   struct btrfs_ioctl_dedupe_args *dargs);
> +
> +/*
> + * Disable dedupe and invalidate all its dedupe data.
> + * Called at dedupe disable time.
> + *
> + * Return 0 for success
> + * Return <0 for any error
> + * (tree operation error for some backends)
> + */
> +int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info);
> +
> +/*
> + * Get current dedupe status.
> + * Return 0 for success
> + * No possible error yet
> + */
> +void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
> +struct btrfs_ioctl_dedupe_args *dargs);
> +
> +/*
> + * Calculate hash for dedupe.
> + * Caller must ensure [start, start + dedupe_bs) has valid data.
> + *
> + * Return 0 for success
> + * Return <0 for any error
> + * (error from hash codes)
> + */
> +int btrfs_dedupe_calc_hash(struct btrfs_fs_info *fs_info,
> +  struct inode *inode, u64 start,
> +  struct btrfs_dedupe_hash *hash);
> +
> +/*
> + * Search for duplicated extents by calculated hash
> + * Caller must call btrfs_dedupe_calc_hash() first to get the hash.
> + *
> + * @inode: the inode for we are writing
> + * @file_pos: offset inside the inode
> + * As we will increase extent ref immediately after a hash match,
> + * we need @file_pos and @inode in this case.
> + *
> + * Return > 0 for a hash match, and the extent ref will be
> + * *INCREASED*, and hash->bytenr/num_bytes will record the existing
> + * extent data.
> + * Return 0 for a hash miss. Nothing is done
> + * Return <0 for any error
> + * (tree operation error for some backends)
> + */
> +int btrfs_dedupe_search(struct btrfs_fs_info *fs_info,
> +   struct inode *inode, u64 file_pos,
> +   struct btrfs_dedupe_hash *hash);
> +
> +/*
> + * Add a dedupe hash into dedupe info
> + * Return 0 for success
> + * Return <0 for any error
> + * (tree operation error for some backends)
> + */
> +int btrfs_dedupe_add(struct btrfs_fs_info *fs_info,
> +

Re: Where is my disk space ?

2018-11-08 Thread Barbet Alain
Hi !
Just to give you end of the story:
I move my /var/lib/docker to my home (other partition), and my space
come back ...
I let docker here & don't try to put it again on / to see if problem come back.
Le mer. 31 oct. 2018 à 08:34, Barbet Alain  a écrit :
>
> > Also, since you don't have any snapshots, you could also find this
> > conventionally:
> >
> > # du -sh /*
>
>
> Usually yes, but here not. It's just like when you remove a file when
> a process still use it and write in it, and fcsk will be not happy
> next time.
> But I reboot & check with btrfs check withtout any issue :-/
>
> alian@alian:/> sudo btrfs fi du -s *
>  Total   Exclusive  Set shared  Filename
>1.58MiB 1.58MiB   0.00B  bin
>   42.69MiB42.69MiB   0.00B  boot
>   14.78MiB14.78MiB   0.00B  etc
>  532.40MiB   532.40MiB   0.00B  lib
>9.88MiB 9.88MiB   0.00B  lib64
>  0.00B   0.00B   0.00B  mnt
>   23.96MiB23.96MiB   0.00B  opt
>  128.00KiB   128.00KiB   0.00B  root
>9.74MiB 9.74MiB   0.00B  sbin
>  0.00B   0.00B   0.00B  selinux
>  0.00B   0.00B   0.00B  srv
>   15.92MiB15.92MiB   0.00B  tmp
>4.86GiB 4.86GiB   0.00B  usr
>  345.65MiB   345.65MiB   0.00B  var
>
> alian@alian:~> sudo du --exclude /home -sh /*
> 2,1M/bin
> 43M /boot
> 0   /dev
> 20M /etc
> 534M/lib
> 11M /lib64
> 0   /mnt
> 24M /opt
> 0   /proc
> 172K/root
> 18M /run
> 11M /sbin
> 0   /selinux
> 0   /srv
> 0   /sys
> 16M /tmp
> 5,2G/usr
> 355M/var


Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state

2018-11-08 Thread Anand Jain




On 11/08/2018 04:52 PM, Nikolay Borisov wrote:



On 8.11.18 г. 10:33 ч., Anand Jain wrote:



On 11/07/2018 08:19 PM, Nikolay Borisov wrote:



On 7.11.18 г. 13:43 ч., Anand Jain wrote:

+    /* scrub for replace must not be running in suspended state */
+    if (btrfs_scrub_cancel(fs_info) != -ENOTCONN)
+    ASSERT(0);


ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN)



There will be substantial difference in code when compiled with and
without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info)
won't be run at all,  I would like to keep it as it is.


Fair point, in that case do:

ret = btrfs_scrub_cancel(fs_info);
ASSERT(ret != -ENOTCONN);


Fixed.

Thanks, Anand


result




[1]
--
./fs/btrfs/ctree.h
#ifdef CONFIG_BTRFS_ASSERT

__cold
static inline void assfail(const char *expr, const char *file, int line)
{
     pr_err("assertion failed: %s, file: %s, line: %d\n",
    expr, file, line);
     BUG();
}

#define ASSERT(expr)    \
     (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
#else
#define ASSERT(expr)    ((void)0)
#endif
---

Thanks, Anand




Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state

2018-11-08 Thread Nikolay Borisov



On 8.11.18 г. 10:33 ч., Anand Jain wrote:
> 
> 
> On 11/07/2018 08:19 PM, Nikolay Borisov wrote:
>>
>>
>> On 7.11.18 г. 13:43 ч., Anand Jain wrote:
>>> +    /* scrub for replace must not be running in suspended state */
>>> +    if (btrfs_scrub_cancel(fs_info) != -ENOTCONN)
>>> +    ASSERT(0);
>>
>> ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN)
>>
> 
> There will be substantial difference in code when compiled with and
> without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info)
> won't be run at all,  I would like to keep it as it is.

Fair point, in that case do:

ret = btrfs_scrub_cancel(fs_info);
ASSERT(ret != -ENOTCONN);

result


> 
> [1]
> --
> ./fs/btrfs/ctree.h
> #ifdef CONFIG_BTRFS_ASSERT
> 
> __cold
> static inline void assfail(const char *expr, const char *file, int line)
> {
>     pr_err("assertion failed: %s, file: %s, line: %d\n",
>    expr, file, line);
>     BUG();
> }
> 
> #define ASSERT(expr)    \
>     (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
> #else
> #define ASSERT(expr)    ((void)0)
> #endif
> ---
> 
> Thanks, Anand
> 
> 


Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state

2018-11-08 Thread Anand Jain




On 11/07/2018 08:19 PM, Nikolay Borisov wrote:



On 7.11.18 г. 13:43 ч., Anand Jain wrote:

+   /* scrub for replace must not be running in suspended state */
+   if (btrfs_scrub_cancel(fs_info) != -ENOTCONN)
+   ASSERT(0);


ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN)



There will be substantial difference in code when compiled with and 
without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info)

won't be run at all,  I would like to keep it as it is.

[1]
--
./fs/btrfs/ctree.h
#ifdef CONFIG_BTRFS_ASSERT

__cold
static inline void assfail(const char *expr, const char *file, int line)
{
pr_err("assertion failed: %s, file: %s, line: %d\n",
   expr, file, line);
BUG();
}

#define ASSERT(expr)\
(likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
#else
#define ASSERT(expr)((void)0)
#endif
---

Thanks, Anand




[PATCH 0/3] Cleanups following optional extent_io_ops callbacks removal

2018-11-08 Thread Nikolay Borisov
Here are 3 minor patches that further clean up writepage_delalloc. The first
one moves the extent locked check in the caller of writepage_delalloc since
this seems more natural. This paves the way for the second patch which removes
epd as an argument to writepage_delalloc. The final patch was suggested by 
Josef and removes an extent_state argument which has never been used. 

Nikolay Borisov (3):
  btrfs: Move epd::extent_locked check to writepage_delalloc's caller
  btrfs: Remove extent_page_data argument from writepage_delalloc
  btrfs: Remove unused extent_state argument from
btrfs_writepage_endio_finish_ordered

 fs/btrfs/compression.c |  6 --
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/extent_io.c   | 36 +---
 fs/btrfs/inode.c   |  7 +++
 4 files changed, 26 insertions(+), 27 deletions(-)

-- 
2.17.1



[PATCH 3/3] btrfs: Remove unused extent_state argument from btrfs_writepage_endio_finish_ordered

2018-11-08 Thread Nikolay Borisov
This parameter was never used, yet was part of the interface of the
function ever since its introduction as extent_io_ops::writepage_end_io_hook
in e6dcd2dc9c48 ("Btrfs: New data=ordered implementation"). Now that
NULL is passed everywhere as a value for this parameter let's remove it
for good. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/compression.c |  6 --
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/extent_io.c   | 12 +---
 fs/btrfs/inode.c   |  7 +++
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index bde8d0487bbb..717d9300dd18 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -251,8 +251,10 @@ static void end_compressed_bio_write(struct bio *bio)
tree = _I(inode)->io_tree;
cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0],
-   cb->start, cb->start + cb->len - 1, NULL,
-   bio->bi_status ? BLK_STS_OK : BLK_STS_NOTSUPP);
+cb->start,
+cb->start + cb->len - 1,
+bio->bi_status ?
+BLK_STS_OK : BLK_STS_NOTSUPP);
cb->compressed_pages[0]->mapping = NULL;
 
end_compressed_writeback(inode, cb);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b41ec42f405..c48fcaf4004d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3179,7 +3179,7 @@ int btrfs_create_subvol_root(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *new_root,
 struct btrfs_root *parent_root,
 u64 new_dirid);
-void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state,
+ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state 
*state,
   unsigned *bits);
 void btrfs_clear_delalloc_extent(struct inode *inode,
 struct extent_state *state, unsigned *bits);
@@ -3231,7 +3231,7 @@ int btrfs_run_delalloc_range(void *private_data, struct 
page *locked_page,
struct writeback_control *wbc);
 int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
 void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
-   u64 end, struct extent_state *state, int uptodate);
+ u64 end, int uptodate);
 extern const struct dentry_operations btrfs_dentry_operations;
 
 /* ioctl.c */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cca9d3cbe74a..f3bf7f9c13c0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2408,7 +2408,7 @@ void end_extent_writepage(struct page *page, int err, u64 
start, u64 end)
 
tree = _I(page->mapping->host)->io_tree;
 
-   btrfs_writepage_endio_finish_ordered(page, start, end, NULL, uptodate);
+   btrfs_writepage_endio_finish_ordered(page, start, end, uptodate);
 
if (!uptodate) {
ClearPageUptodate(page);
@@ -3329,8 +3329,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 
end = page_end;
if (i_size <= start) {
-   btrfs_writepage_endio_finish_ordered(page, start, page_end,
-NULL, 1);
+   btrfs_writepage_endio_finish_ordered(page, start, page_end, 1);
goto done;
}
 
@@ -3342,7 +3341,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 
if (cur >= i_size) {
btrfs_writepage_endio_finish_ordered(page, cur,
-page_end, NULL, 1);
+page_end, 1);
break;
}
em = btrfs_get_extent(BTRFS_I(inode), page, pg_offset, cur,
@@ -3379,7 +3378,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
if (!compressed)
btrfs_writepage_endio_finish_ordered(page, cur,
cur + iosize - 1,
-   NULL, 1);
+   1);
else if (compressed) {
/* we don't want to end_page_writeback on
 * a compressed extent.  this happens
@@ -4066,8 +4065,7 @@ int extent_write_locked_range(struct inode *inode, u64 
start, u64 end,
ret = __extent_writepage(page, _writepages, );
else {
btrfs_writepage_endio_finish_ordered(page, start,

[PATCH 1/3] btrfs: Move epd::extent_locked check to writepage_delalloc's caller

2018-11-08 Thread Nikolay Borisov
If epd::extent_locked is set then writepage_delalloc terminates. Make
this a bit more apparent in the caller by simply bubbling the check up.
This enables to remove epd as an argument to writepage_delalloc in a
future patch. No functional change.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 46c299560f4f..e1ce07b2d33a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3215,8 +3215,6 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
int ret;
int page_started = 0;
 
-   if (epd->extent_locked)
-   return 0;
 
while (delalloc_end < page_end) {
nr_delalloc = find_lock_delalloc_range(inode, tree,
@@ -3472,11 +3470,14 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
 
set_page_extent_mapped(page);
 
-   ret = writepage_delalloc(inode, page, wbc, epd, start, _written);
-   if (ret == 1)
-   goto done_unlocked;
-   if (ret)
-   goto done;
+   if (!epd->extent_locked) {
+   ret = writepage_delalloc(inode, page, wbc, epd, start,
+_written);
+   if (ret == 1)
+   goto done_unlocked;
+   if (ret)
+   goto done;
+   }
 
ret = __extent_writepage_io(inode, page, wbc, epd,
i_size, nr_written, write_flags, );
-- 
2.17.1



[PATCH 2/3] btrfs: Remove extent_page_data argument from writepage_delalloc

2018-11-08 Thread Nikolay Borisov
The only remaining use of the 'epd' argument in writepage_delalloc is
to reference the extent_io_tree which was set in extent_writepages. Since
it is guaranteed that page->mapping of any page passed to
writepage_delalloc (and __extent_writepage as the sole caller) to be
equal to that passed in extent_writepages we can directly get the
io_tree via the already passed inode (which is also taken from
page->mapping->host). No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e1ce07b2d33a..cca9d3cbe74a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3202,12 +3202,12 @@ static void update_nr_written(struct writeback_control 
*wbc,
  * This returns < 0 if there were errors (page still locked)
  */
 static noinline_for_stack int writepage_delalloc(struct inode *inode,
- struct page *page, struct writeback_control *wbc,
- struct extent_page_data *epd,
- u64 delalloc_start,
- unsigned long *nr_written)
+struct page *page,
+struct writeback_control *wbc,
+u64 delalloc_start,
+unsigned long *nr_written)
 {
-   struct extent_io_tree *tree = epd->tree;
+   struct extent_io_tree *tree = _I(inode)->io_tree;
u64 page_end = delalloc_start + PAGE_SIZE - 1;
u64 nr_delalloc;
u64 delalloc_to_write = 0;
@@ -3471,8 +3471,7 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
set_page_extent_mapped(page);
 
if (!epd->extent_locked) {
-   ret = writepage_delalloc(inode, page, wbc, epd, start,
-_written);
+   ret = writepage_delalloc(inode, page, wbc, start, _written);
if (ret == 1)
goto done_unlocked;
if (ret)
-- 
2.17.1



Re: [PATCH 7/9] btrfs: quiten warn if the replace is canceled at finish

2018-11-08 Thread Anand Jain




On 11/07/2018 08:17 PM, Nikolay Borisov wrote:



On 7.11.18 г. 13:43 ч., Anand Jain wrote:

-   WARN_ON(ret);
+   if (ret != -ECANCELED)
+   WARN_ON(ret);


WARN_ON(ret && ret != -ECANCELED)



Will fix.
Thanks, Anand