Re: Full filesystem btrfs rebalance kernel panic to read-only lock
On 2018/11/9 上午6:40, Pieter Maes wrote: > Hello, > > So, I've had the full disk issue, so when I tried re-balancing, > I got a panic, that pushed filesystem read-only and I'm unable to > balance or grow the filesystem now. > > fs info: > btrfs fi show / > Label: none uuid: 9b591b6b-6040-437e-9398-6883ca3bf1bb > Total devices 1 FS bytes used 614.94GiB > devid 1 size 750.00GiB used 750.00GiB path /dev/mapper/vg0-root > > btrfs fi df / > Data, single: total=740.94GiB, used=610.75GiB > System, DUP: total=32.00MiB, used=112.00KiB > Metadata, DUP: total=4.50GiB, used=3.94GiB Metadata usage the the biggest problem. It's already used up. > GlobalReserve, single: total=512.00MiB, used=255.06MiB And the reserved space is also been used, that's a pretty bad news. > > btrfs sub list -ta / > ID gen top level path > -- --- - > > btrfs --version > btrfs-progs v4.4 > > Log when booting machine now from root: > > > > [ 54.746700] [ cut here ] > [ 54.746701] BTRFS: Transaction aborted (error -28) Transaction can't even be done due to lack of space. [snip] > > > > When booting to a net/livecd rescue > First I run a check with repair: > > > > enabling repair mode > Checking filesystem on /dev/vg0/root > UUID: 9b591b6b-6040-437e-9398-6883ca3bf1bb > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > reset nbytes for ino 6228034 root 5 It's a minor problem. So the fs itself is still pretty health. > checking csums > checking root refs > found 664259596288 bytes used err is 0 > total csum bytes: 619404608 > total tree bytes: 4237737984 > total fs tree bytes: 1692581888 > total extent tree bytes: 1461665792 > btree space waste bytes: 945044758 > file data blocks allocated: 1568329531392 > referenced 537131163648 > > > But then when I try to mount the fs: > > [snip] > > rescue kernel: 4.9.120 > > > > I've grown the blockdevice, but there is no way I can grow the fs, > it doesn't want to mount in my rescue system, and it only mounts > read-only when booting from it, so I can't do it from there either Btrfs-progs could do it with some extra dirty work. (I purposed offline device resize idea, but didn't implement it yet) You could use this branch: https://github.com/adam900710/btrfs-progs/tree/dirty_fix It's a quick and dirty fix to allow "btrfs-corrupt-block -X " to extent device size to max. Please try above command to see if it solves your problem. Thanks, Qu > > I hope someone can help me out with this. > Thanks! > signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: incremental send, fix infinite loop when apply children dir moves
robbieko 於 2018-11-06 20:23 寫到: Hi, I can reproduce the infinite loop, the following will describe the reason and example. Example: tree --inodes parent/ send/ parent/ `-- [261] 261 `-- [271] 271 `-- [266] 266 |-- [259] 259 |-- [260] 260 | `-- [267] 267 |-- [264] 264 | `-- [258] 258 | `-- [257] 257 |-- [265] 265 |-- [268] 268 |-- [269] 269 | `-- [262] 262 |-- [270] 270 |-- [272] 272 | |-- [263] 263 | `-- [275] 275 `-- [274] 274 `-- [273] 273 send/ `-- [275] 275 `-- [274] 274 `-- [273] 273 `-- [262] 262 `-- [269] 269 `-- [258] 258 `-- [271] 271 `-- [268] 268 `-- [267] 267 `-- [270] 270 |-- [259] 259 | `-- [265] 265 `-- [272] 272 `-- [257] 257 |-- [260] 260 `-- [264] 264 `-- [263] 263 `-- [261] 261 `-- [ 266] 266 1. While process inode 257, we delay its rename operation because inode 272 has not been renamed (since 272 > 257, that is, beyond the current progress). 2. And so on (inode 258-274), we can get a bunch of waiting waiting relationships 257 -> (wait for) 272 258 -> 269 259 -> 270 260 -> 272 261 -> 263 262 -> 274 263 -> 264 264 -> 257 265 -> 270 266 -> 263 267 -> 268 268 -> 271 269 -> 262 270 -> 271 271 -> 258 272 -> 274 274 -> 275 3. While processing inode 275, we rename ./261/271/272/275 to ./275, and then now we start processing the waiting subdirectories in apply_children_dir_moves. 4. We first initialize the stack into an empty list, and then we add 274 to the stack because 274 is waiting for 275 to complete. Every time we take the first object in the stack to process it. 5. So we can observe the change in object in the stack. loop: round 1. 274 2. 262 -> 272 3. 272 -> 269 4. 269 -> 257 -> 260 5. 257 -> 260 -> 258 6. 260 -> 258 -> 264 7. 258 -> 264 8. 264 -> 271 9. 271 -> 263 10. 263 -> 268 -> 270 11. 268 -> 270 -> 261 -> 266 12. 270 -> 261 -> 266 -> 267 13. 261 -> 266 -> 267 -> 259 -> 265 (since 270 path loop, so we add 270 waiting for 267) 14. 266 -> 267 -> 259 -> 265 15. 267 -> 266 -> 259 -> 265 (since 266 path loop, so we add 266 waiting for 270, but we don't add to stack) 16. 266 -> 259 -> 265 -> 270 17. 266 -> 259 -> 265 -> 270 (since 266 path loop, so we add 266 waiting for 270, but we don't add to stack) 18. 266 -> 259 -> 265 -> 270 (since 266 path loop, so we add 266 waiting for 270, but we don't add to stack) 19. 266 -> 259 -> 265 -> 270 (since 266 path loop, so we add 266 waiting for 270, but we don't add to stack) ... infinite loop 6. In round 13, we processing 270, we delayed the rename because 270 has a path loop with 267, and then we add 259, 265 to the stack, but we don't remove from pending_dir_moves rb_tree. 7. In round 15, we processing 266, we delayed the rename because 266 has a path loop with 270, So we look for parent_ino equal to 270 from pending_dir_moves, and we find ino 259 because it was not removed from pending_dir_moves. Then we create a new pending_dir and join the ino 259, because the ino 259 is currently in the stack, the new pending_dir ino 266 is also indirectly added to the stack, placed between 267 and 259. So we fix this problem by remove node from pending_dir_moves, avoid add new pending_dir_move to stack list. Does anyone have any suggestions ? Later, I will submit the case in xfstest. Qu Wenruo 於 2018-11-05 22:35 寫到: On 2018/11/5 下午7:11, Filipe Manana wrote: On Mon, Nov 5, 2018 at 4:10 AM robbieko wrote: Filipe Manana 於 2018-10-30 19:36 寫到: On Tue, Oct 30, 2018 at 7:00 AM robbieko wrote: From: Robbie Ko In apply_children_dir_moves, we first create an empty list (stack), then we get an entry from pending_dir_moves and add it to the stack, but we didn't delete the entry from rb_tree. So, in add_pending_dir_move, we create a new entry and then use the parent_ino in the current rb_tree to find the
Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups
On 2018/11/8 下午10:48, Filipe Manana wrote: > On Thu, Nov 8, 2018 at 2:37 PM Filipe Manana wrote: >> >> On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo wrote: >>> >>> >>> >>> On 2018/11/8 下午9:17, fdman...@kernel.org wrote: From: Filipe Manana When creating a block group we don't need to set the log for full commit if the new block group is not used for data. Logged items can only point to logical addresses of data block groups (through file extent items) so there is no need to for the next fsync to fallback to a transaction commit if the new block group is for metadata. >>> >>> Is it possible for the log tree blocks to be allocated in that new block >>> group? >> >> Yes. > > Now I realize what might be your concern, and this would cause trouble. > Surprised this didn't trigger any problem and I had this (together > with other changes) running tests for some weeks already. Maybe it's related metadata chunk pre-allocation so it will be super hard to hit in normal case, or some extent allocation policy preventing us from allocating tree block of newly created bg. Thanks, Qu > >> >>> >>> Thanks, >>> Qu >>> Signed-off-by: Filipe Manana --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 577878324799..588fbd1606fb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, struct btrfs_block_group_cache *cache; int ret; - btrfs_set_log_full_commit(fs_info, trans); + if (type & BTRFS_BLOCK_GROUP_DATA) + btrfs_set_log_full_commit(fs_info, trans); cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); if (!cache) >>> signature.asc Description: OpenPGP digital signature
Full filesystem btrfs rebalance kernel panic to read-only lock
Hello, So, I've had the full disk issue, so when I tried re-balancing, I got a panic, that pushed filesystem read-only and I'm unable to balance or grow the filesystem now. fs info: btrfs fi show / Label: none uuid: 9b591b6b-6040-437e-9398-6883ca3bf1bb Total devices 1 FS bytes used 614.94GiB devid 1 size 750.00GiB used 750.00GiB path /dev/mapper/vg0-root btrfs fi df / Data, single: total=740.94GiB, used=610.75GiB System, DUP: total=32.00MiB, used=112.00KiB Metadata, DUP: total=4.50GiB, used=3.94GiB GlobalReserve, single: total=512.00MiB, used=255.06MiB btrfs sub list -ta / ID gen top level path -- --- - btrfs --version btrfs-progs v4.4 Log when booting machine now from root: [ 54.746700] [ cut here ] [ 54.746701] BTRFS: Transaction aborted (error -28) [ 54.746734] WARNING: CPU: 6 PID: 481 at /build/linux-hwe-q2wgwz/linux-hwe-4.15.0/fs/btrfs/extent-tree.c:6997 __btrfs_free_extent.isra.62+0x2a7/0xdf0 [btrfs] [ 54.746734] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast igb ttm drm_kms_helper dca i2c_algo_bit syscopyarea sysfillrect sysimgblt raid1 fb_sys_fops bcache ahci ptp drm libahci pps_core nvme nvme_core wmi [ 54.746748] CPU: 6 PID: 481 Comm: mount Not tainted 4.15.0-36-generic #39~16.04.1-Ubuntu [ 54.746749] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8 Series/Z10PA-U8 Series, BIOS 3403 03/01/2017 [ 54.746757] RIP: 0010:__btrfs_free_extent.isra.62+0x2a7/0xdf0 [btrfs] [ 54.746757] RSP: 0018:b9540d66b858 EFLAGS: 00010286 [ 54.746758] RAX: RBX: 019518102000 RCX: 0001 [ 54.746759] RDX: 0001 RSI: 0002 RDI: 0246 [ 54.746759] RBP: b9540d66b900 R08: R09: 0026 [ 54.746760] R10: R11: R12: 995a0b52 [ 54.746760] R13: ffe4 R14: 9959f7114230 R15: 0005 [ 54.746761] FS: 7f467684a840() GS:995a3f38() knlGS: [ 54.746762] CS: 0010 DS: ES: CR0: 80050033 [ 54.746762] CR2: 7fca430351e4 CR3: 003f6dd6a004 CR4: 001606e0 [ 54.746763] Call Trace: [ 54.746768] ? check_preempt_wakeup+0x210/0x240 [ 54.746771] ? tracing_record_taskinfo_skip+0x24/0x50 [ 54.746772] ? tracing_record_taskinfo+0x13/0x90 [ 54.746780] __btrfs_run_delayed_refs+0x322/0x11b0 [btrfs] [ 54.746782] ? __set_page_dirty_nobuffers+0x11e/0x160 [ 54.746791] ? btree_set_page_dirty+0xe/0x10 [btrfs] [ 54.746800] ? btrfs_mark_buffer_dirty+0x79/0xa0 [btrfs] [ 54.746808] btrfs_run_delayed_refs+0xf6/0x1c0 [btrfs] [ 54.746817] btrfs_truncate_inode_items+0xaf7/0x1000 [btrfs] [ 54.746825] ? reserve_metadata_bytes+0x2e7/0xb10 [btrfs] [ 54.746835] btrfs_evict_inode+0x47d/0x5a0 [btrfs] [ 54.746838] evict+0xca/0x1a0 [ 54.746839] iput+0x1d2/0x220 [ 54.746849] btrfs_orphan_cleanup+0x20f/0x490 [btrfs] [ 54.746858] btrfs_cleanup_fs_roots+0x11b/0x1c0 [btrfs] [ 54.746868] ? lookup_extent_mapping+0x13/0x20 [btrfs] [ 54.746879] ? btrfs_check_rw_degradable+0xf5/0x170 [btrfs] [ 54.746885] btrfs_remount+0x2f1/0x520 [btrfs] [ 54.746887] ? shrink_dcache_sb+0x12e/0x180 [ 54.746889] do_remount_sb+0x6d/0x1e0 [ 54.746890] do_mount+0x797/0xd00 [ 54.746910] ? memdup_user+0x4f/0x70 [ 54.746912] SyS_mount+0x95/0xe0 [ 54.746914] do_syscall_64+0x73/0x130 [ 54.746916] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 54.746917] RIP: 0033:0x7f4676129b9a [ 54.746917] RSP: 002b:7ffc5d515838 EFLAGS: 0202 ORIG_RAX: 00a5 [ 54.746918] RAX: ffda RBX: 00978030 RCX: 7f4676129b9a [ 54.746919] RDX: 00978210 RSI: 0097a520 RDI: 00978230 [ 54.746919] RBP: R08: R09: 0014 [ 54.746920] R10: c0ed0020 R11: 0202 R12: 00978230 [ 54.746920] R13: 00978210 R14: R15: 0003 [ 54.746921] Code: 8b 45 90 48 8b 40 60 f0 0f ba a8 d8 cd 00 00 02 72 1b 41 83 fd fb 0f 84 5f 03 00 00 44 89 ee 48 c7 c7 58 76 51 c0 e8 a9 55 a2 de <0f> 0b 48 8b 7d 90 44 89 e9 ba 55 1b 00 00 48 c7 c6 80 08 51 c0 [ 54.746934] ---[ end trace 18d422c4358ee800 ]--- [ 54.746936] BTRFS: error (device dm-0) in __btrfs_free_extent:6997: errno=-28 No space left [ 54.746937] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:3082: errno=-28 No space left [ 54.746976] BTRFS error (device dm-0): Error removing orphan entry, stopping orphan cleanup [ 54.746977] BTRFS error (device dm-0): could not do orphan cleanup -22 root kernel: 4.15.0-36-generic #39~16.04.1-Ubuntu When booting to a net/livecd rescue First I run a check with repair: enabling repair mode Checking
Re: [PATCH v15.1 03/13] btrfs: dedupe: Introduce function to add hash into in-memory tree
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi : > > From: Wang Xiaoguang > > Introduce static function inmem_add() to add hash into in-memory tree. > And now we can implement the btrfs_dedupe_add() interface. > > Signed-off-by: Qu Wenruo > Signed-off-by: Wang Xiaoguang > Reviewed-by: Josef Bacik > Signed-off-by: Lu Fengqi > --- > fs/btrfs/dedupe.c | 150 ++ > 1 file changed, 150 insertions(+) > > diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c > index 06523162753d..784bb3a8a5ab 100644 > --- a/fs/btrfs/dedupe.c > +++ b/fs/btrfs/dedupe.c > @@ -19,6 +19,14 @@ struct inmem_hash { > u8 hash[]; > }; > > +static inline struct inmem_hash *inmem_alloc_hash(u16 algo) > +{ > + if (WARN_ON(algo >= ARRAY_SIZE(btrfs_hash_sizes))) > + return NULL; > + return kzalloc(sizeof(struct inmem_hash) + btrfs_hash_sizes[algo], > + GFP_NOFS); > +} > + > static struct btrfs_dedupe_info * > init_dedupe_info(struct btrfs_ioctl_dedupe_args *dargs) > { > @@ -167,3 +175,145 @@ int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info) > /* Place holder for bisect, will be implemented in later patches */ > return 0; > } > + > +static int inmem_insert_hash(struct rb_root *root, > +struct inmem_hash *hash, int hash_len) > +{ > + struct rb_node **p = >rb_node; > + struct rb_node *parent = NULL; > + struct inmem_hash *entry = NULL; > + > + while (*p) { > + parent = *p; > + entry = rb_entry(parent, struct inmem_hash, hash_node); > + if (memcmp(hash->hash, entry->hash, hash_len) < 0) > + p = &(*p)->rb_left; > + else if (memcmp(hash->hash, entry->hash, hash_len) > 0) > + p = &(*p)->rb_right; > + else > + return 1; > + } > + rb_link_node(>hash_node, parent, p); > + rb_insert_color(>hash_node, root); > + return 0; > +} > + > +static int inmem_insert_bytenr(struct rb_root *root, > + struct inmem_hash *hash) > +{ > + struct rb_node **p = >rb_node; > + struct rb_node *parent = NULL; > + struct inmem_hash *entry = NULL; > + > + while (*p) { > + parent = *p; > + entry = rb_entry(parent, struct inmem_hash, bytenr_node); > + if (hash->bytenr < entry->bytenr) > + p = &(*p)->rb_left; > + else if (hash->bytenr > entry->bytenr) > + p = &(*p)->rb_right; > + else > + return 1; > + } > + rb_link_node(>bytenr_node, parent, p); > + rb_insert_color(>bytenr_node, root); > + return 0; > +} > + > +static void __inmem_del(struct btrfs_dedupe_info *dedupe_info, > + struct inmem_hash *hash) > +{ > + list_del(>lru_list); > + rb_erase(>hash_node, _info->hash_root); > + rb_erase(>bytenr_node, _info->bytenr_root); > + > + if (!WARN_ON(dedupe_info->current_nr == 0)) > + dedupe_info->current_nr--; > + > + kfree(hash); > +} > + > +/* > + * Insert a hash into in-memory dedupe tree > + * Will remove exceeding last recent use hash. > + * > + * If the hash mathced with existing one, we won't insert it, to > + * save memory > + */ > +static int inmem_add(struct btrfs_dedupe_info *dedupe_info, > +struct btrfs_dedupe_hash *hash) > +{ > + int ret = 0; > + u16 algo = dedupe_info->hash_algo; > + struct inmem_hash *ihash; > + > + ihash = inmem_alloc_hash(algo); > + > + if (!ihash) > + return -ENOMEM; > + > + /* Copy the data out */ > + ihash->bytenr = hash->bytenr; > + ihash->num_bytes = hash->num_bytes; > + memcpy(ihash->hash, hash->hash, btrfs_hash_sizes[algo]); > + > + mutex_lock(_info->lock); > + > + ret = inmem_insert_bytenr(_info->bytenr_root, ihash); > + if (ret > 0) { > + kfree(ihash); > + ret = 0; > + goto out; > + } > + > + ret = inmem_insert_hash(_info->hash_root, ihash, > + btrfs_hash_sizes[algo]); > + if (ret > 0) { > + /* > +* We only keep one hash in tree to save memory, so if > +* hash conflicts, free the one to insert. > +*/ > + rb_erase(>bytenr_node, _info->bytenr_root); > + kfree(ihash); > + ret = 0; > + goto out; > + } > + > + list_add(>lru_list, _info->lru_list); > + dedupe_info->current_nr++; > + > + /* Remove the last dedupe hash if we exceed limit */ > + while (dedupe_info->current_nr > dedupe_info->limit_nr) { > + struct inmem_hash *last; > + > + last = list_entry(dedupe_info->lru_list.prev, > +
Re: Where is my disk space ?
On Thu, Nov 8, 2018 at 2:27 AM, Barbet Alain wrote: > Hi ! > Just to give you end of the story: > I move my /var/lib/docker to my home (other partition), and my space > come back ... I'm not sure why that would matter. Both btrfs du and regular du showed only ~350M used in /var which is about what I'd expect. And also the 'btrfs sub list' output doesn't show any subvolumes/snapshots for Docker. The upstream Docker behavior on Btrfs is that it uses subvolumes and snapshots for everything, quickly you'll see a lot of them. However many distributions override the default Docker behavior, e.g. with Docker storage setup, and will cause it to always favor a particular driver. For example the Docker overlay2 driver, which leverages kernel overlayfs, which will work on any file system including Btrfs. And I'm not exactly sure where the upper dirs are stored, but I'd be surprised if they're not in /var. Anyway, if you're using Docker, moving stuff around will almost certainly break it. And as I'm an extreme expert in messing up Docker storage, I can vouch for the strategy of stopping the docker daemon, recursively deleting everything in /var/lib/docker/ and then starting Docker. Now you get to go fetch all your images again. And anyway, you shouldn't be storing any data in the containers, they should be throwaway things, important data should be stored elsewhere including any state information for the container. :-D Avoid container misery by having a workflow that expects containers to be transient disposable objects. -- Chris Murphy
Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups
On Thu, Nov 8, 2018 at 2:37 PM Filipe Manana wrote: > > On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo wrote: > > > > > > > > On 2018/11/8 下午9:17, fdman...@kernel.org wrote: > > > From: Filipe Manana > > > > > > When creating a block group we don't need to set the log for full commit > > > if the new block group is not used for data. Logged items can only point > > > to logical addresses of data block groups (through file extent items) so > > > there is no need to for the next fsync to fallback to a transaction commit > > > if the new block group is for metadata. > > > > Is it possible for the log tree blocks to be allocated in that new block > > group? > > Yes. Now I realize what might be your concern, and this would cause trouble. Surprised this didn't trigger any problem and I had this (together with other changes) running tests for some weeks already. > > > > > Thanks, > > Qu > > > > > > > > Signed-off-by: Filipe Manana > > > --- > > > fs/btrfs/extent-tree.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > > index 577878324799..588fbd1606fb 100644 > > > --- a/fs/btrfs/extent-tree.c > > > +++ b/fs/btrfs/extent-tree.c > > > @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct > > > btrfs_trans_handle *trans, u64 bytes_used, > > > struct btrfs_block_group_cache *cache; > > > int ret; > > > > > > - btrfs_set_log_full_commit(fs_info, trans); > > > + if (type & BTRFS_BLOCK_GROUP_DATA) > > > + btrfs_set_log_full_commit(fs_info, trans); > > > > > > cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); > > > if (!cache) > > > > >
Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups
On Thu, Nov 8, 2018 at 2:35 PM Qu Wenruo wrote: > > > > On 2018/11/8 下午9:17, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > When creating a block group we don't need to set the log for full commit > > if the new block group is not used for data. Logged items can only point > > to logical addresses of data block groups (through file extent items) so > > there is no need to for the next fsync to fallback to a transaction commit > > if the new block group is for metadata. > > Is it possible for the log tree blocks to be allocated in that new block > group? Yes. > > Thanks, > Qu > > > > > Signed-off-by: Filipe Manana > > --- > > fs/btrfs/extent-tree.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > index 577878324799..588fbd1606fb 100644 > > --- a/fs/btrfs/extent-tree.c > > +++ b/fs/btrfs/extent-tree.c > > @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct > > btrfs_trans_handle *trans, u64 bytes_used, > > struct btrfs_block_group_cache *cache; > > int ret; > > > > - btrfs_set_log_full_commit(fs_info, trans); > > + if (type & BTRFS_BLOCK_GROUP_DATA) > > + btrfs_set_log_full_commit(fs_info, trans); > > > > cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); > > if (!cache) > > >
Re: [PATCH] Btrfs: do not set log for full commit when creating non-data block groups
On 2018/11/8 下午9:17, fdman...@kernel.org wrote: > From: Filipe Manana > > When creating a block group we don't need to set the log for full commit > if the new block group is not used for data. Logged items can only point > to logical addresses of data block groups (through file extent items) so > there is no need to for the next fsync to fallback to a transaction commit > if the new block group is for metadata. Is it possible for the log tree blocks to be allocated in that new block group? Thanks, Qu > > Signed-off-by: Filipe Manana > --- > fs/btrfs/extent-tree.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 577878324799..588fbd1606fb 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle > *trans, u64 bytes_used, > struct btrfs_block_group_cache *cache; > int ret; > > - btrfs_set_log_full_commit(fs_info, trans); > + if (type & BTRFS_BLOCK_GROUP_DATA) > + btrfs_set_log_full_commit(fs_info, trans); > > cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); > if (!cache) > signature.asc Description: OpenPGP digital signature
[PATCH] btrfs: Check for missing device before bio submission in btrfs_map_bio
Before btrfs_map_bio submits all stripe bio it does a number of checks to ensure the device for every stripe is present. However, it doesn't do a DEV_STATE_MISSING check, instead this is relegated to the lower level btrfs_schedule_bio (in the async submission case, sync submission doesn't check DEV_STATE_MISSING at all). Additionally btrfs_schedule_bios does the duplicate device->bdev check which has already been performed in btrfs_map_bio. This patch moves the DEV_STATE_MISSING check in btrfs_map_bio and removes the duplicate device->bdev check. Doing so ensures that no bio cloning/submission happens for both async/sync requests in the face of missing device. This makes the async io submission path slightly shorter in terms of instruction count. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 44c5e8ccb644..3312cad65209 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6106,12 +6106,6 @@ static noinline void btrfs_schedule_bio(struct btrfs_device *device, int should_queue = 1; struct btrfs_pending_bios *pending_bios; - if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state) || - !device->bdev) { - bio_io_error(bio); - return; - } - /* don't bother with additional async steps for reads, right now */ if (bio_op(bio) == REQ_OP_READ) { btrfsic_submit_bio(bio); @@ -6240,7 +6234,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { dev = bbio->stripes[dev_nr].dev; - if (!dev || !dev->bdev || + if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, + >dev_state) || (bio_op(first_bio) == REQ_OP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))) { bbio_error(bbio, first_bio, logical); -- 2.17.1
[PATCH] Btrfs: do not set log for full commit when creating non-data block groups
From: Filipe Manana When creating a block group we don't need to set the log for full commit if the new block group is not used for data. Logged items can only point to logical addresses of data block groups (through file extent items) so there is no need to for the next fsync to fallback to a transaction commit if the new block group is for metadata. Signed-off-by: Filipe Manana --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 577878324799..588fbd1606fb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10112,7 +10112,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, struct btrfs_block_group_cache *cache; int ret; - btrfs_set_log_full_commit(fs_info, trans); + if (type & BTRFS_BLOCK_GROUP_DATA) + btrfs_set_log_full_commit(fs_info, trans); cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); if (!cache) -- 2.11.0
Re: [PATCH v15.1 02/13] btrfs: dedupe: Introduce function to initialize dedupe info
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi : > > From: Wang Xiaoguang > > Add generic function to initialize dedupe info. > > Signed-off-by: Qu Wenruo > Signed-off-by: Wang Xiaoguang > Reviewed-by: Josef Bacik > Signed-off-by: Lu Fengqi > --- > fs/btrfs/Makefile | 2 +- > fs/btrfs/dedupe.c | 169 + > fs/btrfs/dedupe.h | 12 +++ > include/uapi/linux/btrfs.h | 3 + > 4 files changed, 185 insertions(+), 1 deletion(-) > create mode 100644 fs/btrfs/dedupe.c > > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile > index ca693dd554e9..78fdc87dba39 100644 > --- a/fs/btrfs/Makefile > +++ b/fs/btrfs/Makefile > @@ -10,7 +10,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o > root-tree.o dir-item.o \ >export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \ >compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ >reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ > - uuid-tree.o props.o free-space-tree.o tree-checker.o > + uuid-tree.o props.o free-space-tree.o tree-checker.o dedupe.o > > btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o > btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o > diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c > new file mode 100644 > index ..06523162753d > --- /dev/null > +++ b/fs/btrfs/dedupe.c > @@ -0,0 +1,169 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2016 Fujitsu. All rights reserved. > + */ > + > +#include "ctree.h" > +#include "dedupe.h" > +#include "btrfs_inode.h" > +#include "delayed-ref.h" > + > +struct inmem_hash { > + struct rb_node hash_node; > + struct rb_node bytenr_node; > + struct list_head lru_list; > + > + u64 bytenr; > + u32 num_bytes; > + > + u8 hash[]; > +}; > + > +static struct btrfs_dedupe_info * > +init_dedupe_info(struct btrfs_ioctl_dedupe_args *dargs) > +{ > + struct btrfs_dedupe_info *dedupe_info; > + > + dedupe_info = kzalloc(sizeof(*dedupe_info), GFP_NOFS); > + if (!dedupe_info) > + return ERR_PTR(-ENOMEM); > + > + dedupe_info->hash_algo = dargs->hash_algo; > + dedupe_info->backend = dargs->backend; > + dedupe_info->blocksize = dargs->blocksize; > + dedupe_info->limit_nr = dargs->limit_nr; > + > + /* only support SHA256 yet */ > + dedupe_info->dedupe_driver = crypto_alloc_shash("sha256", 0, 0); > + if (IS_ERR(dedupe_info->dedupe_driver)) { > + kfree(dedupe_info); > + return ERR_CAST(dedupe_info->dedupe_driver); > + } > + > + dedupe_info->hash_root = RB_ROOT; > + dedupe_info->bytenr_root = RB_ROOT; > + dedupe_info->current_nr = 0; > + INIT_LIST_HEAD(_info->lru_list); > + mutex_init(_info->lock); > + > + return dedupe_info; > +} > + > +/* > + * Helper to check if parameters are valid. > + * The first invalid field will be set to (-1), to info user which parameter > + * is invalid. > + * Except dargs->limit_nr or dargs->limit_mem, in that case, 0 will returned > + * to info user, since user can specify any value to limit, except 0. > + */ > +static int check_dedupe_parameter(struct btrfs_fs_info *fs_info, > + struct btrfs_ioctl_dedupe_args *dargs) > +{ > + u64 blocksize = dargs->blocksize; > + u64 limit_nr = dargs->limit_nr; > + u64 limit_mem = dargs->limit_mem; > + u16 hash_algo = dargs->hash_algo; > + u8 backend = dargs->backend; > + > + /* > +* Set all reserved fields to -1, allow user to detect > +* unsupported optional parameters. > +*/ > + memset(dargs->__unused, -1, sizeof(dargs->__unused)); > + if (blocksize > BTRFS_DEDUPE_BLOCKSIZE_MAX || > + blocksize < BTRFS_DEDUPE_BLOCKSIZE_MIN || > + blocksize < fs_info->sectorsize || > + !is_power_of_2(blocksize) || > + blocksize < PAGE_SIZE) { > + dargs->blocksize = (u64)-1; > + return -EINVAL; > + } > + if (hash_algo >= ARRAY_SIZE(btrfs_hash_sizes)) { > + dargs->hash_algo = (u16)-1; > + return -EINVAL; > + } > + if (backend >= BTRFS_DEDUPE_BACKEND_COUNT) { > + dargs->backend = (u8)-1; > + return -EINVAL; > + } > + > + /* Backend specific check */ > + if (backend == BTRFS_DEDUPE_BACKEND_INMEMORY) { > + /* only one limit is accepted for enable*/ > + if (dargs->limit_nr && dargs->limit_mem) { > + dargs->limit_nr = 0; > + dargs->limit_mem = 0; > + return -EINVAL; > + } > + > + if (!limit_nr && !limit_mem) > + dargs->limit_nr = BTRFS_DEDUPE_LIMIT_NR_DEFAULT; > + else { > + u64 tmp = (u64)-1;
Re: [PATCH -next] btrfs: remove set but not used variable 'tree'
On Thu, Nov 08, 2018 at 02:14:43AM +, YueHaibing wrote: > Fixes gcc '-Wunused-but-set-variable' warning: > > fs/btrfs/extent_io.c: In function 'end_extent_writepage': > fs/btrfs/extent_io.c:2406:25: warning: > variable 'tree' set but not used [-Wunused-but-set-variable] > > It not used any more after > commit 2922040236f9 ("btrfs: Remove extent_io_ops::writepage_end_io_hook") > > Signed-off-by: YueHaibing Thanks, the patches are still out of mainline so the commit id is not stable and I can fold in the fixup. Same for the other one.
Re: [PATCH v15.1 01/13] btrfs: dedupe: Introduce dedupe framework and its header
вт, 6 нояб. 2018 г. в 9:41, Lu Fengqi : > > From: Wang Xiaoguang > > Introduce the header for btrfs in-band(write time) de-duplication > framework and needed header. > > The new de-duplication framework is going to support 2 different dedupe > methods and 1 dedupe hash. > > Signed-off-by: Qu Wenruo > Signed-off-by: Wang Xiaoguang > Signed-off-by: Lu Fengqi > --- > fs/btrfs/ctree.h | 7 ++ > fs/btrfs/dedupe.h | 128 - > fs/btrfs/disk-io.c | 1 + > include/uapi/linux/btrfs.h | 34 ++ > 4 files changed, 168 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 80953528572d..910050d904ef 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1118,6 +1118,13 @@ struct btrfs_fs_info { > spinlock_t ref_verify_lock; > struct rb_root block_tree; > #endif > + > + /* > +* Inband de-duplication related structures > +*/ > + unsigned long dedupe_enabled:1; > + struct btrfs_dedupe_info *dedupe_info; > + struct mutex dedupe_ioctl_lock; > }; > > static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) > diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h > index 90281a7a35a8..222ce7b4d827 100644 > --- a/fs/btrfs/dedupe.h > +++ b/fs/btrfs/dedupe.h > @@ -6,7 +6,131 @@ > #ifndef BTRFS_DEDUPE_H > #define BTRFS_DEDUPE_H > > -/* later in-band dedupe will expand this struct */ > -struct btrfs_dedupe_hash; > +#include > > +/* 32 bytes for SHA256 */ > +static const int btrfs_hash_sizes[] = { 32 }; > + > +/* > + * For caller outside of dedupe.c > + * > + * Different dedupe backends should have their own hash structure > + */ > +struct btrfs_dedupe_hash { > + u64 bytenr; > + u32 num_bytes; > + > + /* last field is a variable length array of dedupe hash */ > + u8 hash[]; > +}; > + > +struct btrfs_dedupe_info { > + /* dedupe blocksize */ > + u64 blocksize; > + u16 backend; > + u16 hash_algo; > + > + struct crypto_shash *dedupe_driver; > + > + /* > +* Use mutex to portect both backends > +* Even for in-memory backends, the rb-tree can be quite large, > +* so mutex is better for such use case. > +*/ > + struct mutex lock; > + > + /* following members are only used in in-memory backend */ > + struct rb_root hash_root; > + struct rb_root bytenr_root; > + struct list_head lru_list; > + u64 limit_nr; > + u64 current_nr; > +}; > + > +static inline int btrfs_dedupe_hash_hit(struct btrfs_dedupe_hash *hash) > +{ > + return (hash && hash->bytenr); > +} > + > +/* > + * Initial inband dedupe info > + * Called at dedupe enable time. > + * > + * Return 0 for success > + * Return <0 for any error > + * (from unsupported param to tree creation error for some backends) > + */ > +int btrfs_dedupe_enable(struct btrfs_fs_info *fs_info, > + struct btrfs_ioctl_dedupe_args *dargs); > + > +/* > + * Disable dedupe and invalidate all its dedupe data. > + * Called at dedupe disable time. > + * > + * Return 0 for success > + * Return <0 for any error > + * (tree operation error for some backends) > + */ > +int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info); > + > +/* > + * Get current dedupe status. > + * Return 0 for success > + * No possible error yet > + */ > +void btrfs_dedupe_status(struct btrfs_fs_info *fs_info, > +struct btrfs_ioctl_dedupe_args *dargs); > + > +/* > + * Calculate hash for dedupe. > + * Caller must ensure [start, start + dedupe_bs) has valid data. > + * > + * Return 0 for success > + * Return <0 for any error > + * (error from hash codes) > + */ > +int btrfs_dedupe_calc_hash(struct btrfs_fs_info *fs_info, > + struct inode *inode, u64 start, > + struct btrfs_dedupe_hash *hash); > + > +/* > + * Search for duplicated extents by calculated hash > + * Caller must call btrfs_dedupe_calc_hash() first to get the hash. > + * > + * @inode: the inode for we are writing > + * @file_pos: offset inside the inode > + * As we will increase extent ref immediately after a hash match, > + * we need @file_pos and @inode in this case. > + * > + * Return > 0 for a hash match, and the extent ref will be > + * *INCREASED*, and hash->bytenr/num_bytes will record the existing > + * extent data. > + * Return 0 for a hash miss. Nothing is done > + * Return <0 for any error > + * (tree operation error for some backends) > + */ > +int btrfs_dedupe_search(struct btrfs_fs_info *fs_info, > + struct inode *inode, u64 file_pos, > + struct btrfs_dedupe_hash *hash); > + > +/* > + * Add a dedupe hash into dedupe info > + * Return 0 for success > + * Return <0 for any error > + * (tree operation error for some backends) > + */ > +int btrfs_dedupe_add(struct btrfs_fs_info *fs_info, > +
Re: Where is my disk space ?
Hi ! Just to give you end of the story: I move my /var/lib/docker to my home (other partition), and my space come back ... I let docker here & don't try to put it again on / to see if problem come back. Le mer. 31 oct. 2018 à 08:34, Barbet Alain a écrit : > > > Also, since you don't have any snapshots, you could also find this > > conventionally: > > > > # du -sh /* > > > Usually yes, but here not. It's just like when you remove a file when > a process still use it and write in it, and fcsk will be not happy > next time. > But I reboot & check with btrfs check withtout any issue :-/ > > alian@alian:/> sudo btrfs fi du -s * > Total Exclusive Set shared Filename >1.58MiB 1.58MiB 0.00B bin > 42.69MiB42.69MiB 0.00B boot > 14.78MiB14.78MiB 0.00B etc > 532.40MiB 532.40MiB 0.00B lib >9.88MiB 9.88MiB 0.00B lib64 > 0.00B 0.00B 0.00B mnt > 23.96MiB23.96MiB 0.00B opt > 128.00KiB 128.00KiB 0.00B root >9.74MiB 9.74MiB 0.00B sbin > 0.00B 0.00B 0.00B selinux > 0.00B 0.00B 0.00B srv > 15.92MiB15.92MiB 0.00B tmp >4.86GiB 4.86GiB 0.00B usr > 345.65MiB 345.65MiB 0.00B var > > alian@alian:~> sudo du --exclude /home -sh /* > 2,1M/bin > 43M /boot > 0 /dev > 20M /etc > 534M/lib > 11M /lib64 > 0 /mnt > 24M /opt > 0 /proc > 172K/root > 18M /run > 11M /sbin > 0 /selinux > 0 /srv > 0 /sys > 16M /tmp > 5,2G/usr > 355M/var
Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state
On 11/08/2018 04:52 PM, Nikolay Borisov wrote: On 8.11.18 г. 10:33 ч., Anand Jain wrote: On 11/07/2018 08:19 PM, Nikolay Borisov wrote: On 7.11.18 г. 13:43 ч., Anand Jain wrote: + /* scrub for replace must not be running in suspended state */ + if (btrfs_scrub_cancel(fs_info) != -ENOTCONN) + ASSERT(0); ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN) There will be substantial difference in code when compiled with and without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info) won't be run at all, I would like to keep it as it is. Fair point, in that case do: ret = btrfs_scrub_cancel(fs_info); ASSERT(ret != -ENOTCONN); Fixed. Thanks, Anand result [1] -- ./fs/btrfs/ctree.h #ifdef CONFIG_BTRFS_ASSERT __cold static inline void assfail(const char *expr, const char *file, int line) { pr_err("assertion failed: %s, file: %s, line: %d\n", expr, file, line); BUG(); } #define ASSERT(expr) \ (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) #else #define ASSERT(expr) ((void)0) #endif --- Thanks, Anand
Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state
On 8.11.18 г. 10:33 ч., Anand Jain wrote: > > > On 11/07/2018 08:19 PM, Nikolay Borisov wrote: >> >> >> On 7.11.18 г. 13:43 ч., Anand Jain wrote: >>> + /* scrub for replace must not be running in suspended state */ >>> + if (btrfs_scrub_cancel(fs_info) != -ENOTCONN) >>> + ASSERT(0); >> >> ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN) >> > > There will be substantial difference in code when compiled with and > without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info) > won't be run at all, I would like to keep it as it is. Fair point, in that case do: ret = btrfs_scrub_cancel(fs_info); ASSERT(ret != -ENOTCONN); result > > [1] > -- > ./fs/btrfs/ctree.h > #ifdef CONFIG_BTRFS_ASSERT > > __cold > static inline void assfail(const char *expr, const char *file, int line) > { > pr_err("assertion failed: %s, file: %s, line: %d\n", > expr, file, line); > BUG(); > } > > #define ASSERT(expr) \ > (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) > #else > #define ASSERT(expr) ((void)0) > #endif > --- > > Thanks, Anand > >
Re: [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state
On 11/07/2018 08:19 PM, Nikolay Borisov wrote: On 7.11.18 г. 13:43 ч., Anand Jain wrote: + /* scrub for replace must not be running in suspended state */ + if (btrfs_scrub_cancel(fs_info) != -ENOTCONN) + ASSERT(0); ASSERT(btrfs_scrub_cancel(fs_info) == -ENOTCONN) There will be substantial difference in code when compiled with and without CONFIG_BTRFS_ASSERT [1]. That is, btrfs_scrub_cancel(fs_info) won't be run at all, I would like to keep it as it is. [1] -- ./fs/btrfs/ctree.h #ifdef CONFIG_BTRFS_ASSERT __cold static inline void assfail(const char *expr, const char *file, int line) { pr_err("assertion failed: %s, file: %s, line: %d\n", expr, file, line); BUG(); } #define ASSERT(expr)\ (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__)) #else #define ASSERT(expr)((void)0) #endif --- Thanks, Anand
[PATCH 0/3] Cleanups following optional extent_io_ops callbacks removal
Here are 3 minor patches that further clean up writepage_delalloc. The first one moves the extent locked check in the caller of writepage_delalloc since this seems more natural. This paves the way for the second patch which removes epd as an argument to writepage_delalloc. The final patch was suggested by Josef and removes an extent_state argument which has never been used. Nikolay Borisov (3): btrfs: Move epd::extent_locked check to writepage_delalloc's caller btrfs: Remove extent_page_data argument from writepage_delalloc btrfs: Remove unused extent_state argument from btrfs_writepage_endio_finish_ordered fs/btrfs/compression.c | 6 -- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/extent_io.c | 36 +--- fs/btrfs/inode.c | 7 +++ 4 files changed, 26 insertions(+), 27 deletions(-) -- 2.17.1
[PATCH 3/3] btrfs: Remove unused extent_state argument from btrfs_writepage_endio_finish_ordered
This parameter was never used, yet was part of the interface of the function ever since its introduction as extent_io_ops::writepage_end_io_hook in e6dcd2dc9c48 ("Btrfs: New data=ordered implementation"). Now that NULL is passed everywhere as a value for this parameter let's remove it for good. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/compression.c | 6 -- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/extent_io.c | 12 +--- fs/btrfs/inode.c | 7 +++ 4 files changed, 14 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index bde8d0487bbb..717d9300dd18 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -251,8 +251,10 @@ static void end_compressed_bio_write(struct bio *bio) tree = _I(inode)->io_tree; cb->compressed_pages[0]->mapping = cb->inode->i_mapping; btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0], - cb->start, cb->start + cb->len - 1, NULL, - bio->bi_status ? BLK_STS_OK : BLK_STS_NOTSUPP); +cb->start, +cb->start + cb->len - 1, +bio->bi_status ? +BLK_STS_OK : BLK_STS_NOTSUPP); cb->compressed_pages[0]->mapping = NULL; end_compressed_writeback(inode, cb); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b41ec42f405..c48fcaf4004d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3179,7 +3179,7 @@ int btrfs_create_subvol_root(struct btrfs_trans_handle *trans, struct btrfs_root *new_root, struct btrfs_root *parent_root, u64 new_dirid); -void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, + void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, unsigned *bits); void btrfs_clear_delalloc_extent(struct inode *inode, struct extent_state *state, unsigned *bits); @@ -3231,7 +3231,7 @@ int btrfs_run_delalloc_range(void *private_data, struct page *locked_page, struct writeback_control *wbc); int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end); void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, - u64 end, struct extent_state *state, int uptodate); + u64 end, int uptodate); extern const struct dentry_operations btrfs_dentry_operations; /* ioctl.c */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index cca9d3cbe74a..f3bf7f9c13c0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2408,7 +2408,7 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end) tree = _I(page->mapping->host)->io_tree; - btrfs_writepage_endio_finish_ordered(page, start, end, NULL, uptodate); + btrfs_writepage_endio_finish_ordered(page, start, end, uptodate); if (!uptodate) { ClearPageUptodate(page); @@ -3329,8 +3329,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode, end = page_end; if (i_size <= start) { - btrfs_writepage_endio_finish_ordered(page, start, page_end, -NULL, 1); + btrfs_writepage_endio_finish_ordered(page, start, page_end, 1); goto done; } @@ -3342,7 +3341,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode, if (cur >= i_size) { btrfs_writepage_endio_finish_ordered(page, cur, -page_end, NULL, 1); +page_end, 1); break; } em = btrfs_get_extent(BTRFS_I(inode), page, pg_offset, cur, @@ -3379,7 +3378,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode, if (!compressed) btrfs_writepage_endio_finish_ordered(page, cur, cur + iosize - 1, - NULL, 1); + 1); else if (compressed) { /* we don't want to end_page_writeback on * a compressed extent. this happens @@ -4066,8 +4065,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, ret = __extent_writepage(page, _writepages, ); else { btrfs_writepage_endio_finish_ordered(page, start,
[PATCH 1/3] btrfs: Move epd::extent_locked check to writepage_delalloc's caller
If epd::extent_locked is set then writepage_delalloc terminates. Make this a bit more apparent in the caller by simply bubbling the check up. This enables to remove epd as an argument to writepage_delalloc in a future patch. No functional change. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 46c299560f4f..e1ce07b2d33a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3215,8 +3215,6 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, int ret; int page_started = 0; - if (epd->extent_locked) - return 0; while (delalloc_end < page_end) { nr_delalloc = find_lock_delalloc_range(inode, tree, @@ -3472,11 +3470,14 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, set_page_extent_mapped(page); - ret = writepage_delalloc(inode, page, wbc, epd, start, _written); - if (ret == 1) - goto done_unlocked; - if (ret) - goto done; + if (!epd->extent_locked) { + ret = writepage_delalloc(inode, page, wbc, epd, start, +_written); + if (ret == 1) + goto done_unlocked; + if (ret) + goto done; + } ret = __extent_writepage_io(inode, page, wbc, epd, i_size, nr_written, write_flags, ); -- 2.17.1
[PATCH 2/3] btrfs: Remove extent_page_data argument from writepage_delalloc
The only remaining use of the 'epd' argument in writepage_delalloc is to reference the extent_io_tree which was set in extent_writepages. Since it is guaranteed that page->mapping of any page passed to writepage_delalloc (and __extent_writepage as the sole caller) to be equal to that passed in extent_writepages we can directly get the io_tree via the already passed inode (which is also taken from page->mapping->host). No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e1ce07b2d33a..cca9d3cbe74a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3202,12 +3202,12 @@ static void update_nr_written(struct writeback_control *wbc, * This returns < 0 if there were errors (page still locked) */ static noinline_for_stack int writepage_delalloc(struct inode *inode, - struct page *page, struct writeback_control *wbc, - struct extent_page_data *epd, - u64 delalloc_start, - unsigned long *nr_written) +struct page *page, +struct writeback_control *wbc, +u64 delalloc_start, +unsigned long *nr_written) { - struct extent_io_tree *tree = epd->tree; + struct extent_io_tree *tree = _I(inode)->io_tree; u64 page_end = delalloc_start + PAGE_SIZE - 1; u64 nr_delalloc; u64 delalloc_to_write = 0; @@ -3471,8 +3471,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, set_page_extent_mapped(page); if (!epd->extent_locked) { - ret = writepage_delalloc(inode, page, wbc, epd, start, -_written); + ret = writepage_delalloc(inode, page, wbc, start, _written); if (ret == 1) goto done_unlocked; if (ret) -- 2.17.1
Re: [PATCH 7/9] btrfs: quiten warn if the replace is canceled at finish
On 11/07/2018 08:17 PM, Nikolay Borisov wrote: On 7.11.18 г. 13:43 ч., Anand Jain wrote: - WARN_ON(ret); + if (ret != -ECANCELED) + WARN_ON(ret); WARN_ON(ret && ret != -ECANCELED) Will fix. Thanks, Anand