Re: [PATCH] btrfs-progs: Move -rdynamic linker only option to LDFLAGS.
On Wed, Feb 11, 2015 at 08:46:24AM +0800, Qu Wenruo wrote: Same thing as clang cleanup patch: commit 040b3f11ba6bd793a9ef79ed4d9032d22370 Author: Qu Wenruo quwen...@cn.fujitsu.com Date: Fri Dec 19 14:13:08 2014 +0800 btrfs-progs: Makefile: Move linker only option to LDFLAGS But the move to autoconfig seems using old Makefile. So do it again. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/10] Enhance btrfs-find-root and open_ctree() to provide better chance on damaged btrfs.
On Wed, Feb 11, 2015 at 08:33:03AM +0800, Qu Wenruo wrote: Also, since only 2 patches is modified(although other part is slightly modified to match the change), to avoid mail bombing, I created the pull request on github and only send the first 2 patches with cover-letter. https://github.com/kdave/btrfs-progs/pull/5 Sending the changed patches only is ok (if you point me at the rest of the patches), but it's not necessary to open the github pull request. The version to version changelogs are also stored in the commit changelogs, that's a bit unexpected for a branch to be pulled. Oh, very sorry for this. I was meant to save your time, but I forgot that pull branch won't emit the changelog like patches. Pulled except the last patch, and I've cleaned up some bits so please have a look. It's basically what I'd tell you during a normal review but now it was easier to do myself. My concern about the patch btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary... is the 'automatically' part. Falling to the backup roots should be IMO on request. The tools should have (and some of them already do have) commandline options to request a given backup root. That way the user can try the default action and then decide if the backup roots are fine for use. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Fix 2 extent buffer leak in btrfs-debug-tree.
On Wed, Feb 11, 2015 at 10:02:14AM +0800, Qu Wenruo wrote: There are 2 known extent buffer: Oh, a small typo: 2 known extent buffer leak:, missing the word leak. Fixed and applied, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/24] Btrfs: sysfs: support seed devices in the sysfs layout
On Mon, Feb 09, 2015 at 07:56:24AM +0800, Anand Jain wrote: This adds an enhancement to show the seed fsid and devices. The way sprouting handles fs_devices: clone seed fs_devices and add to the fs_uuids mem copy seed fs_devices and assign to fs_devices-seed (move dev_list) evacuate seed fs_devices contents to hold sprout fs devices contents So to be inline with this fs_devices changes during seeding, represent seed fsid under the sprout fsid, this is achieved by using the kobject_move() eg: showing two levels of seeding. That's new to me, how does nested seeding work? find /sys/fs/btrfs/ -type d -name devices -exec ls {} \; -print sde /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/devices sdd /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/53ac3265-0c34-4afd-9453-cc0d1a07be64/devices The plain uuid is IMHO not the best naming convention, although it's acceptable in the global list in /sys/fs/btrfs/* I'd rather avoid it if it's mixed with other files. Would it be enough to print all relevant seeding information into a single file? If the UUID directoreis do not contain anything else, that would be IMHO best. Do the seeding fsids exist on their own in /sys/sf/btrfs? I haven't tested the patchset so I'd probably find that out myself. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/24 V2] provide frame work so that sysfs attributs from the fs_devices can be added
On Mon, Feb 09, 2015 at 07:56:01AM +0800, Anand Jain wrote: This patch set will provide a framework and help to create attributes from the structure btrfs_fs_devices which are available even before fs_info is created. So by moving the parent kobject super_kobj from fs_info to btrfs_fs_devices, it will help to create attributes from the btrfs_fs_devices as well. Just to note, this does not change any of the existing btrfs sysfs external kobject names and its attributes and not even the life cycle of them. Changes are internal only. And to ensure the same, this path has been tested with various device operations and, checking and comparing the sysfs kobjects and attributes with sysfs kobject and attributes with out this patch, and they remain same. These test cases are added to the progs as test-btrfs-devmgt.sh, its patch is below as well. I went through the patchset, looks ok to me in general. The only concern is about the new seeding representation, but the other changes seem ok (but I did not do in-depth review). I like the patch separation, that really helps to understand the changes although there are 20+ patches in total. We can merge patches 1-22, patch 23 should be folded into 24 as it fixes a bug introduced there. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Btrfs-progs: add regression tests for sysfs contents during btrfs device management
On Mon, Feb 09, 2015 at 04:24:12PM +0800, Qu Wenruo wrote: This tests are not even not bind to btrfs-progs. They are kernel tests in fact. So btrfs-progs isn't the best place for it. Well, I agree. Userspace tools mostly exercise the checker, repair or image, ie. mostly offline actions. The mount test that now exists is to really check that the fixed filesystem can be mounted. The tests Anand proposes perform add, replace, seeding etc. That really belongs to fstests. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Repair broken btrfs raid6?
Hmm, it looks like it is getting worse... Here are some parts of my syslog, including two crashed btrfs-threads: So I am still getting many of these: BTRFS (device dm-5): parent transid verify failed on 25033166798848 wanted 108976 found 108958 BTRFS warning (device dm-5): page private not zero on page 25033166798848 BTRFS warning (device dm-5): page private not zero on page 25033166802944 BTRFS warning (device dm-5): page private not zero on page 25033166807040 BTRFS warning (device dm-5): page private not zero on page 25033166811136 BTRFS info (device dm-5): force lzo compression BTRFS info (device dm-5): disk space caching is enabled BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 Then there is this crash of super/btrfs_abort_transaction: [ cut here ] WARNING: CPU: 0 PID: 30526 at /home/kernel/COD/linux/fs/btrfs/super.c:260 __btrfs_abort_transaction+0x5f/0x140 [btrfs]() BTRFS: Transaction aborted (error -5) Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) mac_hid(E) pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) drm(E) scsi_transport_sas(E) CPU: 0 PID: 30526 Comm: kworker/u16:6 Tainted: GW E 3.19.0-031900-generic #201502091451 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] 0104 880002743c18 817c4c00 0007 880002743c68 880002743c58 81076e87 880002743c58 88020a8694d0 8801fb715800 fffb 0ae8 Call Trace: [817c4c00] dump_stack+0x45/0x57 [81076e87] warn_slowpath_common+0x97/0xe0 [81076f86] warn_slowpath_fmt+0x46/0x50 [c06375cf] __btrfs_abort_transaction+0x5f/0x140 [btrfs] [c0655105] btrfs_run_delayed_refs.part.82+0x175/0x290 [btrfs] [c0655237] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [c0655507] delayed_ref_async_start+0x37/0x90 [btrfs] [c069720e] normal_work_helper+0x7e/0x1b0 [btrfs] [c0697572] btrfs_extent_refs_helper+0x12/0x20 [btrfs] [8108f76d] process_one_work+0x14d/0x460 [8109014b] worker_thread+0x11b/0x3f0 [81090030] ? create_worker+0x1e0/0x1e0 [81095d59] kthread+0xc9/0xe0 [81095c90] ? flush_kthread_worker+0x90/0x90 [817d1e7c] ret_from_fork+0x7c/0xb0 [81095c90] ? flush_kthread_worker+0x90/0x90 ---[ end trace dd65465954546462 ]--- BTRFS: error (device dm-5) in btrfs_run_delayed_refs:2792: errno=-5 IO failure BTRFS info (device dm-5): forced readonly and this crash of delayed-ref/btrfs_select_ref_head: [ cut here ] WARNING: CPU: 7 PID: 3159 at /home/kernel/COD/linux/fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0x120/0x130 [btrfs]() Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) mac_hid(E) pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) drm(E) scsi_transport_sas(E) CPU: 7 PID: 3159 Comm: btrfs-transacti Tainted: GW E 3.19.0-031900-generic #201502091451 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 01b6 8801cb687c48 817c4c00 0007 8801cb687c88 81076e87 0001 8801fe80bf00 8801fe80bfc8 8802345d8280 Call Trace: [817c4c00] dump_stack+0x45/0x57 [81076e87] warn_slowpath_common+0x97/0xe0 [81076eea] warn_slowpath_null+0x1a/0x20 [c06b2d40] btrfs_select_ref_head+0x120/0x130
[PATCH 3/3] Btrfs: account for large extents with enospc
On our gluster boxes we stream large tar balls of backups onto our fses. With 160gb of ram this means we get really large contiguous ranges of dirty data, but the way our ENOSPC stuff works is that as long as it's contiguous we only hold metadata reservation for one extent. The problem is we limit our extents to 128mb, so we'll end up with at least 800 extents so our enospc accounting is quite a bit lower than what we need. To keep track of this make sure we increase outstanding_extents for every multiple of the max extent size so we can be sure to have enough reserved metadata space. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 16 + fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 63 +- 4 files changed, 76 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0b4683f..1675602 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -198,6 +198,8 @@ static int btrfs_csum_sizes[] = { 4, 0 }; #define BTRFS_DIRTY_METADATA_THRESH(32 * 1024 * 1024) +#define BTRFS_MAX_EXTENT_SIZE (128 * 1024 * 1024) + /* * The key defines the order in the tree, and so it also defines (optimal) * block layout. diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a346487..eb30b90 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4966,19 +4966,25 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root, /** * drop_outstanding_extent - drop an outstanding extent * @inode: the inode we're dropping the extent for + * @num_bytes: the number of bytes we're relaseing. * * This is called when we are freeing up an outstanding extent, either called * after an error or after an extent is written. This will return the number of * reserved extents that need to be freed. This must be called with * BTRFS_I(inode)-lock held. */ -static unsigned drop_outstanding_extent(struct inode *inode) +static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes) { unsigned drop_inode_space = 0; unsigned dropped_extents = 0; + unsigned num_extents = 0; - BUG_ON(!BTRFS_I(inode)-outstanding_extents); - BTRFS_I(inode)-outstanding_extents--; + num_extents = (unsigned)div64_u64(num_bytes + + BTRFS_MAX_EXTENT_SIZE - 1, + BTRFS_MAX_EXTENT_SIZE); + ASSERT(num_extents); + ASSERT(BTRFS_I(inode)-outstanding_extents = num_extents); + BTRFS_I(inode)-outstanding_extents -= num_extents; if (BTRFS_I(inode)-outstanding_extents == 0 test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, @@ -5149,7 +5155,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) out_fail: spin_lock(BTRFS_I(inode)-lock); - dropped = drop_outstanding_extent(inode); + dropped = drop_outstanding_extent(inode, num_bytes); /* * If the inodes csum_bytes is the same as the original * csum_bytes then we know we haven't raced with any free()ers @@ -5228,7 +5234,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) num_bytes = ALIGN(num_bytes, root-sectorsize); spin_lock(BTRFS_I(inode)-lock); - dropped = drop_outstanding_extent(inode); + dropped = drop_outstanding_extent(inode, num_bytes); if (num_bytes) to_free = calc_csum_metadata_size(inode, num_bytes, 0); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4ebabd2..3fbc177 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3239,7 +3239,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, page, delalloc_start, delalloc_end, - 128 * 1024 * 1024); + BTRFS_MAX_EXTENT_SIZE); if (nr_delalloc == 0) { delalloc_start = delalloc_end + 1; continue; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fb16fd3..4564975 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1530,10 +1530,45 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, static void btrfs_split_extent_hook(struct inode *inode, struct extent_state *orig, u64 split) { + u64 size; + /* not delalloc, ignore it */ if (!(orig-state EXTENT_DELALLOC)) return; + size = orig-end - orig-start + 1; + if (size BTRFS_MAX_EXTENT_SIZE) { + u64 num_extents; + u64 new_size; + + /* +* We need the largest size of the remaining extent
[PATCH 2/3] Btrfs: don't set and clear delalloc for O_DIRECT writes
We do this to get the space accounting, but this is just needless churn on the io_tree, so just drop setting/clearing delalloc and just drop the reserved data space when we have a successfull allocation. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/inode.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e78a2fd..fb16fd3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7142,7 +7142,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, int ret = 0; if (create) - unlock_bits |= EXTENT_DELALLOC | EXTENT_DIRTY; + unlock_bits |= EXTENT_DIRTY; else len = min_t(u64, len, root-sectorsize); @@ -7278,11 +7278,7 @@ unlock: BTRFS_I(inode)-outstanding_extents++; spin_unlock(BTRFS_I(inode)-lock); } - - ret = set_extent_bit(BTRFS_I(inode)-io_tree, lockstart, -lockstart + len - 1, EXTENT_DELALLOC, NULL, -cached_state, GFP_NOFS); - BUG_ON(ret); + btrfs_free_reserved_data_space(inode, len); } /* -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: only adjust outstanding_extents when we do a short write
We have this weird dance where we always inc outstanding_extents when we do a O_DIRECT write, even if we allocate the entire range. To get around this we also drop the metadata space if we successfully write. This is an unnecessary dance, we only need to jack up outstanding_extents if we don't satisfy the entire range request in get_blocks_direct, otherwise we are good using our original reservation. So drop the unconditional inc and the drop of the metadata space that we have for the unconditional inc. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/inode.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8a036ed..e78a2fd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7137,6 +7137,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, u64 start = iblock inode-i_blkbits; u64 lockstart, lockend; u64 len = bh_result-b_size; + u64 orig_len = len; int unlock_bits = EXTENT_LOCKED; int ret = 0; @@ -7272,9 +7273,11 @@ unlock: if (start + len i_size_read(inode)) i_size_write(inode, start + len); - spin_lock(BTRFS_I(inode)-lock); - BTRFS_I(inode)-outstanding_extents++; - spin_unlock(BTRFS_I(inode)-lock); + if (len orig_len) { + spin_lock(BTRFS_I(inode)-lock); + BTRFS_I(inode)-outstanding_extents++; + spin_unlock(BTRFS_I(inode)-lock); + } ret = set_extent_bit(BTRFS_I(inode)-io_tree, lockstart, lockstart + len - 1, EXTENT_DELALLOC, NULL, @@ -8056,8 +8059,6 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, else if (ret = 0 (size_t)ret count) btrfs_delalloc_release_space(inode, count - (size_t)ret); - else - btrfs_delalloc_release_metadata(inode, 0); } out: if (wakeup) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: Fix out-of-space bug
On 02/11/2015 05:01 AM, Zhaolei wrote: From: Zhao Lei zhao...@cn.fujitsu.com Btrfs will report NO_SPACE when we create and remove files for several times, and we can't write to filesystem until mount it again. Steps to reproduce: 1: Create a single-dev btrfs fs with default option 2: Write a file into it to take up most fs space 3: Delete above file 4: Wait about 100s to let chunk removed 5: goto 2 Script is like following: #!/bin/bash # Recommend 1.2G space, too large disk will make test slow DEV=/dev/sda16 MNT=/mnt/tmp dev_size=$(lsblk -bn -o SIZE $DEV) || exit 2 file_size_m=$((dev_size * 75 / 100 / 1024 / 1024)) echo Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev for ((i = 0; i 10; i++)); do umount $MNT 2/dev/null; done echo mkfs $DEV mkfs.btrfs -f $DEV /dev/null || exit 2 echo mount $DEV $MNT mount $DEV $MNT || exit 2 for ((loop_i = 0; loop_i 20; loop_i++)); do echo echo loop $loop_i echo dd file... cmd=(dd if=/dev/zero of=$MNT/file0 bs=1M count=$file_size_m) ${cmd[@]} 2/dev/null || { # NO_SPACE error triggered echo dd failed: ${cmd[*]} exit 1 } echo rm file... rm -f $MNT/file0 || exit 2 for ((i = 0; i 10; i++)); do df $MNT | tail -1 sleep 10 done done Excellent find btw, please make sure to turn this into an xfstest. An atomic is a bit heavy handed for this, just use an int and set it to 1, we don't need to worry about races since handles will have exited out in time. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] Btrfs: sysfs: don't fail seeding for the sake of sysfs kobject issue
On 02/12/2015 02:40 AM, David Sterba wrote: On Mon, Feb 09, 2015 at 07:56:23AM +0800, Anand Jain wrote: Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 51873ec..1490723 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2249,7 +2249,8 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) root-fs_info-fsid); if (kobject_rename(root-fs_info-fs_devices-super_kobj, fsid_buf)) - goto error_trans; + printk(KERN_WARNING\ + BTRFS: sysfs: failed to create fsid for sprout\n); You can safely use btrfs_warn here. right. I tried to know what to use before, but wasn't sure. would you be able to accept it as it is ? OR I can send a new patch to correct this. Just that changing this commit would fail further commits like Btrfs: sysfs: support seed devices in the sysfs layout. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Btrfs: account for large extents with enospc
On Wed, Feb 11, 2015 at 03:08:59PM -0500, Josef Bacik wrote: On our gluster boxes we stream large tar balls of backups onto our fses. With 160gb of ram this means we get really large contiguous ranges of dirty data, but the way our ENOSPC stuff works is that as long as it's contiguous we only hold metadata reservation for one extent. The problem is we limit our extents to 128mb, so we'll end up with at least 800 extents so our enospc accounting is quite a bit lower than what we need. To keep track of this make sure we increase outstanding_extents for every multiple of the max extent size so we can be sure to have enough reserved metadata space. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 16 + fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 63 +- 4 files changed, 76 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0b4683f..1675602 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -198,6 +198,8 @@ static int btrfs_csum_sizes[] = { 4, 0 }; #define BTRFS_DIRTY_METADATA_THRESH (32 * 1024 * 1024) +#define BTRFS_MAX_EXTENT_SIZE (128 * 1024 * 1024) + /* * The key defines the order in the tree, and so it also defines (optimal) * block layout. diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a346487..eb30b90 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4966,19 +4966,25 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root, /** * drop_outstanding_extent - drop an outstanding extent * @inode: the inode we're dropping the extent for + * @num_bytes: the number of bytes we're relaseing. * * This is called when we are freeing up an outstanding extent, either called * after an error or after an extent is written. This will return the number of * reserved extents that need to be freed. This must be called with * BTRFS_I(inode)-lock held. */ -static unsigned drop_outstanding_extent(struct inode *inode) +static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes) { unsigned drop_inode_space = 0; unsigned dropped_extents = 0; + unsigned num_extents = 0; - BUG_ON(!BTRFS_I(inode)-outstanding_extents); - BTRFS_I(inode)-outstanding_extents--; + num_extents = (unsigned)div64_u64(num_bytes + + BTRFS_MAX_EXTENT_SIZE - 1, + BTRFS_MAX_EXTENT_SIZE); A fastpath is better, like btrfs_merge_extent_hook(). (num_extents BTRFS_MAX_EXTENT_SIZE) ? num_extents = 1 : (div64_u64(...)) + ASSERT(num_extents); + ASSERT(BTRFS_I(inode)-outstanding_extents = num_extents); + BTRFS_I(inode)-outstanding_extents -= num_extents; if (BTRFS_I(inode)-outstanding_extents == 0 test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, @@ -5149,7 +5155,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) out_fail: spin_lock(BTRFS_I(inode)-lock); - dropped = drop_outstanding_extent(inode); + dropped = drop_outstanding_extent(inode, num_bytes); /* * If the inodes csum_bytes is the same as the original * csum_bytes then we know we haven't raced with any free()ers @@ -5228,7 +5234,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) num_bytes = ALIGN(num_bytes, root-sectorsize); spin_lock(BTRFS_I(inode)-lock); - dropped = drop_outstanding_extent(inode); + dropped = drop_outstanding_extent(inode, num_bytes); if (num_bytes) to_free = calc_csum_metadata_size(inode, num_bytes, 0); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4ebabd2..3fbc177 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3239,7 +3239,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, page, delalloc_start, delalloc_end, -128 * 1024 * 1024); +BTRFS_MAX_EXTENT_SIZE); if (nr_delalloc == 0) { delalloc_start = delalloc_end + 1; continue; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fb16fd3..4564975 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1530,10 +1530,45 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, static void btrfs_split_extent_hook(struct inode *inode, struct extent_state *orig, u64 split) { + u64 size; + /* not delalloc, ignore it */ if (!(orig-state EXTENT_DELALLOC)) return; +
Re: [PATCH 2/3] Btrfs: don't set and clear delalloc for O_DIRECT writes
On Wed, Feb 11, 2015 at 03:08:58PM -0500, Josef Bacik wrote: We do this to get the space accounting, but this is just needless churn on the io_tree, so just drop setting/clearing delalloc and just drop the reserved data space when we have a successfull allocation. Thanks, Looks good. Reviewed-by: Liu Bo bo.li@oracle.com Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/inode.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e78a2fd..fb16fd3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7142,7 +7142,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, int ret = 0; if (create) - unlock_bits |= EXTENT_DELALLOC | EXTENT_DIRTY; + unlock_bits |= EXTENT_DIRTY; else len = min_t(u64, len, root-sectorsize); @@ -7278,11 +7278,7 @@ unlock: BTRFS_I(inode)-outstanding_extents++; spin_unlock(BTRFS_I(inode)-lock); } - - ret = set_extent_bit(BTRFS_I(inode)-io_tree, lockstart, - lockstart + len - 1, EXTENT_DELALLOC, NULL, - cached_state, GFP_NOFS); - BUG_ON(ret); + btrfs_free_reserved_data_space(inode, len); } /* -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1 V2] export symbol kobject_move()
drivers/cpufreq/cpufreq.c is already using this function. And now btrfs needs it as well. export symbol kobject_move(). Signed-off-by: Anand Jain anand.j...@oracle.com --- v1-v2: Didn't notice there wasn't my signed-off, now added. Thanks Dave. lib/kobject.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/kobject.c b/lib/kobject.c index 58751bb..e055c06 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -548,6 +548,7 @@ out: kfree(devpath); return error; } +EXPORT_SYMBOL_GPL(kobject_move); /** * kobject_del - unlink kobject from hierarchy. -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] btrfs: cleanup: remove unuesd DEDINE_WAIT() in btrfs_bio_counter_inc_blocked()
From: Zhao Lei zhao...@cn.fujitsu.com 1: Remove unused DEFINE_WAIT(wait) 2: Add likely() for BTRFS_FS_STATE_DEV_REPLACING condition 3: Use a loop instead of goto Changelog v1-v2: s/look/loop in description. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/dev-replace.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index ca6a3a3..92109b7 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -932,15 +932,15 @@ void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info) { - DEFINE_WAIT(wait); -again: - percpu_counter_inc(fs_info-bio_counter); - if (test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)) { + while (1) { + percpu_counter_inc(fs_info-bio_counter); + if (likely(!test_bit(BTRFS_FS_STATE_DEV_REPLACING, +fs_info-fs_state))) + break; + btrfs_bio_counter_dec(fs_info); wait_event(fs_info-replace_wait, !test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)); - goto again; } - } -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/24] Btrfs: sysfs: fix, kobject pointer clean up needed after kobject release
From: Anand Jain anand.j...@oracle.com The sysfs clean up self test like in the below code fails, since fs_info-device_dir_kobject still points to its stale kobject. Reseting this pointer will help to fix this. open_ctree() { ret = btrfs_sysfs_add_one(fs_info); :: + btrfs_sysfs_remove_one(fs_info); + ret = btrfs_sysfs_add_one(fs_info); + if (ret) { + pr_err(BTRFS: failed to init sysfs interface: %d\n, ret); + goto fail_block_groups; + } Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index adfac3e..15fead2 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -525,6 +525,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) btrfs_kobj_rm_device(fs_info, NULL); kobject_del(fs_info-device_dir_kobj); kobject_put(fs_info-device_dir_kobj); + fs_info-device_dir_kobj = NULL; addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-super_kobj, btrfs_feature_attr_group); __btrfs_sysfs_remove_one(fs_info); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/24] Btrfs: sysfs: move super_kobj and device_dir_kobj from fs_info to btrfs_fs_devices
From: Anand Jain anand.j...@oracle.com This patch will provide a framework and help to create attributes from the structure btrfs_fs_devices which are available even before fs_info is created. So by moving the parent kobject super_kobj from fs_info to btrfs_fs_devices, it will help to create attributes from the btrfs_fs_devices as well. Patches on top of this patch now will be able to create the sys/fs/btrfs/fsid kobject and attributes from btrfs_fs_devices when devices are scanned and registered to the kernel. Just to note, this does not change any of the existing btrfs sysfs external kobject names and its attributes and not even the life cycle of them. Changes are internal only. And to ensure the same, this path has been tested with various device operations and, checking and comparing the sysfs kobjects and attributes with sysfs kobject and attributes with out this patch, and they remain same. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/ctree.h | 3 -- fs/btrfs/sysfs.c | 88 ++ fs/btrfs/volumes.c | 3 +- fs/btrfs/volumes.h | 5 4 files changed, 56 insertions(+), 43 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 7e60741..9493b91 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1580,10 +1580,7 @@ struct btrfs_fs_info { struct task_struct *cleaner_kthread; int thread_pool_size; - struct kobject super_kobj; struct kobject *space_info_kobj; - struct kobject *device_dir_kobj; - struct completion kobj_unregister; int do_barriers; int closing; int log_root_recovering; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 2cb4c69..ac15fbb 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -33,6 +33,7 @@ #include volumes.h static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj); +static inline struct btrfs_fs_devices *to_fs_devs(struct kobject *kobj); static u64 get_features(struct btrfs_fs_info *fs_info, enum btrfs_feature_set set) @@ -438,10 +439,10 @@ static const struct attribute *btrfs_attrs[] = { static void btrfs_release_super_kobj(struct kobject *kobj) { - struct btrfs_fs_info *fs_info = to_fs_info(kobj); + struct btrfs_fs_devices *fs_devs = to_fs_devs(kobj); - memset(fs_info-super_kobj, 0, sizeof(struct kobject)); - complete(fs_info-kobj_unregister); + memset(fs_devs-super_kobj, 0, sizeof(struct kobject)); + complete(fs_devs-kobj_unregister); } static struct kobj_type btrfs_ktype = { @@ -449,11 +450,18 @@ static struct kobj_type btrfs_ktype = { .release= btrfs_release_super_kobj, }; +static inline struct btrfs_fs_devices *to_fs_devs(struct kobject *kobj) +{ + if (kobj-ktype != btrfs_ktype) + return NULL; + return container_of(kobj, struct btrfs_fs_devices, super_kobj); +} + static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj) { if (kobj-ktype != btrfs_ktype) return NULL; - return container_of(kobj, struct btrfs_fs_info, super_kobj); + return to_fs_devs(kobj)-fs_info; } #define NUM_FEATURE_BITS 64 @@ -494,12 +502,12 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) attrs[0] = fa-kobj_attr.attr; if (add) { int ret; - ret = sysfs_merge_group(fs_info-super_kobj, + ret = sysfs_merge_group(fs_info-fs_devices-super_kobj, agroup); if (ret) return ret; } else - sysfs_unmerge_group(fs_info-super_kobj, + sysfs_unmerge_group(fs_info-fs_devices-super_kobj, agroup); } @@ -507,18 +515,17 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) return 0; } -static void btrfs_sysfs_remove_fsid(struct btrfs_fs_info *fs_info) +static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) { - if (fs_info-device_dir_kobj) { - btrfs_kobj_rm_device(fs_info, NULL); - kobject_del(fs_info-device_dir_kobj); - kobject_put(fs_info-device_dir_kobj); - fs_info-device_dir_kobj = NULL; + if (fs_devs-device_dir_kobj) { + kobject_del(fs_devs-device_dir_kobj); + kobject_put(fs_devs-device_dir_kobj); + fs_devs-device_dir_kobj = NULL; } - kobject_del(fs_info-super_kobj); - kobject_put(fs_info-super_kobj); - wait_for_completion(fs_info-kobj_unregister); + kobject_del(fs_devs-super_kobj); + kobject_put(fs_devs-super_kobj); +
[PATCH 12/24] Btrfs: sysfs: add pointer to access fs_info from fs_devices
From: Anand Jain anand.j...@oracle.com adds fs_info pointer with struct btrfs_fs_devices. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 4 fs/btrfs/volumes.h | 1 + 2 files changed, 5 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index ac15fbb..4b5bac6 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -530,6 +530,8 @@ static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) { + fs_info-fs_devices-fs_info = NULL; + if (fs_info-space_info_kobj) { sysfs_remove_files(fs_info-space_info_kobj, allocation_attrs); kobject_del(fs_info-space_info_kobj); @@ -729,6 +731,8 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) struct btrfs_fs_devices *fs_devs = fs_info-fs_devices; struct kobject *super_kobj = fs_devs-super_kobj; + fs_devs-fs_info = fs_info; + error = btrfs_sysfs_add_fsid(fs_devs, NULL); if (error) return error; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index c2e5bd0..53fd278 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -254,6 +254,7 @@ struct btrfs_fs_devices { */ int rotating; + struct btrfs_fs_info *fs_info; /* sysfs kobjects */ struct kobject super_kobj; struct kobject *device_dir_kobj; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/24] Btrfs: sysfs: let default_attrs be separate from the kset
From: Anand Jain anand.j...@oracle.com As of now btrfs_attrs are provided using the default_attrs through the kset. Separate them and create the default_attrs using the sysfs_create_files instead. By doing this we will have the flexibility that device discovery thread could create fsid kobject. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index f42d8fd..5208a49 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -428,7 +428,7 @@ static ssize_t btrfs_clone_alignment_show(struct kobject *kobj, BTRFS_ATTR(clone_alignment, btrfs_clone_alignment_show); -static struct attribute *btrfs_attrs[] = { +static const struct attribute *btrfs_attrs[] = { BTRFS_ATTR_PTR(label), BTRFS_ATTR_PTR(nodesize), BTRFS_ATTR_PTR(sectorsize), @@ -447,7 +447,6 @@ static void btrfs_release_super_kobj(struct kobject *kobj) static struct kobj_type btrfs_ktype = { .sysfs_ops = kobj_sysfs_ops, .release= btrfs_release_super_kobj, - .default_attrs = btrfs_attrs, }; static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj) @@ -531,6 +530,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) } addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-super_kobj, btrfs_feature_attr_group); + sysfs_remove_files(fs_info-super_kobj, btrfs_attrs); btrfs_sysfs_remove_fsid(fs_info); } @@ -720,13 +720,17 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) return error; } - error = sysfs_create_group(fs_info-super_kobj, - btrfs_feature_attr_group); + error = sysfs_create_files(fs_info-super_kobj, btrfs_attrs); if (error) { btrfs_sysfs_remove_fsid(fs_info); return error; } + error = sysfs_create_group(fs_info-super_kobj, + btrfs_feature_attr_group); + if (error) + goto failure; + error = addrm_unknown_feature_attrs(fs_info, true); if (error) goto failure; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/24] Btrfs: sysfs: fix, btrfs_release_super_kobj() should to clean up the kobject data
From: Anand Jain anand.j...@oracle.com The following test case fails indicating that, thread tried to init an initialized object. kernel: [232104.016513] kobject (880006c1c980): tried to init an initialized object, something is seriously wrong. btrfs_sysfs_remove_one() self test code: open_tree() { :: ret = btrfs_sysfs_add_one(fs_info); if (ret) { pr_err(BTRFS: failed to init sysfs interface: %d\n, ret); goto fail_block_groups; } + btrfs_sysfs_remove_one(fs_info); + ret = btrfs_sysfs_add_one(fs_info); + if (ret) { + pr_err(BTRFS: failed to init sysfs interface: %d\n, ret); + goto fail_block_groups; + } cleaning up the unregistered kobject fixes this. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 92db3f6..68dcd17 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -439,6 +439,8 @@ static struct attribute *btrfs_attrs[] = { static void btrfs_release_super_kobj(struct kobject *kobj) { struct btrfs_fs_info *fs_info = to_fs_info(kobj); + + memset(fs_info-super_kobj, 0, sizeof(struct kobject)); complete(fs_info-kobj_unregister); } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/24] Btrfs: sysfs: fix, undo sysfs device links
From: Anand Jain anand.j...@oracle.com Theoritically need to remove the device links attributes, but since its entire device kobject was removed, so there wasn't any issue of about it. Just do it nicely. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 17 + 1 file changed, 17 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 68dcd17..adfac3e 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -522,6 +522,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) kobject_del(fs_info-space_info_kobj); kobject_put(fs_info-space_info_kobj); } + btrfs_kobj_rm_device(fs_info, NULL); kobject_del(fs_info-device_dir_kobj); kobject_put(fs_info-device_dir_kobj); addrm_unknown_feature_attrs(fs_info, false); @@ -604,6 +605,8 @@ static void init_feature_attrs(void) } } +/* when one_device is NULL, it removes all device links */ + int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, struct btrfs_device *one_device) { @@ -621,6 +624,20 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, disk_kobj-name); } + if (one_device) + return 0; + + list_for_each_entry(one_device, + fs_info-fs_devices-devices, dev_list) { + if (!one_device-bdev) + continue; + disk = one_device-bdev-bd_part; + disk_kobj = part_to_dev(disk)-kobj; + + sysfs_remove_link(fs_info-device_dir_kobj, + disk_kobj-name); + } + return 0; } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/24] Btrfs: sysfs: provide framework to remove all fsid kobject
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4b5bac6..83d7535 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -515,7 +515,7 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) return 0; } -static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) +static void __btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) { if (fs_devs-device_dir_kobj) { kobject_del(fs_devs-device_dir_kobj); @@ -528,6 +528,21 @@ static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) wait_for_completion(fs_devs-kobj_unregister); } +/* when fs_devs is NULL it will remove all fsid kobject */ +static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) +{ + struct list_head *fs_uuids = btrfs_get_fs_uuids(); + + if (fs_devs) { + __btrfs_sysfs_remove_fsid(fs_devs); + return; + } + + list_for_each_entry(fs_devs, fs_uuids, list) { + __btrfs_sysfs_remove_fsid(fs_devs); + } +} + void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) { fs_info-fs_devices-fs_info = NULL; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix out-of-space bug
From: Zhao Lei zhao...@cn.fujitsu.com Btrfs will report NO_SPACE when we create and remove files for several times, and we can't write to filesystem until mount it again. Steps to reproduce: 1: Create a single-dev btrfs fs with default option 2: Write a file into it to take up most fs space 3: Delete above file 4: Wait about 100s to let chunk removed 5: goto 2 Script is like following: #!/bin/bash # Recommend 1.2G space, too large disk will make test slow DEV=/dev/sda16 MNT=/mnt/tmp dev_size=$(lsblk -bn -o SIZE $DEV) || exit 2 file_size_m=$((dev_size * 75 / 100 / 1024 / 1024)) echo Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev for ((i = 0; i 10; i++)); do umount $MNT 2/dev/null; done echo mkfs $DEV mkfs.btrfs -f $DEV /dev/null || exit 2 echo mount $DEV $MNT mount $DEV $MNT || exit 2 for ((loop_i = 0; loop_i 20; loop_i++)); do echo echo loop $loop_i echo dd file... cmd=(dd if=/dev/zero of=$MNT/file0 bs=1M count=$file_size_m) ${cmd[@]} 2/dev/null || { # NO_SPACE error triggered echo dd failed: ${cmd[*]} exit 1 } echo rm file... rm -f $MNT/file0 || exit 2 for ((i = 0; i 10; i++)); do df $MNT | tail -1 sleep 10 done done Reason: It is triggered by commit: 47ab2a6c689913db23ccae38349714edf8365e0a which is used to remove empty block groups automatically, but the reason is not in that patch. Code before works well because btrfs don't need to create and delete chunks so many times with high complexity. Above bug is caused by many reason, any of them can trigger it. Reason1: btrfs_check_data_free_space() try to commit transaction and retry allocating chunk when the first allocating failed, but space_info-full is set in first allocating, and prevent second allocating in retry. When we commit transaction with removed bgs, we need to clear space_info-full. Fixed in this patch. Reason2: When we remove some continuous chunks but leave other chunks after, these disk space should be used by chunk-recreating, but in current code, only first create will successed. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Reason3: contains_pending_extent() return wrong value in calculation. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Changelog v2-v3: v2 fixed the bug by adding more commit-transaction, but we only need to reclaim space when we are really have no space for new chunk. Actually, code already have this type of commit-and-retry, we only need to make it working with removed-bgs. v3 fixed the bug by above way. Changelog v1-v2: v1 will introduce a new bug when delete and create chunk in same disk space in same transaction, noticed by: Filipe David Manana fdman...@gmail.com V2 fix this bug by commit transaction after remove block grops. Tested for severial times by above script. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/transaction.c | 4 fs/btrfs/transaction.h | 5 + fs/btrfs/volumes.c | 2 ++ 3 files changed, 11 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8cc0e97..f9299f3 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -220,6 +220,7 @@ loop: * commit the transaction. */ atomic_set(cur_trans-use_count, 2); + atomic_set(cur_trans-have_free_bgs, 0); cur_trans-start_time = get_seconds(); cur_trans-delayed_refs.href_root = RB_ROOT; @@ -2030,6 +2031,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, btrfs_finish_extent_commit(trans, root); + if (atomic_read(cur_trans-have_free_bgs)) + btrfs_clear_space_info_full(root-fs_info); + root-fs_info-last_trans_committed = cur_trans-transid; /* * We needn't acquire the lock here because there is no other task diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 00ed29c..5225326 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -47,6 +47,11 @@ struct btrfs_transaction { atomic_t num_writers; atomic_t use_count; + /* +* true if there is free bgs operations in this transaction +*/ + atomic_t have_free_bgs; + /* Be protected by fs_info-trans_lock when we want to change it. */ enum btrfs_trans_state state; struct list_head list; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ba9857d..46495a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1323,6 +1323,8 @@ again: if (ret) { btrfs_error(root-fs_info, ret, Failed to remove dev extent item); + } else { + atomic_set(trans-transaction-have_free_bgs, 1); } out: btrfs_free_path(path); --
Re: kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123
I can confirm that Liu's aforementioned patch made the filesystem accessible again. Thanks for your help. J. 2015-02-09 6:05 GMT+01:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123 From: Jeroen Van den Keybus jeroen.vandenkey...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2015年02月09日 06:14 Hi, I have a LUKS encrypted raw external (USB) disk mapped to /dev/mapper/sd.backup. This mapped device was default btrfs formatted. I can mount the mapped device to /mnt/backup. There used to be a subvolume in /mnt/backup, which I deleted. I now seem unable to either add a directory to /mnt/backup or umount /mnt/backup; the command never finishes and the dmesg log reports: [17937.939438] [ cut here ] [17937.939523] kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123! [17937.939602] invalid opcode: [#1] SMP [17937.939664] Modules linked in: xts gf128mul rfcomm bluetooth joydev hid_logitech_dj xt_multiport nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_physdev br_netfilter xt_tcpudp xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables ebtable_filter ebtable_broute bridge stp llc ebtables x_tables eeepc_wmi asus_wmi sparse_keymap video kvm_amd kvm pl2303 usbserial serio_raw k10temp snd_usb_audio snd_usbmidi_lib snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel sp5100_tco snd_hda_controller i2c_piix4 snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd shpchp soundcore 8250_fintek mac_hid parport_pc [17937.940798] ppdev lp parport nct6775 nls_iso8859_1 hwmon_vid btrfs raid10 raid1 multipath linear raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx hid_generic usbhid hid xor raid6_pq psmouse radeon uas usb_storage r8169 i2c_algo_bit ttm mii drm_kms_helper drm wmi ahci libahci [17937.941259] CPU: 1 PID: 16331 Comm: btrfs-cleaner Not tainted 3.18.3-031803-generic #201501161810 [17937.941358] Hardware name: System manufacturer System Product Name/E45M1-M PRO, BIOS 0801 01/10/2012 [17937.941464] task: 88023383a800 ti: 880183868000 task.ti: 880183868000 [17937.941547] RIP: 0010:[c04dd109] [c04dd109] btrfs_orphan_add+0x1a9/0x1c0 [btrfs] [17937.941705] RSP: 0018:88018386bc98 EFLAGS: 00010286 [17937.941769] RAX: ffe4 RBX: 8801ebe6b000 RCX: [17937.941849] RDX: 5cb8 RSI: 0004 RDI: 8801b7f85138 [17937.941928] RBP: 88018386bcd8 R08: 88023ed1db40 R09: 88019ab07b40 [17937.942007] R10: R11: 0010 R12: 880233513630 [17937.942086] R13: 880041414d58 R14: 8801ebe6b458 R15: 0001 [17937.942168] FS: 7f8d60824740() GS:88023ed0() knlGS: [17937.942261] CS: 0010 DS: ES: CR0: 8005003b [17937.942327] CR2: 7f70b2ec CR3: 000124627000 CR4: 07e0 [17937.942406] Stack: [17937.942435] 88018386bcd8 c051bd4f 8801b93b9000 880204bbe200 [17937.942537] 8801b93b9000 88019ab07b40 880233513630 0001 [17937.942640] 88018386bd58 c04c5310 8801b7f85000 0004c04abffa [17937.942742] Call Trace: [17937.942819] [c051bd4f] ? lookup_free_space_inode+0x4f/0x100 [btrfs] [17937.942934] [c04c5310] btrfs_remove_block_group+0x140/0x490 [btrfs] [17937.943056] [c0500065] btrfs_remove_chunk+0x245/0x380 [btrfs] [17937.943163] [c04c5896] btrfs_delete_unused_bgs+0x236/0x270 [btrfs] [17937.943272] [c04ced6c] cleaner_kthread+0x12c/0x190 [btrfs] [17937.943374] [c04cec40] ? btrfs_destroy_all_delalloc_inodes+0x120/0x120 [btrfs] [17937.943471] [85093a49] kthread+0xc9/0xe0 [17937.943531] [85093980] ? flush_kthread_worker+0x90/0x90 [17937.943608] [857b3b7c] ret_from_fork+0x7c/0xb0 [17937.943673] [85093980] ? flush_kthread_worker+0x90/0x90 [17937.943746] Code: e8 4d 9f fc ff 8b 45 c8 e9 6d ff ff ff 0f 1f 44 00 00 f0 41 80 65 80 fd 4c 89 ef 89 45 c8 e8 bf 1e fe ff 8b 45 c8 e9 48 ff ff ff 0f 0b 4c 89 f7 45 31 f6 e8 ea 64 2d c5 e9 f9 fe ff ff 0f 1f 44 [17937.944273] RIP [c04dd109] btrfs_orphan_add+0x1a9/0x1c0 [btrfs] [17937.944392] RSP 88018386bc98 [17937.944503] ---[ end trace cee2bcd2393b84fb ]--- The BUG_ON in btrfs_orphan_add() seems have already been fixed by the patch from Forrest Liu. [PATCH] Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group https://patchwork.kernel.org/patch/5759741/ Thanks, Qu $ uname -a: Linux zacate 3.18.3-031803-generic #201501161810 SMP Fri Jan 16 18:12:22 UTC 2015 x86_64 x86_64
[PATCH v3] btrfs: Fix out-of-space bug
From: Zhao Lei zhao...@cn.fujitsu.com Btrfs will report NO_SPACE when we create and remove files for several times, and we can't write to filesystem until mount it again. Steps to reproduce: 1: Create a single-dev btrfs fs with default option 2: Write a file into it to take up most fs space 3: Delete above file 4: Wait about 100s to let chunk removed 5: goto 2 Script is like following: #!/bin/bash # Recommend 1.2G space, too large disk will make test slow DEV=/dev/sda16 MNT=/mnt/tmp dev_size=$(lsblk -bn -o SIZE $DEV) || exit 2 file_size_m=$((dev_size * 75 / 100 / 1024 / 1024)) echo Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev for ((i = 0; i 10; i++)); do umount $MNT 2/dev/null; done echo mkfs $DEV mkfs.btrfs -f $DEV /dev/null || exit 2 echo mount $DEV $MNT mount $DEV $MNT || exit 2 for ((loop_i = 0; loop_i 20; loop_i++)); do echo echo loop $loop_i echo dd file... cmd=(dd if=/dev/zero of=$MNT/file0 bs=1M count=$file_size_m) ${cmd[@]} 2/dev/null || { # NO_SPACE error triggered echo dd failed: ${cmd[*]} exit 1 } echo rm file... rm -f $MNT/file0 || exit 2 for ((i = 0; i 10; i++)); do df $MNT | tail -1 sleep 10 done done Reason: It is triggered by commit: 47ab2a6c689913db23ccae38349714edf8365e0a which is used to remove empty block groups automatically, but the reason is not in that patch. Code before works well because btrfs don't need to create and delete chunks so many times with high complexity. Above bug is caused by many reason, any of them can trigger it. Reason1: When we remove some continuous chunks but leave other chunks after, these disk space should be used by chunk-recreating, but in current code, only first create will successed. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Reason2: contains_pending_extent() return wrong value in calculation. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Reason3: btrfs_check_data_free_space() try to commit transaction and retry allocating chunk when the first allocating failed, but space_info-full is set in first allocating, and prevent second allocating in retry. Fixed in this patch by clear space_info-full in commit transaction. Changelog v2-v3: v2 fixed the bug by adding more commit-transaction, but we only need to reclaim space when we are really have no space for new chunk, noticed by: Filipe David Manana fdman...@gmail.com Actually, our code already have this type of commit-and-retry, we only need to make it working with removed-bgs. v3 fixed the bug with above way. Changelog v1-v2: v1 will introduce a new bug when delete and create chunk in same disk space in same transaction, noticed by: Filipe David Manana fdman...@gmail.com V2 fix this bug by commit transaction after remove block grops. Tested for severial times by above script. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/transaction.c | 4 fs/btrfs/transaction.h | 5 + fs/btrfs/volumes.c | 2 ++ 3 files changed, 11 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8cc0e97..f9299f3 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -220,6 +220,7 @@ loop: * commit the transaction. */ atomic_set(cur_trans-use_count, 2); + atomic_set(cur_trans-have_free_bgs, 0); cur_trans-start_time = get_seconds(); cur_trans-delayed_refs.href_root = RB_ROOT; @@ -2030,6 +2031,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, btrfs_finish_extent_commit(trans, root); + if (atomic_read(cur_trans-have_free_bgs)) + btrfs_clear_space_info_full(root-fs_info); + root-fs_info-last_trans_committed = cur_trans-transid; /* * We needn't acquire the lock here because there is no other task diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 00ed29c..5225326 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -47,6 +47,11 @@ struct btrfs_transaction { atomic_t num_writers; atomic_t use_count; + /* +* true if there is free bgs operations in this transaction +*/ + atomic_t have_free_bgs; + /* Be protected by fs_info-trans_lock when we want to change it. */ enum btrfs_trans_state state; struct list_head list; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ba9857d..46495a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1323,6 +1323,8 @@ again: if (ret) { btrfs_error(root-fs_info, ret, Failed to remove dev extent item); + } else { + atomic_set(trans-transaction-have_free_bgs, 1); } out:
[PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
kfree checks whether the pointer it is passed is NULL. The two foregoing checks are therefore unnecessary. This issue was detected using Coccinelle. Signed-off-by: Bas Peters baspeter...@gmail.com --- fs/btrfs/free-space-cache.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d6c03f7..7d2d817 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1976,8 +1976,7 @@ new_bitmap: out: if (info) { - if (info-bitmap) - kfree(info-bitmap); + kfree(info-bitmap); kmem_cache_free(btrfs_free_space_cachep, info); } @@ -3427,8 +3426,7 @@ again: if (info) kmem_cache_free(btrfs_free_space_cachep, info); - if (map) - kfree(map); + kfree(map); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
On Wed, 11 Feb 2015, Bas Peters wrote: kfree checks whether the pointer it is passed is NULL. The two foregoing checks are therefore unnecessary. This issue was detected using Coccinelle. Signed-off-by: Bas Peters baspeter...@gmail.com --- fs/btrfs/free-space-cache.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d6c03f7..7d2d817 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1976,8 +1976,7 @@ new_bitmap: out: if (info) { - if (info-bitmap) - kfree(info-bitmap); + kfree(info-bitmap); kmem_cache_free(btrfs_free_space_cachep, info); } @@ -3427,8 +3426,7 @@ again: if (info) kmem_cache_free(btrfs_free_space_cachep, info); - if (map) - kfree(map); + kfree(map); A certain lack of parallelism arises in the latter case. julia return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kernel-janitors in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix scheduler warning when syncing log
We try to lock a mutex while the current task state is not TASK_RUNNING, which results in the following warning when CONFIG_DEBUG_LOCK_ALLOC=y: [30736.772501] [ cut here ] [30736.774545] WARNING: CPU: 9 PID: 19972 at kernel/sched/core.c:7300 __might_sleep+0x8b/0xa8() [30736.783453] do not call blocking ops when !TASK_RUNNING; state=2 set at [8107499b] prepare_to_wait+0x43/0x89 [30736.786261] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc psmouse parport pcspkr microcode serio_raw evdev processor thermal_sys i2c_piix4 i2c_core button ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring e1000 virtio scsi_mod [30736.794323] CPU: 9 PID: 19972 Comm: fsstress Not tainted 3.19.0-rc7-btrfs-next-5+ #1 [30736.795821] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [30736.798788] 0009 88042743fbd8 814248ed 88043d32f2d8 [30736.800504] 88042743fc28 88042743fc18 81045338 0001 [30736.802131] 81064514 817c52d1 026d [30736.803676] Call Trace: [30736.804256] [814248ed] dump_stack+0x4c/0x65 [30736.805245] [81045338] warn_slowpath_common+0xa1/0xbb [30736.806360] [81064514] ? __might_sleep+0x8b/0xa8 [30736.807391] [81045398] warn_slowpath_fmt+0x46/0x48 [30736.808511] [8107499b] ? prepare_to_wait+0x43/0x89 [30736.809620] [8107499b] ? prepare_to_wait+0x43/0x89 [30736.810691] [81064514] __might_sleep+0x8b/0xa8 [30736.811703] [81426eaf] mutex_lock_nested+0x2f/0x3a0 [30736.812889] [8107bfa1] ? trace_hardirqs_on_caller+0x18f/0x1ab [30736.814138] [8107bfca] ? trace_hardirqs_on+0xd/0xf [30736.819878] [a038cfff] wait_for_writer.isra.12+0x91/0xaa [btrfs] [30736.821260] [810748bd] ? signal_pending_state+0x31/0x31 [30736.822410] [a0391f0a] btrfs_sync_log+0x160/0x947 [btrfs] [30736.823574] [8107bfa1] ? trace_hardirqs_on_caller+0x18f/0x1ab [30736.824847] [8107bfca] ? trace_hardirqs_on+0xd/0xf [30736.825972] [a036e555] btrfs_sync_file+0x2b0/0x319 [btrfs] [30736.827684] [8117901a] vfs_fsync_range+0x21/0x23 [30736.828932] [81179038] vfs_fsync+0x1c/0x1e [30736.829917] [8117928b] do_fsync+0x34/0x4e [30736.830862] [811794b3] SyS_fsync+0x10/0x14 [30736.831819] [8142a512] system_call_fastpath+0x12/0x17 [30736.832982] ---[ end trace c0b57df60d32ae5c ]--- Fix this my acquiring the mutex after calling finish_wait(), which sets the task's state to TASK_RUNNING. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/tree-log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index a266587..ea0431d 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2448,8 +2448,8 @@ static void wait_for_writer(struct btrfs_trans_handle *trans, mutex_unlock(root-log_mutex); if (atomic_read(root-log_writers)) schedule(); - mutex_lock(root-log_mutex); finish_wait(root-log_writer_wait, wait); + mutex_lock(root-log_mutex); } } -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
kfree checks whether the pointer it is passed is NULL. The two foregoing checks are therefore unnecessary. This issue was detected using Coccinelle. Would you like to integrate my update suggestion btrfs: Deletion of unnecessary checks before six function calls? https://lkml.org/lkml/2014/10/31/606 http://article.gmane.org/gmane.linux.kernel/1818924 https://systeme.lip6.fr/pipermail/cocci/2014-October/001321.html Regards, Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
2015-02-11 12:30 GMT+01:00 SF Markus Elfring elfr...@users.sourceforge.net: kfree checks whether the pointer it is passed is NULL. The two foregoing checks are therefore unnecessary. This issue was detected using Coccinelle. Would you like to integrate my update suggestion btrfs: Deletion of unnecessary checks before six function calls? https://lkml.org/lkml/2014/10/31/606 http://article.gmane.org/gmane.linux.kernel/1818924 https://systeme.lip6.fr/pipermail/cocci/2014-October/001321.html Oh, I see you already made the exact same change. I'll just drop my patch in that case. Regards, Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
Markus, 2015-02-11 13:18 GMT+01:00 SF Markus Elfring elfr...@users.sourceforge.net: https://lkml.org/lkml/2014/10/31/606 http://article.gmane.org/gmane.linux.kernel/1818924 https://systeme.lip6.fr/pipermail/cocci/2014-October/001321.html Oh, I see you already made the exact same change. Would you like to add any tags to my update suggestion? No, it's fine, I should've checked before submitting the patch. Regards, Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: btrfs: free-space-cache.c: remove two unnecessary checks before calling kfree()
https://lkml.org/lkml/2014/10/31/606 http://article.gmane.org/gmane.linux.kernel/1818924 https://systeme.lip6.fr/pipermail/cocci/2014-October/001321.html Oh, I see you already made the exact same change. Would you like to add any tags to my update suggestion? Regards, Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On 2015-02-09 12:26, P. Remek wrote: Hello, I am benchmarking Btrfs and when benchmarking random writes with fio utility, I noticed following two things: Based on what I know about BTRFS, I think that these issues actually have distinct causes. 1) On first run when target file doesn't exist yet, perfromance is about 8000 IOPs. On second, and every other run, performance goes up to 7 IOPs. Its massive difference. The target file is the one created during the first run. I've noticed that almost always, file creation on BTRFS is slower than file re-writes. This seems to especially be the case when using AIO and/or O_DIRECT (although O_DIRECT on a COW filesystem is _really_ complicated to get right). I don't know that there is really any way currently to solve this, although it would be interesting to see if fallocat'ing the files prior to the initial run would have any significant performance impact. 2) There are windows during the test where IOPs drop to 0 and stay 0 about 10 seconds and then it goes back again, and after couple of seconds again to 0. This is reproducible 100% times. I've seen this same behavior on a number of filesystems (not just BTRFS) when using the default I/O scheduler with it's default parameters, especially on systems with high performance storage. IIRC, Ubuntu 13.10 switched from using the upstream default I/O scheduler (CFQ) to using the Deadline I/O scheduler because it has better performance (and is more deterministic) on most cheap commodity desktop/laptop hardware. I've found however that the Deadline scheduler actually tends to perform worse than CFQ when used on higher-end server systems and/or SSD's, although CFQ with default parameters only does marginally better. I'd suggest experimenting with some of the parameters under /sys/block (check the files in the Documentation/block directory of the Linux kernel sources for information about what (almost) everything there does). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] btrfs: Fix out-of-space bug
From: Zhao Lei zhao...@cn.fujitsu.com Btrfs will report NO_SPACE when we create and remove files for several times, and we can't write to filesystem until mount it again. Steps to reproduce: 1: Create a single-dev btrfs fs with default option 2: Write a file into it to take up most fs space 3: Delete above file 4: Wait about 100s to let chunk removed 5: goto 2 Script is like following: #!/bin/bash # Recommend 1.2G space, too large disk will make test slow DEV=/dev/sda16 MNT=/mnt/tmp dev_size=$(lsblk -bn -o SIZE $DEV) || exit 2 file_size_m=$((dev_size * 75 / 100 / 1024 / 1024)) echo Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev for ((i = 0; i 10; i++)); do umount $MNT 2/dev/null; done echo mkfs $DEV mkfs.btrfs -f $DEV /dev/null || exit 2 echo mount $DEV $MNT mount $DEV $MNT || exit 2 for ((loop_i = 0; loop_i 20; loop_i++)); do echo echo loop $loop_i echo dd file... cmd=(dd if=/dev/zero of=$MNT/file0 bs=1M count=$file_size_m) ${cmd[@]} 2/dev/null || { # NO_SPACE error triggered echo dd failed: ${cmd[*]} exit 1 } echo rm file... rm -f $MNT/file0 || exit 2 for ((i = 0; i 10; i++)); do df $MNT | tail -1 sleep 10 done done Reason: It is triggered by commit: 47ab2a6c689913db23ccae38349714edf8365e0a which is used to remove empty block groups automatically, but the reason is not in that patch. Code before works well because btrfs don't need to create and delete chunks so many times with high complexity. Above bug is caused by many reason, any of them can trigger it. Reason1: When we remove some continuous chunks but leave other chunks after, these disk space should be used by chunk-recreating, but in current code, only first create will successed. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Reason2: contains_pending_extent() return wrong value in calculation. Fixed by Forrest Liu forre...@synology.com in: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole Reason3: btrfs_check_data_free_space() try to commit transaction and retry allocating chunk when the first allocating failed, but space_info-full is set in first allocating, and prevent second allocating in retry. Fixed in this patch by clear space_info-full in commit transaction. Tested for severial times by above script. Changelog v3-v4: use light weight int instead of atomic_t to record have_remove_bgs in transaction, suggested by: Josef Bacik jba...@fb.com Changelog v2-v3: v2 fixed the bug by adding more commit-transaction, but we only need to reclaim space when we are really have no space for new chunk, noticed by: Filipe David Manana fdman...@gmail.com Actually, our code already have this type of commit-and-retry, we only need to make it working with removed-bgs. v3 fixed the bug with above way. Changelog v1-v2: v1 will introduce a new bug when delete and create chunk in same disk space in same transaction, noticed by: Filipe David Manana fdman...@gmail.com V2 fix this bug by commit transaction after remove block grops. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Suggested-by: Filipe David Manana fdman...@gmail.com Suggested-by: Josef Bacik jba...@fb.com Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/transaction.c | 4 fs/btrfs/transaction.h | 5 + fs/btrfs/volumes.c | 2 ++ 3 files changed, 11 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e88b59d..2c192f9 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -220,6 +220,7 @@ loop: * commit the transaction. */ atomic_set(cur_trans-use_count, 2); + cur_trans-have_free_bgs = 0; cur_trans-start_time = get_seconds(); cur_trans-delayed_refs.href_root = RB_ROOT; @@ -2026,6 +2027,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, btrfs_finish_extent_commit(trans, root); + if (cur_trans-have_free_bgs) + btrfs_clear_space_info_full(root-fs_info); + root-fs_info-last_trans_committed = cur_trans-transid; /* * We needn't acquire the lock here because there is no other task diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 00ed29c..22e45d1 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -47,6 +47,11 @@ struct btrfs_transaction { atomic_t num_writers; atomic_t use_count; + /* +* true if there is free bgs operations in this transaction +*/ + int have_free_bgs; + /* Be protected by fs_info-trans_lock when we want to change it. */ enum btrfs_trans_state state; struct list_head list; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 50c5a87..e86f4ca 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1310,6
Re: [PATCH 23/24] Btrfs: sysfs: support seed devices in the sysfs layout
Thanks for commenting. more below. On 02/12/2015 02:52 AM, David Sterba wrote: On Mon, Feb 09, 2015 at 07:56:24AM +0800, Anand Jain wrote: This adds an enhancement to show the seed fsid and devices. The way sprouting handles fs_devices: clone seed fs_devices and add to the fs_uuids mem copy seed fs_devices and assign to fs_devices-seed (move dev_list) evacuate seed fs_devices contents to hold sprout fs devices contents So to be inline with this fs_devices changes during seeding, represent seed fsid under the sprout fsid, this is achieved by using the kobject_move() eg: showing two levels of seeding. That's new to me, how does nested seeding work? I called below operation as nested seeding: mark a sprout as seed, mount it, add a new sprout to it. eg: mkfs.btrfs /dev/sdz btrfstune -S 1 /dev/sdz mount /dev/sdz /btrfs btrfs dev add /dev/sdy /btrfs umount /btrfs btrfstune -S 1 /dev/sdy mount /dev/sdy /btrfs btrfs dev add /dev/sdx /btrfs (Its bit complicated during seeding, as fs_devices and device move around. /proc/fs/btrfs/devlist helped to understand. its in the ML) { Since we are on this topic: btrfs-progs shouldn't have had this patch: git log -p 2513077 - commit 2513077f2f830b4bc83d528bfb6979eb461918bd Author: Gui Hecheng guihc.f...@cn.fujitsu.com Date: Mon Oct 6 18:16:46 2014 +0800 btrfs-progs: fix device missing of btrfs fi show with seed devices - it doesn't work with nested seed as I commented http://marc.info/?l=linux-btrfsm=141102300324251w=2 - btrfs fi show -d warning devid 1 not found already warning devid 2 not found already Check tree block failed, want=29425664, have=0 read block failed check_tree_block Couldn't setup csum tree Check tree block failed, want=29360128, have=0 read block failed check_tree_block - I haven't see next version of this patch from Gui. (Gui ?, copied) } find /sys/fs/btrfs/ -type d -name devices -exec ls {} \; -print sde /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/devices sdd /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/53ac3265-0c34-4afd-9453-cc0d1a07be64/devices The plain uuid is IMHO not the best naming convention, although it's acceptable in the global list in /sys/fs/btrfs/* I'd rather avoid it if it's mixed with other files. just to clarify, the above aren't uuid, they are fsid rather, sorry I didn't mention. sde /sys/fs/btrfs/sprout-fsid/seed-fsid/devices sdd /sys/fs/btrfs/sprout-fsid/seed-fsid/2nd-level-seed-fsid/devices In any case, as in previous RFC patch [PATCH RFC] btrfs: add sysfs layout to show volume info uuid will be there, the reasons are first, btrfs kernel the device uniqueness is determined by fsid-uuid-devid combination (which means if _any one_ of these is different its going to create a new struct btrfs_device), so its easy to be inline with that. name abstraction links on top of it can be created as well. next, we originally have device name link under /sys/fs/btrfs/fsid/device. Since it made first, I doubt if we could alter that to a kobject dir instead of link?. Some script might be using it. So I am planning to put uuid under /sys/fs/btrfs/fsid/device to contain info about the device. as shown in the RFC patch above. Would it be enough to print all relevant seeding information into a single file? If the UUID directoreis do not contain anything else, that would be IMHO best. Hmm nope it will contain more info. Do the seeding fsids exist on their own in /sys/sf/btrfs? I haven't tested the patchset so I'd probably find that out myself. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: procfs-devlist: introduce procfs interface for the device list
From: Anand Jain anand.j...@oracle.com (added RFC prefix to the patch header) (as of now just an experimental interface) This patch introduces profs interface /proc/fs/btrfs/devlist, which as of now exports all the members of kernel fs_devices. The current /sys/fs/btrfs interface works when the fs is mounted, and is on the file directory hierarchy and also has the sysfs limitation max output of U64 per file. Here btrfs procfs uses seq_file to export all the members of fs_devices. Also shows the contents when device is not mounted, but have registered with btrfs kernel (useful as an alternative to buggy ready ioctl) An attempt is made to follow the some standard file format output such as ini. So that a simple warper python script will provide end user useful interfaces. Further planning to add few more members to the interface such as group profile info. The long term idea is to make procfs interface a onestop btrfs application interface for the device and fs info from the kernel, where a simple python script can make use of it. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.h | 4 ++ fs/btrfs/procfs.c | 142 + fs/btrfs/super.c | 4 ++ fs/btrfs/volumes.h | 1 + 5 files changed, 152 insertions(+), 1 deletion(-) create mode 100644 fs/btrfs/procfs.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 6d1d0b9..134a62f 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -4,7 +4,7 @@ obj-$(CONFIG_BTRFS_FS) := btrfs.o btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ file-item.o inode-item.o inode-map.o disk-io.o \ transaction.o inode.o file.o tree-defrag.o \ - extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ + extent_map.o sysfs.o procfs.o struct-funcs.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9493b91..a83a16a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3986,6 +3986,10 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); int btrfs_parse_options(struct btrfs_root *root, char *options); int btrfs_sync_fs(struct super_block *sb, int wait); +/* procfs.c */ +void btrfs_exit_procfs(void); +void btrfs_init_procfs(void); + #ifdef CONFIG_PRINTK __printf(2, 3) void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...); diff --git a/fs/btrfs/procfs.c b/fs/btrfs/procfs.c new file mode 100644 index 000..d16a76b --- /dev/null +++ b/fs/btrfs/procfs.c @@ -0,0 +1,142 @@ +#include linux/seq_file.h +#include linux/vmalloc.h +#include linux/proc_fs.h +#include linux/rcustring.h +#include ctree.h +#include volumes.h + +#define BTRFS_PROC_PATHfs/btrfs +#define BTRFS_PROC_DEVLIST devlist + +struct proc_dir_entry *btrfs_proc_root; + +void btrfs_print_devlist(struct seq_file *seq) +{ + + /* Btrfs Procfs String Len */ +#define BPSL 256 +#define BTRFS_SEQ_PRINT(plist, arg)\ + snprintf(str, BPSL, plist, arg);\ + if (sprt)\ + seq_printf(seq, \t);\ + seq_printf(seq, str) + + char str[BPSL]; + struct btrfs_device *device; + struct btrfs_fs_devices *fs_devices; + struct btrfs_fs_devices *cur_fs_devices; + struct btrfs_fs_devices *sprt; //sprout fs devices + struct list_head *fs_uuids = btrfs_get_fs_uuids(); + struct list_head *cur_uuid; + + seq_printf(seq, \n#Its Experimental, parameters may change without notice.\n\n); + + mutex_lock(uuid_mutex); + /* Todo: there must be better way than nested locks */ + list_for_each(cur_uuid, fs_uuids) { + cur_fs_devices = list_entry(cur_uuid, struct btrfs_fs_devices, list); + + mutex_lock(cur_fs_devices-device_list_mutex); + + fs_devices = cur_fs_devices; + sprt = NULL; + +again_fs_devs: + if (sprt) { + BTRFS_SEQ_PRINT([[seed_fsid: %pU]]\n, fs_devices-fsid); + BTRFS_SEQ_PRINT(\tsprout_fsid:\t\t%pU\n, sprt-fsid); + } else { + BTRFS_SEQ_PRINT([fsid: %pU]\n, fs_devices-fsid); + } + if (fs_devices-seed) { + BTRFS_SEQ_PRINT(\tseed_fsid:\t\t%pU\n, fs_devices-seed-fsid); + } + BTRFS_SEQ_PRINT(\tfs_devs_addr:\t\t%p\n, fs_devices); + BTRFS_SEQ_PRINT(\tnum_devices:\t\t%llu\n, fs_devices-num_devices); + BTRFS_SEQ_PRINT(\topen_devices:\t\t%llu\n, fs_devices-open_devices); + BTRFS_SEQ_PRINT(\trw_devices:\t\t%llu\n, fs_devices-rw_devices); +
[PATCH 0/3] [Not for integration, experimental] Introduce /proc/fs/btrfs/devlist
An example is easy to understand so is below. Output is in Python config reader module format, just import of this output within a python script is enough. Useful for troubleshoot/debug, storage based on btrfs to render info on their bui/gui and further btrfs-progs could be very sleek if uses this. [fsid: 763c600a-7af6-4b8a-a421-6611de307dbf] seed_fsid: e962e198-ef98-4782-ae99-d0128c9f5c37 fs_devs_addr: 880046fb4400 num_devices:1 open_devices: 1 rw_devices: 1 missing_devices:0 total_rw_devices: 1633799168 total_devices: 3 opened: 1 seeding:0 rotating: 1 super_kobj_state: 1 super_kobj_insysfs: 1 device_kobj_state: 1 device_kobj_insysfs:1 [[uuid: 762f7b41-419e-438a-8a58-8a90f6642c18]] dev_addr: 88004699 device: /dev/sdg devid: 3 dev_root_fsid: 763c600a-7af6-4b8a-a421-6611de307dbf generation: 37 total_bytes:1633799168 dev_totalbytes: 1633799168 bytes_used: 234881024 type: 0 io_align: 4096 io_width: 4096 sector_size:4096 mode: 0x83 writeable: 1 in_fs_metadata: 1 missing:0 can_discard:0 replace_tgtdev: 0 active_pending: 0 nobarriers: 0 devstats_valid: 1 bdev: not_null [[seed_fsid: e962e198-ef98-4782-ae99-d0128c9f5c37]] sprout_fsid:763c600a-7af6-4b8a-a421-6611de307dbf seed_fsid: 1c52f894-0ead-43d6-847a-d42359f78370 fs_devs_addr: 88004586a400 num_devices:1 open_devices: 1 rw_devices: 0 missing_devices:0 total_rw_devices: 0 total_devices: 2 opened: 1 seeding:1 rotating: 1 super_kobj_state: 1 super_kobj_insysfs: 1 device_kobj_state: 1 device_kobj_insysfs:1 [[uuid: d064a43c-e9ce-42fb-9c01-140d2bdcd528]] dev_addr: 880046991400 device: /dev/sdf devid: 2 dev_root_fsid: 763c600a-7af6-4b8a-a421-6611de307dbf generation: 27 total_bytes:1633799168 dev_totalbytes: 1633799168 bytes_used: 167772160 type: 0 io_align: 4096 io_width: 4096 sector_size:4096 mode: 0x81 writeable: 0 in_fs_metadata: 1 missing:0 can_discard:0 replace_tgtdev: 0 active_pending: 0 nobarriers: 0 devstats_valid: 0 bdev: not_null [[seed_fsid: 1c52f894-0ead-43d6-847a-d42359f78370]] sprout_fsid:e962e198-ef98-4782-ae99-d0128c9f5c37 fs_devs_addr: 88004586a000 num_devices:1 open_devices: 1 rw_devices: 0 missing_devices:0 total_rw_devices: 0 total_devices: 1 opened: 1 seeding:1 rotating: 1 super_kobj_state: 1 super_kobj_insysfs: 1 device_kobj_state: 1 device_kobj_insysfs:1 [[uuid: 4c9b2e81-e4b9-474c-9462-cc2dcd6117d5]] dev_addr: 880046990800 device: /dev/sde devid: 1 dev_root_fsid: 763c600a-7af6-4b8a-a421-6611de307dbf generation: 5 total_bytes:1633796096 dev_totalbytes: 1633796096 bytes_used: 180092928 type:
[PATCH 3/3] Btsfs: procfs-devlist: update the sysfs contents
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/procfs.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/procfs.c b/fs/btrfs/procfs.c index 4aec759..0211aec 100644 --- a/fs/btrfs/procfs.c +++ b/fs/btrfs/procfs.c @@ -62,6 +62,17 @@ again_fs_devs: BTRFS_SEQ_PRINT(\tseeding:\t\t%d\n, fs_devices-seeding); BTRFS_SEQ_PRINT(\trotating:\t\t%d\n, fs_devices-rotating); + BTRFS_SEQ_PRINT(\tsuper_kobj_state:\t%d\n, fs_devices-super_kobj.state_initialized); + BTRFS_SEQ_PRINT(\tsuper_kobj_insysfs:\t%d\n, fs_devices-super_kobj.state_in_sysfs); + + if (fs_devices-device_dir_kobj) { + BTRFS_SEQ_PRINT(\tdevice_kobj_state:\t%d\n, fs_devices-device_dir_kobj-state_initialized); + BTRFS_SEQ_PRINT(\tdevice_kobj_insysfs:\t%d\n, fs_devices-device_dir_kobj-state_in_sysfs); + } else { + BTRFS_SEQ_PRINT(\tdevice_kobj_state:\t%s\n, null); + BTRFS_SEQ_PRINT(\tdevice_kobj_insysfs:\t%s\n, null); + } + list_for_each_entry(device, fs_devices-devices, dev_list) { BTRFS_SEQ_PRINT(\t[[uuid: %pU]]\n, device-uuid); BTRFS_SEQ_PRINT(\t\tdev_addr:\t%p\n, device); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
Duncan 1i5t5.dun...@cox.net schrieb: P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? I'm out of my (admin, no claims at developer) league on that. I see someone else replied, and would defer to them on this. I don't think that O_DIRECT can work efficiently on COW filesystems. It probably has a negative effect and cannot be faster as normal access. Linus itself said one time that O_DIRECT is broken and should go away, and instead cache hinting should be used. Think of this: For the _unbuffered_ direct-io request to be fulfilled the file system has to go through its COW logic first which it otherwise had buffered and done in background. Bypassing the cache is probably only a side-effect of O_DIRECT, not its purpose. At least I'd try with a nocow-file for the benchmark if you still have to use O_DIRECT. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Btrfs: only adjust outstanding_extents when we do a short write
On Wed, Feb 11, 2015 at 03:08:57PM -0500, Josef Bacik wrote: We have this weird dance where we always inc outstanding_extents when we do a O_DIRECT write, even if we allocate the entire range. To get around this we also drop the metadata space if we successfully write. This is an unnecessary dance, we only need to jack up outstanding_extents if we don't satisfy the entire range request in get_blocks_direct, otherwise we are good using our original reservation. So drop the unconditional inc and the drop of the metadata space that we have for the unconditional inc. Thanks, Looks good. Reviewed-by: Liu Bo bo.li@oracle.com Thanks, -liubo Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/inode.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8a036ed..e78a2fd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7137,6 +7137,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, u64 start = iblock inode-i_blkbits; u64 lockstart, lockend; u64 len = bh_result-b_size; + u64 orig_len = len; int unlock_bits = EXTENT_LOCKED; int ret = 0; @@ -7272,9 +7273,11 @@ unlock: if (start + len i_size_read(inode)) i_size_write(inode, start + len); - spin_lock(BTRFS_I(inode)-lock); - BTRFS_I(inode)-outstanding_extents++; - spin_unlock(BTRFS_I(inode)-lock); + if (len orig_len) { + spin_lock(BTRFS_I(inode)-lock); + BTRFS_I(inode)-outstanding_extents++; + spin_unlock(BTRFS_I(inode)-lock); + } ret = set_extent_bit(BTRFS_I(inode)-io_tree, lockstart, lockstart + len - 1, EXTENT_DELALLOC, NULL, @@ -8056,8 +8059,6 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, else if (ret = 0 (size_t)ret count) btrfs_delalloc_release_space(inode, count - (size_t)ret); - else - btrfs_delalloc_release_metadata(inode, 0); } out: if (wakeup) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On Mon, Feb 09, 2015 at 06:26:49PM +0100, P. Remek wrote: Hello, I am benchmarking Btrfs and when benchmarking random writes with fio utility, I noticed following two things: 1) On first run when target file doesn't exist yet, perfromance is about 8000 IOPs. On second, and every other run, performance goes up to 7 IOPs. Its massive difference. The target file is the one created during the first run. I was doing similar tests in the last few days, well, the huge performance difference comes from AIO+DIO path, fs/direct-io.c: 1170 /* * For file extending writes updating i_size before data * writeouts * complete can expose uninitialized blocks in dumb filesystems. * In that case we need to wait for I/O completion even if asked * for an asynchronous write. */ if (is_sync_kiocb(iocb)) dio-is_async = false; else if (!(dio-flags DIO_ASYNC_EXTEND) (rw WRITE) end i_size_read(inode)) dio-is_async = false; else dio-is_async = true; So you may like to play with fio's fallocate option, although it's 'posix' on default which should have set proper i_size for you, but I don't believe it unless I set it to. 2) There are windows during the test where IOPs drop to 0 and stay 0 about 10 seconds and then it goes back again, and after couple of seconds again to 0. This is reproducible 100% times. Can somobody shred some light on what's happening? I'd use a blktrace based tool like iowatcher or seekwatcher to see what's really happening on the performance drops. Command: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 --size=10G --numjobs=1 --readwrite=randwrite Since this is just a libaio-dio random write, I think it has nothing to do with progs side. Thanks, -liubo Environment: CPU: dual socket: E5-2630 v2 RAM: 32 GB ram OS: Ubuntu server 14.10 Kernel: 3.19.0-031900rc2-generic btrfs tools: Btrfs v3.14.1 2x LSI 9300 HBAs - SAS3 12/Gbs 8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs Regards, Premek -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/24] Btrfs: sysfs btrfs_kobj_rm_device() pass fs_devices instead of fs_info
since btrfs_kobj_rm_device() does nothing with fs_info Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/sysfs.c | 12 ++-- fs/btrfs/sysfs.h | 2 +- fs/btrfs/volumes.c | 4 ++-- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 2acc0aa..124b60f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -592,7 +592,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, mutex_unlock(uuid_mutex); /* replace the sysfs entry */ - btrfs_kobj_rm_device(fs_info, src_device); + btrfs_kobj_rm_device(fs_info-fs_devices, src_device); btrfs_kobj_add_device(fs_info-fs_devices, tgt_device); btrfs_rm_dev_replace_free_srcdev(fs_info, src_device); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 15e4d54..4c86e62 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -555,7 +555,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-fs_devices-super_kobj, btrfs_feature_attr_group); sysfs_remove_files(fs_info-fs_devices-super_kobj, btrfs_attrs); - btrfs_kobj_rm_device(fs_info, NULL); + btrfs_kobj_rm_device(fs_info-fs_devices, NULL); btrfs_sysfs_remove_fsid(fs_info-fs_devices); } @@ -636,20 +636,20 @@ static void init_feature_attrs(void) /* when one_device is NULL, it removes all device links */ -int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, +int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device) { struct hd_struct *disk; struct kobject *disk_kobj; - if (!fs_info-fs_devices-device_dir_kobj) + if (!fs_devices-device_dir_kobj) return -EINVAL; if (one_device one_device-bdev) { disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; - sysfs_remove_link(fs_info-fs_devices-device_dir_kobj, + sysfs_remove_link(fs_devices-device_dir_kobj, disk_kobj-name); } @@ -657,13 +657,13 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, return 0; list_for_each_entry(one_device, - fs_info-fs_devices-devices, dev_list) { + fs_devices-devices, dev_list) { if (!one_device-bdev) continue; disk = one_device-bdev-bd_part; disk_kobj = part_to_dev(disk)-kobj; - sysfs_remove_link(fs_info-fs_devices-device_dir_kobj, + sysfs_remove_link(fs_devices-device_dir_kobj, disk_kobj-name); } diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index eeb86a8..3938ac1 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -72,6 +72,6 @@ extern struct kobj_type space_info_ktype; extern struct kobj_type btrfs_raid_ktype; int btrfs_kobj_add_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); -int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, +int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); #endif /* _BTRFS_SYSFS_H_ */ diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e567d54..51873ec 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1701,7 +1701,7 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device-bdev) { device-fs_devices-open_devices--; /* remove sysfs entry */ - btrfs_kobj_rm_device(root-fs_info, device); + btrfs_kobj_rm_device(root-fs_info-fs_devices, device); } call_rcu(device-rcu, free_device); @@ -2285,7 +2285,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) error_trans: btrfs_end_transaction(trans, root); rcu_string_free(device-name); - btrfs_kobj_rm_device(root-fs_info, device); + btrfs_kobj_rm_device(root-fs_info-fs_devices, device); kfree(device); error: blkdev_put(bdev, FMODE_EXCL); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/24 V3] Btrfs: sysfs: add check if super kobject is already initialized
This patch will be useful when we have to change the context in which we create and destroy sysfs fsid and device kobjects. But this is a good change to have it, as it just does the right thing in general. Signed-off-by: Anand Jain anand.j...@oracle.com --- v2-v3: add missing signed-off, update commit v1-v2: when kobject is already created return EEXIST, not sent to ML fs/btrfs/sysfs.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index f8358d2..6ebbe6c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -750,10 +750,14 @@ int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, int error = 0; while (fs_devs) { - init_completion(fs_devs-kobj_unregister); - fs_devs-super_kobj.kset = btrfs_kset; - error = kobject_init_and_add(fs_devs-super_kobj, + if (!fs_devs-super_kobj.state_initialized) { + init_completion(fs_devs-kobj_unregister); + fs_devs-super_kobj.kset = btrfs_kset; + error = kobject_init_and_add(fs_devs-super_kobj, btrfs_ktype, parent, %pU, fs_devs-fsid); + } else { + error = -EEXIST; + } if (!follow_seed) return error; parent = fs_devs-super_kobj; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/24] Btrfs: sysfs: separate kobject and attribute creation
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/disk-io.c | 18 +- fs/btrfs/sysfs.c | 15 ++- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0cd6550..4b7f3b8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2785,10 +2785,22 @@ retry_root_backup: btrfs_close_extra_devices(fs_info, fs_devices, 1); + ret = btrfs_sysfs_add_fsid(fs_devices, NULL); + if (ret) { + pr_err(BTRFS: failed to init sysfs fsid interface: %d\n, ret); + goto fail_block_groups; + } + + ret = btrfs_sysfs_add_device(fs_devices); + if (ret) { + pr_err(BTRFS: failed to init sysfs device interface: %d\n, ret); + goto fail_fsdev_sysfs; + } + ret = btrfs_sysfs_add_one(fs_info); if (ret) { pr_err(BTRFS: failed to init sysfs interface: %d\n, ret); - goto fail_block_groups; + goto fail_fsdev_sysfs; } ret = btrfs_init_space_info(fs_info); @@ -3002,6 +3014,9 @@ fail_cleaner: fail_sysfs: btrfs_sysfs_remove_one(fs_info); +fail_fsdev_sysfs: + btrfs_sysfs_remove_fsid(fs_info-fs_devices); + fail_block_groups: btrfs_put_block_group_cache(fs_info); btrfs_free_block_groups(fs_info); @@ -3679,6 +3694,7 @@ void close_ctree(struct btrfs_root *root) } btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_fsid(fs_info-fs_devices); btrfs_free_fs_roots(fs_info); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index ff9e5f6..d0caa32 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -556,7 +556,6 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) sysfs_remove_group(fs_info-fs_devices-super_kobj, btrfs_feature_attr_group); sysfs_remove_files(fs_info-fs_devices-super_kobj, btrfs_attrs); btrfs_kobj_rm_device(fs_info-fs_devices, NULL); - btrfs_sysfs_remove_fsid(fs_info-fs_devices); } const char * const btrfs_feature_set_names[3] = { @@ -688,10 +687,6 @@ int btrfs_kobj_add_device(struct btrfs_fs_devices *fs_devices, int error = 0; struct btrfs_device *dev; - error = btrfs_sysfs_add_device(fs_devices); - if (error) - return error; - list_for_each_entry(dev, fs_devices-devices, dev_list) { struct hd_struct *disk; struct kobject *disk_kobj; @@ -747,19 +742,13 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) fs_devs-fs_info = fs_info; - error = btrfs_sysfs_add_fsid(fs_devs, NULL); - if (error) - return error; - error = btrfs_kobj_add_device(fs_devs, NULL); - if (error) { - btrfs_sysfs_remove_fsid(fs_devs); + if (error) return error; - } error = sysfs_create_files(super_kobj, btrfs_attrs); if (error) { - btrfs_sysfs_remove_fsid(fs_devs); + btrfs_kobj_rm_device(fs_devs, NULL); return error; } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/24] Btrfs: sysfs: make btrfs_sysfs_add_fsid() non static
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 2 +- fs/btrfs/sysfs.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4c86e62..2dbb064 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -727,7 +727,7 @@ u64 btrfs_debugfs_test; * Can be called by the device discovery thread. * And parent can be specified for seed device */ -static int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, +int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, struct kobject *parent) { int error; diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index 3938ac1..aaff124 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -74,4 +74,6 @@ int btrfs_kobj_add_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); +int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, + struct kobject *parent); #endif /* _BTRFS_SYSFS_H_ */ -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/24 V3] Btrfs: sysfs: support seed devices in the sysfs layout
This adds an enhancement to show the seed fsid and devices. The way sprouting handles fs_devices: clone seed fs_devices and add to the fs_uuids mem copy seed fs_devices and assign to fs_devices-seed (move dev_list) evacuate seed fs_devices contents to hold sprout fs devices contents So to be inline with this fs_devices changes during seeding, represent seed fsid under the sprout fsid, this is achieved by using the kobject_move() The end result will be, /sys/fs/btrfs/sprout-fsid/level-1-seed-fsid/(if)level-2-seed-fsid eg: showing two levels of seeding. find /sys/fs/btrfs/ -type d -name devices -exec ls {} \; -print sde /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/devices sdd /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/f7ef2904-ce89-4421-bfb0-49fd999e9a0b/53ac3265-0c34-4afd-9453-cc0d1a07be64/devices sdf /sys/fs/btrfs/8c2772d4-6951-43c3-89b6-3ab3c70a13f8/devices Signed-off-by: Anand Jain anand.j...@oracle.com --- v2-v3: commit updates, Thanks Dave. v1-v2: does not exist fs/btrfs/dev-replace.c | 4 +-- fs/btrfs/disk-io.c | 4 +-- fs/btrfs/sysfs.c | 66 +++--- fs/btrfs/sysfs.h | 8 +++--- fs/btrfs/volumes.c | 40 +- 5 files changed, 89 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 124b60f..e72b986 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -592,8 +592,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, mutex_unlock(uuid_mutex); /* replace the sysfs entry */ - btrfs_kobj_rm_device(fs_info-fs_devices, src_device); - btrfs_kobj_add_device(fs_info-fs_devices, tgt_device); + btrfs_kobj_rm_device(fs_info-fs_devices, src_device, 0); + btrfs_kobj_add_device(fs_info-fs_devices, tgt_device, 0); btrfs_rm_dev_replace_free_srcdev(fs_info, src_device); /* write back the superblocks */ diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4b7f3b8..77372af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2785,13 +2785,13 @@ retry_root_backup: btrfs_close_extra_devices(fs_info, fs_devices, 1); - ret = btrfs_sysfs_add_fsid(fs_devices, NULL); + ret = btrfs_sysfs_add_fsid(fs_devices, NULL, 1); if (ret) { pr_err(BTRFS: failed to init sysfs fsid interface: %d\n, ret); goto fail_block_groups; } - ret = btrfs_sysfs_add_device(fs_devices); + ret = btrfs_sysfs_add_device(fs_devices, 1); if (ret) { pr_err(BTRFS: failed to init sysfs device interface: %d\n, ret); goto fail_fsdev_sysfs; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index d89bf4d..f8358d2 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -517,15 +517,20 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) static void __btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) { + if (fs_devs-seed) + __btrfs_sysfs_remove_fsid(fs_devs-seed); + if (fs_devs-device_dir_kobj) { kobject_del(fs_devs-device_dir_kobj); kobject_put(fs_devs-device_dir_kobj); fs_devs-device_dir_kobj = NULL; } - kobject_del(fs_devs-super_kobj); - kobject_put(fs_devs-super_kobj); - wait_for_completion(fs_devs-kobj_unregister); + if (fs_devs-super_kobj.state_initialized) { + kobject_del(fs_devs-super_kobj); + kobject_put(fs_devs-super_kobj); + wait_for_completion(fs_devs-kobj_unregister); + } } /* when fs_devs is NULL it will remove all fsid kobject */ @@ -555,7 +560,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-fs_devices-super_kobj, btrfs_feature_attr_group); sysfs_remove_files(fs_info-fs_devices-super_kobj, btrfs_attrs); - btrfs_kobj_rm_device(fs_info-fs_devices, NULL); + btrfs_kobj_rm_device(fs_info-fs_devices, NULL, 1); } const char * const btrfs_feature_set_names[3] = { @@ -636,7 +641,7 @@ static void init_feature_attrs(void) /* when one_device is NULL, it removes all device links */ int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, - struct btrfs_device *one_device) + struct btrfs_device *one_device, int follow_seed) { struct hd_struct *disk; struct kobject *disk_kobj; @@ -666,27 +671,39 @@ int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, disk_kobj-name); } + if (follow_seed fs_devices-seed) + btrfs_kobj_rm_device(fs_devices-seed, NULL, follow_seed); + return 0; } -int btrfs_sysfs_add_device(struct btrfs_fs_devices *fs_devs)
[PATCH 08/24] Btrfs: sysfs: introduce function btrfs_sysfs_add_fsid() to create sysfs fsid
From: Anand Jain anand.j...@oracle.com We need it in a seperate function so that it can be called from the device discovery thread as well. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c923e8b..f42d8fd 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -690,7 +690,12 @@ static struct dentry *btrfs_debugfs_root_dentry; /* Debugging tunables and exported data */ u64 btrfs_debugfs_test; -int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) +/* + * Can be called by the device discovery thread. + * And parent can be specified for seed device + */ +int btrfs_sysfs_add_fsid(struct btrfs_fs_info *fs_info, + struct kobject *parent) { int error; @@ -698,6 +703,14 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) fs_info-super_kobj.kset = btrfs_kset; error = kobject_init_and_add(fs_info-super_kobj, btrfs_ktype, NULL, %pU, fs_info-fsid); + return error; +} + +int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) +{ + int error; + + error = btrfs_sysfs_add_fsid(fs_info, NULL); if (error) return error; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/24] Btrfs: sysfs: separate device kobject and its attribute creation
From: Anand Jain anand.j...@oracle.com Separate device kobject and its attribute creation so that device kobject can be created from the device discovery thread. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 5208a49..2cb4c69 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -645,13 +645,8 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info, return 0; } -int btrfs_kobj_add_device(struct btrfs_fs_info *fs_info, - struct btrfs_device *one_device) +int btrfs_sysfs_add_device(struct btrfs_fs_info *fs_info) { - int error = 0; - struct btrfs_fs_devices *fs_devices = fs_info-fs_devices; - struct btrfs_device *dev; - if (!fs_info-device_dir_kobj) fs_info-device_dir_kobj = kobject_create_and_add(devices, fs_info-super_kobj); @@ -659,6 +654,20 @@ int btrfs_kobj_add_device(struct btrfs_fs_info *fs_info, if (!fs_info-device_dir_kobj) return -ENOMEM; + return 0; +} + +int btrfs_kobj_add_device(struct btrfs_fs_info *fs_info, + struct btrfs_device *one_device) +{ + int error = 0; + struct btrfs_fs_devices *fs_devices = fs_info-fs_devices; + struct btrfs_device *dev; + + error = btrfs_sysfs_add_device(fs_info); + if (error) + return error; + list_for_each_entry(dev, fs_devices-devices, dev_list) { struct hd_struct *disk; struct kobject *disk_kobj; -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/24 V2] provide frame work so that sysfs attributs from the fs_devices can be added
Thanks for commenting. V3 is out. more below. On 02/12/2015 03:01 AM, David Sterba wrote: On Mon, Feb 09, 2015 at 07:56:01AM +0800, Anand Jain wrote: This patch set will provide a framework and help to create attributes from the structure btrfs_fs_devices which are available even before fs_info is created. So by moving the parent kobject super_kobj from fs_info to btrfs_fs_devices, it will help to create attributes from the btrfs_fs_devices as well. Just to note, this does not change any of the existing btrfs sysfs external kobject names and its attributes and not even the life cycle of them. Changes are internal only. And to ensure the same, this path has been tested with various device operations and, checking and comparing the sysfs kobjects and attributes with sysfs kobject and attributes with out this patch, and they remain same. These test cases are added to the progs as test-btrfs-devmgt.sh, its patch is below as well. I went through the patchset, looks ok to me in general. The only concern is about the new seeding representation, but the other changes seem ok (but I did not do in-depth review). I like the patch separation, that really helps to understand the changes although there are 20+ patches in total. We can merge patches 1-22, patch 23 should be folded into 24 as it fixes a bug introduced there. Actually there isn't bug, 24 it provides framework for the upcoming RFC patch which is under test. Since I wanted this upcoming RFC patch to be sleek, so I pushed 24 with the framework patch set. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/24] Btrfs: sysfs: btrfs_sysfs_remove_fsid() make it non static
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 2 +- fs/btrfs/sysfs.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 2dbb064..ff9e5f6 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -529,7 +529,7 @@ static void __btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) } /* when fs_devs is NULL it will remove all fsid kobject */ -static void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) +void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) { struct list_head *fs_uuids = btrfs_get_fs_uuids(); diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index ac06b5c..2b31f6f 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -77,4 +77,5 @@ int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, struct kobject *parent); int btrfs_sysfs_add_device(struct btrfs_fs_devices *fs_devs); +void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs); #endif /* _BTRFS_SYSFS_H_ */ -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/24 V3] Btrfs: sysfs: add support to add parent for fsid
To support seed sysfs layout and represent seed fsid under the sprout we need the facility to create fsid under the specified parent. Signed-off-by: Anand Jain anand.j...@oracle.com --- v2-v3: added missing signed-off v1-v2: does not exist fs/btrfs/sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index d0caa32..d89bf4d 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -729,8 +729,8 @@ int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, init_completion(fs_devs-kobj_unregister); fs_devs-super_kobj.kset = btrfs_kset; - error = kobject_init_and_add(fs_devs-super_kobj, btrfs_ktype, NULL, -%pU, fs_devs-fsid); + error = kobject_init_and_add(fs_devs-super_kobj, + btrfs_ktype, parent, %pU, fs_devs-fsid); return error; } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/24] Btrfs: introduce btrfs_get_fs_uuids
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 4 fs/btrfs/volumes.h | 1 + 2 files changed, 5 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 218a14a..c1b1038 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -52,6 +52,10 @@ static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); +struct list_head *btrfs_get_fs_uuids(void) +{ + return fs_uuids; +} static struct btrfs_fs_devices *__alloc_fs_devices(void) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 53fd278..4e99f06 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -540,5 +540,6 @@ static inline void unlock_chunks(struct btrfs_root *root) mutex_unlock(root-fs_info-chunk_mutex); } +struct list_head *btrfs_get_fs_uuids(void); #endif -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/24] Btrfs: sysfs: make btrfs_sysfs_add_device() non static
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index aaff124..ac06b5c 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -76,4 +76,5 @@ int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, struct kobject *parent); +int btrfs_sysfs_add_device(struct btrfs_fs_devices *fs_devs); #endif /* _BTRFS_SYSFS_H_ */ -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/24] Btrfs: sysfs: don't fail seeding for the sake of sysfs kobject issue
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 51873ec..1490723 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2249,7 +2249,8 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) root-fs_info-fsid); if (kobject_rename(root-fs_info-fs_devices-super_kobj, fsid_buf)) - goto error_trans; + printk(KERN_WARNING\ + BTRFS: sysfs: failed to create fsid for sprout\n); } root-fs_info-num_tolerated_disk_barrier_failures = -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/24] Btrfs: sysfs: reorder the kobject creations
From: Anand Jain anand.j...@oracle.com As of now the order in which the kobjects are created at btrfs_sysfs_add_one() is.. fsid features unknown features (dynamic features) devices. Since we would move fsid and device kobject to fs_devices from fs_info structure, this patch will reorder in which the kobjects are created as below. fsid devices features unknown features (dynamic features) And hence the btrfs_sysfs_remove_one() will follow the same in reverse order. and the device kobject destroy now can be moved into the function __btrfs_sysfs_remove_one() Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 506f7e4..c3e7f06 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -510,6 +510,13 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) static void __btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) { + if (fs_info-device_dir_kobj) { + btrfs_kobj_rm_device(fs_info, NULL); + kobject_del(fs_info-device_dir_kobj); + kobject_put(fs_info-device_dir_kobj); + fs_info-device_dir_kobj = NULL; + } + kobject_del(fs_info-super_kobj); kobject_put(fs_info-super_kobj); wait_for_completion(fs_info-kobj_unregister); @@ -522,12 +529,6 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) kobject_del(fs_info-space_info_kobj); kobject_put(fs_info-space_info_kobj); } - if (fs_info-device_dir_kobj) { - btrfs_kobj_rm_device(fs_info, NULL); - kobject_del(fs_info-device_dir_kobj); - kobject_put(fs_info-device_dir_kobj); - fs_info-device_dir_kobj = NULL; - } addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-super_kobj, btrfs_feature_attr_group); __btrfs_sysfs_remove_one(fs_info); @@ -700,6 +701,12 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) if (error) return error; + error = btrfs_kobj_add_device(fs_info, NULL); + if (error) { + __btrfs_sysfs_remove_one(fs_info); + return error; + } + error = sysfs_create_group(fs_info-super_kobj, btrfs_feature_attr_group); if (error) { @@ -711,10 +718,6 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) if (error) goto failure; - error = btrfs_kobj_add_device(fs_info, NULL); - if (error) - goto failure; - fs_info-space_info_kobj = kobject_create_and_add(allocation, fs_info-super_kobj); if (!fs_info-space_info_kobj) { -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/24] Btrfs: sysfs: rename __btrfs_sysfs_remove_one to btrfs_sysfs_remove_fsid
From: Anand Jain anand.j...@oracle.com Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c3e7f06..c923e8b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -508,7 +508,7 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) return 0; } -static void __btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) +static void btrfs_sysfs_remove_fsid(struct btrfs_fs_info *fs_info) { if (fs_info-device_dir_kobj) { btrfs_kobj_rm_device(fs_info, NULL); @@ -531,7 +531,7 @@ void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) } addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-super_kobj, btrfs_feature_attr_group); - __btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_fsid(fs_info); } const char * const btrfs_feature_set_names[3] = { @@ -703,14 +703,14 @@ int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) error = btrfs_kobj_add_device(fs_info, NULL); if (error) { - __btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_fsid(fs_info); return error; } error = sysfs_create_group(fs_info-super_kobj, btrfs_feature_attr_group); if (error) { - __btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_fsid(fs_info); return error; } -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: cleanup: use wait_event() for btrfs_rm_dev_replace_blocked()
From: Zhao Lei zhao...@cn.fujitsu.com wait_event() is just suit for these hand-made code. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/dev-replace.c | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 92109b7..1a2d440 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -440,18 +440,9 @@ leave: */ static void btrfs_rm_dev_replace_blocked(struct btrfs_fs_info *fs_info) { - s64 writers; - DEFINE_WAIT(wait); - set_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state); - do { - prepare_to_wait(fs_info-replace_wait, wait, - TASK_UNINTERRUPTIBLE); - writers = percpu_counter_sum(fs_info-bio_counter); - if (writers) - schedule(); - finish_wait(fs_info-replace_wait, wait); - } while (writers); + wait_event(fs_info-replace_wait, + !percpu_counter_sum(fs_info-bio_counter)); } /* -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: cleanup: remove unuesd DEDINE_WAIT() in btrfs_bio_counter_inc_blocked()
From: Zhao Lei zhao...@cn.fujitsu.com 1: Remove unused DEFINE_WAIT(wait) 2: Add likely() for BTRFS_FS_STATE_DEV_REPLACING condition 3: Use a look instead of goto Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/dev-replace.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index ca6a3a3..92109b7 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -932,15 +932,15 @@ void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info) { - DEFINE_WAIT(wait); -again: - percpu_counter_inc(fs_info-bio_counter); - if (test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)) { + while (1) { + percpu_counter_inc(fs_info-bio_counter); + if (likely(!test_bit(BTRFS_FS_STATE_DEV_REPLACING, +fs_info-fs_state))) + break; + btrfs_bio_counter_dec(fs_info); wait_event(fs_info-replace_wait, !test_bit(BTRFS_FS_STATE_DEV_REPLACING, fs_info-fs_state)); - goto again; } - } -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: cleanup: use for() loop in btrfs_map_bio()
From: Zhao Lei zhao...@cn.fujitsu.com for() is obviously better in these code block, and remove noused init-value to reduce about 6 bytes binary size. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/volumes.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e86f4ca..46d07b9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5812,8 +5812,8 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, u64 map_length; u64 *raid_map = NULL; int ret; - int dev_nr = 0; - int total_devs = 1; + int dev_nr; + int total_devs; struct btrfs_bio *bbio = NULL; length = bio-bi_iter.bi_size; @@ -5856,11 +5856,10 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, BUG(); } - while (dev_nr total_devs) { + for (dev_nr = 0; dev_nr total_devs; dev_nr++) { dev = bbio-stripes[dev_nr].dev; if (!dev || !dev-bdev || (rw WRITE !dev-writeable)) { bbio_error(bbio, first_bio, logical); - dev_nr++; continue; } @@ -5873,7 +5872,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, ret = breakup_stripe_bio(root, bbio, first_bio, dev, dev_nr, rw, async_submit); BUG_ON(ret); - dev_nr++; continue; } @@ -5888,7 +5886,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, submit_stripe_bio(root, bbio, bio, bbio-stripes[dev_nr].physical, dev_nr, rw, async_submit); - dev_nr++; } btrfs_bio_counter_dec(root-fs_info); return 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html